Vous êtes sur la page 1sur 12

Optimum Statistical Classifiers

1
Introduction

The approaches to pattern recognition are divided into two


principal areas: decision-theoretic and structural

The first category deals with patterns described using


quantitative descriptors, such as length, area, and texture

The second category deals with patterns best described by


qualitative descriptors, such as the relational descriptors.

2
Recognition Based on Decision-Theoretic
Methods
Decision-theoretic approaches to recognition are based on the
use decision functions.
Let x ( x1 , x2 ,..., xn ) represent an n-dimensional pattern vector.
T

For W pattern classes 1 , 2 ,..., W , we want to find W decision


functions d1 (x), d 2 (x),..., dW (x) with the property that, if a pattern x
belongs to class i , then
d i ( x) d j ( x) j 1,2,...,W ; j i
The decision boundary separating class i and j is given by
d i (x) d j (x) or d i (x) d j (x) 0
Optimum statistical classifiers
The probability that a particular pattern x comes from class wi
is denoted p(wi x)
If the pattern classifier decides that x came fromw j when it
actually came from wi , it incurs a loss, denoted Lij , then
average loss is given by
W
rj ( x) Lkj p ( wk x)
k 1
- (1)
Also known as conditional average risk or loss
Bayes classifier minimum total average loss

4
Optimum statistical classifiers

From basic probability theory,


p( A B) p( A) p( B A) p(B)

So Eqn. 1 becomes
1 W
rj ( x )
p ( x) k 1
Lkj p ( x wk ) P ( wk )

Ignoring the positive constant p(x) for all j,


W
rj ( x) Lkj p ( x wk ) P ( wk )
k 1
- (2)
Thus x is assigned a class i whose ri (x) is minimum

5
Optimum statistical classifiers

Thus the Bayes classifier assigns an unknown pattern x to


class wi if ri(x) < rj(x) for all j=1,2 .. W and ij
if
W W

L
k 1
ki p( x wk ) P(wk ) Lqj p( x wq ) P(wq )
q 1
ij

Lij 1 ij
1 if i j
where dirac function i, j
if i j
0

Eqn. 2 becomes
W
rj ( x) (1 kj ) p ( x wk ) P( wk )
k 1

p( x) p ( x w j ) p ( w j )
6
Optimum statistical classifiers

The Bayes classifier then assigns a pattern x to class wi if,

p( x) p( x wi ) P( wi ) p ( x) p ( x w j ) P ( w j )
or, equivalently, if

p ( x wi ) P ( wi ) p ( x w j ) P ( w j )
Thus the discriminant | decision function is

d j ( x) p( x w j ) P( w j ) - (3)

7
Optimum statistical classifiers
Bayes Classifier for Gaussian Pattern Classes
Let us consider a 1-D problem (n=1) involving two pattern
classes (W=2) governed by Gaussian densities

d j ( x) p( x w j ) P( w j )
( x m j )2

1 2 2j
e P( w j ) j 1, 2
2 j

8
Optimum statistical classifiers

In the n-dimensional case, the Gaussian density of the


vectors in the jth pattern class has the form
1
1 ( x m j )T C j 1 ( x m j )
p( x w j ) 12
e 2

(2 ) n2
Cj

9
Optimum statistical classifiers

Taking natural logarithms we have Eqn. 3 as

d j ( x) ln[ p ( x / j ) P( j )]
ln p ( x / j ) ln P( j )
n 1 1
ln P( j ) ln 2 ln C j [( x m j )T C j 1 ( x m j )]
2 2 2
1 1
ln P( j ) ln C j [( x m j )T C j 1 ( x m j )]
2 2
n
( ln 2 is same for all classes )
2
Thus decision boundary is hyperquadrics ( quadratic
function in n- dimensional space)

10
Optimum statistical classifiers
If all Covariance matrices are equal ( Cj = C for j=1 ,2 W)
Dropping all terms independent of j we have
1 T 1
d j ( x) ln P ( j ) x C m j
T 1
mj C mj
2
In addition, if C= I and P(j) = 1/W, then
1 T
d j ( x) x m j
T
mj mj
2
Thus MDC is optimum in Bayes sense if
Pattern classes are gaussian
All covariance matrices are equal to
identity matrix
All class are equally likely to occur

11
Optimum statistical classifiers

Advantages:
1. combines with other methods - high accuracy

Disadvantages:
1.It costs time for counting samples
2.It has to combine other methods

12

Vous aimerez peut-être aussi