Vous êtes sur la page 1sur 16

Naive Bayes Classifier

The naive Bayes classifier assigns an


instance sk with attribute values (A1=v1,
A2=v2, , Am=vm ) to class Ci with
maximum Prob(Ci|(v1, v2, , vm)) for all i.
The naive Bayes classifier exploits the
Bayess rule and assumes independence
of attributes.

Likelihood of sk belonging to Ci

P v1 , v 2 ,..., v m | Ci P (Ci )
ProbCi | v1 , v 2 ,..., v m
P v1 , v 2 ,..., v m
Likelihood of sk belonging to Cj

ProbC j | v1 , v 2 ,..., v m

P v1 , v 2 ,..., v m | C j P(C j )
P v1 , v 2 ,..., v m

Therefore, when comparing Prob(Ci| (v1, v2, , vm)) and


P(Cj |(v1, v2, , vm)), we only need to compute P((v1,
v2, , vm)|Ci)P(Ci) and P((v1, v2, , vm)|Cj)P(Cj)

Under the assumption of independent attributes


P v1 , v 2 ,..., v m | C j
P( A1 v1 | C j ) P( A2 v 2 | C j ) P( Am v m | C j )
m

P( Ah v h | C j )
h 1

Furthermore, P(Cj) can be computed by

number of training samples belonging to C j


total number of training samples

An Example of the Nave Bayes


Classifier
The weather data, with counts and probabilities
outlook

temperature

yes

no

sunny

overcast

rainy

humidity

yes

no

hot

mild

cool

sunny

2/9

3/5

hot

2/9

2/5 high

3/9

overcast

4/9

0/5

mild

4/9

2/5 normal

6/9

rainy

3/9

2/5

cool

3/9

1/5

windy

yes

no

high

normal

play

yes

no

yes

no

false

true

4/5 false

6/9

2/5

9/14

5/14

1/5 true

3/9

3/5

A new day
outlook

temperature

humidity

windy

play

sunny

cool

high

true

Likelihood of yes
2 3 3 3 9
0.0053
9 9 9 9 14
Likelihood of no

3 1 4 3 5
0.0206
5 5 5 5 14
Therefore, the prediction is No

The Naive Bayes Classifier for


Data Sets with Numerical Attribute
Values
One common practice to handle numerical
attribute values is to assume normal
distributions for numerical attributes.

The numeric weather data with summary statistics


outlook

temperature

humidity

windy

yes

no

yes

no

yes

no

sunny

83

85

86

85

overcast

70

80

96

90

rainy

68

65

80

70

64

72

65

95

69

71

70

91

75

80

75

70

72

90

81

75

play

yes

no

yes

no

false

true

9/14

5/14

sunny

2/9

3/5 mean

73

74.6

mean

79.1

86.2

false

6/9

2/5

overcast

4/9

0/5 std
dev

6.2

7.9

std
dev

10.2

9.7

true

3/9

3/5

rainy

3/9

2/5

Let x1, x2, , xn be the values of a numerical


attribute in the training data set.

1 n
xi
n i 1
1 n
2

i
n 1 i 1
1
f ( w)
e
2

w 2
2

For examples,
f temperature 66 | Yes

1
e
2 6.2

66 73 2
2 6.2 2

0.0340

Likelihood of Yes =

2
3 9
0.0340 0.0221 0.000036
9
9 14

Likelihood of No =

3
3 5
0.0291 0.038 0.000136
5
5 14

Instance-Based Learning
In instance-based learning, we take k
nearest training samples of a new instance
(v1, v2, , vm) and assign the new
instance to the class that has most
instances in the k nearest training samples.
Classifiers that adopt instance-based
learning are commonly called the KNN
classifiers.

The basic version of the KNN classifiers works


only for data sets with numerical values.
However, extensions have been proposed for
handling data sets with categorical attributes.
If the number of training samples is sufficiently
large, then it can be proved statistically that the
KNN classifier can deliver the accuracy
achievable with learning from the training data
set.

However, if the number of training


samples is not large enough, the KNN
classifier may not work well.

If the data set is noiseless, then the 1NN classifier should work well.
In general, the more noisy the data set is, the higher should k be set.
However, the optimal k value should be figured out through cross
validation.
The ranges of attribute values should be normalized, before the
KNN classifier is applied. There are two common normalization
approaches

v vmin
w
vmax vmin
v
w

, where and 2 are the mean and the variance of


the attribute values, respectively.

Cross Validatioan
Most data classification algorithms require some
parameters to be set, e.g. k in KNN classifier
and the tree pruning threshold in the decision
tree.
One way to find an appropriate parameter
setting is through k-fold cross validation,
normally k=10.
In the k-fold cross validation, the training data
set is divided into k subsets. Then k runs of the
classification algorithm is conducted, with each
subset serving as the test set once, while using
the remaining (k-1) subsets as the training set.

The parameter values that yield maximum


accuracy in cross validation are then
adopted.

Example of the KNN Classifiers

If an 1NN classifier is employed, then the


prediction of = X.
If an 3NN classifier is employed, then
prediction of = O.