Académique Documents
Professionnel Documents
Culture Documents
Lecture 12
14-08-09
Body temp.
Root
node
cold
warm
Internal node
Non-
Gives birth mammals
Leaf
yes no Node
Mammals Non-mammals
Non-mammals
Leaf nodes
Lect 12/ 14-08-09 2
Example whether a person will buy a computer/not
yes Decision
student? yes credit rating? trees can be
converted
easily to
no yes excellent fair classification
rules.
no yes
no yes
yes no
no yes
Lect 12/ 14-08-09 4
Apply Model to Test Data
Start from the root of tree. Test Data
Refund Marital Taxable
Status Income Cheat
Refund
No Married 80K ?
Yes No 10
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Lecture 13/17-08-09
1 Ye
Dt
(b)
Home Home
owner owner
Yes No Yes No
• Binary attributes
• Test cond for a binary attr generates 2 potential
outcomes,
Body
temprature
Warm-blooded Cold-blooded
CarType CarType
{Sports, {Family,
Luxury} {Family} OR Luxury} {Sports}
Size
Size {Medium,
{Small, OR Large} {Small}
Medium} {Large}
Size
• What about this split? {Small,
Large} {Medium}
Own
Gender Car Cust_ID Studen
Car? Type? ID?
M F Family Luxury c1
Yes No
c10 c11
Sports
C0: 6 C0: 4 C0: 1 C0: 8 C0: 1 C0: 1 ... C0: 1 C
C1: 4 C1: 6 C1: 3 C1: 0 C1: 7 C1: 0 C1: 0 C
c −1
• Entropy= − ∑ pi ( log 2 p i )
i =0
Yes
A?
C0No Yes
B?
No
C1
Node N1 Node N2 Node N3 Node N4
M1
C0 C0 N10
M2
C0N M3 M4
C1 C1 N11
C1N
M12 M34
Gain = M0 – M12 vs M0 – M34
Lect 12/ 14-08-09 28
Measure of Impurity: GINI
• Gini Index for a given node t :
GINI (t ) = 1 − ∑ [ p ( j | t )]2
j
C1 0 C1 1 C1 2 C1 3
C2 6 C2 5 C2 4 C2 3
Gini=0.000 Gini=0.278 Gini=0.444 Gini=0.500
29
Examples for computing GINI
GINI (t ) = 1 − ∑ [ p ( j | t )]2
j
Error (t ) = 1 − max pi
i
impurity measures
for binary
classification
problems
• ‘p’ refers to the
fraction of rec. that
belong to one of the
two classes
•All measures attain
their max. value
when CD is uniform
(p=0.5) and min.
values when all
records belong to
the same class (p=0
or 1)
33
Customer ID Gender Car Type Shirt Size Class
1 M Family S C0
2 M Sports M C0
3 M Sports M C0
4 M Sports L C0
5 M Sports XL C0
6 M Sports XL C0
7 F Sports S C0
8 F Sports S C0
9 F Sports M C0
10 F Luxury L C0
11 M Family L C1
12 M Family XL C1
13 M Family M C1
14 M Luxury XL C1
15 F Luxury S C1
16 F Luxury S C1
17 F Luxury M C1
18 F Luxury M C1
19 F Luxury M C1 34
20 F Luxury L C1
• Q1: consider the training ex shown in
previous slide for a binary classification
problem.
• (a) compute Gini index for Customer ID
attr.
• (b) Gini Index for Gender attr.
• (c) Gini index for Car type attr. Using multi
way split
k
ni
GINI split = ∑ GINI (i )
i =1 n
Yes No
No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
GAIN n
= Entropy ( p ) − ∑ Entropy (i )
k
i
n
split i =1
SplitINFO
split
n n i =1
Error (t ) = 1 − max P (i | t )i