Académique Documents
Professionnel Documents
Culture Documents
Tree Learning
Fall 2016
Narayanan C Krishnan
ckn@iitrpr.ac.in
Classification Tree
Regression Tree
< 3.16
< 1.2
< 0.2
1.15
< 4.16
-1.35
2.49
< 5.9
1.73
1.2
Decision Tree Learning
< 6.2
3.1
0.4
4
0
Internal
(split) node
3
7
4
8
10
5
11
12
Terminal
(leaf) node
Decision Tree Learning
0
1
10
5
11
12
Example Dataset
example
age
income
student
Credit rating
Buys computer
x.
30
high
no
fair
no
x/
30
high
no
excellent
no
x2
31 40
high
no
fair
yes
x4
> 40
medium
no
fair
yes
x6
> 40
low
yes
fair
yes
x7
> 40
low
yes
excellent
no
x8
31 40
low
yes
excellent
yes
x9
30
medium
no
fair
no
x:
30
low
yes
fair
yes
x.;
> 40
medium
yes
fair
yes
x..
30
medium
yes
excellent
yes
x./
31 40
medium
no
excellent
yes
x.2
31 40
high
yes
fair
yes
x.4
> 40
medium
no
excellent
no
example
age
Buys computer
x.
30
no
x/
30
no
x9
30
no
x:
30
yes
30
31-40
x..
30
yes
x2
31 40
yes
3
2
0
4
x8
31 40
yes
x./
31 40
yes
x.2
31 40
yes
x4
> 40
yes
x6
> 40
yes
x7
> 40
no
x.;
> 40
yes
x.4
> 40
no
age
> 40
2
3
10
11
example
age
Buys computer
x.
30
no
x/
30
no
x9
30
no
x:
30
yes
x..
30
yes
x2
31 40
yes
30
31-40
x8
31 40
yes
x./
31 40
yes
3
2
0
4
x.2
31 40
yes
x4
> 40
yes
x6
> 40
yes
x7
> 40
no
x.;
> 40
yes
x.4
> 40
no
5
9
age
E = 0.9403
> 40
2
3
12
example
student
Buys computer
x.
no
no
x/
no
no
5
9
x2
no
yes
student
x4
no
yes
x9
no
no
no
x./
no
yes
4
3
x.4
no
no
x6
yes
yes
x7
yes
no
x8
yes
yes
x:
yes
yes
x.;
yes
yes
x..
yes
yes
x.2
yes
yes
E = 0.9403
yes
1
6
13
Information Gain
Gain(, T ) expected reduction in entropy due to
sorting on attribute T
Gain , T = Entropy
\
Y`abcde(fg )
Y
Entropy(Y )
||
14
15
age
31-40
> 40
3
2
0
4
2
3
E=0
5
9
IG = 0.1519
no
yes
4
3
E = 0.9710
1
6
E = 0.9852
E = 0.5917
5
9
E = 0.9403
income
1
3
E = 0.8113
Decision Tree Learning
medium
high
2
4
2
2
E = 0.9183
E = 0.9403
credit rating
IG = 0.0481
IG = 0.0292
low
E = 0.9403
student
IG = 0.2467
30
E = 0.9710
5
9
E = 0.9403
E=1
fair
2
6
E = 0.8113
CSL465/603 - Machine Learning
excellent
3
3
E=1
16
age
IG = 0.2467
30
example
31-40
3
2
E = 0.9710
(1,2,8,11,9)
0x.
4x /
xE9= 0
(3,7,12,13)
x:
x..
income
student
Credit rating
Buys computer
high2
no
fair
no
high3
no
excellent
no
fair
no
fair
yes
excellent
yes
> 40
medium
no
E = 0.9710
low(4,5,6,10,14)
yes
medium
yes
17
Then below this new branch add a leaf node with the most frequently occurring
Target Attribute in Examples
Else, below this new branch add the subtree ID3(Examplesw , Target
Attribute, Attributes {})
18
30
Student
yes
yes
> 40
Credit
rating
yes
no
fair
no
yes
excellent
no
19
No backtracking
Greedy algorithm
20
21
30
Student
yes
yes
Decision Tree Learning
> 40
Credit
rating
yes
no
fair
no
yes
CSL465/603 - Machine Learning
excellent
no
22
23
Add/Drop
Current strength ~ 70
Last day to add/drop
Those who have added the course during the past few
days and are having doubts about continuing, please
contact me after the class
24
No backtracking
Greedy algorithm
25
26
27
28
29
30
30
Student
yes
yes
> 40
Credit
rating
yes
no
fair
no
yes
excellent
no
31
189
245
256
370
377
394
605
720
Buys computer
no
no
yes
no
yes
yes
no
no
32
33
age
Buys computer
x.
20
no
x/
21
no
x2
23
yes
x4
34
yes
x6
25
yes
x7
41
no
x8
37
yes
x9
18
no
x:
27
yes
x.;
45
yes
x..
29
yes
x./
35
yes
x.2
36
yes
x.4
46
no
34
SplitInformation , T
= \
log /
, = | T |
Gain Ratio , T
Information Gain (, T )
=
Split Information(, T )
35
36
37
38
39
40
Generate different
bootstrapped training
datasets
0.20
0.15
Test: Bagging
Test: RandomForest
OOB: Bagging
OOB: RandomForest
0.10
Error
0.25
Sampling with
replacement
50
100
150
200
250
300
Number of Trees
41
m=p
m=p/2
m= p
0.4
0.3
0.2
Build a number of
decision trees on
bootstrapped training
samples with the change
Decorrelates trees in
turn reduces variance
when we average the
trees
0.5
Provide an improvement
over instance bagging
100
200
300
400
500
Number of Trees
42
Summary
Popular concept learning method
Uses information theoretic heuristic for the
construction of the tree
The model is completely expressive
Inductive Bias is towards shorter trees
43