Vous êtes sur la page 1sur 17

1

ID3 and Decision tree


2
Decision Trees
Decision trees classify instances by
sorting them down the tree from the root
to some leaf node, which provides the
classification of the instance.
Each node in the tree specifies a test of
some attribute of the instance, and each
branch descending from that node
corresponds to one of the possible values
for this attribute

3
Why Decision Trees?
They generalize in a better way for unobserved
instances.
They are efficient in computation as it is
proportional to the number of training instances
observed
The tree interpretation gives a good
understanding of how to classify instances based
on attributes arranged on the basis of information
they provide and makes the classification process
self-evident.
Algorithm in this area : ID3, C4.5, etc
4
ID3 algorithm
Is an algorithm to construct a decision tree
ID3 constructs decision tree by employing a
top-down, greedy search through the given sets
of training data to test each attribute at every
node
It uses statistical property call information
gain to select which attribute to test at each
node in the tree
Information gain measures how well a given
attribute separates the training examples
according to their target classification.
Using Entropy to generate the information gain
The best value then be selected
ID3 and Decision tree
5
Entropy
A measure in the information theory which
characterizes the impurity of an arbitrary
collection of examples.
The complete formula for entropy is:

E(S) = - (p
+
)*log
2
(p
+
) - (p_)*log
2
(p_)

Where p
+
is the positive samples
Where p_ is the negative samples
Where S is the sample of attributions
ID3 and Decision tree
6
Example
A
1
=?
True False
[21+, 5-] [8+, 30-]
[29+,35-]
E(A) = -29/(29+35)*log2(29/(29+35))
35/(35+29)log2(35/(35+29))
= 0.9937


E(TRUE) = - 21/(21+5)*log2(21/(21+5))
5/(5+21)*log2(5/(5+21))
= 0.7960

E(FALSE) = -8/(8+30)*log2(8/(8+30))
30/(30+8)*log2(30/(30+8))
= 0.7426
The Entropy of True:
The Entropy of False:
ID3 and Decision tree
7
Information Gain
Gain (Sample, Attributes) or Gain (S,A) is expected reduction
in entropy due to sorting S on attribute A





So, for the previous example, the Information gain is calculated:
G(A1) = E(A1) - (21+5)/(29+35) * E(TRUE)
- (8+30)/(29+35) * E(FALSE)
= E(A1) - 26/64 * E(TRUE) - 38/64* E(FALSE)
= 0.9937 26/64 * 0.796 38/64* 0.7426
= 0.5465
ID3 and Decision tree
Gain(S,A) = Entropy(S) -
vvalues(A)
|S
v
|/|S| Entropy(S
v
)
8
The complete example
Day Outlook Temp. Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
ID3 and Decision tree
9
Decision tree
We want to build a decision tree for the
tennis matches
The schedule of matches depend on the
weather (Outlook, Temperature, Humidity,
and Wind)
So to apply what we know to build a decision
tree based on this table
ID3 and Decision tree
10
Example
Entropy (E) =
- (9/14)log2(9/14) (5/14)log2(5/14)
= 0.940
Calculating the information gains for
each of the weather attributes:
For the Temp
For the Wind
For the Humidity
For the Outlook
ID3 and Decision tree
11
For the Temp
Gain(S,Temp):
=0.940 - (4/14)*1.0 - (6/14)*0.918 (4/14)*0.811
=0.029

Temp
Hot
Cool
[2+, 2-] [3+, 1-]
S=[9+,5-]
E=0.940
E=1.0 E=0.811
Mild
[4+, 2]
E=0.918
12
For the Wind
Wind
Weak Strong
[6+, 2-]
[3+, 3-]
S=[9+,5-]
E=0.940

Gain(S,Wind):
=0.940 - (8/14)*0.811 - (6/14)*1.0
=0.048
ID3 and Decision tree
13
For the Humidity

Humidity
High Normal
[6+, 1-]
S=[9+,5-]
E=0.940
Gain(S,Humidity)
=0.940-(7/14)*0.985 (7/14)*0.592
=0.151
[3+, 4-]
ID3 and Decision tree
14
For the Outlook
Outlook
Sunny
Rain
[2+, 3-] [3+, 2-]
S=[9+,5-]
E=0.940
E=0.971 E=0.971
Over
cast
[4+, 0]
E=0.0
Gain(S,Outlook)
=0.940-(5/14)*0.971 -(4/14)*0.0 (5/14)*0.971
=0.247
ID3 and Decision tree
15
Choosing Attributes
Select attribute with the maximum
information gain, which is
outlook, for splitting.
Apply the algorithm to each child
node of this root, until leaf nodes
(node that has entropy = 0) are
reached.
16
Complete tree
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
Yes No
[D3,D7,D12,D13]
[D8,D9,D11]
[D6,D14]
[D1,D2]
ID3 and Decision tree
17
Reference:
Dr. Lees Slides, San Jose State University,
Spring 2007
"Building Decision Trees with the ID3
Algorithm", by: Andrew Colin, Dr. Dobbs
Journal, June 1996
"Incremental Induction of Decision Trees", by
Paul E. Utgoff, Kluwer Academic Publishers,
1989
http://www.cise.ufl.edu/~ddd/cap6635/Fall-
97/Short-papers/2.htm
http://decisiontrees.net/node/27

Vous aimerez peut-être aussi