Vous êtes sur la page 1sur 7

The ID3 Algorithm

This paper details the ID3 classification algorithm. Very simply, ID3 builds a
decision tree from a fixed set of examples. The resulting tree is used to classify
future samples. The example has several attributes and belongs to a class (like
yes or no. The leaf nodes of the decision tree contain the class name !hereas a
non"leaf node is a decision node. The decision node is an attribute test !ith each
branch (to another decision tree being a possible value of the attribute. ID3 uses
information gain to help it decide !hich attribute goes into a decision node. The
advantage of learning a decision tree is that a program, rather than a kno!ledge
engineer, elicits kno!ledge from an expert.
#. $oss %uinlan originally developed ID3 at the &niversity of 'ydney. (e first
presented ID3 in )*+, in a book, Machine Learning, vol. ), no. ). ID3 is based off
the -oncept .earning 'ystem (-.' algorithm. The basic -.' algorithm over a set
of training instances -/
'tep )/ If all instances in - are positive, then create 01' node and halt.
If all instances in - are negative, create a 23 node and halt.
3ther!ise select a feature, 4 !ith values v), ..., vn and create a decision node.
'tep 5/ 6artition the training instances in - into subsets -), -5, ..., -n according
to the values of V.
'tep 3/ apply the algorithm recursively to each of the sets -i.
2ote, the trainer (the expert decides !hich feature to select.
ID3 improves on -.' by adding a feature selection heuristic. ID3 searches
through the attributes of the training instances and extracts the attribute that
best separates the given examples. If the attribute perfectly classifies the training
sets then ID3 stops7 other!ise it recursively operates on the n (!here n 8
number of possible values of an attribute partitioned subsets to get their 9best9
attribute. The algorithm uses a greedy search, that is, it picks the best attribute
and never looks back to reconsider earlier choices.
ID3 is a nonincremental algorithm, meaning it derives its classes from a fixed set
of training instances. :n incremental algorithm revises the current concept
definition, if necessary, !ith a ne! sample. The classes created by ID3 are
inductive, that is, given a small set of training instances, the specific classes
created by ID3 are expected to !ork for all future instances. The distribution of
the unkno!ns must be the same as the test cases. Induction classes cannot be
proven to !ork in every case since they may classify an infinite number of
instances. 2ote that ID3 (or any inductive algorithm may misclassify data.
Data Description

The sample data used by ID3 has certain re;uirements, !hich are/
:ttribute"value description " the same attributes
must describe each example and have a fixed
number of values.
6redefined classes " an example<s attributes
must already be defined, that is, they are not
learned by ID3.
Discrete classes " classes must be sharply
delineated. -ontinuous classes broken up into
vague categories such as a metal being 9hard,
;uite hard, flexible, soft, ;uite soft9 are
'ufficient examples " since inductive
generali=ation is used (i.e. not provable there
must be enough test cases to distinguish valid
patterns from chance occurrences.
Attribute Selection

(o! does ID3 decide !hich attribute is the best> : statistical property, called
information gain, is used. ?ain measures ho! !ell a given attribute separates
training examples into targeted classes. The one !ith the highest information
(information being the most useful for classification is selected. In order to
define gain, !e first borro! an idea from information theory called entropy.
1ntropy measures the amount of information in an attribute.
?iven a collection ' of c outcomes

1ntropy(' 8 "p(I log5 p(I

!here p(I is the proportion of ' belonging to class I. is over c. .og5 is log base
2ote that ' is not an attribute but the entire sample set.
Example 1

If ' is a collection of )@ examples !ith * 01' and , 23 examples then

1ntropy(' 8 " (*A)@ .og5 (*A)@ " (,A)@ .og5 (,A)@ 8 B.*@B

2otice entropy is B if all members of ' belong to the same class (the data is
perfectly classified. The range of entropy is B (9perfectly classified9 to ) (9totally
?ain(', : is information gain of example set ' on attribute : is defined as

?ain(', : 8 1ntropy(' " ((C'vC A C'C D 1ntropy('v

is each value v of all possible values of attribute :
'v 8 subset of ' for !hich attribute : has value v
C'vC 8 number of elements in 'v
C'C 8 number of elements in '
Example 2

'uppose ' is a set of )@ examples in !hich one of the attributes is !ind speed.
The values of Eind can be Weak or Strong. The classification of these )@
examples are * 01' and , 23. 4or attribute Eind, suppose there are F
occurrences of Eind 8 Eeak and G occurrences of Eind 8 'trong. 4or Eind 8
Eeak, G of the examples are 01' and 5 are 23. 4or Eind 8 'trong, 3 are 01'
and 3 are 23. Therefore


= 0.940 - (8/14)*0.811 - (6/14)*1.00

= 0.048

1ntropy('!eak 8
(GAFDlog5(GAF " (5AFDlog5(5AF 8 B.F))
1ntropy('strong 8
(3AGDlog5(3AG " (3AGDlog5(3AG 8 ).BB
4or each attribute, the gain is calculated and the highest gain is used in the
decision node.
Example of ID3

'uppose !e !ant ID3 to decide !hether the !eather is amenable to playing
baseball. 3ver the course of 5 !eeks, data is collected to help ID3 build a
decision tree (see table ).
The target classification is 9should !e play baseball>9 !hich can be yes or no.
The !eather attributes are outlook, temperature, humidity, and !ind speed. They
can have the follo!ing values/
outlook 8 H sunny, overcast, rain I
temperature 8 Hhot, mild, cool I
humidity 8 H high, normal I
!ind 8 H!eak, strong I
1xamples of set ' are/
Day 3utlook Temperature (umidity Eind 6lay ball
D) 'unny (ot (igh Eeak 2o
D5 'unny (ot (igh 'trong 2o
D3 3vercast (ot (igh Eeak 0es
D@ $ain Jild (igh Eeak 0es
D, $ain -ool 2ormal Eeak 0es
DG $ain -ool 2ormal 'trong 2o
D+ 3vercast -ool 2ormal 'trong 0es
DF 'unny Jild (igh Eeak 2o
D* 'unny -ool 2ormal Eeak 0es
D)B $ain Jild 2ormal Eeak 0es
D)) 'unny Jild 2ormal 'trong 0es
D)5 3vercast Jild (igh 'trong 0es
D)3 3vercast (ot 2ormal Eeak 0es
D)@ $ain Jild (igh 'trong 2o

Table 1

Ee need to find !hich attribute !ill be the root node in our decision tree. The
gain is calculated for all four attributes/
?ain(', 3utlook 8 B.5@G
?ain(', Temperature 8 B.B5*
?ain(', (umidity 8 B.),)
?ain(', Eind 8 B.B@F (calculated in example 5
3utlook attribute has the highest gain, therefore it is used as the decision
attribute in the root node.
'ince 3utlook has three possible values, the root node has three branches
(sunny, overcast, rain. The next ;uestion is 9!hat attribute should be tested at
the 'unny branch node>9 'ince !e8*5ve used 3utlook at the root, !e only
decide on the remaining three attributes/ (umidity, Temperature, or Eind.
'sunny 8 HD), D5, DF, D*, D))I 8 , examples from table ) !ith outlook 8 sunny
?ain('sunny, (umidity 8 B.*+B
?ain('sunny, Temperature 8 B.,+B
?ain('sunny, Eind 8 B.B)*
(umidity has the highest gain7 therefore, it is used as the decision node. This
process goes on until all data is classified perfectly or !e run out of attributes.
The final decision 8 tree
The decision tree can also be expressed in rule format/
I4 outlook 8 sunny :2D humidity 8 high T(12 playball 8 no
I4 outlook 8 rain :2D humidity 8 high T(12 playball 8 no
I4 outlook 8 rain :2D !ind 8 strong T(12 playball 8 yes
I4 outlook 8 overcast T(12 playball 8 yes
I4 outlook 8 rain :2D !ind 8 !eak T(12 playball 8 yes
ID3 has been incorporated in a number of commercial rule"induction packages.
'ome specific applications include medical diagnosis, credit risk assessment of
loan applications, e;uipment malfunctions by their cause, classification of
soybean diseases, and !eb search classification.
The discussion and examples given sho! that ID3 is easy to use. Its primary use
is replacing the expert !ho !ould normally build a classification tree by hand. :s
industry has sho!n, ID3 has been effective.
9Kuilding Decision Trees !ith the ID3 :lgorithm9, by/ :ndre! -olin, Dr. Dobbs
#ournal, #une )**G
9Incremental Induction of Decision Trees9, by 6aul 1. &tgoff, Llu!er :cademic
6ublishers, )*F*
9Jachine .earning9, by Tom Jitchell, Jc?ra!"(ill, )**+ pp. ,5"F)
9-@., 6rograms for Jachine .earning9, by #. $oss %uinlan, Jorgan Laufmann,
9:lgorithm :lley -olumn/ -@.,9, by .ynn Jonson, Dr. Dobbs #ournal, #an )**+