Vous êtes sur la page 1sur 20

Decision Tree Algorithm

Muhammad Monjurul Islam 11-95100-3 Jafiul Mahmud 10-95000-3

ID3 Decision Tree Algorithm


Iterative Dichotomiser 3 or ID3 builds a decision tree from a fixed set of examples. The resulting tree is used to classify future samples. The example has several attributes and belongs to a class (like yes or no). we will the attribute selection procedure uses in ID3 algorithm.

ID3 Decision Tree Algorithm (cont)


Attribute selection section has been divided into basic information related to data set, entropy and information gain has been discussed and few examples have been used to show How to calculate entropy and information gain using example data.

Attribute selection
Attribute selection is the fundamental step to construct a decision tree. There two term Entropy and Information Gain is used to process attribute selection, using attribute selection ID3 algorithm select which attribute will be selected to become a node of the decision tree and so on. Before dive into deep need to introduce few terminology used in attribute selection process.

Attribute selection (cont.)

Attributes

In the above table, Day, Outlook, Temperature, Humidity, Wind, Play ball denotes as attributes.

Class(C) or Classifier
Among these attributes Play ball refers as Class(C) or Classifier. Because based on Outlook, Temperature, Humidity and Wind we need to decide whether we can Play ball or not, thats why Play ball is a classifier to make decision.

Collection (S)

All the records in the table refer as Collection (S).

Entropy Calculation
Entropy is one kind of measurement procedure in information theory. In here, we will see how to calculate Entropy of given set of data. The test data used in here is fig 1 data,

Entropy Calculation (cont.)


Entropy(S) = n=1-p(I)log2p(I) p(I) refers to the proportion of S belonging to class I i.e. in the following table we have two kinds of class {No, Yes} with {5, 9}, the collection size is S=14. So the p(I) over C for the Entire collection is No (5/14) and Yes (9/14).

Entropy Calculation (cont.)


log2p(I), refers to the log2(5/14) and log2(9/14) over C. is over c i.e. summation of all the classifier items. In this case summation of p(No/S)log2p(No/S) and p(Yes/S)log2p(Yes/S) = -p(No/S)log2p(No/S) + -p(Yes/S)log2p(Yes/S)

Entropy Calculation (cont.)


Entropy(S) = -p(I)log2p(I) =) Entropy(S) = -p(No/S)log2p(No/S) + p(Yes/S)log2p(Yes/S) =) Entropy(S) = ((-(5/14)log2(5/14)) + (-(9/14)log2(9/14)) ) =) Entropy(S) = (-0.35714 x -1.485427) + (-0.64285 x 0.63743) =) Entropy(S) = 0.5305096 + 0.40977 = 0.940

So the Entropy of S is 0.940

Entropy Calculation (cont.)


Entropy is 0 if all members of S belong to the same class (the data is perfectly classified). The range of entropy is 0 ("perfectly classified") to 1 ("totally random").

Information Gain G(S,A)


Information gain is the procedure to select a particular attribute to be a decision node of a decision tree. Information gain is G(S,A) where S is the collection of the data in the data set and A is the attribute for which information gain will be calculated over the collection S. So if Gain(S, Wind) then it refers gain of Wind over S.

Information Gain G(S,A) (cont.)


Gain(S, A) = Entropy(S) ( ( |Sv|/|S| ) x Entropy(Sv) ) Where, S is the total collection of the records. A is the attribute for which gain will be calculated. v is all the possible of the attribute A, for instance in this if we consider Windy attribute then the set of v is { Weak, Strong }.

Information Gain G(S,A) (cont.)


Sv is the number of elements for each v for instance Sweak = 8 and Sstrong = 6. is the summation of ( ( |Sv|/|S| ) x Entropy(Sv) ) for all the items from the set of v i.e. ( ( |Sweak|/|S| ) x Entropy(Sweak) ) + ( ( |Sstrong|/|S| ) x Entropy(Sstrong) )

Information Gain G(S,A) (cont.)


if we want to calculate information gain of Wind over the collection set S using following formula,

Gain(S, A) = Entropy(S)-( ( |Sv|/|S| ) x Entropy(Sv) )


then the it will be as below, =) Gain(S, Wind) = Entropy(S) - ( ( |Sweak|/|S| ) x Entropy(Sweak) ) - ( ( |Sstrong|/|S| ) x Entropy(Sstrong) )

Information Gain G(S,A) (cont.)


=) Gain(S, Wind) = 0.940 - ( (8/14 ) x Entropy(Sweak) ) - ( ( 6/14 ) x Entropy(Sstrong) ) Where
Entropy(Sweak) = -p(I)log2p(I)=( - ( (2/8)xlog2(2/8) ) ) + ( - ( (6/8)xlog2(6/8) ) ) Entropy(Sstrong) = -p(I)log2p(I)=( - ( (3/6)xlog2(3/6) ) ) +( - ( (3/6)xlog2(3/6) ) )

Information Gain G(S,A) (cont.)


So Gain(S, Wind) will be as below, =) Gain(S, Wind) = 0.940 - ( (8/14 ) x Entropy(Sweak) ) - ( ( 6/14 ) x Entropy(Sstrong) ) =) Gain(S,Wind) = 0.940 - Value Of Sweak - Value Of Sstrong =) Gain(S, Wind) = 0.048
So the information gain of Wind over S is 0.048.

Vous aimerez peut-être aussi