Académique Documents
Professionnel Documents
Culture Documents
An Example
Outlook sunny sunny overcast rain rain rain overcast sunny sunny rain sunny overcast overcast rain Tempreature Humidity Windy Class hot high false N hot high true N hot high false P mild high false P cool normal false P cool normal true N cool normal true P mild high false N cool normal false P mild normal false P mild normal true P mild high true P hot normal false P mild high true N
humidity
normal P
windy
true N false P
high N
Yes Yes
Grade = B Grade = C
No
No
No
Etc...
3
1 of 2
2 of 2
Child
Leaf
Leaf
Introduction
Decision Trees
Powerful/popular for classification & prediction Represent rules
Rules can be expressed in English
IF Age <=43 & Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No
Useful to explore data to gain insight into relationships of a large number of candidate input variables to a target (output) variable
Scoring
Often it is useful to show the proportion of the data in each of the desired classes Clarify Fig 6.2
10
Gender
Age
Good Split
12
Split Criteria
The best split is defined as one that does the best job of separating the data into groups where a single class predominates in each group Measure used to evaluate a potential split is purity
The best split is one that increases purity of the sub-sets by the greatest amount A good split also creates nodes of similar size or at least does not create very small nodes
13
15
Pruning
Decision Trees can often be simplified or pruned:
CART C5 Stability-based
16
be complex
18
Alternative Representations
Box Diagram Tree Ring Diagram Decision Table Supplementary Material
19
End of Chapter 6
20