Applied Predictive Analytics For Business: Decision Trees

Applied Predictive Analytics
for Business
Decision Trees
1
Todays Agenda
Class Business Items

Homework 2, Homework 3
Projects, Review Session, Exam
Finish Decision Trees
Bagging
Random Forests
Boosting
2
Decision Tree Representation
Internal nodes test attributes

Branches corresponds to attribute values
Each leaf node assigns a classification
or value
3
Classification and Regression Trees
The official name of the tree building algorithm is

called recursive partitioning.
Its a greedy algorithm: it doesnt look ahead; it
takes the best one of those currently available.
The algorithm only considers binary splits.
Two types based on Y:
Classification trees: Y is categorical.
Regression trees: Y is quantitative.
4
Top-Down Induction of Decision Trees
1. A the best decision attribute for the next

node.
2. Assign A as the decision attribute for node.
3. For the binary split according to the decision
attribute of A, create two new descendants.
4. Sort training examples to leaf nodes.
5. If training examples perfectly classified or
meet a maximum number of observations,
then STOP, else iterate over new leaf nodes.
5
Regression Trees
customer assets income amount
1 H 75 150
2 L 50 30
3 M 25 25
4 M 50 100
5 M 100 110
6 H 25 200
7 L 25 15
8 M 75 90
Two predictors:
assets = {Low, Medium, High}
Income in 1000s of dollars
Response (quantitative):
Borrowing amount in 1000s of dollars
Goal: Can you create a decision rule (a tree!) to predict the borrowing amount?
6
Building Regression Trees
Regression tree building process:
Suppose we are starting at the root node.
The algorithm considers all the partitions of the predictors
into the left and right nodes and then computes RSS
(Residual Sums of Squares) for each partition.
RSS (Y Yleft )2 (Y Y right
) 2
left node right node
Then it picks the partition that gives the lowest RSS.

We continue the process at each node until some stopping
rule is met.
7
Recursive Partitioning in Action
Suppose we are starting at the root node. We consider:
Partition Left Node Right Node RSS

1 Asset = M or H Asset = L ?
2 Asset = L or H Asset = M ?
3 Asset = L or M Asset = H ?
4 Income < 37.5 Income > 37.5 ?
5 Income < 62.5 Income > 62.5 ?
6 Income < 87.5 Income > 87.5 ?
8
Partition 3
4 Income < 37.5 Income > 37.5 ?
5 Income < 62.5 Income > 62.5 ?
6 Income < 87.5 Income > 87.5 ?
Customer Assets Income Amount Left RSS Right RSS

1 H 75 150 625
2 L 50 30 1003
3 M 25 25 1344
4 M 50 100 1469
5 M 100 110 2336
Total RSS for Partition 3:
6 H 25 200 625 9122 + 1250 = 10383
7 L 25 15 2178
8 M 75 90 803
9133 1250
Mean of Amount for Asset = H 175.00
Mean of Amount for Asset = L or M 61.67
9
Partition 4
4 Income < 37.5 Income > 37.5 ?
5 Income < 62.5 Income > 62.5 ?
6 Income < 87.5 Income > 87.5 ?
Customer Assets Income Amount Left RSS Right RSS

1 H 75 150 2916
2 L 50 30 4356
3 M 25 25 3025
4 M 50 100 16
5 M 100 110 196
Total RSS for Partition 4:
6 H 25 200 14400 21650 + 7520 = 29170
7 L 25 15 4225
8 M 75 90 36
21650 7520
Mean of Amount for Income < 37.5 80.00
Mean of Amount for Income > 37.5 96.00
Note: RSS for Partition 3 < RSS for Partition 4
Partition 3 is taken over partition 4.
10
Executing Tree Commands in R
11
Decision Tree with the Hitters Data Set
12
Cost Complexity Pruning
13
Pruning Through Cost Complexity Analysis
14
Pruned Tree
15
Regression Tree Lab
16
Classification Trees
With a regression tree, the predicted response for an
observation is given by the mean response of the
training observations that belong to the same terminal
node
For classification, we predict that each observation

belongs to the most commonly occurring class of
training observations in the region to which it belongs
17
Growing a Classification Tree
Similar to regression, just cant use RSS.
Classification error would provide a natural

correspondence, but it is not sensitive enough
for tree growing.
Instead, use the Gini index or cross-entropy

index.
The basic idea behind using the index is that

splitting occurs by reducing node impurity.
18
Entropy
S is a sample of training examples

is the proportion of positive examples in S
is the proportion of negative examples in S
Entropy measures the impurity of S
log 2 log 2
19
Information Gain
Gain(S,A) = expected reduction in

entropy due to sorting on A

,

()
20
Decision Tree for Play Tennis
21
Training Examples
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Stong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Stong No
D7 Overcast Cool Normal Stong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Stong Yes
D12 Overcast Mild High Stong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Stong No
22
Which attribute is the best classifier?
23
Which attribute should be tested?
,E = 0.940
,E = 0.971 ,E = 0.971
Which attribute should be tested here?
24
Test Temperature
Ssunny = {D1, D2, D8, D9, D11}

Gain(Ssunny,Temperature):
0.971 (2/5) *1 (2/5) * 0 (1/5) * 0 = 0.571
Mild Hot Cool
25
Test Wind
Ssunny = {D1, D2, D8, D9, D11}

Gain(Ssunny, Wind):
0.971 (3/5) *0.918 (2/5) * 1.0 = 0.019
Weak Strong
26
Test Humidity
Ssunny = {D1, D2, D8, D9, D11}

Gain(Ssunny, Humidity):
0.971 (3/5) *0 (2/5) * 0 = 0.971
High Normal
27
Regression Tree Lab
28
Advantages of Trees
Most common transformation of the predictors will not change the
tree results. (Transformation of the response could make a
difference.)
Interaction terms (often used in multiple regression) are not used.
They are automatically handled within the context of a tree.
They are easy to visualize. (Unless you have too many
branches.)
There is no need to have dummy variables for categorical data.
They are easy to explain and use.
Missing values can be handled easily.
Many people use trees as an exploratory tool.
29
Disadvantages of Trees
Trees are not generally robust. They can be severely
unstable if small changes in data are made.
Regression trees only give the mean (or mode) of Y
values in the leaves as predictions. Why just a mean?
Often they dont predict well!
There are fixes to the disadvantages but they come at a price: they take away
from the advantages!
80

Applied Predictive Analytics For Business: Decision Trees

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Applied Predictive Analytics For Business: Decision Trees

Transféré par

Droits d'auteur :

Formats disponibles

Applied Predictive Analytics

Class Business Items

Internal nodes test attributes

The official name of the tree building algorithm is

1. A the best decision attribute for the next

left node right node

Then it picks the partition that gives the lowest RSS.

Suppose we are starting at the root node. We consider:

Partition Left Node Right Node RSS

Customer Assets Income Amount Left RSS Right RSS

Customer Assets Income Amount Left RSS Right RSS

For classification, we predict that each observation

Classification error would provide a natural

Instead, use the Gini index or cross-entropy

The basic idea behind using the index is that

S is a sample of training examples

Gain(S,A) = expected reduction in

Which attribute should be tested here?

Ssunny = {D1, D2, D8, D9, D11}

Ssunny = {D1, D2, D8, D9, D11}

Ssunny = {D1, D2, D8, D9, D11}

Vous aimerez peut-être aussi