Académique Documents
Professionnel Documents
Culture Documents
Outline
1
Decision Tree for PlayTennis
Outlook
No Yes No Yes
Instituto Superior de Estatstica e Gesto de Informao
Universidade Nova de Lisboa
Outlook
2
Decision Tree for PlayTennis
No
Instituto Superior de Estatstica e Gesto de Informao
Yes No Yes
Universidade Nova de Lisboa
Outlook=Sunny Wind=Weak
Outlook
Wind No No
Strong Weak
No Yes
Instituto Superior de Estatstica e Gesto de Informao
Universidade Nova de Lisboa
3
Decision Tree for Disjunction
Outlook=Sunny Wind=Weak
Outlook
No Yes No Yes
Instituto Superior de Estatstica e Gesto de Informao
Universidade Nova de Lisboa
Outlook
4
Decision Tree
(Outlook=Sunny Humidity=Normal)
(Outlook=Overcast)
(Outlook=Rain Wind=Weak)
Instituto Superior de Estatstica e Gesto de Informao
Universidade Nova de Lisboa
5
Top-Down Induction of Decision Trees ID3
6
Entropy
Entropy
7
Information Gain
Information Gain
8
Training Examples
S=[9+,5-] S=[9+,5-]
E=0.940 E=0.940
Humidity Wind
9
Selecting the Next Attribute
S=[9+,5-]
E=0.940
Outlook Temp ?
Over
Sunny Rain
cast
ID3 Algorithm
[D1,D2,,D14] Outlook
[9+,5-]
10
ID3 Algorithm
Outlook
No Yes No Yes
+ - +
A2
A1
+ - + + - -
+ - + - - +
A2 A2
- + - + -
A3 A4
+ - - +
Instituto Superior de Estatstica e Gesto de Informao
Universidade Nova de Lisboa
11
Hypothesis Space Search ID3
Outlook
12
Continuous Valued Attributes
13
Unknown Attribute Values
Occams Razor
14
Overfitting
15
Boosting: Combining Classifiers
Cross Validation
16
Bagging
Boosting
INTUITION
Combining Predictions of an ensemble is more accurate than a
single classifier
Reasons
Easy to find quite correct rules of thumb however hard
to find single highly accurate prediction rule.
If the training examples are few and the hypothesis space
is large then there are several equally accurate classifiers.
Hypothesis space does not contain the true function, but
it has several good approximations.
Exhaustive global search in the hypothesis space is
expensive so we can combine the predictions of several
locally accurate classifiers.
17
Boosting
Boosting(Algorithm)
18
Boosting (Algorithm)
Outline
Background
Adaboost Algorithm
Theory/Interpretations
19
Whats So Good About Adaboost
Simple to implement
20
Adaboost Terminology
Each training
sample has a weight,
which determines
the probability of
being selected for
training the
component classifier
21
Find the Weak Classifier
22
The algorithm core
Reweighting
y * h(x) = 1
y * h(x) = -1
23
Reweighting
Reweighting
24
Algorithm recapitulation
t=1
Algorithm recapitulation
25
Algorithm recapitulation
Algorithm recapitulation
26
Algorithm recapitulation
Algorithm recapitulation
27
Algorithm recapitulation
Algorithm recapitulation
28
AdaBoost(Example)
AdaBoost(Example)
ROUND 1
29
AdaBoost(Example)
ROUND 2
AdaBoost(Example)
ROUND 3
30
AdaBoost(Example)
Advantages
Very simple to implement
Does feature selection resulting in relatively simple
classifier
Fairly good generalization
Disadvantages
Suboptimal solution
Sensitive to noisy data and outliers
31