Académique Documents
Professionnel Documents
Culture Documents
Cheng Li
chengli@ccs.neu.edu
College of Computer and Information Science
Northeastern University
Gradient Boosting
P
I Fit an additive model (ensemble) t t ht (x) in a forward
stage-wise manner.
I In each stage, introduce a weak learner to compensate the
shortcomings of existing weak learners.
I In Adaboost,shortcomings are identified by high-weight data
points.
What is Gradient Boosting
Simple solution:
You wish to improve the model such that
F (x1 ) + h(x1 ) = y1
F (x2 ) + h(x2 ) = y2
...
F (xn ) + h(xn ) = yn
Gradient Boosting for Regression
Simple solution:
Or, equivalently, you wish
h(x1 ) = y1 F (x1 )
h(x2 ) = y2 F (x2 )
...
h(xn ) = yn F (xn )
Gradient Boosting for Regression
Simple solution:
Or, equivalently, you wish
h(x1 ) = y1 F (x1 )
h(x2 ) = y2 F (x2 )
...
h(xn ) = yn F (xn )
Simple solution:
Or, equivalently, you wish
h(x1 ) = y1 F (x1 )
h(x2 ) = y2 F (x2 )
...
h(xn ) = yn F (xn )
Simple solution:
Or, equivalently, you wish
h(x1 ) = y1 F (x1 )
h(x2 ) = y2 F (x2 )
...
h(xn ) = yn F (xn )
Simple solution:
Or, equivalently, you wish
h(x1 ) = y1 F (x1 )
h(x2 ) = y2 F (x2 )
...
h(xn ) = yn F (xn )
Simple solution:
Or, equivalently, you wish
h(x1 ) = y1 F (x1 )
h(x2 ) = y2 F (x2 )
...
h(xn ) = yn F (xn )
Simple solution:
Or, equivalently, you wish
h(x1 ) = y1 F (x1 )
h(x2 ) = y2 F (x2 )
...
h(xn ) = yn F (xn )
Simple solution:
yi F (xi ) are called residuals. These are the parts that existing
model F cannot do well.
The role of h is to compensate the shortcoming of existing model
F.
Gradient Boosting for Regression
Simple solution:
yi F (xi ) are called residuals. These are the parts that existing
model F cannot do well.
The role of h is to compensate the shortcoming of existing model
F.
If the new model F + h is still not satisfactory,
Gradient Boosting for Regression
Simple solution:
yi F (xi ) are called residuals. These are the parts that existing
model F cannot do well.
The role of h is to compensate the shortcoming of existing model
F.
If the new model F + h is still not satisfactory, we can add another
regression tree...
Gradient Boosting for Regression
Simple solution:
yi F (xi ) are called residuals. These are the parts that existing
model F cannot do well.
The role of h is to compensate the shortcoming of existing model
F.
If the new model F + h is still not satisfactory, we can add another
regression tree...
We are improving the predictions of training data, is the procedure
also useful for test data?
Gradient Boosting for Regression
Simple solution:
yi F (xi ) are called residuals. These are the parts that existing
model F cannot do well.
The role of h is to compensate the shortcoming of existing model
F.
If the new model F + h is still not satisfactory, we can add another
regression tree...
We are improving the predictions of training data, is the procedure
also useful for test data?
Yes! Because we are building a model, and the model can be
applied to test data as well.
Gradient Boosting for Regression
Simple solution:
yi F (xi ) are called residuals. These are the parts that existing
model F cannot do well.
The role of h is to compensate the shortcoming of existing model
F.
If the new model F + h is still not satisfactory, we can add another
regression tree...
We are improving the predictions of training data, is the procedure
also useful for test data?
Yes! Because we are building a model, and the model can be
applied to test data as well.
How is this related to gradient descent?
Gradient Boosting for Regression
Gradient Descent
Minimize a function by moving in the opposite direction of the
gradient.
J
i := i
i
L(yi , F (xi ))
g (xi ) = = yi F (xi )
F (xi )
Pn
i y
start with an initial model, say, F (x) = i=1n
iterate until converge:
calculate negative gradients g (xi )
fit a regression tree h to negative gradients g (xi )
F := F + h, where = 1
Gradient Boosting for Regression
L(yi , F (xi ))
g (xi ) = = yi F (xi )
F (xi )
Pn
i y
start with an initial model, say, F (x) = i=1n
iterate until converge:
calculate negative gradients g (xi )
fit a regression tree h to negative gradients g (xi )
F := F + h, where = 1
L(y , F ) = |y F |
Gradient Boosting for Regression
L(y , F ) = |y F |
L(y , F ) = |y F |
yi 0.5 1.2 2 5*
F (xi ) 0.6 1.4 1.5 1.7
Square loss 0.005 0.02 0.125 5.445
Absolute loss 0.1 0.2 0.5 3.3
Huber loss( = 0.5) 0.005 0.02 0.125 1.525
Gradient Boosting for Regression
L(yi , F (xi ))
g (xi ) = = sign(yi F (xi ))
F (xi )
Pn
i y
start with an initial model, say, F (x) = i=1n
iterate until converge:
calculate gradients g (xi )
fit a regression tree h to negative gradients g (xi )
F := F + h
Gradient Boosting for Regression
L(yi , F (xi ))
g (xi ) =
F (xi )
(
yi F (xi ) |yi F (xi )|
=
sign(yi F (xi )) |yi F (xi )| >
Pn
y
i
start with an initial model, say, F (x) = i=1n
iterate until converge:
calculate negative gradients g (xi )
fit a regression tree h to negative gradients g (xi )
F := F + h
Gradient Boosting for Regression
In general,
negative gradients 6 residuals
We should follow negative gradients rather than residuals. Why?
Gradient Boosting for Regression
Update by Residual:
h(xi ) = yi F (xi )
Problem
Recognize the given hand written capital letter.
I Multi-class classification
I 26 classes. A,B,C,...,Z
Data Set
I http://archive.ics.uci.edu/ml/datasets/Letter+
Recognition
I 20000 data points, 16 features
Gradient Boosting for Classification
Feature Extraction
Model
I 26 score functions (our models): FA , FB , FC , ..., FZ .
I FA (x) assigns a score for class A
I scores are used to calculate probabilities
e FA (x)
PA (x) = PZ
Fc (x)
c=A e
e FB (x)
PB (x) = PZ
Fc (x)
c=A e
...
e FZ (x)
PZ (x) = PZ
Fc (x)
c=A e
Goal
I minimize the total loss (KL-divergence)
I for each data point, we wish the predicted probability
distribution to match the true probability distribution as
closely as possible
Gradient Boosting for Classification
Goal
I minimize the total loss (KL-divergence)
I for each data point, we wish the predicted probability
distribution to match the true probability distribution as
closely as possible
I we achieve this goal by adjusting our models FA , FB , ..., FZ .
Gradient Boosting for Regression: Review
Differences
I FA , FB , ..., FZ vs F
I a matrix of parameters to optimize vs a column of parameters
to optimize
FA (x1 ) FB (x1 ) ... FZ (x1 )
FA (x2 ) FB (x2 ) ... FZ (x2 )
... ... ... ...
FA (xn ) FB (xn ) ... FZ (xn )
I a matrix of gradients vs a column of gradients
L L
FA (x1 ) FB (x1 ) ... FZL(x1 )
L L L
FA (x2 ) FB (x2 ) ... FZ (x2 )
... ... ... ...
L L L
FA (xn ) FB (xn ) ... FZ (xn )
Gradient Boosting for Classification
round 0
i y YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ
1 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
2 I 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 N 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5 G 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y FA FB FC FD FE FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ
1 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 N 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ
1 T 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
2 I 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
3 D 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
4 N 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
5 G 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ
PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ
1 T -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 0.96 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
2 I -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 0.96 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
3 D -0.04 -0.04 -0.04 0.96 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
4 N -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 0.96 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
5 G -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 0.96 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Gradient Boosting for Classification
(
0.98 feature 10 of x 2.0
hA (x) =
0.07 feature 10 of x > 2.0
(
0.07 feature 15 of x 8.0
hB (x) =
0.22 feature 15 of x > 8.0
...
(
0.07 feature 8 of x 8.0
hZ (x) =
0.82 feature 8 of x > 8.0
FA := FA + A hA
FB := FB + B hB
...
FZ := FZ + Z hZ
Gradient Boosting for Classification
round 1
i y YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ
1 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
2 I 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 N 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5 G 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y FA FB FC FD FE FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ
1 T -0.08 -0.07 -0.06 -0.07 -0.02 -0.02 -0.08 -0.02 -0.03 -0.03 -0.06 -0.04 -0.08 -0.08 -0.07 -0.07 -0.02 -0.04 -0.04 0.59 -0.01 -0.07 -0.07 -0.05 -0.06 -0.07
2 I -0.08 0.23 -0.06 -0.07 -0.02 -0.02 0.16 -0.02 -0.03 -0.03 -0.06 -0.04 -0.08 -0.08 -0.07 -0.07 -0.02 -0.04 -0.04 -0.07 -0.01 -0.07 -0.07 -0.05 -0.06 -0.07
3 D -0.08 0.23 -0.06 -0.07 -0.02 -0.02 -0.08 -0.02 -0.03 -0.03 -0.06 -0.04 -0.08 -0.08 -0.07 -0.07 -0.02 -0.04 -0.04 -0.07 -0.01 -0.07 -0.07 -0.05 -0.06 -0.07
4 N -0.08 -0.07 -0.06 -0.07 -0.02 -0.02 0.16 -0.02 -0.03 -0.03 0.26 -0.04 -0.08 0.3 -0.07 -0.07 -0.02 -0.04 -0.04 -0.07 -0.01 -0.07 -0.07 -0.05 -0.06 -0.07
5 G -0.08 0.23 -0.06 -0.07 -0.02 -0.02 0.16 -0.02 -0.03 -0.03 -0.06 -0.04 -0.08 -0.08 -0.07 -0.07 -0.02 -0.04 -0.04 -0.07 -0.01 -0.07 -0.07 -0.05 -0.06 -0.07
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ
1 T 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.07 0.04 0.04 0.04 0.04 0.04 0.04
2 I 0.04 0.05 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
3 D 0.04 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
4 N 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 0.05 0.04 0.04 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
5 G 0.04 0.05 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ
PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ
1 T -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 0.93 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
2 I -0.04 -0.05 -0.04 -0.04 -0.04 -0.04 -0.05 -0.04 0.96 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
3 D -0.04 -0.05 -0.04 0.96 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
4 N -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.05 -0.04 -0.04 -0.04 -0.05 -0.04 -0.04 0.95 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
5 G -0.04 -0.05 -0.04 -0.04 -0.04 -0.04 0.95 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Gradient Boosting for Classification
(
0.37 feature 10 of x 2.0
hA (x) =
0.07 feature 10 of x > 2.0
(
0.07 feature 14 of x 5.0
hB (x) =
0.22 feature 14 of x > 5.0
...
(
0.07 feature 8 of x 8.0
hZ (x) =
0.35 feature 8 of x > 8.0
FA := FA + A hA
FB := FB + B hB
...
FZ := FZ + Z hZ
Gradient Boosting for Classification
round 2
i y YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ
1 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
2 I 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 N 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5 G 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y FA FB FC FD FE FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ
1 T -0.15 -0.14 -0.12 -0.14 -0.03 0.28 -0.14 -0.04 1.49 -0.07 -0.11 -0.08 -0.14 -0.17 -0.13 -0.13 -0.04 -0.11 -0.07 1.05 0.19 0.25 -0.16 -0.09 0.33 -0.14
2 I -0.15 0.16 -0.12 -0.14 -0.03 -0.08 0.33 -0.04 -0.07 -0.07 -0.11 -0.08 -0.14 -0.17 -0.13 -0.13 -0.04 -0.11 -0.07 -0.11 -0.07 -0.15 -0.16 -0.09 -0.13 -0.14
3 D -0.15 0.16 -0.12 -0.14 -0.03 -0.08 0.1 -0.04 -0.07 -0.07 -0.11 -0.08 -0.14 -0.17 -0.13 -0.13 -0.04 0.19 -0.07 -0.11 -0.07 -0.15 -0.16 -0.09 -0.13 -0.14
4 N -0.15 -0.14 -0.12 -0.14 -0.03 -0.08 0.1 -0.04 -0.07 -0.07 0.46 -0.08 -0.14 0.5 -0.13 -0.13 -0.04 -0.11 -0.07 -0.11 -0.07 -0.15 0.25 -0.09 -0.13 -0.14
5 G -0.15 0.16 -0.12 -0.14 -0.03 -0.08 0.33 -0.04 -0.07 -0.07 -0.11 -0.08 -0.14 -0.17 -0.13 -0.13 -0.04 0.19 -0.07 -0.11 -0.07 -0.15 -0.16 -0.09 -0.13 -0.14
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ
1 T 0.03 0.03 0.03 0.03 0.03 0.04 0.03 0.03 0.15 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.09 0.04 0.04 0.03 0.03 0.05 0.03
2 I 0.04 0.05 0.04 0.04 0.04 0.04 0.06 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
3 D 0.04 0.05 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
4 N 0.03 0.03 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.04 0.06 0.04 0.03 0.06 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.03 0.05 0.04 0.03 0.03
5 G 0.03 0.05 0.04 0.04 0.04 0.04 0.06 0.04 0.04 0.04 0.04 0.04 0.04 0.03 0.04 0.04 0.04 0.05 0.04 0.04 0.04 0.04 0.03 0.04 0.04 0.04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ
PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ
1 T -0.03 -0.03 -0.03 -0.03 -0.03 -0.04 -0.03 -0.03 -0.15 -0.03 -0.03 -0.03 -0.03 -0.03 -0.03 -0.03 -0.03 -0.03 -0.03 0.91 -0.04 -0.04 -0.03 -0.03 -0.05 -0.03
2 I -0.04 -0.05 -0.04 -0.04 -0.04 -0.04 -0.06 -0.04 0.96 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
3 D -0.04 -0.05 -0.04 0.96 -0.04 -0.04 -0.05 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.05 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04
4 N -0.03 -0.03 -0.03 -0.03 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.06 -0.04 -0.03 0.94 -0.03 -0.03 -0.04 -0.04 -0.04 -0.04 -0.04 -0.03 -0.05 -0.04 -0.03 -0.03
5 G -0.03 -0.05 -0.04 -0.04 -0.04 -0.04 0.94 -0.04 -0.04 -0.04 -0.04 -0.04 -0.04 -0.03 -0.04 -0.04 -0.04 -0.05 -0.04 -0.04 -0.04 -0.04 -0.03 -0.04 -0.04 -0.04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Gradient Boosting for Classification
round 100
i y YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ
1 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
2 I 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 N 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5 G 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y FA FB FC FD FE FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ
1 T -3.26 -2.7 -2.2 -2.22 -2.48 -0.31 -2.77 -1.19 2.77 0.1 -1.49 -1.02 -1.64 -0.8 -2.4 -3.57 -0.9 -2.45 -0.2 4.61 0.5 -0.71 -1.21 -0.24 0.49 -1.66
2 I -1.64 -1.09 -2.29 -1.8 0.45 -0.43 2.14 -1.56 1.19 1.09 -1.5 -0.5 -3.64 -3.98 -0.39 -2.3 1.42 -0.59 0.27 -2.88 -1.96 -1.67 -4.38 -2.06 -2.95 -1.76
3 D -2.45 0.18 -3.01 0.18 -2.79 -1.7 -2.21 0.43 -1.12 0.32 0.67 -2.16 -2.91 -2.76 -1.92 -3.04 -1.47 -0.48 -1.48 -1.25 -2.25 -3.23 -4.38 0.17 -2.95 -2.65
4 N -3.95 -3.38 -0.22 -0.94 -1.33 -1.38 -1.22 -0.12 -2.33 -3.13 0.58 -0.65 -0.25 2.96 -2.84 -1.82 0.19 0.55 -1.22 -1.25 0.45 -1.8 0.11 -0.69 -1.6 -3.78
5 G -3.14 -0.04 -2.37 -0.78 0.02 -2.68 2.6 -1.48 -1.93 0.42 -1.44 -1.45 -3.36 -3.98 -0.94 -3.42 1.84 1.44 0.62 -1.25 -1.33 -4.41 -4.71 -2.62 -2.15 -1.09
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ
1 T 0 0 0 0 0 0.01 0 0 0.13 0.01 0 0 0 0 0 0 0 0 0.01 0.79 0.01 0 0 0.01 0.01 0
2 I 0.01 0.01 0 0.01 0.06 0.02 0.32 0.01 0.12 0.11 0.01 0.02 0 0 0.03 0 0.16 0.02 0.05 0 0.01 0.01 0 0 0 0.01
3 D 0.01 0.11 0 0.11 0.01 0.02 0.01 0.14 0.03 0.12 0.17 0.01 0 0.01 0.01 0 0.02 0.05 0.02 0.03 0.01 0 0 0.11 0 0.01
4 N 0 0 0.02 0.01 0.01 0.01 0.01 0.03 0 0 0.05 0.02 0.02 0.59 0 0 0.04 0.05 0.01 0.01 0.05 0.01 0.03 0.02 0.01 0
5 G 0 0.03 0 0.01 0.03 0 0.42 0.01 0 0.05 0.01 0.01 0 0 0.01 0 0.19 0.13 0.06 0.01 0.01 0 0 0 0 0.01
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
i y YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ
PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ
1 T -0 -0 -0 -0 -0 -0.01 -0 -0 -0.13 -0.01 -0 -0 -0 -0 -0 -0 -0 -0 -0.01 0.21 -0.01 -0 -0 -0.01 -0.01 -0
2 I -0.01 -0.01 -0 -0.01 -0.06 -0.02 -0.32 -0.01 0.88 -0.11 -0.01 -0.02 -0 -0 -0.03 -0 -0.16 -0.02 -0.05 -0 -0.01 -0.01 -0 -0 -0 -0.01
3 D -0.01 -0.11 -0 0.89 -0.01 -0.02 -0.01 -0.14 -0.03 -0.12 -0.17 -0.01 -0 -0.01 -0.01 -0 -0.02 -0.05 -0.02 -0.03 -0.01 -0 -0 -0.11 -0 -0.01
4 N -0 -0 -0.02 -0.01 -0.01 -0.01 -0.01 -0.03 -0 -0 -0.05 -0.02 -0.02 0.41 -0 -0 -0.04 -0.05 -0.01 -0.01 -0.05 -0.01 -0.03 -0.02 -0.01 -0
5 G -0 -0.03 -0 -0.01 -0.03 -0 0.58 -0.01 -0 -0.05 -0.01 -0.01 -0 -0 -0.01 -0 -0.19 -0.13 -0.06 -0.01 -0.01 -0 -0 -0 -0 -0.01
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Things not covered
I How to choose proper learning rates. See [Friedman, 2001]
I Other possible loss functions. See [Friedman, 2001]
References I
Breiman, L. (1999).
Prediction games and arcing algorithms.
Neural computation, 11(7):14931517.
Breiman, L. et al. (1998).
Arcing classifier (with discussion and a rejoinder by the
author).
The annals of statistics, 26(3):801849.
Freund, Y. and Schapire, R. E. (1997).
A decision-theoretic generalization of on-line learning and an
application to boosting.
Journal of computer and system sciences, 55(1):119139.
Freund, Y., Schapire, R. E., et al. (1996).
Experiments with a new boosting algorithm.
In ICML, volume 96, pages 148156.
References II