Académique Documents
Professionnel Documents
Culture Documents
Jeff Howbert
Winter 2012
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006) Jeff Howbert Introduction to Machine Learning Winter 2012 2
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006) Jeff Howbert Introduction to Machine Learning Winter 2012 3
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006) Jeff Howbert Introduction to Machine Learning Winter 2012 4
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006) Jeff Howbert Introduction to Machine Learning Winter 2012 5
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006) Jeff Howbert Introduction to Machine Learning Winter 2012 6
Loss function
z
Suppose target labels come from set Y Binary y classification: Y = { 0, ,1} Regression: Y= (real numbers) A loss function maps decisions to costs: ) defines the penalty for predicting y when the L( y, y true value is y . St d d choice Standard h i f for classification: l ifi ti 0 if y = y 0/1 loss (same as ) = L0 /1 ( y, y misclassification error) ) 1 otherwise Standard choice for regression: squared loss
) = ( y y)2 L( y, y
Winter 2012 7
Jeff Howbert
Most popular estimation method is least squares: Determine linear coefficients , that minimize sum of squared loss (SSL). Use standard (multivariate) differential calculus:
diff differentiate ti t SSL with ith respect t to t , find zeros of each partial differential equation solve for ,
One dimension:
SSL = ( y j ( + x j )) 2
j =1 N
cov[ x, y ] = var[ x] t = + xt y
Jeff Howbert
= y x
Winter 2012
Multiple dimensions To simplify p y notation and derivation, , change g to 0, and add a new feature x0 = 1 to feature vector x:
= 0 1 + i xi = x y
i =1 d
Jeff Howbert
Winter 2012
10
Jeff Howbert
Winter 2012
11
The inputs X for linear regression can be: Original g q quantitative inputs p Transformation of quantitative inputs, e.g. log, exp, square root, square, etc. Polynomial transformation
example: x3 = x1 x2
This allows use of linear regression techniques to fit much more complicated non-linear datasets.
Introduction to Machine Learning Winter 2012 12
Jeff Howbert
Jeff Howbert
Winter 2012
13
97 samples, partitioned into: 67 training g samples p 30 test samples Eight predictors (features): 6 continuous (4 log transforms) 1 binary 1 ordinal Continuous outcome variable: lpsa: log( l ( prostate t t specific ifi antigen ti l level l)
Jeff Howbert
Winter 2012
14
Jeff Howbert
Winter 2012
15
Jeff Howbert
Winter 2012
16
Regularization
z z
Complex models (lots of parameters) often prone to overfitting. Overfitting can be reduced by imposing a constraint on the overall magnitude it d of f the th parameters. t Two common types of regularization in linear regression: L2 regularization g ( (a.k.a. ridge g regression). g ) Find which minimizes:
( y j i xi ) + i
2 j =1 i =0 i =1
(y x )
j =1 j i =0 i i
Jeff Howbert
+ | i |
i =1
Winter 2012 17
Example of L2 regularization
L2 regularization shrinks coefficients towards (but not to) zero, and towards each other.
Jeff Howbert
Winter 2012
18
Example of L1 regularization
L1 regularization shrinks coefficients to zero at different rates; different values of give models with ith diff different t subsets b t of f features.
Jeff Howbert
Winter 2012
19
Jeff Howbert
Winter 2012
20
Jeff Howbert
Winter 2012
21
Jeff Howbert
Winter 2012
22
Jeff Howbert
Winter 2012
23
MATLAB interlude
Jeff Howbert
Winter 2012
24