Académique Documents
Professionnel Documents
Culture Documents
LEARNING IN PYTHON
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
Learning objectives
Understand how a number of different supervised
learning algorithms learn by estimating their
parameters from data to make new predictions.
Understand the strengths and weaknesses of
particular supervised learning methods.
APPLIED MACHINE
LEARNING IN PYTHON
Learning objectives
Learn how to apply specific supervised machine learning
algorithms in Python with scikit-learn.
Learn about general principles of supervised machine learning,
like overfitting and how to avoid it.
APPLIED MACHINE
LEARNING IN PYTHON
Overfitting in regression
Target variable
Input variable
APPLIED MACHINE
LEARNING IN PYTHON
Overfitting in classification
Feature 2
Feature 1
APPLIED MACHINE
LEARNING IN PYTHON
Target variable:
Per capita violent crimes
K-Nearest Neighbors:
Classification and Regression
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
KNeighborsClassifier and
KNeighborsRegressor: important parameters
Model complexity
n_neighbors : number of nearest neighbors (k) to
consider
Default = 5
Model fitting
metric: distance function between data points
Default: Minkowski distance with power parameter p = 2
(Euclidean)
APPLIED MACHINE
LEARNING IN PYTHON
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
Model accuracy
Best model
Model complexity
APPLIED MACHINE
LEARNING IN PYTHON
Linear Models
A linear model is a sum of weighted variables that
predicts a target output value given an input data
instance. Example: predicting housing prices
House features: taxes per year ( ), age in years ( )
= 212000 + 109 2000
A house with feature values ( , ) of (10000, 75)
would have a predicted selling price of:
= 212000 + 109 2000 = 1,152,000
APPLIED MACHINE
LEARNING IN PYTHON
Linear Regression is an
Example of a Linear Model
Input instance feature vector: = (0 , 1 , , )
+
Predicted
output: =
0 0 +
1 1 +
Parameters =
0 , ,
: feature weights/
to estimate: model coefficients
constant bias term / intercept
:
APPLIED MACHINE
LEARNING IN PYTHON
Input instance: = (0 )
0 0 +
Predicted
output: =
Parameters
0 (slope)
to estimate:
(y-intercept)
0
APPLIED MACHINE
LEARNING IN PYTHON
linreg.coef_ linreg.intercept_
= 0 +
APPLIED MACHINE
LEARNING IN PYTHON
linreg.coef_ linreg.intercept_
= 0 +
APPLIED MACHINE
LEARNING IN PYTHON
linreg.coef_ linreg.intercept_
= 0 +
APPLIED MACHINE
LEARNING IN PYTHON
Linear Regression:
Ridge, Lasso, and Polynomial Regression
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
Ridge Regression
Ridge regression learns w, b using the same least-squares criterion but adds a
penalty for large variations in w parameters
(, ) = ( + ) 2
+ 2
=1 =1
Once the parameters are learned, the ridge regression prediction formula is the
same as ordinary least-squares.
The addition of a parameter penalty is called regularization. Regularization
prevents overfitting by restricting the model, typically to reduce its complexity.
Ridge regression uses L2 regularization: minimize sum of squares of w entries
The influence of the regularization term is controlled by the parameter.
Higher alpha means more regularization and simpler models.
APPLIED MACHINE
LEARNING IN PYTHON
Fit the scaler using the training set, then apply the same scaler
to transform the test set.
Do not scale the training and test sets using different scalers:
this could lead to random skew in the data.
Do not fit the scaler using any part of the test data: referencing
the test data can lead to a form of data leakage. More on this
issue later in the course.
APPLIED MACHINE
LEARNING IN PYTHON
=
0 0 + 00 02 +
1 1 + 11 12 +
01 0 1 +
Generate new features consisting of all polynomial combinations of the
original two features 0 , 1 .
The degree of the polynomial specifies how many variables participate at
a time in each new feature (above example: degree 2)
This is still a weighted linear combination of features, so it's still a linear
model, and can use same least-squares estimation method for w and b.
APPLIED MACHINE
LEARNING IN PYTHON
Logistic regression
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
Linear Regression
Input features
1 b
Output
x1 w1
w2 y
x2 w3
x3
= +
1 1 +
APPLIED MACHINE
LEARNING IN PYTHON
1
=
1 + exp [ ( +
1 1 +
)]
APPLIED MACHINE
LEARNING IN PYTHON
0.50
0.00
0.00 1 2 3 4 5 6
Hours studying
APPLIED MACHINE
LEARNING IN PYTHON
1.00
Probability of passing exam
0.50
0.00
0.00 1 2 3 4 5 6
Hours studying
APPLIED MACHINE
LEARNING IN PYTHON
Feature 2
Feature 1
APPLIED MACHINE
LEARNING IN PYTHON
y =1
y = 0.5
y=0
Feature 2
Feature 1
APPLIED MACHINE
LEARNING IN PYTHON
< 0.5
0.5
Feature 2
Feature 1
APPLIED MACHINE
LEARNING IN PYTHON
apple
not an
apple
APPLIED MACHINE
LEARNING IN PYTHON
Linear Classifiers:
Support Vector Machines
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
Feature
Class value
vector
f
, , = +
= ( [] + b)
APPLIED MACHINE
LEARNING IN PYTHON
, , = ( + )
1 2 = 0
Feature
Class value
vector
f
2
, , = +
1 2 = 0
= [1, 1]
= 0
1
APPLIED MACHINE
LEARNING IN PYTHON
1 2 = 0
Feature
Class value
vector
f
2
, , = + [0.75, 2.25], ,
= 1 0.75 + 1 (2.25) + 0
1 2 = 0 = 0.75 + 2.25 = 1.50 = +1
= [1, 1]
= 0
1
APPLIED MACHINE
LEARNING IN PYTHON
1 2 = 0
Feature
Class value
vector
[1.75, 0.25], ,
f = 1 1.75 + 1 (0.25) + 0
2 = 1.75 + 0.25 = 1.50 = 1
, , = + [0.75, 2.25], ,
= 1 0.75 + 1 (2.25) + 0
1 2 = 0 = 0.75 + 2.25 = 1.50 = +1
= [1, 1]
= 0
1
APPLIED MACHINE
LEARNING IN PYTHON
Linear Classifiers
f
, , = ( + )
Classifier Margin
f
, , = ( + )
Classifier margin
, , = ( + )
Multi-Class Classification
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
print(clf.coef_)
[[-0.23401135 0.72246132]
[-1.63231901 1.15222281]
[ 0.0849835 0.31186707]
[ 1.26189663 -1.68097 ]]
print(clf.intercept_)
[-3.31753728 1.19645936 -2.7468353 1.16107418]
APPLIED MACHINE
LEARNING IN PYTHON
print(clf.coef_)
[[-0.23401135 0.72246132]
[-1.63231901 1.15222281]
[ 0.0849835 0.31186707]
[ 1.26189663 -1.68097 ]] apple
print(clf.intercept_) not apple
[-3.31753728 1.19645936 -2.7468353 1.16107418]
x =0
APPLIED MACHINE
LEARNING IN PYTHON
x
x =0
APPLIED MACHINE
LEARNING IN PYTHON
2
vi= ( , )
x
x =0
APPLIED MACHINE
LEARNING IN PYTHON
2
vi= ( , )
x
APPLIED MACHINE
LEARNING IN PYTHON
x x
APPLIED MACHINE
LEARNING IN PYTHON
x x
APPLIED MACHINE
LEARNING IN PYTHON
xi= (0 , 1 )
APPLIED MACHINE
LEARNING IN PYTHON
xi= (0 , 1 ) vi = (0 , 1 , 1 (02 + 12 ))
APPLIED MACHINE
LEARNING IN PYTHON
xi= (0 , 1 ) vi = (0 , 1 , 1 (02 + 12 ))
APPLIED MACHINE
LEARNING IN PYTHON
vi = (0 , 1 , 1 (02 + 12 )) xi= (0 , 1 )
APPLIED MACHINE
LEARNING IN PYTHON
(, ) = exp [ 2
]
Increasing gamma
Increasing C
APPLIED MACHINE
LEARNING IN PYTHON
Pros: Cons:
Can perform well on a Efficiency (runtime speed and
range of datasets. memory usage) decreases as
Versatile: different kernel training set size increases (e.g.
over 50000 samples).
functions can be Needs careful normalization of
specified, or custom input data and parameter tuning.
kernels can be defined for Does not provide direct
specific data types. probability estimates (but can be
Works well for both low- estimated using e.g. Platt
and high-dimensional scaling).
data. Difficult to interpret why a
prediction was made.
APPLIED MACHINE
LEARNING IN PYTHON
Cross-validation
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
Cross-validation
Uses multiple train-test splits, not just a
single one random_state Test set
Each split used to train & evaluate a separate accuracy
0 1.00
model 1 0.93
Why is this better? 5 0.93
7 0.67
The accuracy score of a supervised learning method 10 0.87
can vary, depending on which samples happen to end
up in the training set.
Accuracy of k-NN classifier
Using multiple train-test splits gives more stable and (k=5) on fruit data test set for
reliable estimates for how the classifier is likely to different random_state values
perform on average. in train_test_split.
Results are averaged over multiple different training
sets instead of relying on a single model trained on a
particular training set.
APPLIED MACHINE
LEARNING IN PYTHON
Stratified Cross-validation
fruit_label fruit_name (Folds and dataset shortened for illustration purposes.)
1 Apple
1 Apple
Example has 20 data samples
1 Apple
= 4 classes with 5 samples each.
1 Apple
1 Apple
5-fold CV: 5 folds of 4 samples each.
2 Mandarin
... ...
Fold 1 uses the first 20% of the dataset as the test set,
3 Orange
which only contains samples from class 1.
... ...
4 Lemon
Classes 2, 3, 4 are missing entirely from test set and so
4 Lemon
4 Lemon
will be missing from the evaluation.
4 Lemon
4 Lemon
APPLIED MACHINE
LEARNING IN PYTHON
Stratified Cross-validation
fruit_label fruit_name
Fold 1
1 Apple
1 Apple
1 Apple
Test 20% Stratified folds each
1 Apple
1 Apple
contain a proportion of
2 Mandarin classes that matches
... ... the overall dataset.
3 Orange Now, all classes will be
... ...
Train
fairly represented in
80%
4 Lemon
the test set.
4 Lemon
4 Lemon
4 Lemon
4 Lemon
APPLIED MACHINE
LEARNING IN PYTHON
Fold N
Sample N
APPLIED MACHINE
LEARNING IN PYTHON
param_range = np.logspace(-3, 3, 4)
train_scores, test_scores
= validation_curve(SVC(), X, y, param_name="gamma",
param_range=param_range, cv=5)
Decision Trees
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
Leaf node
APPLIED MACHINE
LEARNING IN PYTHON
petal
150 flowers
3 species
50 examples/species
sepal
setosa
samples at this leaf have: samples at this leaf have:
petal length > 2.35 petal length > 2.35
AND petal width <= 1.2 AND petal width > 1.2
versicolor virginica
APPLIED MACHINE
LEARNING IN PYTHON
Informativeness of Splits
The value list gives the
number of samples of each
class that end up at this leaf
node during training.
What class
(species) is
this flower?
petal length: 3.0
petal width: 2.0
sepal width: 2.0
sepal length: 4.2
Other parameters: Max. # of leaf nodes: max_leaf_nodes Min. samples to consider splitting: min_samples_leaf
APPLIED MACHINE
LEARNING IN PYTHON
petal length
sepal width
sepal length
Decision tree