Académique Documents
Professionnel Documents
Culture Documents
Basic Concepts
!"#$%&"'('
Feature'2'
!"#$%&"')'
Feature'1' *"+,-,./'0.%/1#&2'
Terminology
Machine Learning, Data Science, Data Mining, Data Analysis, Sta-
tistical Learning, Knowledge Discovery in Databases, Pattern Dis-
covery.
Data everywhere!
1. Google: processes 24 peta bytes of data per day.
6. . . .
Texts
Numbers
Clickstreams
Graphs
Tables
Images
Transactions
Videos
Domain DB
m
Ti
expertise
+
models, - +
- - +
clustering, density -
-
+
Statistics!
Biology! Visualization!
Signal
processing! Databases!
ML versus Statistics
Statistics: Machine Learning:
http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf
Machine Learning definition
Unsupervised learning:
Learning a model from unlabeled data.
Supervised learning:
Learning a model from labeled data.
Unsupervised Learning
Training data:“examples” x.
x1, . . . , xn, xi ∈ X ⊂ Rn
• Clustering/segmentation:
Feature'2'
Feature'1'
Unsupervised learning
Feature'2'
Feature'1'
Unsupervised learning
Feature'2'
Feature'1'
Methods: K-means, gaussian mixtures, hierarchical clustering,
spectral clustering, etc.
Supervised learning
Training data:“examples” x with “labels” y.
!"#$%&"'('
!"#$%&"')'
Supervised learning
!"#$%&"'('
!"#$%&"')'
*"+,-,./'0.%/1#&2'
Supervised learning
!"#$%&"'('
!"#$%&"')'
*"+,-,./'0.%/1#&2'
Methods: Support Vector Machines, neural networks, decision
trees, K-nearest neighbors, naive Bayes, etc.
Supervised learning
Classification:
!"#$%&"'('
!"#$%&"'('
!"#$%&"')'
!"#$%&"')'
!"#$%&"'('
!"#$%&"'('
!"#$%&"'('
!"#$%&"')'
!"#$%&"')' !"#$%&"')'
Supervised learning
Non linear classification
Supervised learning
Training data:“examples” x with “labels” y.
f : Rd −→ R f is called a regressor.
Example: amount of credit, weight of fruit.
Supervised learning
Regression:
!"
#$%&'($")"
Example: Income in function of age, weight of the fruit in function
of its length.
Supervised learning
Regression:
!"
#$%&'($")"
Supervised learning
Regression:
!"
#$%&'($")"
Supervised learning
Regression:
!"
#$%&'($")"
Training and Testing
!"#$%$%&'()*'
+,'-.&/"$*01'
+/2).'345'
Training and Testing
!"#$%$%&'()*'
+,'-.&/"$*01'
6%7/1)8''
&)%2)"8'' =")2$*'#1/:%*'>'
#&)8'' +/2).'345' =")2$*'9)(?%/'
4#1$.9'(*#*:(8'
;$<7/2)'
K-nearest neighbors
• Not every ML method builds a model!
v
d
u
uX
d(xi, xj ) = t (xik − xjk )2
u
k=1
K-nearest neighbors
Training algorithm:
Add each training example (x, y) to the dataset D.
x ∈ Rd, y ∈ {+1, −1}.
K-nearest neighbors
Training algorithm:
Add each training example (x, y) to the dataset D.
x ∈ Rd, y ∈ {+1, −1}.
Classification algorithm:
X
ŷq = sign( yi)
xi∈Nk (xq )
K-nearest neighbors
Cons:
- Requires large space to store the entire training dataset.
- Slow! Given n examples and d features. The method takes
O(n × d) to run.
- Suffers from the curse of dimensionality.
Applications of K-NN
1. Information retrieval.
!"#$%$%&'()*'
+,'-.&/"$*01'
6%7/1)8''
&)%2)"8'' =")2$*'#1/:%*'>'
#&)8'' +/2).'345' =")2$*'9)(?%/'
4#1$.9'(*#*:(8'
;$<7/2)'
n
E train(f ) =
X
`oss(yi, f (xi))
i=1
Training and Testing
• We calculate E train the in-sample error (training error or em-
pirical error/risk).
n
E train(f ) =
X
`oss(yi, f (xi))
i=1
• Examples of loss functions:
– Classification error:
(
1 if sign(yi) 6= sign(f (xi))
`oss(yi, f (xi)) =
0 otherwise
Training and Testing
• We calculate E train the in-sample error (training error or em-
pirical error/risk).
n
E train(f ) =
X
`oss(yi, f (xi))
i=1
• Examples of loss functions:
– Classification error:
(
1 if sign(yi) 6= sign(f (xi))
`oss(yi, f (xi)) =
0 otherwise
n
E train(f ) =
X
`oss(yi, f (xi))
i=1
n
E train(f ) =
X
`oss(yi, f (xi))
i=1
An intuitive example
Structural Risk Minimization
High*Bias****** * * * * * * * ***Low*Bias**
Low*Variance* * * * * * * * *High*Variance*
Predic'on*Error*
____Test*error****
____Training*error*
Low*******************************************Complexity*of*the*model*************************************High*
Training and Testing
!"#$%&'
!"#$%&'
()&' ()&'
!"#$%&'
()&'
Training and Testing
!"#$%&'
!"#$%&'
()&' ()&'
()&'
Training and Testing
!"#$%&'
!"#$%&'
()&' ()&'
()&'
!"#$%&'
!"#$%&'
()&' ()&'
()&'
n
X
`oss(yi, f (xi)) + C × R(f )
i=1
Regularization: Intuition
!"#$%&'
!"#$%&'
!"#$%&'
()&' ()&' ()&'
Example: Split the data randomly into 60% for training, 20% for
validation and 20% for testing.
Train, Validation and Test
TRAIN VALIDATION TEST
Note: Never use the test set in any way to further tune
the parameters or revise the model.
K-fold Cross Validation
A method for estimating test error using training data.
Algorithm:
Step 2:
For j = 1 to k
Train A on all Di, i ∈ 1, . . . k and i 6= j, and get fj .
Apply fj to Dj and compute E Dj
j=1
Confusion matrix
34*#/0%;/<$0%
!"#$%$&' (')*%$&'
!"#$%$&' !"#$%&'()*)+$%,!&- ./0($%&'()*)+$%,.&-
&"$=)4*$=%;/<$0
(')*%$&' ./0($%1$2/*)+$%,.1- !"#$%1$2/*)+$%,!1-
34*#/0%;/<$0%
!"#$%$&' (')*%$&'
!"#$%$&' !"#$%&'()*)+$%,!&- ./0($%&'()*)+$%,.&-
&"$=)4*$=%;/<$0
(')*%$&' ./0($%1$2/*)+$%,.1- !"#$%1$2/*)+$%,!1-