Académique Documents
Professionnel Documents
Culture Documents
ETHEM ALPAYDIN
The MIT Press, 2010
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Why Reduce Dimensionality?
Reduces time complexity: Less computation
Reduces space complexity: Less parameters
Saves the cost of observing the feature
Simpler models are more robust on small datasets
More interpretable; simpler explanation
Data visualization (structure, groups, outliers, etc) if
plotted in 2 or 3 dimensions
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 3
Feature Selection vs Extraction
Feature selection: Choosing k<d important features,
ignoring the remaining d k
Subset selection algorithms
Feature extraction: Project the
original xi , i =1,...,d dimensions to
new k<d dimensions, zj , j =1,...,k
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 4
Subset Selection
There are 2d subsets of d features
Forward search: Add the best feature at each step
Set of features F initially .
At each iteration, find the best new feature
j = argmini E ( F xi )
Add xj to F if E ( F xj ) < E ( F )
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 5
Principal Components Analysis (PCA)
Find a low-dimensional space such that when x is
projected there, information loss is minimized.
The projection of x on the direction of w is: z = wTx
Find w such that Var(z) is maximized
Var(z) = Var(wTx) = E[(wTx wT)2]
= E[(wTx wT)(wTx wT)]
= E[wT(x )(x )Tw]
= wT E[(x )(x )T]w = wT w
where Var(x)= E[(x )(x )T] =
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 6
Maximize Var(z) subject to ||w||=1
maxw1T w1 w1T w1 1
w1
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 7
What PCA does
z = WT(x m)
where the columns of W are the eigenvectors of , and m
is sample mean
Centers the data at the origin and rotates the axes
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 8
How to choose k ?
Proportion of Variance (PoV) explained
1 2 k
1 2 k d
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 9
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 10
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 11
Factor Analysis
Find a small number of factors z, which when combined
generate x :
xi i = vi1z1 + vi2z2 + ... + vikzk + i
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 12
PCA vs FA
PCA From x to z z = WT(x )
FA From z to x x = Vz +
x z
z x
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 13
Factor Analysis
In FA, factors zj are stretched, rotated and translated to
generate x
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 14
Multidimensional Scaling
Given pairwise distances between N points,
dij, i,j =1,...,N
place on a low-dim map s.t. distances are preserved.
z = g (x | ) Find that min Sammon stress
E | X
z r
z x x
s r s
2
s 2
r ,s x xr
gx | gx | x
r s r
x s
2
s 2
r ,s x x
r
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 15
Map of Europe by MDS
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 16
Linear Discriminant Analysis
Find a low-dimensional
space such that when x is
projected, classes are
well-separated.
Find w that maximizes
J w
m1 m2 2
2
s1 s2
2
m1
t x r
w T t t
s t w x m1 r
2 T t 2 t
r t 1
t
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 17
Between-class scatter:
m1 m2 w m1 w m 2
2 T T 2
w m1 m 2 m1 m 2 w
T T
w T SB w where SB m1 m 2 m1 m 2 T
Within-class scatter:
s t w x m1 r
2 T t 2 t
1
t w x m1 x m1 wr t w T S1w
T t t T
where S1 t x m1 x m1 r t t T t
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 18
Fishers Linear Discriminant
Find w that max
w SB w w m1 m 2
T 2
T
Jw T
w SW w w T SW w
LDA soln: w c SW1 m1 m2
Parametric soln:
w 1 2
1
when px|C i ~ N i ,
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 19
K>2 Classes
Within-class scatter:
Si t ri x m i x m i
K
SW Si t t t T
i 1
Between-class scatter:
K
1 K
SB Ni m i m m i m T m mi
i 1 K i 1
Find W that max
W SB W
T The largest eigenvectors of SW-1SB
J W Maximum rank of K-1
WT SW W
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 20
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 21
Isomap
Geodesic distance is the distance along the manifold that
the data lies in, as opposed to the Euclidean distance in
the input space
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 22
Isomap
Instances r and s are connected in the graph if
||xr-xs||<e or if xs is one of the k neighbors of xr
The edge length is ||xr-xs||
For two nodes r and s not connected, the distance is equal to
the shortest path between them
Once the NxN distance matrix is thus formed, use MDS to find
a lower-dimensional mapping
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 23
Optdigits after Isomap (with neighborhood graph).
150
100 2
22222
2
2
50 3 22 2
7 7777 1 11 313
333
77 7 7 7 4 111 1
1 338
3
1 8 83
7 44999 5 5 5 98 38
0 99944 9 59 88
49
4 0
88 0 0
0 00
-50 000
6
4 6 66 0
6 66
4
44
-100
4
-150
-150 -100 -50 0 50 100 150
Matlab source from http://web.mit.edu/cocosci/isomap/isomap.html
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 24
Locally Linear Embedding
1. Given xr find its neighbors xs(r)
2. Find Wrs that minimize
2
E (z | W) z r Wrs z(sr )
r s
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 25
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 26
LLE on Optdigits
1
0 000
7
7777
7
6 666 7
7 9 9
1 66 399 47
84 4
8383
9334
957
9
44
389 93
41
9
8 34
3
484 1
1
4 4 1 82 282
1 1 22 222
9
8 25
1
1
1 55
5
1
1
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 27