PCA Explained: Principal Components Analysis Guide

-- --
Principal Components Analysis (PCA)
• Reading Assignments
S. Gong et al., Dynamic Vision: From Images to Face Recognition, Imperial College
Press, 2001 (pp. 168-173 and Appendix C: Mathematical Details, hard copy).
K. Kastleman, Digital Image Processing, Prentice Hall, (Appendix 3: Mathematical
Background, hard copy).
F. Ham and I. Kostanic. Principles of Neurocomputing for Science and Engineering,
Prentice Hall, (Appendix A: Mathematical Foundation for Neurocomputing,
hard copy).
A. Jain, R. Duin, and J. Mao, "Statistical Pattern Recognition: A Review", IEEE
Transactions on Pattern Analysis and Machine Intelligenve, vol. 22, no. 1, pp.
4-37, 2000 (read pp. 11-13, on-line)
• Case Studies
M. Turk and A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neu-

roscience, vol. 3, no. 1, pp. 71-86, 1991 (hard copy)
K. Ohba and K. Ikeuchi, "Detectability, Uniqueness, and Reliability of Eigen Win-
dows for Stable Verification of Partially Occluded Objects", IEEE Transactions
on Pattern Analysis and Machine Intelligenve, vol. 19, no. 9, pp. 1043-1048,
1997 (on-line)
H. Murase and S. Nayar, "Visual Learning and Recognition of 3D Objects from
Appearance", Interantional Journal of Computer Vision, vol 14, pp. 5-24, 1995
(hard-copy)
-- --
-2-
Principal Component Analysis (PCA)
• Pattern recognition in high-dimensional spaces
- Problems arise when performing recognition in a high-dimensional space

(e.g., curse of dimensionality).
- Significant improvements can be achieved by first mapping the data into a

lower-dimensionality space.
 a1   b1 
 a2   b2 
x=  − − > reduce dimensionality − − > y =   ( K << N )
 ...   ... 
 aN   bK 
- The goal of PCA is to reduce the dimensionality of the data while retaining as
much as possible of the variation present in the original dataset.
• Dimensionality reduction
- PCA allows us to compute a linear transformation that maps data from a high
dimensional space to a lower dimensional space.
b1 = t 11 a1 + t 12 a2 +. . . +t 1n a N
b2 = t 21 a1 + t 22 a2 +. . . +t 2n a N
...
b K = t K 1 a1 + t K 2 a2 +. . . +t KN a N
 t 11 t 12 ... t 1N 
 t 21 t 22 ... t 2N 
or y = Tx where T =  
 ... ... ... ... 
 tK1 tK2 ... t KN 
-- --
-3-
• Lower dimensionality basis
- Approximate the vectors by finding a basis in an appropriate lower dimen-

sional space.
(1) Higher-dimensional space representation:
x = a1 v1 + a2 v2 + . . . + a N v N
v 1 , v 2 , ..., v N is a basis of the N -dimensional space
(2) Lower-dimensional space representation:
x̂ = b1 u1 + b2 u2 + . . . + b K u K
u1 , u2 , ..., u K is a basis of the K -dimensional space
- Note: if both bases have the same size ( N = K ), then x = x̂ )
Example
1 0 0

v 1 =  0 , v 2 =  1 , v 3 =  0  (standard basis)
     
 
0  
0 1
3
x v =  3  = 3v 1 + 3v 2 + 3v 3
 
3
1 1 1
u1 =  0 , u2 =  1 , u3 =  1  (some other basis)
     
 
0  
0 1
3
x u =  3  = 0u1 + 0u2 + 3u3
 
3
thus, x v = x u
-- --
-4-
• Information loss
- Dimensionality Reduction implies Information Loss !!
- Preserve as much information as possible, that is,
minimize ||x − x̂|| (error)
• How to determine the best lower dimensional space?
The best low-dimensional space can be determined by the "best" eigenvectors of the
covariance matrix of x (i.e., the eigenvectors corresponding to the "largest" eigen-
values -- also called "principal components").
• Methodology
- Suppose x 1 , x 2 , ..., x M are N x1 vectors
1 M
Step 1: x =
M
Σ
i=1
xi
Step 2: subtract the mean: Φi = x i − x
Step 3: form the matrix A = [Φ1 Φ2 . . . Φ M ] ( N x M matrix), then

compute:
1 M
C= Σ
M n=1
Φn ΦTn = AAT
(sample covariance matrix, N x N , characterizes the scatter of the data)
Step 4: compute the eigenvalues of C : 1 > 2 > ... > N
Step 5: compute the eigenvectors of C : u1 , u2 , . . . , u N
- Since C is symmetric, u1 , u2 , . . . , u N form a basis, (i.e., any vector x or

actually (x − x), can be written as a linear combination of the eigenvectors):
N
x − x = b1 u1 + b2 u2 + . . . + b N u N = Σ bi ui
i=1
-- --
-5-
Step 6: (dimensionality reduction step) keep only the terms correspond-

ing to the K largest eigenvalues:
K
x̂ − x = Σ
i=1
bi ui where K << N
- The representation of x̂ − x into the basis u1 , u2 , ..., u K is thus
 b1 
 b2 
 
 ... 
 K
b
• Linear tranformation implied by PCA
- The linear tranformation R N -> R K that performs the dimensionality reduction

is:
T
 b1   u1 
 b2   uT2 
 =  (x − x) = U (x − x)
T

 ...   ... 
 b K   uTK 
• An example
(see Castleman’s appendix, pp. 648-649)

-- --
-6-
• Geometrical interpretation
- PCA projects the data along the directions where the data varies the most.
- These directions are determined by the eigenvectors of the covariance matrix

corresponding to the largest eigenvalues.
- The magnitude of the eigenvalues corresponds to the variance of the data

along the eigenvector directions.
• Properties and assumptions of PCA
- The new variables (i.e., bi ’s) are uncorrelated.
 1 0 0 
 0 2 0 
 
. . .
the covariance of bi ’s is: U T CU =  
 . . . 
 . . . 
 
 0 0 K 
- The covariance matrix represents only second order statistics among the vector
values.
- Since the new variables are linear combinations of the original variables, it is
usally difficult to interpret their meaning.
-- --
-7-
• How to choose the principal components?
- To choose K , use the following criterion:
K
Σ
i=1
i
N
> Threshold (e.g., 0.9 or 0.95)
Σ
i=1
i
• What is the error due to dimensionality reduction?
- We saw above that an original vector x can be reconstructed using its the prin-
cipla components:
K K
x̂ − x = Σ bi ui or x̂ =
i=1
Σ
i=1
bi ui + x
- It can be shown that the low-dimensional basis based on principal components

minimizes the reconstruction error:
e = ||x − x̂||
- It can be shown that the error is equal to:
N
e = 1/2 Σ
i=K +1
i
• Standardization
- The principal components are dependent on the units used to measure the orig-
inal variables as well as on the range of values they assume.
- We should always standardize the data prior to using PCA.
- A common standardization method is to transform all the data to have zero

mean and unit standard deviation:
xi −
( and are the mean and standard deviation of x i ’s)
-- --
-8-
• PCA and classification
- PCA is not always an optimal dimensionality-reduction procedure for classifi-

cation purposes:
• Other problems
- How to handle occlusions?
- How to handle different views of a 3D object?

PCA Explained: Principal Components Analysis Guide

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

PCA Explained: Principal Components Analysis Guide

Transféré par

Droits d'auteur :

Formats disponibles

-- --

Principal Components Analysis (PCA)

M. Turk and A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neu-

Principal Component Analysis (PCA)

• Pattern recognition in high-dimensional spaces

- Problems arise when performing recognition in a high-dimensional space

- Significant improvements can be achieved by first mapping the data into a

• Lower dimensionality basis

- Approximate the vectors by finding a basis in an appropriate lower dimen-

(1) Higher-dimensional space representation:

u1 , u2 , ..., u K is a basis of the K -dimensional space

- Note: if both bases have the same size ( N = K ), then x = x̂ )

1 0 0

- Dimensionality Reduction implies Information Loss !!

- Preserve as much information as possible, that is,

minimize ||x − x̂|| (error)

• How to determine the best lower dimensional space?

- Suppose x 1 , x 2 , ..., x M are N x1 vectors

Step 2: subtract the mean: Φi = x i − x

Step 3: form the matrix A = [Φ1 Φ2 . . . Φ M ] ( N x M matrix), then

Step 4: compute the eigenvalues of C : 1 > 2 > ... > N

Step 5: compute the eigenvectors of C : u1 , u2 , . . . , u N

- Since C is symmetric, u1 , u2 , . . . , u N form a basis, (i.e., any vector x or

Step 6: (dimensionality reduction step) keep only the terms correspond-

- The representation of x̂ − x into the basis u1 , u2 , ..., u K is thus

• Linear tranformation implied by PCA

- The linear tranformation R N -> R K that performs the dimensionality reduction

(see Castleman’s appendix, pp. 648-649)

- These directions are determined by the eigenvectors of the covariance matrix

- The magnitude of the eigenvalues corresponds to the variance of the data

• Properties and assumptions of PCA

- The new variables (i.e., bi ’s) are uncorrelated.

• How to choose the principal components?

- To choose K , use the following criterion:

• What is the error due to dimensionality reduction?

- It can be shown that the low-dimensional basis based on principal components

- It can be shown that the error is equal to:

- We should always standardize the data prior to using PCA.

- A common standardization method is to transform all the data to have zero

• PCA and classification

- PCA is not always an optimal dimensionality-reduction procedure for classifi-

- How to handle occlusions?

- How to handle different views of a 3D object?

Vous aimerez peut-être aussi