SVM

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/273917134
Comparative Study of PCA, ICA, LDA using SVM Classiﬁer
Article in Journal of Emerging Technologies in Web Intelligence · February 2014

DOI: 10.4304/jetwi.6.1.64-68
CITATIONS READS
10 618
3 authors:
Anissa Bouzalmat Jamal Kharroubi

Faculté des Sciences et Techniques Fès Faculté des Sciences et Techniques Fès
8 PUBLICATIONS 50 CITATIONS 33 PUBLICATIONS 161 CITATIONS
SEE PROFILE SEE PROFILE
Arsalane ZARGHILI
Faculté des Sciences et Techniques Fès
46 PUBLICATIONS 115 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Computer-Aided Diagnosis in medical imaging View project
3D face recognition View project
All content following this page was uploaded by Arsalane ZARGHILI on 28 April 2015.
The user has requested enhancement of the downloaded file.

64 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 6, NO. 1, FEBRUARY 2014
Comparative Study of PCA, ICA, LDA using

SVM Classifier
Anissa Bouzalmat Jamal Kharroubi Arsalane Zarghili
Sidi Mohamed Ben Abdellah Sidi Mohamed Ben Abdellah Sidi Mohamed Ben Abdellah
University University University
Department of Computer Science Department of Computer Science Department of Computer Science
Faculty of Science and Faculty of Science and Faculty of Science and
Technology Technology Technology
Route d'Imouzzer B.P.2202 Fez Route d'Imouzzer B.P.2202 Fez Route d'Imouzzer B.P.2202 Fez
30000 Morocco 30000 Morocco 30000 Morocco
anissabouzalmat@yahoo.fr jamal.kharroubi@yahoo.fr a.zarghili@ieee.ma
Abstract—Feature representation and classification are two step is to classify the image .A large margin classifiers
key steps for face recognition. We compared three are proposed recently in machine learning such as
automated methods for face recognition using different Support Vector Machine (SVM) [8]. The method was
method for feature extraction: PCA (Principle Component used in this step is SVM (Support Vector Machines)
Analysis), LDA (Linear Discriminate Analysis), ICA
(Independent Component Analysis) and SVM (Support
which have been developed in the frame work of
Vector Machine) were used for classification. The statistical learning theory, and have been successfully
experiments were implemented on two face databases, The applied to a number of applications, ranging from time
ATT Face Database [1] and the Indian Face Database (IFD) series prediction, to face recognition, to biological data
[2] with the combination of methods (PCA+ SVM), processing for medical diagnosis [9,10]. VC (Vapnik-
(ICA+SVM) and (LDA+SVM) showed that (LDA+SVM) Chervonenkis) dimension theory and SRM (Structural
method had a higher recognition rate than the other two Risk Minimization) principle based SVM can well
methods for face recognition. resolve some practical problems such as small sample
size, nonlinear, high dimensional problems, etc. [11,12] .
Index Terms—Face Recognition, SVM, LDA, PCA, ICA.
In this paper SVMs were used for classification using
different method for feature extraction: PCA, LDA, ICA,
the experiments were implemented on two face databases,
I. INTRODUCTION
The ATT Face Database [1] and the Indian Face Database
Face Recognition is a term that includes several sub- (IFD) [2] .
stages as a two step process: Feature extraction and The face recognition system is shown as Fig. 1.
classification.
Feature extraction for face representation is one of
central issues to face recognition systems, it can be Input Features Classifier
defined as the procedure of extracting relevant Extraction
information from a face image.
There are many feature extraction algorithms, most of
them are used in other areas than face recognition. Training PCA
Researchers in face recognition have used, modified Images SVM
and adapted many algorithms and methods to their ICA Classifier
purpose . For example, Principle component analysis Test
(PCA) was applied to face representation and recognition Images LDA
[3, 4, 5].
The PCA method [5] is obviously of advantage to
feature extraction, but it is more suitable for image Fig 1: The face recognition system
reconstruction because of no consideration for the
separability of various classes. Aiming at optimal The outline of the paper is as follows: Section 2 feature
separability of feature subspace, LDA (Linear extraction and classification. In section 3 contains
Discriminate Analysis) can just make up for the experimental results. Section 4 concludes the paper.
deficiency of PCA [6]. ICA (Independent Component
Analysis) is a method that finds better basis by
recognizing the high-order relationships between the
pixels images [7], once the features are extracted, the next
© 2014 ACADEMY PUBLISHER

doi:10.4304/jetwi.6.1.64-68
JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 6, NO. 1, FEBRUARY 2014 65
II FEATURE EXTRACTION a subspace in which the between-class scatter (extra

personal variability) is as large as possible, while the
Feature extraction involves several steps -
within-class scatter (intrapersonal variability) is kept
dimensionality reduction, feature extraction and feature
constant. In this sense, the subspace obtained by LDA
selection. We have a large features vector which
optimally discriminates the classes-faces.
considers the whole image that needs a reduction of
We have a set of C-class and D-dimensional samples
dimension and selection the important features. Then
these new features will be used for the training and {x( , x( ,... x( }
1 2 N
testing of SVM classifier .In this paragraph we describe N1 of which belong to class w1 , N 2 to class w2 and N c
three techniques of extraction feature, Principal
component analysis (PCA), independent component to class wc , In order to find a good discrimination of
analysis (ICA) and linear discriminate analysis (LDA). these classes we need to define a measure of separation,
We define a measure of the within-class scatter by Eq. (6):
2.1 Principal Component Analysis (PCA) c
∑ ( x − μ )( x − μ )
T
Principal component analysis (PCA) is a powerful tool Si = i i (6)
for feature extraction as proposed by Turk and Pentland x∈wi
[13]. The main advantage of PCA is that it can reduce the c

1
dimension of the data without losing much information. Where: Sw = ∑ Si and μi = ∑x i
i =1 Ni x∈wi
Suppose there are N images Ii(i=l,2,---,N), each image is
denoted as a column vector xi , and the dimension is M. And the between-class scatter Eq. (7) becomes:
C
The mean of the images is given by: S B = ∑ N i ( μi − μ )( μi − μ )
T
(7)
N
x
x =∑ i (1)
i =1
1 1
∑ x = ∑ i =1 Ni μi
N C
i =1
Where: μ =
the covariance matrix of images is given by N ∀x N
1 N 1 Matrix ST = SB + SW is called the total scatter similarly,
C = ∑ ( xi − x )( xi − x ) = XX T ( 2)
T
N i =1 N we define the mean vector and scatter matrices for the

Where X = [ x1 − x , x2 − x ,..., xN − x ] the projection space projected samples as:
c
is made up of the eigenvectors which correspond to the SW = ∑ ∑ ( y − μi )( y − μ i )
T
significant eigenvalues when M>>N, the computational i =1 y∈wi

complexity is increased .we can use the singular value c
SB = ∑ N i ( μ i − μ )( μ i − μ )
T
decomposition (SVD), theorem to simplify the
computation .the matrix X, whose dimension is M*N and i =1
rank is N, can be decomposed as: 1 1

1
Where: μ i =
Ni
∑y , μ =
N
∑y
( 3)
y∈wi ∀y
X =UΛ V 2 T
From our derivation for the two-class problem, we can

1
( 4) write: SB = W SBW and SW = W SWW

T T
U = X ∨Λ 2
Where : Recall that we are looking for a projection that

Λ = diag [ λ1 , λ2 ,..., λN ] , λ1 ≥ λ2 ≥ ... ≥ λN , are the nonzero maximizes the ratio of between-class to within-class
scatter. Since the projection is no longer a scalar (it has
eigenvalues of XX T and X T X . C−1 dimensions), we use the determinant of the scatter
U = [u1 , u2 ,..., uM ] ,V = [ v1 , v2 ,..., vN ] are orthogonal matrices to obtain a scalar objective function Eq. (8):
matrices. SB WTS W
J (W ) = = T B (8)
ui is the eigenvector of XX T , vi is the eigenvector of SW W SW W
X T X and the λi is the corresponding eigenvalue. And we will seek the projection matrix W* that
ui is calculated by following : maximizes this ratio it can be shown that the optimal
projection matrix W* is the one whose columns are the
1 eigenvectors corresponding to the largest eigenvalues of
Ui = Xvi i = 1, 2,..., N ( 5)
λi the following generalized eigenvalue problem Eq. (9):
The p eigenvectors U = ⎡⎣u1 , u2 ,..., u p ⎤⎦ p≤N WTSBW
⎡*∗ ∗ ∗ ⎤
w = w1 w2 ..wc−1 =argmax T ⇒( SB −λiSW )Wi∗ (9)
corresponding to the p significant eigenvalues are ⎣ ⎦ WSW W
selected to form the projection space and the sample
feature is obtained by calculating. SB is the sum of C matrices of rank ≤1 and the mean
1 C
2.2 Analyse discriminate linear (LDA) vectors are constrained by : ∑ μi = μ
c i =1
LDA also known as Fisher’s Discriminate Analysis, is Therefore, SB will be of rank (C−1) or less and this
another dimensionality reduction technique, it determines means that only (C−1) of the eigenvalues ߣ will be non-

zero. The prrojections witth maximum class separabbility

{ }
l
information are the eigeenvectors corrresponding too the S = ( xi , yi ) xi ∈ Rn , yi ∈ {−1,1} (11)

i=1
−1
largest eigennvalues of S S .
W B
We seek (C−1) projecttions ⎡⎣ y1,2 , … yc −1 ⎤⎦ by meanns of

(c − 1) projection vectors wi arranged by b columns innto a
projection m
matrix
W = [ w1 | w2 | ... | wc −1 ] : yi = wiTx ⇒ y = WTx .
2.3 Independdent Componeent Analysis (IICA)
The mostt common method m for geenerating spatatially
localized feaatures is to apply indepeendent compoonent
analysis (IC CA) to prodduce basis vectors that are
statistically iindependent (not
( just lineaarly decorrelaated ,
as with PCA A) [14].it is an alternativ ve to PCA w which
provides a m more powerfuul data repressentation [15]] and
it’s a discrim
minate analysiss criterion, whhich can be ussed to
enhance PCA A.
ICA for faace recognitioon has been prroposed underr two Fig
g 2: Maximum-m
margin hyperplanee for SVM trained
d with samples
from two cllasses.
architecture by Barlett et. al. [16]. The T architectuure 1
aimed at finnding a set off statistically independent basis
We
W have solv ving a quadraatic optimizaation problem m
images whilee the architecture 2 finds a factorial codde. In
with
h linear constrraints that cann be interpreteed in terms off
this paper, thhe architecturee 1 has been used.
u This proocess
the Lagrange multipliers calculated by b quadraticc
involves the following twoo initial steps :
prog
gramming Eq: 12
1. The facce images in thet database are a organizedd as a
⎧ n ⎫
matrix X in w which each row correspond ds to an image . 1
( )
x(αi ):L(α ) = ∑ αi − ∑i, j αiα j yi y j k xi , x j (for any i=1,..,n)? ≤αi ≤c⎪⎪
⎪⎪max
2. The facce database iss processed to o obtain a redduced ⎨ i=1 2 ⎬ (12)
⎪ ⎪
dataset in orrder to reducee the computaation efficienccy of ⎪⎩ ∑i =1αi yi =0
n
⎪⎭
the ICA algoorithm. The reeduced dataseet is obtained from αi are the Lag grange multip ipliers param meters to bee
the first m principal compponent (PC) eigenvectors
e oof the usted ,c is the penalty paraameter of the classificationn
adju
image databaase. Hence thhe first step iss applying PC CA to error term it mustt be adjusted bbecause the data
d are rarelyy
determine thhe m PCs ,then the IC CA algorithm m is mpletely separaable, the xi aree the training examples.
com
performed oon the prinncipal compo onents using the The
T solution of o the optimiization probleem will be a
mathematicaal procedure deescribed in [17]. vecttor w ∈ F , thaat can be writtten as a linearr combinationn
III CLASSIFIICATION: SUPPPORT VECTOR MACHINE (SV
VM) of th puts Eq: 13 w =
he training inp ∑α y x
i i i (13)
SVMs (S Support Vecttor Machines) are a uuseful (w,b OSH = { x : w.x + b = 0} b iss
b) define the hyperplane O
technique foor data classsification and d are still uunder the bias.
b
intensive reseearch [18],[199]. Although SVM
S is considdered We
W use the sepparating (OSH H), once we have
h trained itt
easier to usee than Neuraal Networks, there are seeveral on the
t training seet, The (OSH n
H) divides the R into twoo
kernels are bbeing proposed by research hers, the four bbasic regioons: one where
w w.xi + b ≥ 0 and one wheree
kernels as foollow: linear, polynomial, sigmoid
s and rradial
basis functioon (RBF), Wee chose RBF kernelk functioon for w.xi + b ≤ 0 . To use the maxiimal margin classifier, wee
SVM classifi fier in our facee recognition experiments
e E
Eq:10 deteermine on which side the tesst vector lies and a assign thee
which has few wer numericaal difficulties [18]
[ . corresponding claass label. Hennce, the prediccted class of a
(
K ( xi , y j ) = exp −γ xi − x j
2
) γ ;0 (10 ) test point x is the output of the decision funcction Eq 14.
⎛ l ⎞
d usingγ : = 1
γ is kernel parameter and parameterized d ( x ) = sgn ⎜ ∑ α i yi k ( xi , x ) + b ⎟ (14 )
2σ 2 ⎝ i =1 ⎠
3.1 Maximal Margin Hypperplanes (

K ( xi , y j ) = exp −γ xi − x j
2
) γ ;0
After we change the representation of the trai aining
3.2 Multiclass Cllassification
examples byy mapping thhe data to a features spaace F
where the ooptimal sepaarating hyper plane (OSH H) is SVVM was origiinally designeed for binary classification..
constructed F Fig 2, we lim mited our stu udy to the casse of Facee recognition is a multi-claass classificattion problem..
two-class disscrimination [8][ and we consider the traiaining Therre are two baasic methods for face reco ognition withh
data S a seet of l vectorrs features each vector hhas n SVMMs: one against-one and one-against-aall. The one--
dimension, w where each point
p xi belonngs to one off two agaiinst-one metho od is
classes identiified by the laabel -1 or 1 Eq
q 11. Classification
C between eachh pair classes . The one--
agaiinst-all is classification betw
ween each claass and all thee

JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 6, NO. 1, FEBRUARY 2014 67
rest classes. In our experiments the one-against-all These two databases both contains 10 classes, each
method was used for classification. class have 5 images for training and 5 images for testing
In real world problems we often have to deal with Fig 3 and Fig 4. We use these Databases for comparison
n ≥ 2 classes. Our training set will consist of pairs of different face recognition algorithms such as
( xi , yi ) , where xi ∈ R n and yi ∈ {1,..., n} , i = 1..l for PCA+SVM, LDA+SVM and ICA+SVM. We extract
different features from a training set and testing set using
extending the two-class to the multiclass case this method PCA, LDA, ICA methods. Using these feature we trained
will be described briefly below. the classifier SVM and find the accuracy of the three
3.2.1 One vs. all approach methods, the recognition rates of the three methods
In the one-Vs-all approach n SVMs are trained. Each PCA+SVM, LDA+SVM, ICA+SVM were shown as Fig.
of the SVMs separates a single class from all remaining 5.
classes [20,21] ,A more recent comparison between
several multi-class techniques [22] favors the one-vs-all
approach because of its simplicity and excellent
classification performance. Regarding the training effort,
the one-vs-all approach is preferable over the one-vs-one
approach since only n SVMs have to be trained compared
to n(n − 1) / 2 SVMs in the pairwise approach (one-vs-one)
[23], [24], [25] . The construction of a n-class classifier
using two-class discrimination methods is usually done
by the following procedure: (a)
Construct n two-class decision functions
dk ( x ) , k = 1,.., n that separate examples of class k from
the training points of all other classes,
⎧ +1 if x belongs to class k ⎪⎫
d ( x ) = ⎪⎨ ⎬
k ⎩⎪ −1 otherwise ⎭⎪
In the face database of n individuals, 10 face images
for everyone. 5 images among the 10 images of every one
were taken to compose training samples and the rest 5
ones compose test samples.
Five images of first individual was taken and marked (b)
Fig 3: Examples of (a) training and (b) test images of (ATT) Face
as positive samples, the all images of other training Database
samples as negative samples. Both positive samples and
negative samples were taken as input samples to train a
SVM classifier to get corresponding support vectors and
optimal hyperplane. The SVM was labeled as SVM1. In
turn we can get the SVM for every individual and labeled
as SVM1, … , SVMn respectively.
The n SVMs can divide the samples into n classes.
When a test sample was in turn inputted to every SVM,
there would be several cases:
• If the sample was decided to be positive by SVMi and
to be negative by others SVMs at the same time, then (c)
the sample was classified as class i.
• If the sample was decided to be negative by several
SVMs synchronously and to be positive by other SVMs,
then the classification was false.
• If the sample was decided to be negative by all SVMs
synchronously, then the sample was decided not
belonging to the face database.
IV. EXPERIMENTATION AND RESULTS

(d)
Our experiments were performed on two face Fig 4: Examples of (c) training and (d) test images of (IFD) Face
databases, The ATT Face Database [1] and the Indian Database
Face Database (IFD) [2] the ATT database contains The comparison is done on the basis of rate of
images with very small changes in orientation of images recognition accuracy. Comparative results obtained by
for each subject involved, while the IFD contains a set of testing the PCA+SVM, LDA+SVM, ICA+SVM
10 images for each subject where each image is oriented algorithms on both the IFD and the ATT databases Fig.5.
in a different angle compared to the other.

[7] M. Bartlett, J. Movellan, and T. Sejnowski. Face

recognition by independent component analysis. IEEE
Trans. on Neural Networks,13(6):1450–1464, November
2002.
[8] V. Vapnik, Statistical Learning Theory, John Wiley and
Sons, New York, 1998.
[9] Vapnik V. “Statistical Learning Theory”, Wiley, New
York, 1998.
[10] G. Paliouras, V. Karkaletsis, and C.D. Spyropoulos (Eds.),
“Support Vector Machines: Theory and Applications,”
ACAI ’ 99, LNAI 2049,pp. 249-257, 2001.
[11] Nello Cristianini, John Shawe-Taylor, An Introduction to
Fig 5: Comparative of the combination algorithms
PCA+SVM,LDA+SVM,ICA+SVM On the basis Of recognition
Support Vector Machines and Other Kernel-based
accuracy Learning Methods, Cambridge University Press,
It is observed that recognition rate of the method Cambridge, 2000.
[12] Vladimir N. Vapnik, The Nature of Statistical Learning
LDA+SVM is 93.9% obtained on ATT face database and
Theory,Springer, New York, 2000.
70% on IFD face database it is the higher as compare to [13] M. A. Turk and A. P. Pentland, "Face Recognition Using
PCA+SVM and ICA+SVM methods for both IFD and Eigenfaces," in IEEE CVPR), 1991,pp. 586-591.
ATT databases. [14] M. S. Bartlett, Face Image Analysis by Unsupervised
Learning: Kluwer Academic, 2001.
CONCLUSION [15] C. Liu and H. Wechsler. Comparative assessment of
independent component analysis (ica) for face recognition.
We presented a face recognition method based on In Proc. of the Second In-ternational Conference on Audio-
SVM classifier combined with LDA feature extraction. and Video-based Biometric Person Authentication,
We implemented experiments on IFD and ATT face AVBPA’99, Washington D.C., USA, March 1999.
database. First, LDA for dimension reduction and SVM [16] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face
for classification. The experimental results showed that Recognition by Independent Component Analysis, IEEE
LDA+SVM method had a higher recognition rate than the Trans. on Neural Networks, Vol. 13, No. 6, November
other two methods for face recognition. 2002, pp. 1450-1464
[17] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face
Recognition by Independent Component Analysis, IEEE
REFERENCES Trans. on Neural Networks, Vol. 13, No. 6, November
[1] AT&T Laboratories 2002, pp. 1450-1464
Cambridge,http://www.cl.cam.ac.uk/Research/DTG/attarch [18] Vapnik, V., 1995: The Nature of Statistical Learning
ive:pub/data/att_faces.tar.Z. Theory. Springer, N.Y.
[2] Vidit Jain and Amitabha Mukherjee, The Indian Face [19] Vapnik, V., 1998: Statistical Learning Theory. Wiley, N.Y.
Database.http://vis- [20] C. Cortes, V. Vapnik, Support vector networks, Mach.
www.cs.umass.edu/~vidit/IndianFaceDatabase/, 2002 Learning 20 (1995) 1–25.
[3] L. Sirovich and M. Kirby. Low-dimensional procedure for [21] B. Sch€olkopf, C. Burges, V. Vapnik, Extracting support
the characterization of human faces. Journal of the Optical data for a given task, in: U. Fayyad, R.Uthurusamy (Eds.),
Society of America A - Optics, Image Science and Vision, Proc. First Int. Conf. on Knowledge Discovery and Data
4(3):519–524, March 1987. Mining, AAAI Press,Menlo Park, CA, 1995.
[4] M. Kirby and L. Sirovich. Application of the karhunen- [22] R. Rifkin, Everything old is new again: a fresh look at
loeve procedure for the characterization of human faces. historical approaches in machine learning,Ph.D. thesis,
IEEE Transactions on Pattern Analysis and Machine M.I.T., 2002.
Intelligence, 12(1):103–108, 1990. [23] M. Pontil, A. Verri, Support vector machines for 3-d object
[5] M. Turk and A. P. Pentland, “Eigenfaces for recognition”, recognition, IEEE Trans. Pattern Anal.Mach. Intell. (1998)
Journal of Cognitive Neuroscience, vol. 3, no. 1,pp. 71–86, 637–646.
1991. [24] G. Guodong, S. Li, C. Kapluk, Face recognition by support
[6] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, vector machines, in: Proc. IEEE Int.Conf. on Automatic
“Eigenfaces vs. Fisherfaces: recognition using class Face and Gesture Recognition, 2000, pp. 196–201.
specific linear projection”, IEEE Transactions on Pattern [25] Platt J., N. Cristianini, and J. Shawe-Taylor. Large margin
Analysis and Machine Intelligence, vol. 19, no. 7,pp. 711– dags for multiclass classification. In Advances in Neural
720, 1997. Information Processing Systems 12, pages 547-553. MIT
Press, 2000.
View publication stats

SVM

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

SVM

Transféré par

Droits d'auteur :

Formats disponibles

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Comparative Study of PCA, ICA, LDA using SVM Classiﬁer

Article in Journal of Emerging Technologies in Web Intelligence · February 2014

Anissa Bouzalmat Jamal Kharroubi

SEE PROFILE SEE PROFILE

Computer-Aided Diagnosis in medical imaging View project

3D face recognition View project

The user has requested enhancement of the downloaded file.

Comparative Study of PCA, ICA, LDA using

© 2014 ACADEMY PUBLISHER

II FEATURE EXTRACTION a subspace in which the between-class scatter (extra

[13]. The main advantage of PCA is that it can reduce the c

N i =1 N we define the mean vector and scatter matrices for the

significant eigenvalues when M>>N, the computational i =1 y∈wi

rank is N, can be decomposed as: 1 1

From our derivation for the two-class problem, we can

( 4) write: SB = W SBW and SW = W SWW

Where : Recall that we are looking for a projection that

© 2014 ACADEMY PUBLISHER

zero. The prrojections witth maximum class separabbility

information are the eigeenvectors corrresponding too the S = ( xi , yi ) xi ∈ Rn , yi ∈ {−1,1} (11)

We seek (C−1) projecttions ⎡⎣ y1,2 , … yc −1 ⎤⎦ by meanns of

3.1 Maximal Margin Hypperplanes (

© 2014 ACADEMY PUBLISHER

IV. EXPERIMENTATION AND RESULTS

© 2014 ACADEMY PUBLISHER

[7] M. Bartlett, J. Movellan, and T. Sejnowski. Face

© 2014 ACADEMY PUBLISHER

View publication stats

Vous aimerez peut-être aussi