Face Recognition by Generalized Two-Dimensional FLD Method and Multi-Class SVM

G Model
ASOC-1029;
No. of Pages 11
ARTICLE IN PRESS
Applied Soft Computing xxx (2011) xxxxxx
Contents lists available at ScienceDirect
Applied Soft Computing

journal homepage: www.elsevier.com/locate/asoc
Face recognition by generalized two-dimensional FLD method and multi-class

support vector machines
Shiladitya Chowdhury a , Jamuna Kanta Sing b, , Dipak Kumar Basu b , Mita Nasipuri b
a
b
Department of Master of Computer Application, Techno India, EM-4/1, Sector V, Salt Lake, Kolkata 700 091, India
Department of Computer Science & Engineering, Jadavpur University, 188, Raja S. C. Mullick Road, Kolkata, West Bengal 700 032, India
a r t i c l e
i n f o
Article history:
Received 20 April 2010
Received in revised form 27 October 2010
Accepted 1 December 2010
Available online xxx
Keywords:
Generalized two-dimensional FLD
Fishers criteria
Feature extraction
Face recognition
Multi-class SVM
SVM-based classier
a b s t r a c t
This paper presents a novel scheme for feature extraction, namely, the generalized two-dimensional
Fishers linear discriminant (G-2DFLD) method and its use for face recognition using multi-class support
vector machines as classier. The G-2DFLD method is an extension of the 2DFLD method for feature
extraction. Like 2DFLD method, G-2DFLD method is also based on the original 2D image matrix. However,
unlike 2DFLD method, which maximizes class separability either from row or column direction, the G2DFLD method maximizes class separability from both the row and column directions simultaneously.
To realize this, two alternative Fishers criteria have been dened corresponding to row and column-wise
projection directions. Unlike 2DFLD method, the principal components extracted from an image matrix in
G-2DFLD method are scalars; yielding much smaller image feature matrix. The proposed G-2DFLD method
was evaluated on two popular face recognition databases, the AT&T (formerly ORL) and the UMIST face
databases. The experimental results using different experimental strategies show that the new G-2DFLD
scheme outperforms the PCA, 2DPCA, FLD and 2DFLD schemes, not only in terms of computation times,
but also for the task of face recognition using multi-class support vector machines (SVM) as classier.
The proposed method also outperforms some of the neural networks and other SVM-based methods for
face recognition reported in the literature.
2010 Elsevier B.V. All rights reserved.
1. Introduction
1.1. Holistic matching methods
Since the last decade, human face recognition is an active

research area in the eld of pattern recognition and computer vision
due to its wide range of applications, such as identity authentication, access control, surveillance systems, security, etc. As a
result, numerous methods have been proposed in the past. Surveys
of these methods can be found in [14]. Often, a single method
involves techniques motivated by different principles. The usage of
a mixture of techniques makes it difcult to classify these methods
based purely on the types of techniques used for feature representation or classication. Based on the psychological study of how
humans use holistic and local features, face recognition techniques
may be classied into three categories: (i) holistic matching methods, (ii) feature-based (structural) matching methods, and (iii) Hybrid
methods.
These methods use whole face region as the raw input to a

recognition system. One of the most widely used methods is eigenface approach, which is based on the principal component analysis
(PCA) [5,6]. It generates a set of orthogonal bases that capture
directions of maximum variance in the training images. Eigenface
approach can preserve the global structure of the input space and
is optimal in terms of image representation and reconstruction.
The Fishers linear discriminant (FLD) method has also been widely
used for feature extraction and recognition [79]. The key idea of
the FLD technique is to nd the optimal projection that maximizes
the ratio of the between-class and the within-class scatter matrices of the projected samples. However, a difculty in using the
FLD technique in face recognition is the small sample size (SSS)
problem [10]. This problem usually arises when the number of
samples is smaller than the dimension of the samples. In face recognition domain, the dimension of a face image is generally very high.
Therefore, the within-class scatter matrix is almost always singular, thereby making the implementation of FLD method impossible.
One direct solution of SSS is to down sample the face images into a
considerably small size and then perform FLD technique. However,
this process is not computationally efcient as the pre-processing
Corresponding author.
E-mail addresses: dityashila@yahoo.com (S. Chowdhury),
jksing@ieee.org (J.K. Sing),
dipakkbasu@gmail.com (D.K. Basu),
mitanasipuri@yahoo.com (M. Nasipuri).
1568-4946/$ see front matter 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2010.12.002
Please cite this article in press as: S. Chowdhury, et al., Face recognition by generalized two-dimensional FLD method and multi-class
support vector machines, Appl. Soft Comput. J. (2011), doi:10.1016/j.asoc.2010.12.002
G Model
ASOC-1029;
2
No. of Pages 11
ARTICLE IN PRESS
S. Chowdhury et al. / Applied Soft Computing xxx (2011) xxxxxx
of images takes considerable amount of time before actual application of the FLD technique. Er et al. [11] proposed a PCA + FLD
technique to avoid SSS problem. In [11], face features are rst
extracted by the principal component analysis (PCA) method and
then the resultant features are further processed by FLD technique
to acquire lower dimensional discriminant features. An improved
PCA technique, the two-dimensional PCA (2DPCA), was proposed
by Yang et al. [12]. Unlike PCA, which works on the stretched
image vector, the 2DPCA works directly on the original 2D image
matrix. The 2DPCA is not only computationally efcient, but also
superior for the task of face recognition and image reconstruction
than the conventional PCA technique [12]. However, the PCA techniques yield projection directions that maximize the total scatter
across all classes, i.e., across all face images. Therefore, the PCA
retains unwanted variations caused by lighting, facial expression,
and other factors [7,11]. The PCA techniques do not provide any
information for class discrimination but dimension reduction [11].
Recently, Xiong et al. [13] proposed a two-dimensional FLD (2DFLD)
method, which also works directly on the original 2D image matrix
and maximizes class separability either from row or column direction. The so called SSS problem does not arise in 2DFLD method as
the size of its scatter matrices is much smaller. The 2DFLD method
is found to be superior to the PCA and 2DPCA in terms of feature
extraction and face recognition [13]. Apart from the eigenface and
sherface approaches, Bayesian methods, which use a probabilistic
distance metric [14], neural networks [11,1519] and support vector machine (SVM) methods [2026] have also been developed. To
utilize higher order statistics, some nonlinear forms of eigenface
and sherface methods have been developed [2732] for better
recognition performance.
The advantage of using the neural networks for face recognition [11,1519] is that the networks can be trained to capture
more knowledge about the variation of face images and thereby
achieving good generalization. In recent times, among the neural network approaches, many researchers have used RBF neural
networks (RBFNN) for face recognition [11,1519]. The RBF neural
networks can be trained faster than multi layer perceptron (MLP)
because of its locally tuned neurons and has more compact topology
compared to other models of neural networks. Er et al. [11] have
used principal component analysis (PCA) method with RBF networks for face recognition. In their recent work [18], discrete cosine
transform (DCT) and Fishers linear discriminant (FLD) technique
have been employed in an RBFNN for high-speed face recognition. In our earlier work [15], we have used a modied k-means
clustering algorithm using point symmetry distance as a similarity measure to model the hidden layer neurons of an RBFNN for
face recognition. In this method we have generated cluster centers from each individual of the database independently to capture
more knowledge about distribution of facial images. Recently, we
have proposed a high-speed face recognition method using pixelbased features and RBFNN [16]. Yang and Paindovoine [17] have
down-sampled the face images into 16 16 pixels and applied in
an RBFNN for recognition. Haddadnia et al. [19] have combined the
shape information and PCA to extract features from a face image
and used in RBF neural networks for face recognition. The main
drawback of this technique is that the networks have to be extensively tuned to get exceptional performance.
Few methods for face recognition using SVM have also been proposed in the past [2026]. Among the earlier works, Phillips [20]
used SVM for face recognition. Zhaohui and Guiming [21] proposed
a method based on multi-class bias SVM (BSVM), where local facial
features are automatically extracted and combine them to form
a single feature vector, which is then classied by the BSVM for
recognition. Lee et al. [22] proposed a SVM-based method using
PCA + FLD feature subspace. The method reduces the number of
face classes by selecting a few classes closest to the test data after
projected in the PCA + LDA feature subspace. Ko and Byun [23] proposed a method by combining one-per-class (OPC) and pairwise
coupling (PWC) SVMs with rejection criteria. Guo et al. [24] proposed a binary tree-based multi-class SVM for face recognition.
Wang and Sun [25] presented a face recognition method using simple gabor feature space (SGFS) and SVM. Thakur et al. [26] proposed
a SVM-based face recognition technique using FLD features.
More recently some new developments on the holistic matching methods can be found in the literature [3336]. Zhi and Ruan
[33] proposed a two-dimensional direct and weighted linear discriminant analysis (2D-DWLDA) for feature extraction. The method
tries to weaken the overlap between the neighbouring classes by
introducing a weighting function. Wang et al. [34] proposed a feature extraction method, which combines the ideas of 2D-PCA and
2D maximum scatter difference methods. The method can simultaneously make use of the discriminant and descriptive information
of the image. Song et al. [35] proposed a face recognition method
based on complete fuzzy linear discriminant analysis (CF-LDA)
and decision tree fuzzy support vector machines (DT-FSVM). The
method uses a relaxed normalized condition in the denition of
fuzzy membership function to improve the classication results.
Jiang et al. [36] proposed a method for facial eigenfeature regularization and extraction. Image space spanned by the eigenvectors of
the within-class scatter matrix is decomposed into three subspaces.
Then eigenfeatures are regularized differently in these three subspaces based on an eigenspectrum model to address the problems
of instability, over tting and poor generalization. After discriminant assessment, features are extracted from these three subspaces.
1.2. Feature-based (structural) matching methods
Most earlier methods of face recognition belong to this category. Local structural features such as eyes, nose, mouth, etc. are
extracted from the frontal-view images and their locations, angles,
distances, etc. are used for recognition [3739]. Without nding
the exact locations of the facial features, Hidden Markov Model
(HMM)-based methods use strip of pixels to cover forehead, eye,
nose, mouth, and chin [40,41]. One of the most successful methods in this category is the graph matching technique [42], which
is based on the Dynamic Link Architecture (DLA). The main disadvantage of these methods is that the prole (side-view) images and
illumination variations can increase the complexity and time of the
approach.
1.3. Hybrid methods
These types of methods try to realize the human perception
by integrating holistic and feature-based approaches to recognize
a face. Some of the hybrid methods are the modular eigenface
method [43], hybrid local feature analysis (LFA) [44], shapenormalized method [45] and component-based method [46]. The
modular eigenface method [43] uses hybrid features by combining
eigenfaces and other eigenmodules such as, eigeneyes, eigenmouth, and eigennose. This method is found to be slightly superior
to the holistic eigenface method. The hybrid LFA method [44] uses
a set of hybrid features using PCA and LFA methods. The shapenormalized method uses both shape and gray-level information
for automatic face recognition [45]. The component-based method
[46] decomposes a face into a set of facial components such as
mouth and eyes that are interconnected by a exible geometrical
model. One drawback of this method is that it needs a large number of training images taken from different viewpoints and under
different lighting conditions.
In this paper, we have extended the 2DFLD algorithm and
present a novel generalized two-dimensional FLD (G-2DFLD) technique, which maximizes class separability from both the row and
G Model
ASOC-1029;
No. of Pages 11
ARTICLE IN PRESS
column directions simultaneously. Like 2DFLD method, G-2DFLD

method is also based on the original 2D image matrix. In G-2DFLD
method, two alternative Fishers criteria have been dened corresponding to row and column-wise projection directions. Unlike
2DFLD method, the principal components extracted from an image
matrix by the G-2DFLD method are scalars. Therefore, the size of
the resultant image feature matrix is much smaller using the G2DFLD method than that of using the 2DFLD method. A non-linear
multi-class SVM has been designed to classify the face images. The
experimental results on the AT&T and the UMIST databases show
that the new G-2DFLD scheme outperforms the PCA, 2DPCA, FLD
and 2DFLD schemes, not only in terms of computation time, but
also for the task of face recognition.
The remaining part of the paper is organized as follows. Section 2 describes the procedure of extracting face features using
2DFLD technique. Section 3 presents the key idea and algorithm
of the proposed G-2DFLD method for feature extraction and sherface calculation. The key idea of SVMs is described in Section 4. The
experimental results on the AT&T and the UMIST face databases
are presented in Section 5. Finally, Section 6 draws the concluding
remarks.
2. Two-dimensional FLD (2DFLD) method for feature
extraction
The 2DFLD [13] method is based on the 2D image matrix. It
does not need to form a stretched large image vector from the
2D image matrix. The key idea is to project an image matrix X,
an m n random matrix, onto an optimal projection matrix A of
dimension n k (k is the number of projection vector and k n) to
get an image feature matrix Y of dimension m k by the following
linear transformation [13]:
Y = XA
(1)
Let there are N training images, each one is denoted by m n

image matrix Xi (i = 1, 2, . . ., N). The training images contain C
C
classes (subjects), and the cth class Cc has Nc samples ( c=1 Nc =
N). Let the mean image of the training samples is denoted by and
the mean image of the cth class is denoted by c . The betweenclass and within-class scatter matrices Gb and Gw , respectively are
dened as follows:
Gb =
C
Nc (c )T (c )
Gw =
Yi = X i Qopt ; i = 1, 2, . . . , N
X i = Xi
3.1. Key idea and the algorithm

Like 2DFLD method, the generalized two-dimensional FLD (G2DFLD) method is also based on 2D image matrix. The only
difference is that, it maximizes class separability from both the
row and column directions simultaneously by the following linear
transformation:
Z = UT XV
3.1.1. Alternate Fishers criteria

We have dened two alternative Fishers criteria J(U) and J(V)
corresponding to row and column-wise projection directions as
follows:
J(U) =
|VT G wc V|
(9)
(10)
where
G br =
C
Nc (c )(c )T
C
N

c
(3)
(4)
C
(11)
(5)
where {qi |i = 1, 2, . . ., k} is the set of normalized eigenvectors of

G b G 1
w corresponding to k largest eigenvalues {i |i = 1, 2, . . ., k}.
(Xi c )(Xi c )T
(12)
ic
Nc (c )T (c )
(13)
G wc =
C
N

c
where Q is the projection matrix.

It may be noted that the size of both the Gb and Gw is n n. If
Gw is a nonsingular matrix, the ratio in (4) is maximized when the
column vectors of the projection matrix Q, are the eigenvectors of
G b G 1
w . The optimal projection matrix Qopt is dened as follows:
= [q1 , q2 , . . . , qk ]
|UT G wr U|
|VT G bc V|
J(V) =
G bc =
= argmax|G b G 1
w |
|UT G br U|
and
Then the two-dimensional Fishers criterion J(Q) is dened as

follows:
Qopt
(8)
where U and V are two projection matrices of dimension m p

(p m) and n q (q n), respectively. Therefore, our goal is to nd
the optimal projection directions U and V so that the projected
vector in the (p q)-dimensional space reaches its maximum class
separability.
(2)
ic
|Q T G b Q |
J(Q ) =
|Q T G w Q |
(7)
3. Generalized two-dimensional FLD (G-2DFLD) method for

feature extraction
(Xi c )T (Xi c )
(6)
where X i is mean-subtracted image of Xi and dened as follows:
G wr =

c
Now, each face image Xi (i = 1, 2, . . ., N) is projected into the

optimal projection matrix Qopt to obtain its (m k)-dimensional
2DFLD-based features Yi , which is dened as follows:
c
C
(Xi c )T (Xi c )
(14)
ic
We call the matrices Gbr , Gwr , Gbc and Gwc , as image row
between-class scatter matrix, image row within-class scatter
matrix, image column between-class scatter matrix and image column within-class scatter matrix, respectively. It may be noted that
size of the scatter matrices Gbr and Gwr is m m, whereas, for Gbc
and Gwc the size is n n. The sizes of these scatter matrices are
much smaller than that of the conventional FLD algorithm, whose
scatter matrices are mn mn in size. For a square image, m = n and
we have G br = G Tbc and G wr = G Twc and vice versa.
The ratios in (9) and (10) are maximized when the column vectors of the projection matrices U and V, are the eigenvectors of
G Model
ASOC-1029;
No. of Pages 11
ARTICLE IN PRESS
1
G br G 1
wr and G bc G wc , respectively. The optimal projection (eigenvector) matrices Uopt and Vopt are dened as follows:
Uopt
argmax|G br G 1
wr |
U
= [u1 , u2 , . . . , up ]
Vopt
= argmax|G bc G 1
wc |
V
= [v1 , v2 , . . . , vq ]
(15)
(16)
where {ui |i = 1, 2, . . ., p} is the set of normalized eigenvectors of

G br G 1
wr corresponding to p largest eigenvalues {i |i = 1, 2, . . ., p} and
{vj |j = 1, 2, . . ., q} is the set of normalized eigenvectors of G bc G 1
wc
corresponding to q largest eigenvalues {j |j = 1, 2, . . ., q}.
3.1.2. Feature extraction
The optimal projection matrices Uopt and Vopt are used for feature extraction. For a given image sample X, an image feature is
obtained by the following linear projection:
z ij = uTi Xvj , i = 1, 2, . . . , p; j = 1, 2, . . . , q
(17)
The zij (i = 1, 2, . . ., p; j = 1, 2, . . ., q) is called a principal component of the sample image X. It may be noted that each principal
component of the 2DFLD method is a vector, whereas, the principal
component of the G-2DFLD method is a scalar. The principal components thus obtained are used to form a G-2DFLD-based image
feature matrix Z of dimension p q (p m, q n), which is much
smaller than the 2DFLD-based image feature matrix Y of dimension
m k (k n). Therefore, in this case an image matrix is reduced considerably in both the row and column directions simultaneously.
Given labeled N training samples,

d
D = {(xi , yi )}N
i=1 , xi Z , yi M = {+1, 1}
where xi is the G-2DFLD-based image feature matrix of the ith training sample, d (d = p q) is the dimension of the image feature vector
and yi is the class of the ith sample.
A SVM separates the training samples belonging to two separate classes by forming an optimal hyperplane (w x) + b = 0, w d ,
b , which maximizes the margin from x to the hyperplane. The
constraint of the hyperplane can be written as:
yi ((w xi ) + b 1,
Iij =
ui vTj ,
i = 1, 2, . . . , p; j = 1, 2, . . . , q
(18)
4. Support vector machines

After the feature extraction, we have designed a non-linear
multi-class support vector machines (SVMs) to classify and recognize the image samples. The support vector machines originally
were designed for binary-class classication problems [47,48]. Several binary-class SVMs can be combined to form multi-class SVMs
for multi-class classication problems, like face recognition problem.
4.1. Key idea of binary-class support vector machines
The key idea of a binary-class SVM [47,48] is to separate the
two classes by a function, which is induced from available samples.
The SVM nds the hyperplane that separates the largest fraction of
samples of the same class on the same side, while maximizing the
distance from the either class to the hyperplane. This hyperplane is
called optimal separating hyperplane (OSH), which minimizes the
risk of misclassication in the training as well as unknown test set.
The basic algorithm of the binary-class SVM can be described as
follows:
i = 1, 2, . . . , N
(20)
The discriminant function implemented by a support vector

machine for an input sample x is dened as follows:
f (x) =
N
i yi (xi x) + b
(21)
i=1
The distance of a sample x from the hyperplane is 1/||w||. Therefore, a total distance between two classes will be 2/||w||. Hence
the optimal separating hyperplane (OSH) minimizes the following
function:
(w) =
1
||w||2
2
(22)
The solution to the optimization problem of (22) subject to the

constraint of (20) is given by the saddle point of the following
Lagrange function:

1
i {yi ((w xi ) + b) 1}
||w||2
2
N
L(w, b, ) =
3.1.3. Calculating sherfaces
Let an image Ai (i = 1, 2, . . ., N) be an m n matrix of intensity
values. The dimension of the row and column scatter matrices G br G 1
wr
and G bc G 1
wc are m m and n n, respectively. Since the eigenvectors
of these two scatter matrices together dene a subspace of the face
images, we can combine them linearly to form sherfaces.
Let Uopt = [u1 , u2 , . . ., up ] and Vopt = [v1 , v2 , . . ., vq ] are the optimal orthonormal eigenvectors matrices corresponding to the p
1
and q largest eigenvalues of G br G 1
wr and G bc G wc , respectively. The
sherfaces are generated by linear combination of eigenvectors as
follows:
(19)
(23)
i=1
L(w, b, ) =

1
i yi (w xi ) +
i yi b +
i
||w||2
2
N
i=1
i=1
i=1
(24)
where i is the Lagrange multiplier of the training samples. The

Lagrange function has to be minimized with respect to w, b and
maximized with respect to i 0. The Lagrange function can transformed into its dual problem, which is easier to solve, as follows:
maxW () = max{minL(w, b, )}
(25)
w,b
We can derive two optimal conditions from Eq. (24) as follows:
i yi xi = 0
L(w, b, ) = w
w
N
(26)
i=1
i yi = 0
L(w, b, ) =
b
N
(27)
i=1
Substituting Eqs. (26) and (27) into the right hand side of the
Lagrange function (24) reduces the function into the dual objective
function with i as the dual variable. The dual problem (25) is then
dened as follows:
= argmax
N
1
i j yi yj (xi xj )
2
N
i=1
(28)
i=1 j=1
with constraints,
N
i yi = 0
(29)
i=1
i 0, i = 1, 2, . . . , N
(30)
G Model
ASOC-1029;
ARTICLE IN PRESS
No. of Pages 11
Solving Eq. (28) with constraints (29) and (30) determines the
Lagrange multipliers i , and the OSH is dened as follows:
w =
N
i yi xi
(31)
i=1
b = (w xi ) yi for some i > 0
(32)
For a new sample x, the classication is dened as follows:

f (x) = sign((w x) + b )
(33)
In face recognition domain, due to variations in illumination,

pose, etc. face images are highly non-linear. Therefore, each sample is non-linearly mapped into a high-dimensional feature space
with a non-linear function : d D ; D d. Then, linear SVM is
implemented in the feature space. To avoid explicit mapping and
computational overhead in the high-dimensional feature space, a
positive denite kernel function K is chosen a priori to perform
inner product of vectors in the feature space as follows:
K(xi , x) = (xi ) (x)
(34)
where (x) is the transformed vector of the sample x by the nonlinear function .
Two of the commonly used kernel functions are the polynomial and Gaussian radial basis function kernels. These kernels are
dened as follows:
Polynomial kernel :
K(xi , x) = (xi x)
(35)

Gaussian radial basis function :
K(xi , x) = exp
||xi x||2
2 2
4.2. Multi-class support vector machines

Support vector machines are originally designed for binary pattern classication. Multi-class pattern recognition problems are
commonly solved using a combination of binary SVMs and a decision strategy to decide the class of the input pattern. Each SVM is
independently trained. Multi-class SVM can be implemented using
one-against-all [48] and one-against-one [50] strategy. In our work,
we have implemented one-against-all strategy due to its less memory requirement, as discussed below.
Let the training set (xi , ci ) consists of N samples of M classes,
where ci (ci 1, 2, . . ., M) represents the class label of the sample xi .
An SVM is constructed for each class by discriminating that class
against the remaining (M 1) classes. The number of SVMs used in
this approach is M. A test pattern x is classied by using the winnertakes-all decision strategy, i.e., the class with the maximum value
of the discriminant function f(x) is assigned to it. All the N training
samples are used in constructing an SVM for a class. The SVM for
class k is constructed using the set of training samples and their
desired outputs, (xi , yi ). The desired output yi for a training sample
xi is dened as follows:

yi =
+1
if ci = k
if ci =
/ k
(44)
The samples with the desired output yi = +1 are called positive

samples and the samples with the desired output yi = 1 are called
negative samples.
5. Experimental results
(36)
where r is a positive integer, > 0.

The discriminant function implemented by a non-linear support
vector machine for an input sample x is dened as follows:
f (x) =
N
i yi K(xi , x) + b
(37)
i=1
The dual objective function (28) in a non-linear SVM becomes

as follows:
= argmax
N
1
i j yi yj K(xi , xj )
2
N
i=1
(38)
i=1 j=1
with constraints,
N
i yi = 0
l
(39)
Ravg =
i=1
0 i C, i = 1, 2, . . . , N
The performance of the proposed method has been evaluated on the AT&T Laboratories Cambridge database (formerly ORL
database) [51] and the UMIST face database [52]. The AT&T database
is used to test performance of the proposed method under the
condition of minor variations of rotation and scaling, whereas the
UMIST database is used to examine the performance of the method
when the angle of rotation of the facial images is quite large. The
experiments were carried out in three different strategies; (i) randomly partitioning the database, (ii) n-fold cross validation test and
(iii) leave-one-out strategy to test the performance of the proposed
method.
The recognition rate has been dened as the percentage of ratio
of the total number of correct recognition by the method to the
total number of images in the test set for a single experimental
run. Therefore, the average recognition rate, Ravg , of the method is
dened as follows:
(40)
where C is a regularization parameter, controlling a compromise

between maximizing the margin and minimizing the number of
training set error.
The KarushKuhnTucker (KKT) conditions are necessary and
sufcient conditions for an optimal point of a positive denite dual
problem. The dual problem is solved when, for all i:
i = 0 yi f (xi ) 1,
(41)
0 < i < C yi f (xi ) = 1,
(42)
i = C yi f (xi ) 1.
(43)
In our work, the dual objective function (38) is solved by the

sequential minimization optimization (SMO) algorithm [49].
ni
i=1 cls
l ntot
100
(45)
where l is the number of experimental runs, each one of which has

been performed by randomly partitioning the database into two
sets; training set and test set. The nicls is the number of correctly
recognized faces in the ith run and ntot is the total number of test
faces in each run..
The performances of the method have also been evaluated using
rejection criteria. We believe that an ideal face recognition system
should reject the intruders (faces belonging to other classes) while
recognizing the own faces. Here, an SVM of a class should recognize all the faces of its own class and reject the faces belonging to
the other classes (intruders). To calculate the success rate of the
method two parameters, namely, the sensitivity and specicity are
evaluated. Sensitivity is dened as probability of correctly recognizing a face, whereas specicity refers to the probability of correctly
G Model
ASOC-1029;
No. of Pages 11
ARTICLE IN PRESS
5.1. Experiments on the AT&T face database

The AT&T database contains 400 gray-scale images of 40 persons. Each person has 10 gray-scale images, having a resolution
of 112 92 pixels. Images of the individuals have been taken by
varying light intensity, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glasses/no glasses) against a dark
homogeneous background, with tilt and rotation up to 20 and scale
variation up to 10%. Sample face images of a person are shown in
Fig. 1
Fig. 1. Sample images of a subject from the AT&T database.
Fig. 2. Average recognition rates (sensitivity (%)) of the G-2DFLD algorithm on the
AT&T database for different values of s by varying the values of p and q.
rejecting an intruder. They can be computed as follows:

Sensitivity =
TP
TP + FN
(46)
Specicity =
TN
TN + FP
(47)
where TP is the total number of faces correctly recognized (true

positive) and FN is the total number of faces falsely recognized as
intruders (false negative) in each run. TN is the total number faces of
the other classes truly rejected as intruders (true negative) and FP
is the total number of faces of other classes falsely recognized as its
own (false positive) in each run. It may be noted that the percentage
of the sensitivity is also referred as the recognition rate.
5.1.1. Randomly partitioning the database

In this experimental strategy, we randomly select s images from
each subject to form the training set and the remaining images
are included in the test set. To ensure sufcient training and to
test the effectiveness of the proposed technique for different sizes
of the training sets, we choose the value of s as 3, 4, 5, 6 and
7. It may be noted that there is no overlap between the training
and test images. To reduce the inuence of performance on the
training and test sets, for each value of s, experiment is repeated
20 times with different training and test sets. Since the numbers of projection vectors p and q have a considerable impact on
the performance of the G-2DFLD algorithm, we perform several
experiments by varying the values of p and q. Fig. 2 shows the
recognition rates (sensitivity (%)) of the G-2DFLD algorithm using a
multi-class support vector machine (SVM). For each value s, average recognition rates are plotted by varying the values of p and
q. For s = 3, 4, 5, 6 and 7 the best average recognition rates are
found to be 92.82%, 95.94%, 97.68%, 98.72% and 98.42%, respectively and the dimension (p q) of the corresponding image feature
matrices are (16 16), (16 16), (14 14), (14 14) and (8 8),
respectively. The average specicity (%) are found to be 99.82%,
99.90%, 99.94%, 99.97% and 99.96% for s = 3, 4, 5, 6 and 7, respectively.
We have constructed the sherfaces using the eigenvectors for
s = 5. Some samples of these sherfaces Iii (i = 1, 2, . . ., 14) are shown
in Fig. 3.
5.1.2. n-Fold cross validation test
In this experiment, we divide the AT&T database into 10-folds
randomly, taking one image of a person into a fold. Therefore, each
fold consists of 40 images, each one corresponding to a different
person. For 10-folds cross validation test, in each experimental run,
9-folds are used to train the multi-class SVM and remaining 1-fold
for testing. Therefore, training and test sets consist of 360 and 40
images, respectively. The average recognition rates (sensitivity (%))
by varying the image feature matrix (i.e. p q) are shown in Fig. 4.
The best average recognition rate is found to be 99.75% using image
Fig. 3. Fourteen of the sherfaces calculated from a training set of AT&T database.
G Model
ASOC-1029;
ARTICLE IN PRESS
No. of Pages 11
Fig. 5. Some sample images of a subject from the UMIST database.

AT&T database for 10-folds cross validation test by varying the values of p and q. The
upper and lower extrema of the error bars represent the maximum and minimum
values, respectively.
Table 1
Experimental results using leave-one-out strategy on the AT&T database.
Feature
matrix
# of
features
Avg. recognition rate

(sensitivity (%))
Avg. specicity
(%)
88
64
99.00
99.97
feature matrix of size (8 8). The average specicity (%) is found to

be 99.99%.
5.1.3. Leave-one-out method
To classify an image of a subject, the image is removed from the
database of N images and placed into a test set. Remaining N 1
images are used in the corresponding training set. In this way,
experiments were performed N times, removing one image from
the database at a time. For the AT&T database, we have performed
400 experimental runs for the database of 400 images. Table 1
shows the average recognition rate (sensitivity (%)) and specicity
(%) using 8 8 image feature matrix. We have achieved 99.00% and
99.97% average recognition rate and specicity (%), respectively.
5.1.4. Comparison with other methods
For a fair comparison, we have implemented the PCA, 2DPCA,
PCA + FLD and 2DFLD algorithms and used the same multi-class
SVM and parameters for classication. The comparisons in terms
of best average recognition rates (sensitivity (%)) and specicity (%)
of the PCA, 2DPCA, PCA + FLD and 2DFLD algorithms along with the
proposed G-2DFLD algorithm using the two different experimental strategies on the AT&T database are shown in Tables 2 and 3,
respectively. Table 2 also shows the comparison of performances
between the proposed method and the neural networks and SVMbased methods as reported in [1618,24,25,53]. It may be noted
that the results reported in [1618,24,25,53] are based on 10, 1, 10,
4, 1 and 4 experimental runs, respectively, whereas, the proposed
method is based on 20 experimental runs. We can see that in all the
cases the performance of the G-2DFLD method is better than the
PCA, 2DPCA, PCA + FLD and 2DFLD methods, and also the methods
reported in [17,18,24,25,53].
Table 4 shows the average feature extraction, recognition and
total times (in s) taken by the G-2DFLD, PCA, 2DPCA, PCA + FLD and
2DFLD methods with 200 training and 200 test images of the AT&T
database using an IBM Intel Pentium 4 Hyper-Threading technology, 3.0 GHz, 2 GB DDR-II RAM computer running on Fedora 9 Linux
Operating Systems. It may be again noted that the proposed G2DFLD method is more efcient than the PCA, 2DPCA, PCA + FLD
and 2DFLD methods in term of total computation time.
UMIST database for different values of s by varying the values of p and q.
5.2. Experiments on the UMIST face database

The UMIST1 face database is a multi-view database, consisting
of 575 gray-scale images of 20 people (subject), each covering a
wide range of poses from prole to frontal views. Each image has
a resolution of 112 92 pixels. Each subject also covers a range of
race, sex and appearance. Unlike the AT&T database, the number of
images per people is not xed; it varies from 19 to 48. Fig. 5 shows
some of the sample images of a subject from the database.
5.2.1. Randomly partitioning the database
Like AT&T database, we randomly select s images from each subject to form the training set and the remaining images are included
in the test set. We choose the value of s as 4, 6, 8 and 10. It may
be again noted that there is no overlap between the training and
test images. For each value of s, experiment is repeated 20 times
with different training and test sets. Fig. 6 shows the recognition
rates (sensitivity (%)) of the G-2DFLD algorithm using a multi-class
SVM. For each value of s, average recognition rates are plotted by
varying the values of p and q. For s = 4, 6, 8 and 10 the best average recognition rates are found to be 86.22%, 92.28%, 95.54% and
96.92%, respectively and the dimension (p q) of the corresponding image feature matrices are (14 14), (14 14), (14 14) and
(18 18), respectively. The average specicity (%) are found to be
99.28%, 99.59%, 99.77% and 99.84% for s = 4, 6, 8 and 10, respectively.
5.2.2. n-Fold cross validation test
Since the number of images per subject varies from 19 to 48,
we have randomly divided the database into 19-folds, taking one
image of a subject into a fold. Therefore, in each fold there are 20
images, each one corresponding to a different subject. For 19-folds
1
At present UMIST database contains 475 images. However, we have used the
earlier version of the UMIST database to test with more number of images.
G Model
ASOC-1029;
ARTICLE IN PRESS
No. of Pages 11
Table 2
Comparison of different methods in terms of average recognition rates (sensitivity (%)) on the AT&T database.
Experiment
Method
Avg. recognition rates (sensitivity (%))
Randomly partition, s images/subject
G-2DFLD
PCA
2DPCA
PCA + FLD
2DFLD
SA-RBF [16]
RBF [17]
DCT + RBF [18]
PCA + SVM [24]
SGFS + SVM [25]
NFL [53]
10-Folds cross validation test
G-2DFLD
PCA
2DPCA
PCA + FLD
2DFLD
s=3
s=4
s=5
s=6
s=7
92.82
85.58
91.27
83.65
92.30
93.86
93.50
95.94
89.42
94.33
88.65
95.08
96.25
97.68
93.10
96.83
92.60
97.50
97.30
96.90
97.55
97.00
95.00
96.87
98.72
95.28
97.72
95.30
98.26
98.42
96.01
97.79
95.83
97.88
99.75
97.00
99.25
98.25
99.00
Table 3
Comparison of different methods in terms of average specicity (%) on the AT&T database.
Experiment
Method
Average specicity (%)
G-2DFLD
PCA
2DPCA
PCA + FLD
2DFLD
G-2DFLD
PCA
2DPCA
PCA + FLD
2DFLD
s=3
s=4
s=5
s=6
s=7
99.82
99.63
99.78
99.58
99.80
99.90
99.73
99.85
99.71
99.87
99.94
99.82
99.92
99.81
99.94
99.97
99.88
99.94
99.88
99.96
99.96
99.90
99.94
99.89
99.95
99.99
99.92
99.98
99.96
99.97
Table 4
Comparison of different methods in terms of average feature extraction, recognition and total times (in s) using 200 training and 200 test images on the AT&T database.
Method
# of features
Avg. feature extraction time (s)
Avg. recognition time (s)
Avg. total time (s)
G-2DFLD
PCA
2DPCA
PCA + FLD
2DFLD
14 14 = 196
60
112 14 = 1568
25
112 14 = 1568
12.95
55.10
32.55
55.75
22.35
53.42
13.75
313.29
13.31
313.03
66.37
68.85
345.84
69.06
335.38
cross validation test, in each experimental run, 18-folds are used

to train the multi-class SVM and remaining 1-fold is used for testing. Therefore, training and test sets consist of 360 and 20 images,
respectively in a particular experimental run. The average recognition rates (sensitivity (%)) by varying the image feature matrix
(i.e. p q) are shown in Fig. 7. The best average recognition rate is
found to be 98.95% using image feature matrix of size (14 14). The
average specicity (%) is found to be 99.95%.
5.2.3. Leave-one-out method
In this experiment, we have performed 575 experimental runs
for the database of 575 images. Table 5 shows the average recognition rate (sensitivity (%)) and specicity (%) using 14 14 image
feature matrix. We have achieved 98.96% and 99.95% average recognition rate and specicity (%), respectively.
Table 5
Experimental results using leave-one-out strategy on the UMIST database.
Feature
matrix
# of
features
Avg. recognition rate

(sensitivity (%))
Avg. specicity
(%)
14 14
196
98.96
99.95
UMIST database for 19-folds cross validation test by varying the values of p and q. The
upper and lower extrema of the error bars represent the maximum and minimum
values, respectively.
G Model
ASOC-1029;
No. of Pages 11
ARTICLE IN PRESS
Table 6
Comparison of different methods in terms of average recognition rates (sensitivity (%)) on the UMIST database.
Experiment
Method
G-2DFLD
PCA
2DPCA
PCA + FLD
2DFLD
SA-RBF [16]
KPCA SVM GSFS [54]
G-2DFLD
PCA
2DPCA
PCA + FLD
2DFLD
Average recognition rates (sensitivity (%))

s=4
s=6
s=8
s = 10
86.22
80.72
85.70
76.31
86.12
89.46
92.28
86.53
91.91
85.69
92.16
92.84
92.30
95.54
94.01
95.07
90.93
95.25
96.36
96.92
95.11
96.60
93.72
96.55
98.95
98.68
98.95
96.36
98.68
Table 7
Comparison of different methods in terms of average specicity (%) on the UMIST database.
Experiment
Method
G-2DFLD
PCA
2DPCA
PCA + FLD
2DFLD
G-2DFLD
PCA
2DPCA
PCA + FLD
2DFLD
Average specicity (%)

s=4
s=6
s=8
s = 10
99.28
98.99
99.25
98.75
99.27
99.59
99.29
99.57
99.25
99.59
99.77
99.68
99.74
99.52
99.75
99.84
99.74
99.82
99.67
99.83
99.95
99.93
99.95
99.81
99.93
5.2.4. Comparison with other methods

For a fair comparison, like AT&T database, we have implemented
the PCA, 2DPCA, PCA + FLD and 2DFLD algorithms and used the same
multi-class SVM and parameters for classication. The comparisons
in terms of the best average recognition rates (sensitivity (%)) and
specicity (%) of the PCA, 2DPCA, PCA + FLD and 2DFLD algorithms
along with the propose G-2DFLD method using the two different
experimental strategies are shown in Tables 6 and 7, respectively.
Table 6 also shows the comparison of performances between the
proposed method and neural networks and SVM-based methods
reported as in [16,54]. The results reported in [16] are based on
10 experimental runs, whereas in [54], the result is based on 1
experimental run using only 380 images out of 575 images of the
database. It may be recalled that the results of the proposed method
are based on 20 experimental runs using all the 575 images. It
may be again noted that in all the cases the performance of the
G-2DFLD method is better than the PCA, 2DPCA, PCA + FLD and
2DFLD methods, excepting in 19-folds cross validation test, where
the performance of the 2DPCA matches with that of the proposed
G-2DFLD method. The G-2DFLD method is also comparable to the
KPCA SVM GSFS method [54] in spite of using more experimental
runs and images.
G-2DFLD algorithm is much smaller than those in the conventional PCA and FLD schemes, the computational time for feature
extraction is much less. Again, the image feature matrix generated
by the G-2DFLD algorithm is much smaller than those of generated by the 2DPCA and 2DFLD algorithms. As a result, the overall
time (feature extraction time + recognition time) of G-2DFLD algorithm is also much lesser than the 2DPCA and 2DFLD algorithms.
Several experiments were carried out on the AT&T and UMIST
databases, using three different experimental strategies; namely,
(i) randomly partitioning the database, (ii) n-fold cross validation
test, and (iii) leave-one-out method, to test the performance of the
proposed method. A non-linear multi-class SVM has been designed
to classify the face images. The experimental results show that the
G-2DFLD method is more efcient than the PCA, 2DPCA, PCA + FLD,
and 2DFLD methods, not only in terms of computation times, but
also for the task of face recognition. The proposed method also
outperforms some of the neural networks and other SVM-based
methods for face recognition reported in the literature.
6. Conclusion
This work was supported by the UGC major research project (F.
No.: 37-218/2009(SR), dated: 12-01-2010), CMATER and SRUVM
projects of the Department of Computer Science & Engineering,
Jadavpur University, Kolkata, India. The author, Shiladitya Chowdhury would like to thank Techno India, Kolkata for providing
computing facilities and allowing time for conducting research
works. The author, D. K. Basu would also like to thank the AICTE,
New Delhi for providing him the Emeritus Fellowship (F. No.: 151/RID/EF(13)/2007-08, dated 28-02-2008). Last but not the least;
the authors would also like to thank the anonymous reviewers for
their constructive suggestions to improve quality of the paper.
In this paper, we have presented a face recognition system by

proposing a novel feature extraction method, namely, generalized
two-dimensional FLD (G-2DFLD) method, which is based on the
original 2D image matrix. The G-2DFLD algorithm maximizes class
separability from both the row and column directions simultaneously, resulting in smaller image feature matrix. To realize this, we
have dened two alternative Fishers criteria. The principal components extracted from an image matrix by the G-2DFLD method
are scalars. Since the size of the scatter matrices in the proposed
Acknowledgements
G Model
ASOC-1029;
No. of Pages 11
10
ARTICLE IN PRESS
References
[1] A. Samal, P. Iyengar, Automatic recognition and analysis of human faces and
facial expressions: a survey, Pattern Recogn. 25 (1992) 6577.
[2] R. Chellappa, C.L. Wilson, S. Sirohey, Human and machine recognition of faces:
a survey, Proc. IEEE 83 (1995) 705740.
[3] W. Zhao, R. Chellappa, P.J. Phillops, A. Rosenfeld, Face recognition: a literature
survey, ACM Comput. Surveys 35 (2003) 399458.
[4] A.S. Tolba, A.H. El-Baz, A.A. El-Harby, Face recognition: a literature review, Int.
J. Signal Process. 2 (2006) 88103.
[5] L. Sirovich, M. Kirby, Low-dimensional procedure for the characterization of
human faces, J. Opt. Soc. Am. 4 (1987) 519524.
[6] M. Kirby, L. Sirovich, Application of the KL procedure for the characterization
of human faces, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 103108.
[7] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces versus sherfaces:
recognition using class specic linear projection, IEEE Trans. Pattern Anal.
Mach. Intell. 19 (1997) 711720.
[8] C. Liu, H. Wechsler, A shape- and texture-based enhanced sher classier for
face recognition, IEEE Trans. Image Process. 10 (2001) 598608.
[9] W. Zhao, R. Chellappa, A. Krishnaswamy, Discriminant analysis of principal
components for face recognition, in: Proceedings of the International Conference on Automatic Face and Gesture Recognition, 1998, pp. 336341.
[10] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press,
New York, 1990.
[11] M.J. Er, S. Wu, J. Lu, H.L. Toh, Face recognition with radial basis function (RBF)
neural networks, IEEE Trans. Neural Netw. 13 (2002) 697710.
[12] J. Yang, D. Zhang, A.F. Frangi, J.Y. Yang, Two-dimensional PCA: a new approach
to appearance-based face representation and recognition, IEEE Trans. Pattern
Anal. Mach. Intell. 26 (2004) 131137.
[13] H. Xiong, M.N.S. Swamy, M.O. Ahmad, Two-dimensional FLD for face recognition, Pattern Recogn. 38 (2005) 11211124.
[14] B. Moghaddam, A. Pentland, Probabilistic visual learning for object representation, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 696710.
[15] J.K. Sing, D.K. Basu, M. Nasipuri, M. Kundu, Face recognition using point symmetry distance-based RBF network, Appl. Soft Comput. 7 (2007) 5870.
[16] J.K. Sing, S. Thakur, D.K. Basu, M. Nasipuri, M. Kundu, High-speed face recognition using self-adaptive radial basis function neural networks, Neural Comput.
Appl. 18 (2009) 979990.
[17] F. Yang, M. Paindovoine, Implementation of an RBF neural network on embedded systems: real-time face tracking and identity verication, IEEE Trans.
Neural Netw. 14 (2003) 11621175.
[18] M.J. Er, W. Chen, S. Wu, High-speed face recognition based on discrete cosine
transform and RBF neural networks, IEEE Trans. Neural Netw. 16 (2005)
679691.
[19] J. Haddadnia, K. Faez, M. Ahmadi, A fuzzy hybrid learning algorithm for radial
basis function neural network with application in human face recognition,
Pattern Recogn. 36 (2003) 11871202.
[20] P.J. Phillips, Support vector machines applied to face recognition, Adv. Neural
Inform. Process. Syst. 11 (1998) 803809.
[21] C. Zhaohui, H. Guiming, Face recognition using multi-class BSVM with component features, in: Proceedings of the 2005 IEEE International Conference on
Neural Networks and Brain, 2005, pp. 14491452.
[22] C.-H. Lee, S.-W. Park, W. Chang, J.-W. Park, Improving the performance of multiclass SVMs in face recognition with nearest neighbor rule, in: Proceedings of the
15th IEEE International Conference on Tools with Articial Intelligence, 2003.
[23] J. Ko, H. Byun, Combining SVM classiers for multiclass problem: its application
to face recognition, in: Proceedings of 4th International Conference on Audioand Video-Based Biometrie Person Authentication, 2003, pp. 531539.
[24] G.D. Guo, S.Z. Li, K.L. Chen, Support vector machine for face recognition, J. Image
Vis. Comput. 19 (2001) 631638.
[25] L. Wang, Y. Sun, A new approach for face recognition based on SGFS and SVM,
in: Proceedings of IEEE, 2007, pp. 527530.
[26] S. Thakur, J.K. Sing, D.K. Basu, M. Nasipuri, Face recognition using Fisher linear
discriminant analysis and support vector machine, in: Proceedings of the 2nd
International Conference on Contemporary Computing, 2009, pp. 318326.
[27] M.-H. Yang, N. Ahuja, D. Kraegman, Face recognition using kernel eigenfaces,
in: Proceeding of the IEEE International Conference on Image Processing, 2000,
pp. 3740.
[28] K.I. Kim, K. Jung, H.J. Kim, Face recognition using kernel principal component
analysis, IEEE Signal Proc. Lett. 9 (2002) 4042.
[29] V.D.M. Nhat, S.Y. Lee, Kernel-based 2DPCA for face recognition, in: Proceedings
of the IEEE International Symposium on Signal Proc. and Info. Tech, 2007, pp.
3539.
[30] S. Mika, G. Ratsch, J. Weston, Fisher discriminant analysis with kernels, in:
Proceedings of the Neural Networks Signal Processing Workshop, 1999, pp.
4148.
[31] G. Baudat, F. Anouar, Generalized discriminant analysis using a kernel
approach, Neural Comput. 12 (2000) 23852404.
[32] O. Liu, X. Tang, H. Lu, S. Ma, Face recognition using kernel scatter-differencebased discriminant analysis, IEEE Trans. Neural Netw. 17 (2006) 10811085.
[33] R. Zhi, Q. Ruan, Two-dimensional direct and weighted linear discriminant analysis for face recognition, Neurocomputing 71 (2008) 36073611.
[34] J. Wang, W. Yang, Y. Lin, J. Yang, Two-directional maximum scatter difference discriminant analysis for face recognition, Neurocomputing 72 (2008)
352358.
[35] X.-N. Song, Y.-J. Zheng, X.-J. Wu, X.-B. Yang, J.-Y. Yang, A complete fuzzy discriminant analysis approach for face recognition, Appl. Soft Comput. 10 (2010)
208214.
[36] X. Jiang, B. Mandal, A. Kot, Eigenfeature regularization and extraction in face
recognition, IEEE Trans. Pattern Anal. Mach. Intell. 30 (2008) 383394.
[37] M.D. Kelly, Visual Identication of People by Computer, Tech. rep. AI-130, Stanford AI Project, Stanford, CA, 1970.
[38] T. Kanade, Computer Recognition of Human Faces, Birkhauser, Basel,
Switzerland, and Stuttgart, Germany, 1973.
[39] I.J. Cox, J. Ghosn, P.N. Yianilos, Feature-based face recognition using mixture distance, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 1996, pp. 209216.
[40] F. Samaria, S. Young, HMM based architecture for face identication, Image Vis.
Comput. 12 (1994) 537583.
[41] A.V. Nean, M.H. Hayes III, Hidden Markov Models for face recognition, in:
Proceedings of the International Conference on Acoustics, Speech and Signal
Processing, 1998, pp. 27212724.
[42] L. Wiskott, J.-M. Fellous, C. Von Der Malsburg, Face recognition by elastic bunch
graph matching, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 775779.
[43] A. Pentland, B. Moghaddam, T. Starner, View-based and modular eigenspaces
for face recognition, in: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 1994, pp. 8491.
[44] P. Penev, J. Atick, Local feature analysis: a general statistical theory for object
representation, Netw. Comput. Neural Syst. 7 (1996) 477500.
[45] A. Lanitis, C.J. Taylor, T.F. Cootes, Automatic face identication system using
exible appearance models, Image Vis. Comput. 13 (1995) 393401.
[46] J. Huang, B. Heisele, V. Blanz, Component-based face recognition with 3D morphable models, in: Proceedings of the International Conference on Audio- and
Video-Based Person Authentication, 2003, pp. 2734.
[47] C. Cortes, V. Vapnik, Support-vector network, Mach. Learn. 20 (1995) 273297.
[48] V.N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, 1998.
[49] J. Platt, Fast training of SVMs using sequential minimal optimization, in:
Advances in Kernel Methods Support Vector Machine, MIT Press, Cambridge,
1999, pp. 185208.
[50] S. Knerr, L. Personnaz, G. Dreyfus, Nurocosingle-Layer Learning Revisited: A
Stepwise Procedure for Building and Training a Neural Network, Springer, 1990.
[51] The Database of Faces, AT&T Laboratories, Cambridge, U.K. [Online]. Available:
http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.
[52] D.B. Graham, N.M. Allinson, in: H. Wechsler, P.J. Phillips, V. Bruce, F. FogelmanSoulie, T.S. Huang (Eds.), Characterizing Virtual Eigensignatures for General
Purpose Face Recognition: From Theory to Applications, vol. 163, NATO ASI
Series F, Computer and Systems Sciences, 1998, pp. 446456.
[53] S.Z. Li, J. Lee, Face recognition using the nearest feature line method, IEEE Trans.
Neural Netw. 10 (1999) 439443.
[54] W. Li, W. Gang, Y. Liang, W. Chen, Feature selection based on KPCA, SVM and
GSFS for face recognition, in: Proceedings of the International Conference on
Advances in Pattern Recognition, 2005, pp. 344350.
Shiladitya Chowdhury got his Bachelor of Technology

degree in Computer Science and Engineering from West
Bengal University of Technology, Kolkata, India, in 2005
and the Master of Technology degree in Computer Technology from Jadavpur University, Kolkata, India, in 2009.
He is working as a Lecturer, at the Department of Master
of Computer Application in Techno India, Kolkata, India
since January 2007. He is currently pursuing his Doctorate
Degree in Engineering at Jadavpur University. His research
interests include face recognition, pattern recognition and
image processing.
Jamuna Kanta Sing received his B.E. (Computer Science

& Engineering) degree from Jadavpur University in 1992,
M.Tech. (Computer & Information Technology) degree
from Indian Institute of Technology (IIT), Kharagpur in
1993 and Ph.D. (Engineering) degree from Jadavpur University in 2006. Dr. Sing has been a faculty member
of the Department of Computer Science & Engineering,
Jadavpur University since March 1997. He has done his
Post Doctoral research works as a BOYSCAST Fellow of
the Department of Science & Technology, Govt. of India,
at the University of Pennsylvania and the University of
Iowa during 2006. He is a member of the IEEE, USA.
His research interest includes face recognition/detection,
medical image processing, and pattern recognition.
G Model
ASOC-1029;
No. of Pages 11
ARTICLE IN PRESS
Dipak Kumar Basu received his B.E.Tel.E., M.E.Tel., and

Ph.D. (Engg.) degrees from Jadavpur University, in 1964,
1966 and 1969 respectively. Prof. Basu was a faculty member of the Department of Computer Science & Engineering,
Jadavpur University from 1968 to January 2008. He is
presently an A.I.C.T.E. Emeritus Fellow at the Department
of Computer Science & Engineering, Jadavpur University.
His current elds of research interest include pattern
recognition, image processing, and multimedia systems.
He is a senior member of the IEEE, USA, Fellow of IE (India)
and WBAST, Kolkata, India and a former Fellow of Alexander Von Humboldt Foundation, Germany.
11
Mita Nasipuri received her B.E.E., M.E.Tel.E. and Ph.D.

(Engg.) degrees from Jadavpur University, in 1979, 1981
and 1990, respectively. Prof. Nasipuri has been a faculty member of the Department of Computer Science
& Engineering, Jadavpur University since 1987. Her current research interest includes image processing, pattern
recognition, and multimedia systems. She is a senior
member of the IEEE, USA, Fellow of IE (India) and WBAST,
Kolkata, India.

Face Recognition by Generalized Two-Dimensional FLD Method and Multi-Class SVM

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Face Recognition by Generalized Two-Dimensional FLD Method and Multi-Class SVM

Transféré par

Droits d'auteur :

Formats disponibles

G Model

Contents lists available at ScienceDirect

Applied Soft Computing

Face recognition by generalized two-dimensional FLD method and multi-class

1.1. Holistic matching methods

Since the last decade, human face recognition is an active

These methods use whole face region as the raw input to a

column directions simultaneously. Like 2DFLD method, G-2DFLD

Let there are N training images, each one is denoted by m n

Nc (c )T (c )

3.1. Key idea and the algorithm

3.1.1. Alternate Fishers criteria

Nc (c )(c )T

where {qi |i = 1, 2, . . ., k} is the set of normalized eigenvectors of

Nc (c )T (c )

where Q is the projection matrix.

Then the two-dimensional Fishers criterion J(Q) is dened as

where U and V are two projection matrices of dimension m p

3. Generalized two-dimensional FLD (G-2DFLD) method for

where X i is mean-subtracted image of Xi and dened as follows:

Now, each face image Xi (i = 1, 2, . . ., N) is projected into the

S. Chowdhury et al. / Applied Soft Computing xxx (2011) xxxxxx

where {ui |i = 1, 2, . . ., p} is the set of normalized eigenvectors of

Given labeled N training samples,

4. Support vector machines

The discriminant function implemented by a support vector

The solution to the optimization problem of (22) subject to the

where i is the Lagrange multiplier of the training samples. The

We can derive two optimal conditions from Eq. (24) as follows:

S. Chowdhury et al. / Applied Soft Computing xxx (2011) xxxxxx

b = (w xi ) yi for some i > 0

For a new sample x, the classication is dened as follows:

In face recognition domain, due to variations in illumination,

4.2. Multi-class support vector machines

The samples with the desired output yi = +1 are called positive

where r is a positive integer, > 0.

The dual objective function (28) in a non-linear SVM becomes

where C is a regularization parameter, controlling a compromise

0 < i < C yi f (xi ) = 1,

In our work, the dual objective function (38) is solved by the

where l is the number of experimental runs, each one of which has

5.1. Experiments on the AT&T face database

Fig. 1. Sample images of a subject from the AT&T database.

rejecting an intruder. They can be computed as follows:

where TP is the total number of faces correctly recognized (true

5.1.1. Randomly partitioning the database

S. Chowdhury et al. / Applied Soft Computing xxx (2011) xxxxxx

Fig. 5. Some sample images of a subject from the UMIST database.

Avg. recognition rate

feature matrix of size (8 8). The average specicity (%) is found to

5.2. Experiments on the UMIST face database

S. Chowdhury et al. / Applied Soft Computing xxx (2011) xxxxxx

Avg. recognition rates (sensitivity (%))

Randomly partition, s images/subject

10-Folds cross validation test

Average specicity (%)

Randomly partition, s images/subject

10-Folds cross validation test

Avg. feature extraction time (s)

Avg. recognition time (s)

Avg. total time (s)

cross validation test, in each experimental run, 18-folds are used

Avg. recognition rate