Face Recognition in A Video

1-4244-1355-9/07/$25.
00 @2007 IEEE
International Conference on Intelligent and Advanced Systems 2007
742 ~
Face recognition in video, a combination oI eigenIace
and adaptive skin-color model
Le Ha Xuan Supot Nitsuwat
Department oI Computer Science IV Faculty oI InIormation Technology
RWTH Aachen University, King Mongkut`s Institute oI Technology North Bangkok
Ahornstr. 55, 52074 Aachen, Germany Phibulsongram , Bangkok, Thailand
xuanlhgmail.com snskmitnb.ac.th

Abstract- A general statement of face recognition problem (in
computer vision) can be formulated as follows: Given still or
video images of a scene, identify or verify one or more persons in
the scene using a database of stored faces. For the fully automatic
system of face recognition, a lot of researches have been done in
many directions that eliminate the difficulties of finding and
identifying faces with the illumination and pose problems.
However, most of them just concentrate on these issues
separately instead of consider the efficiency of the whole system
while both method in face detection and recognition work
together. This research tried to make compensation between
those two main components. In the recognition step, Eigenface
technique was used to normalize faces in the database. At first,
we calculate the face covariance matrix and the associated face
eigenvectors. These will be the face prints. Different vectors of
weights can represent different faces. After that, any face can be
reconstructed from a set of weights. So we can recognize a new
picture of a familiar face. For the face detection process, skin-
color model was used to separate skin areas from the background
in order to cut out the face block for recognizing. The whole
process of face recognition can be separate into two phases:
Training phase and Recognition phase. At first, we train the
recognition system by using a sequence of images for each person
to create the alternative face print. In the second phase, the result
of the first phase will be used to match with the input face image
in the video stream and then show the conclusion on whether
people in video is granted or not. The experiments included both
aspects: independent component and holistic system on the
popular databases of face detection, recognition in still image and
video. The result achieved a recognition rate at from 80 percents
to 100 percents with a small number of 10 or 24 training samples.

1. INTRODUCTION
During the last couple oI years more and more research
has been done in the area oI Iace recognition Irom image
sequences. Recognizing humans Irom real surveillance videos
is diIIicult because oI the low quality oI images and because
Iace images are small. Still, a lot oI improvement has been
made.
The still image problem has several inherent
advantages and disadvantages. For applications such as
drivers` licenses, due to the controlled nature oI the image
acquisition process, the segmentation problem is rather easy.
However, iI only a static picture oI an airport scene is
available, automatic location and segmentation oI a Iace could
pose serious challenges to any segmentation algorithm. On the
other hand, iI a video sequence is available, segmentation oI a
moving person can be more easily accomplished using motion
as a cue. But the small size and low image quality oI Iaces
captured Irom video can signiIicantly increase the diIIiculty in
recognition. |4, 5, 6|.
The recent FERET test |8| has revealed that there are at least
two major challenges:
- The illumination variation problem
- The pose variation problem
1.1 The illumination problem
Images oI the same Iace appear diIIerently due to the change
in lighting. II the change induced by illumination is larger than
the diIIerence between individuals, systems would not be able
to recognize the input image. To handle the illumination
problem, researchers have proposed various methods. It has
been suggested that one can reduce variation by discarding the
most important eigenIace. And it is veriIied in |9| that
discarding the Iirst Iew eigenIaces seems to work reasonably
well. However, it causes the system perIormance degradation
Ior input images taken under Irontal illumination.
In |10|, diIIerent image representations and distance measures
are evaluated. One important conclusion that this paper draws
is that none oI these methods is suIIicient by itselI to
overcome the illumination variations. More recently, a new
image comparison method was proposed by Jacobs et al. |11|.
However this measure is not strictly illumination-invariant
because the measure changes Ior a pair oI images oI the same
object when the illumination changes.
An illumination subspace Ior a person has been constructed in
|12,13| Ior a Iixed view point. Thus under Iixed view point,
recognition result could be illuminationinvariant. One
drawback to use this method is that we need many images per
person to construct the basis images oI illumination subspace.
In |14|, the authors suggest using Principal Component
Analysis (PCA) to solve parametric shape-Irom-shading (SFS)
problem. Their idea is quite simple. They reconstruct 3D Iace
surIace Irom single image using computer vision techniques.
Then compute the Irontal view image under Irontal
illumination. Very good results are demonstrated.
1.2 The pose problem
The system perIormance drops signiIicantly when pose
variations are present in input images. Basically, the existing
solution can be divided into three types: 1) multiple images
~ 743
per person are required in both training stage and recognition
stage, 2) multiple images per person are used in training stage
but only one database image per person is available in
recognition stage, 3) single image based methods. The second
type is the most popular one.
1.3 Multiple images approaches
An illumination-based image synthesis method |15| has been
proposed Ior handling both pose and illumination problems.
This method is based on illumination cone to deal with
illumination variation. For variations due to rotation, it needs
to completely resolve the GBR (generalized-bas-relieI)
ambiguity when reconstructing 3D surIace.
1.4 Hybrid approaches
So many algorithms oI this type have been proposed. It is
probably the most practical solution up to now. Three
reprehensive methods are: 1) linear class based method |16|,
2) graph matching based method |17|, 3) view-based
eigenIace method |18|. The image synthesis method in |16| is
based on the assumption oI linear 3D object classes and
extension oI linearity to images. In |17|, a robust Iace
recognition scheme based on EBGM is proposed. They
demonstrate substantial improvement in Iace recognition
under rotation. Also, their method is Iully automatic, including
Iace localization, landmark detection and graph matching
scheme. The drawback oI this method is the requirement oI
accurate landmark localization which is not easy when
illumination variations are present. The popular eigenIace
approach |19| has been modiIied to achieve pose-invariant
|18|. This method constructs eigenIaces Ior each pose. More
recently, a general Iramework called bilinear model has been
proposed |20|. The methods in this category have some
common drawbacks: 1) they need many images per person to
cover possible poses. 2) The illumination problem is separated
Irom the pose problem.
1.5 Single Image Based Approaches
Gabor wavelet based Ieature extraction is proposed Ior Iace
recognition |21| and is robust to small-angle rotation. There
are many papers on invariant Ieatures in computer vision
literature. There are little literatures talking about the using
this technology to Iace recognition. Recent work in |22| sheds
some light in this direction. For synthesizing Iace images
under diIIerent lighting or expression, 3D Iacial models have
been explored in |23|. Due to its complexity and computation
cost it is hard to apply this technology to Iace recognition.
0(7+2'6
One oI the holistic methods, view-based eigenIaces is simple,
Iast and accurate in constrained environments. On contrary,
adaptive skin-color model can provide a complete Iace area
without any other unwanted pixels. These will be a large
advantage Ior the eigenIaces recognition process which
considers a Iace area as a matrix oI pixels.
2.1 Face detection
Adaptive skin-color model is an emerging method Ior Iace
detection problem. In this method, there are no necessary to
take time, Ior instance, to train a neural network in the step oI
Iace detection. As in |24|, this algorithm is used to detect Iaces
independently oI the background color oI the scene by the
Iollowing steps:
- Represent image by the chromatic color system.
- Segment skins area Irom other parts base on the diIIerent
chromatic color oI the skin.
- Separate the Iace area Irom skin areas by the Ieature oI Iace.
- Matching the Iace template into the suspected regions to
select the certain Iace area.
a. Skin segmentation:
Skin colors oI diIIerent people are very close, but they diIIer
mainly in intensities |25|. This Iinding is the Iundamental Ior
developing a skin-color model that can separate the skin
regions Irom the background. At Iirst, the system needs to be
train to determine the color distribution oI human skin in
chromatic color space, which can be transIormed as Iollow:
r
B G R
R

(1)
b
B G R
B

(2)
The green color is redundant aIter the normalization because
rgb 1;
The training process use a set oI 121968 pixels oI skin
samples Irom 33 color images. Training samples were taken
Irom persons oI diIIerent ethnicities: Asian, Caucasian and
AIrican. This will cover the probability oI all skin colors. The
color histogram revealed that the distribution oI skin-color oI
diIIerent people are clustered in the chromatic color space and
a skin color distribution can be represented by a Gaussian
model N(m, C), where:
Mean: m E { x } where x (r b)
T
(3)
Covariance: C E {(x m)(x m)
T
}. (4)
ThereIore, iI a pixel, having transIorm Irom RGB color space
to chromatic color space, has a chromatic pair value oI (r,b),
the likelihood oI the skin Ior this pixel can then be computed
as Iollows:
Likelihood P(r,b) exp|-0.5 (x-m)
T
C
-1
(x-m)| (5)
Where: x (r b)
T
(6)
Hence, this skin color model can transIorm a color image into
a gray scale image such that the gray value at each pixel
shows the likelihood oI the pixel belonging to the skin. With
appropriate thresholding, the gray scale images can then be
Iurther transIormed to a binary image, showing skin regions
and non-skin regions.
The threshold value at which the minimum increase in region
size is observed while stepping down the threshold value will
be the optimal threshold. In this program, the threshold value
is decremented Irom 0.65 to 0.05 in steps oI 0.1. II the
minimum increase occurs when the threshold value was
changed Irom 0.45 to 0.35, then the optimal threshold will be
taken as 0.4.

744 ~
FIGURE 2-1 Skin-likelihood image.
Using this technique oI adaptive thresholding, many images
yield good results; the skin-colored regions are eIIectively
segmented Irom the non-skin colored regions. The skin
segmented image oI the previous color image resulting Irom
this technique is shown in Figure 2-1.

FIGURE 2-2. The skin-Segmented image
b. Face segmentation
- Skin Regions.
A skin region is a set oI connected components within an
image. It can be deIined as a closed region in the image, which
can have 0, 1 or more holes inside it. Its color boundary is
represented by pixels with value 1 Ior binary images. All holes
in a binary image have a pixel value oI zero (black).
Figure 2-2 shows the segmented skin regions Irom the last
section as well as a particular skin region selected by the
system that corresponds to the Iace oI the image.

FIGURE 2-3 Segmented skin regions and a skin
region

AIter experimenting with several images, we decided that a
skin region should have at least one hole inside that.
Some oI Iaces have a little inclination a higher matching need
a rotation oI the template Iace in the right angle. One way to
determine a unique orientation is by elongating the object. The
orientation oI the axis oI elongation will determine the
orientation oI the region.
The axis will be computed by Iinding the line Ior which the
sum oI the squared distances between region points and the
line is minimized. In other words, we compute the least-
squares oI a line to the region points in the image |26|. At the
end oI the process, the angle oI inclination (theta) is given by:

2
1
atan
c a
b

where:
n
i
m
f
a
1 1

2
'
if
x | , | f i B
n
i
m
f
b
1 1
2
2
'
if
x
2
'
if
v | , | f i B
n
i
m
f
c
1 1

2
'
if
v | , | f i B

and:
'
x x - x

'
v v - v
- Width and height of the region.
We still need to determine the width and height oI the region
in order to resize our template Iace so it has the same width
and height oI our region.
First, we Iill out any holes that the region might have. This is
to avoid problems when we encounter holes. Since the image
is rotated at some angle theta, the need to rotate our region -
theta degrees so that it is completely vertical. We now proceed
to determine the height and width by moving 4 pointers: one
Irom the leIt, right, top and bottom oI the image. II we Iind a
pixel value diIIerent Irom 0, we stop and this is the coordinate
oI a boundary. When we have obtained the 4 values, we
compute the height by subtracting the bottom and top values,
and the width by subtracting the right and the leIt values.
We can use the width and the height oI the region to improve
our decision process. The height to width ratio oI the human
Iaces is around 1. In order to have less misses however, we
determined that a minimum good value is 0.8. Ratio values
below 0.8 do not suggest a Iace, since human Iaces are
oriented vertically.
While the above improves the classiIication, it can also be a
drawback Ior cases such as the arms that are very long. II the
skin region Ior the arms has holes near the top, this might
result in a Ialse classiIication.
- Template Face and Template Matching.
One oI the most important characteristics oI this method is
using a human Iace template determine iI a skin region
represents a Iace. The template was chosen by averaging 16
Irontal view Iaces oI males and Iemales wearing no glasses
and having no Iacial hair. It is shown in Figure 2-4. Notice
that the leIt and right borders oI the template are located at the
center oI the leIt and right ears oI the averaged Iaces. The
template is also vertically centered at the tip oI the nose oI the
model.

(7)
(8)
(9)
(11)
(10)
~ 745

FIGURE 2-4 Template Iace model

By the result above, we can Iind the Iace areas by looking Ior
the Iace`s hypothesis Ieatures. Our method will consider these
are the holes inside the regions oI a binary image. By applying
the Iace template into the 'suspected areas, we will determine
which one is the Iace area.
We empirically determined, Irom our experiments, that a good
threshold value Ior classiIying a region as a Iace is iI the
resulting autocorrelation value is greater than 0.6.
AIter the system decided that the skin regions correspond to a
Irontal human Iace, we get a new image with a hole exactly
the size and shape oI that oI the processed template Iace. We
then invert the pixel values oI this image to generate a new
one, which, multiplied by the original grayscale image, will
yield an image as the original one, but with the template Iace
located in the selected skin region. This is shown in Figure 2-
5, in which the Iace is replaced by the template Iace.

We Iinally get the coordinates oI the part oI the image that has
the template Iace. With these coordinates, we draw a rectangle
in the original color image. This is the output oI the system
which in this case, detected the Iace as shown in Figure 26.

FIGURE 2-6 Final Result

2.2 Face recognition
The Iace recognition component used eigenIace technique.
The model oI view-based eigenIaces is simple, Iast and
accurate in constrained environments. The goal oI using this is
to implement a model Ior a particular Iace and distinguish it
Irom a large number oI stored Iaces with some real-time
variations. |19|
The scheme is based on an inIormation theory approach
that decomposes Iace images into a small set oI characteristic
Ieature images called eigenIaces`, which are actually the
principal components oI the initial training set oI Iace images.
Recognition is perIormed by projecting a new image into the
subspace spanned by the eigenIaces (Iace space`) and then
classiIying the Iace by comparing its position in the Iace space
with the positions oI the known individuals.
According to |19|, the process oI Iace recognition
process using eigenIaces can be described as Iollow:
a. Pre-Processing
- The Iirst step is to obtain a set S with M Iace images. In our
example 0 25
S ^ `
0
* * * * ,..., , ,
3 2 1

as shown at the beginning oI the tutorial. Each image is
transIormed into a vector oI size 1 and placed into the set.
- AIter obtained the set, we will obtain the mean image

*
0
Q
Q
0
1
1

- Then we will Iind the diIIerence between the input image
and the mean image

FIGURE 2-5 Adding the Template Iace to image.
FIGURE 2-7 Mean image

(14) < * )
L L

(13)
(12)
746 ~
b. EigenIaces calculation
- Next we seek a set oI M orthonormal vectors, u
n
, which best
describes the distribution oI the data. The k
th
vector, u
k
, is
chosen such that
k
O
2
1
) (
1
)
m
n
n
T
k
u
M

is a maximum, subject to

lk k
T
l
u u G
0
1

Note: u
k
and
k
are the eigenvectors and eigenvalues oI the
covariance matrix C.
- We obtain the covariance matrix C in the Iollowing manner
C
) )
m
n
T
n n
M
1
1
AA
T

^ `
n
) ) ) ) ,..., , ,
3 2 1

- Since the C matrix is an N
2
x N
2
matrix, computing its
eigenvectors is not computationally Ieasible. Instead, we Iind
the eigenvectors v
l
oI the new matrix L A
T
A which has
the same eigenvectors oI our matrix C AA
T
L
mn

n
T
m
) )
- Once we have Iound the eigenvectors, v
l
, oI the L matrix, we
can Iind our eigenIaces u
l

u
1

)
m
n
k lk
v
1
l 1,., M

c. Recognition
1. A new Iace is transIormed into its eigenIace components.
First we compare our input image with our mean image and
multiply their diIIerence with each eigenvector oI the L
matrix. Each value would represent a weight and would be
saved on a vector O.
O |
1
,
2.

M
|
The weight vector is given by
k
u
k
T
(- )
k
H
k
: :
2
where weight, eigenvector, input image,
mean Iace
3. The input Iace is considered to belong to a class iI
k
bellow an established threshold H . Then the Iace image is
considered to be a known Iace. II the diIIerence is above the
given threshold, but bellow a second threshold, the image can
be determined as an unknown Iace. II the input image is above
these two thresholds, the image is determined NOT to be a
Iace.
4. II the image is Iound to be an unknown Iace, we could
decide whether or not to add the image to the training set Ior
Iuture recognitions.

3. EXPERIMENTAL RESULT

Database
Aumber of
images
Detection
rate (")
Missing rate (" )
UCDColour 94 40 71
VLIDDBase 530 23 80
Bao 222 80 24
Single 100 88 20
Multi 100 78 25

Database 1raning Examples Aumber Recognition Rate(")
UMIST 433 88
UoT 1823 90
Yale 125 76
JAFFE 176 79
IOEC 82 95

Database 1raning Examples Aumber Recognition Rate(")
UMIST 288 76
UoT 2215 85
Yale 84 66
JAFFE 107 71
IOEC 64 88

Database
The number
of frame
Missing rate
()
Recognition Rate
(")
CMUMobo 2700 25 82
MaxPlanck 960 8 92
XM2VTSDB 400 0 100

TABLE 5-1 Experimental result oI detection testing
TABLE 5-2 Experimental result oI recognition testing with
total samples in the cropped-Iace still image
TABLE 5-3 Experimental result oI recognition testing 75
samples
TABLE 5-4 Experimental result oI recognition testing with 75
samples
(15)
(16)
iI l k
otherwise
(19)
(18)
(21)
(20)
(22)
(17)
~ 747

Database
The number
of frame
Missing rate
()
Recognition
Rate (")
CMUMobo 2700 25 82
MaxPlanck 960 8 92
XM2VTSDB 400 0 100

4. CONCLUSION
- Recognizing Iaces Irom video is probably the most diIIicult
problem. One should be able to tract the location oI Iace,
estimate the pose oI Iace and then recognize Iace. However,
skin-color model is an emerging method that can provide us
an accurate result Ior the detection phase. We can expand it to
a Iace recognition system by adding the view based
eigenspace algorithm. Our system Iollowed this direction.
- The compensation between continuous tasks in the whole
system is very important.
- Face detection step plays an important rule. By improving
the quality oI this step, we can have a tremendous beneIit in
our work.

REFERENCES

|4| S. Zhou, V. Krueger, R. Chellappa, Probabilistic
recognition oI human Iaces Irom video, Computer Vision and
Image Understanding, Vol. 91, 2003, pp. 214-245
|5| S. Zhou, R. Chellappa, B. Moghaddam, Visual tracking
and recognition using appearance-adaptive models in particle
Iilters, IEEE Trans. on Image Processing, Vol. 13, No. 11, pp.
1491-1506, (November 2004)
|6| S. Zhou, R. Chellappa, Probabilistic identity
characterization Ior Iace recognition, Proc. oI the IEEE
Computer Society ConIerence on Computer Vision and
Pattern Recognition, CVPR, Washington, DC, USA (27 June -
02 July 2004)
|4| Wen Yi Zhao, Chellappa'SFS based view synthesis Ior
robust Iace recognition. Proceedings. Fourth IEEE
International ConIerence on Automatic Face and Gesture
Recognition. (2000) : 285 292,
|8| Hyeonjoon Moon, Rizvi, S.A, Rauss Phillips, P.J. 'The
FERET evaluation methodology Ior Iace-recognition
algorithms. IEEE Transactions on Pattern Analysis and
Machine Intelligence. 22 (10 Oct 2000) : 1090-1104.
|9| P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman.
'EigenIaces vs. FisherIaces: recognition using class speciIic
linear projection IEEE Transactions on Pattern Analysis and
Machine Intelligence. Vol.19, Issue: 7, ( Jul 1997) : 711 -720.
|10| Y. Adini, Y. Moses, and S. Ullman. 'Face recognition:
the problem oI compensating Ior changes in illumination
direction, IEEE Transactions on Pattern Analysis and
Machine Intelligence. Vol.19, Issue: 7, (1997, Jul) : 721 -732.
|11| D.W. Jacobs, P.N. Belhumeur, and R. Basri.
'Comparing images under variable illumination.
Proceedings, IEEE Computer Society ConIerence on Computer
Vision and Pattern Recognition, (1998, Jun) : 610 -617.
|12| P.N. Belhumeur, D.J. Kriegman, 'What is the set oI
images oI an object under all possible lighting conditions?
Proceedings CVPR '96. IEEE Computer Society ConIerence
on Computer Vision and Pattern Recognition. (1996, Jun 18-
20) : 270 -277.
|13| P.W. Hallinan, 'A low-dimensional representation oI
human Iaces Ior arbitrary lighting conditions, Proceedings
CVPR '94, IEEE Computer Society ConIerence on Computer
Vision and Pattern Recognition. (21-23 Jun 1994) : 995 -999,
|14| Joseph J. Atick, Paul A. GriIIin, A. Norman Redlich.
'Statistical Approach to Shape Irom Shading: Reconstruction
oI 3D Face SurIaces Irom Single 2D Images Neural
Computation. 8 (1996) : 1321-1340.
|15| A.S. Georghiades, P.N. Belhumeur, D.J. Kriegman.
'Illumination-based image synthesis: creating novel images oI
human Iaces under diIIering pose and lighting, Proceedings,
IEEE Workshop on Multi-View Modeling and Analysis oI
Visual Scenes. (MVIEW '99) (1999) : 47 -54.
|16| Vetter, T, Poggio Linear object classes and image
synthesis Irom a single example image, 7, IEEE Transactions
on Pattern Analysis and Machine Intelligence. Vol.19 Issue:
7. (1997, Jul) : 733 -742.
|17| Wiskott, L, Fellous, J.-M, Kuiger, N, von der Malsburg,
Face recognition by elastic bunch graph matching, , IEEE
Transactions on&Pattern Analysis and Machine Intelligence.
Vol.19 Issue: 7, (Jul 1997) : 775 -779.
|18| Pentland, A, Moghaddam, B, Starner, T. 'View-based
and modular eigenspaces Ior Iace recognition. Proceedings
CVPR '94. IEEE Computer Society ConIerence on Computer
Vision and Pattern Recognition, (1994, Jul) : 84 -91, 21-23.
|19| M. Turk and A. Pentland, "EigenIaces Ior recognition".
Journal oI Cognitive Neuroscience. Vol. 3, No. 1, (Winter
1991) : 71-86.
|20| Freeman, W.T, Tenenbaum, -% 'Learning bilinear
models Ior two-Iactor problems in vision. Proceedings.,
IEEE Computer ociety ConIerence on Computer Vision and
Pattern Recognition, (17-19 Jun 1997 ) 554 -560,
|21| 'A Ieature based a:roach to Iace recognition Manjunath,
B.S, Chellappa, R, von der Malsburg, &, 1992. Proceedings
CVPR '92. 1992 IEEE Computer Society ConIerence on
Computer Vision and Pattern Recognition. (1992, Jun) : 373 -378.
|22| AlIerez, R, Yuan-Fang Wang; Geometric and
illumination invariants Ior object recognition, IEEE
Transactions on Pattern Analysis and Machine Intelligence.
Vol.21 Issue: 6. (Jun 1999) : 505 536.
|23| Akimoto, T, Suenaga, Y, Wallace, R.S 'Automatic
creation oI 3D Iacial models. Computer Graphics and
Applications, IEEE . Vol.13 Issue: 5. (Sep 1993) : 16-22.
|24| Henry Chang, Ulises Robles 'Face Detection - EE368
Final Project Report May 25, 2000
|25| Jie Yang and Alex Waibel, "A Real-Time Face Tracker"
CMU CS Technical Report.
|26| R. Ramesh, Kasturi R. and Schunck B. 'Machine
Vision, McGraw Hill, New York. (1995) : 31 51.
TABLE 5-5 Experimental result oI video-based recognition
testing with 50 samples

Face Recognition in A Video

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Face Recognition in A Video

Transféré par

Droits d'auteur :

Formats disponibles

1-4244-1355-9/07/$25.

Vous aimerez peut-être aussi