Vous êtes sur la page 1sur 4

3D Face Recognition using Normal Sphere and General Fourier Descriptor

Andrea F. Abate, Michele Nappi, Daniel Riccio and Gabriele Sabatino


University of Salerno
Via ponte don Melillo, 84084, Fisciano
{abate, mnappi, driccio, gsabatino}@unisa.it
Abstract
Today, face figures among the most promising
biometrics, allowing to identify people without
requiring any physical contact. In this research field,
3D provides a significant improvement in recognition
performances, but the existing approaches show
limitations dealing with pose variations; indeed 3D
face surfaces need to be aligned before the matching
operation. This paper proposes an approach that
overcomes this limitation by projecting the 3D shape
information onto the 2D surface of a normal sphere,
while a rotation invariant descriptor is used to extract
key features from this surface. In addition, using a 2D
descriptor reduces the computing time that is a typical
drawback of 3D methods. Experimentations have been
conducted on a property face dataset, to assess the
robustness of the method with respect to a large set of
facial expression and pose variations.
1. Introduction
Face represents a quite interesting biometric, being
one of the most common methods humans use in their
visual interactions. Furthermore, it is largely
considered one of the most accepted biometric among
all the existing ones, thanks that providing high
recognition rate without requiring any contact with the
sensor surface. Many 2D algorithms are available from
literature [5], but none of them is free of limitations
(i.e.: illumination/pose changes). Thus 3D processing
has been proposed as a new alternative; it has the
potential of improving performances overcoming the
lacks of the 2D systems. Indeed, 2D data provides just
a bidimensional projection of facial features, while 3D
data represents a discrete approximation of faces
three-dimensional geometry.
In this way, it turns easier to address some intra-
class variations such as facial expression variations
and glasses or beard presence. Furthermore, exploiting
the depth information allows us to profit by topological
descriptors (i.e. local curvatures) for localizing face
features.
The early researches on 3D face recognition were
conducted over a decade ago as reported from Bowyer
et al. [3] in their recent survey on this topic and many
different approaches have been developed over time to
address this challenging task. All the existing methods
can be grouped in two classes according to the way, in
which they address the pose variation problem. In the
first class there are all the approaches working with
pre-aligned models; two examples are given by [6] and
[2]. In [6] the authors compare faces by measuring
distances between any two coarsely aligned 3D facial
surfaces by means of the Iterative Closest Point (ICP)
method, while in [2] the matching operation is
performed by combining similarity scores coming from
comparisons of 3D and 2D prealigned profiles.
We present a 3D face recognition method based on
normal sphere, which is a spherical surface
representing the local curvature of a 3D polygonal
mesh in terms of RGB color data (similar to normal
maps used in [1]). Then, a 2D classification process is
used to save a meaningful part of the computational
complexity while preserving the rotation invariance
property. The rest of the paper is organized as follows.
Section 2 presents the system overview in more detail.
In section 3 the indexing and retrieval techniques are
discussed. Section 4 discuss some experimental results.
The paper concludes in section 5 showing conclusions
and directions for future research.
2. A System Overview
The key idea of the systems consists in projecting
the 3D shape information of the face onto the 2D
surface of a normal sphere. The normal sphere is a 3D
primitive that is recursively generated from an
icosahedron, in which all sides are spacially equi-
distributed equilateral triangles.
Each triangle on this surface is characterized by the
normal v

to its surface, while each normal has three


The 18th International Conference on Pattern Recognition (ICPR'06)
0-7695-2521-0/06 $20.00 2006
components . While, face rotations can
make changes in the value of normal components, the
angle between normals of contiguous triangles still
remain unchanged under different poses; in order to
exploit this property, we store only the angles between
these normals. The homogeneous distribution of these
triangles still retains spatial relationships between
facial features and give us a two-dimensional
representation of the original face geometry.
) , , (
z y x
v v v
Each band (Red, Green and Blue) is then processed
independently by applying a 2D Fourier based
descriptor that produces a fixed number of coefficients,
as shown in Figure 1. At last, the global feature vector
is obtained by concatenating the coefficients of the all
three bands.
Figure 1. Enrolment workflow.
2.1. Face Acquisition and Processing
3D face acquisition is the first step of the enrolment
process which results in a polygonal surface
representing the input face. We opted a structured laser
that scans the face reconstructing vertices and
polygons.
Computational complexity represents a serious
limitations for 3D applications as well as 3D based
recognition systems. Then, here we describe a way of
sub-sampling the 3D information in a 2D space but
still preserving most of the original information.
Indeed, the local curvature of a polygonal mesh is
faithfully represented by polygon normals, where
angles between couples of normals can be represented
as a RGB colour in a 2D space.
To this aim, we first need to project vertex
coordinates onto a 2D curvilinear space; this task could
be thought as the inverse of the well known mapping
technique. We use a spherical projection (re-adapted to
the mesh size), because it turns in a sound way to fit
the 3D shape of the face mesh. In more detail, let be
( ) { }
i i i i i
z y x v M v R M , , , ,
3
= e c
, a mesh, we want relate
each
M v
i
e
with a point on the spherical surface
represented by the ordered couple of polar coordinates
(
i
, |
i
) where 0 < 2 and 0 | . This is done
by the following formulas:
|
.
|

\
|
=

x
y
1
tan u
,
|
.
|

\
|
=

r
z
1
cos |
(1)
where r is the diameter of mesh M (Figure 2 shows
a spherical projection). Then, we can store normals of
mesh M in a two-dimensional structure, namely the
normal sphere N, by using the previously 2D-projected
vertex coordinates given by equation (1).
Figure 2. Projecting the (a) 3D mesh vertices in (b)
spherical coordinates.
In order to obtain an optimized tessellation of 3D
geometry we have to re-shaping triangles by fixing the
vertex coordinates according to a quite regular
structure recursively generated from an icosahedron
(see Figure 3). So there is only one normal to each
point (triangle) on the surface of the normal sphere;
each normal is characterized by its components, so that
the spatial distribution of the normals can be
represented in a two-dimensional structure N by
representing each normal as a point on the normal
sphere surface. In details, for each triangle t
i
in N with
The 18th International Conference on Pattern Recognition (ICPR'06)
0-7695-2521-0/06 $20.00 2006
vertex coordinates
) , ( ) , , (
3 2 1
| u e
i i i
v v v
we assign the
RGB value obtained from the corresponding normal n
i
to the triangle t
i
on the mesh M. We refer this resulting
structure as the Normal Sphere N of the mesh M.
Figure 3. Mesh details before and after resampling.
In order to nullify the effects of face rotations we
consider the angles between normals of contiguous
triangles. In more details, for each triangle on N its
three contiguous neighbors are considered and
the angles
i
t
l k j
t t t , ,
) , (
j i
t t o
,
) , (
k i
t t o
and
) , (
l i
t t o
between
their normals are computed and sorted (
3 2 1
o o o s s
).
Then
2 1
,o o and
3
o
are quantized with 8 bits in the
range [0, 255] and considered as R, G or B band, so
the resulting codification can be seen as a 24 bit color
(see Figure 1). This is the new facial shape that the
GFD descriptor is applied on.
3. Face Classification
To turn the 3D classification problem into a 2D
classification task allows us to save a meaningful part
of the computational complexity. However, we want
the rotation invariance property to be still preserved, so
a certain attention has to be paid to select a 2D face
descriptor. In this case, we opted for a Fourier based
operator. The Fourier Transform (FT) is largely used
in image processing, because it often makes some
limitations to be overcome by working in the
frequency domain (e.g.: noise, shifts). In particular, we
adopted a FT based descriptor defined by Zhang in [7]
for classifying binary trademarks, but we readapted it
to our purposes.
Given an image I in the Cartesian space xOy, we
convert it in a polar space O, by relating I(x,y) with
I(,), defining
( ) ( )
2 2
c c
y y x x + =
and
( ) ( ( )
c c
x x y y a = / 2 tan ) u
, where is the
center of the Cartesian space. The polar Fourier
Descriptor is defined on this polar space as follows:
(
c c
y x O , )
( ) ( )
(

|
.
|

\
|
+ =
__
|
t
t u |
u
T
i
R
r
j r I FD
r
i
i
2
2 exp , ,
(2)
where R r < s 0 and
( ) T i
i
/ 2t u =
, T i < s 0 ;
R < s 0
and T < s | 0 . The and | represent the
number of selected radial and angular frequencies. The
FD descriptor is made rotation invariant by retaining
only the magnitude of the coefficients, while
robustness with respect to the scaling is achieved by
dividing the first coefficient by the area containing the
image and all the remaining coefficients by the first
one, as follows:
( ) ( )
( )
( )
( )
)

=
0 , 0
,
, ,
0 , 0
1 , 0
,
0 , 0
FD
n m FD
FD
FD
area
FD
V
(3)
The most important factor is the way, in which the
point ( )
c c
y x O ,
is localized. Indeed, imposing some
restrictions places this method on a par with all the
approaches requiring a pre-alignment of the face.
We solve this problem by keeping in mind that the
objects to be classified are faces, while large
experimentation proved that we can locate the nose tip
as the point of maximum curvature on the face surface.
By fixing this point onto the normal sphere as the
center of the Cartesian space, left/right side and
bottom/up rotations are made irrelevant, while rollings
(7
th
case in Figure 4) vanish thanks to the
rotation/scaling invariant property of the FD
descriptor.
4. Experimental Results
We built a gallery of 120 subjects enrolled by a
structured light scanner from Inspeck Corp.. In this
case the pose variations were acquired through
multiple scanning of the same individual. Acquisition
differs for pose and expression; in more detail, ten 3D
models with different facial expression have been
acquired for each subject, while 5 pose per subject
have been considered. Figure 4 shows a subset of
poses and expressions we considered.
Figure 4. Pose variations (1st, 2nd and 3rd
neutral, 45 and 60 right side, bottom, and rolled
orientation).
In the first experiment we investigated the best
resolution providing highest value of the CMS. The
The 18th International Conference on Pattern Recognition (ICPR'06)
0-7695-2521-0/06 $20.00 2006
gallery set contains 120 images with neutral
expression, one per subject, while the probe consists of
1080 images that are the remaining 9 expressions for
120 subjects. Figure 5 shows that 1616 pixels are too
little giving very poor results in terms of CMS. The
best results are achieved for a resolution of 3232,
while the performances get worse again when the
resolution increases. This behavior can be explained by
considering that high resolution carry too much high-
frequency information that is of no use in this case.
The second experiment tests the robustness of the
method with respect to pose variations. In particular as
discussed in Section 3 left/right and bottom/up face
rotations are made irrelevant by locating the nose tip
on the normal sphere, thus we considered here a
gallery set of 20 aligned images, as in the first
experiment, while the probe consists of 180 images (9
per subject) with different facial expression and rolled
of a random angle . In
| 6 / , 0 t o e |
Figure 6 we
compared performances of NS+GFD, NS+PCA and
the normal map (NM) based method from [1] when 30
head rolling occurs. Results confirm the superiority of
the proposed method with respect to both facial
expression and pose variation.
Figure 5. CMS for different image resolutions.
5. Concluding remarks
We presented a novel 3D face recognition approach
based on normal sphere, a 2D data structure
representing local curvature of facial surface, aimed to
biometric applications. The 2D classification task
allows us to save a meaningful part of the
computational complexity, preserving the rotation
invariance property. So, it proved to be simple,
invariant to posture variations, fast and with an high
average recognition rate. As the normal sphere is a 2D
mapping of mesh features, ongoing research will
integrate additional 2D color info (texture) captured
during the same enrolment session. Implementing a
true multi-modal version of the basic algorithm which
correlates the texture and normal image could further
enhance the discriminating power even for complex
3D recognition issues such as the presence of beard,
moustache, eyeglasses, etc.
Figure 6. CMS on rolled faces.
6. References
[1] A. F. Abate, M Nappi, S. Ricciardi, G. Sabatino, "Fast
3D Face Recognition Based On Normal Map", in Proc. of
IEEE International Conference on Image Processing
(ICIP05), Genova, Italy, 2005.
[2] C. Beumier and M. Acheroy, "Face verification from
3D and grey level cues", in Pattern Recognition Letters, Vol.
22, No. 12, pp. 1321-1329, 2001.
[3] K. Bowyer, K. Chang, P. Flynn, "A survey of
approaches to threedimensional face recognition", in Proc. of
17
th
International Conference on Pattern Recognition (ICPR
2004), Vol. 1, pp. 358-361, Aug. 2004.
[4] A. M. Bronstein, M. M. Bronstein, and R. Kimmel,
"Expression invariant 3D face recognition", in Proc. of
Audio- and Video- Based Person Authentication (AVBPA
2003), Guildford, UK, Lecture Notes in Computer Science, J.
Kittler and M.S. Nixon, Vol. 2688, pp. 62-69, 2003.
[5] R. Chellapa, C.L. Wilson, S. Sirohey, "Human and
machine recognition of faces: A Survey," in Proc. of the
IEEE, Vol. 83, No. 5, pp. 705-740, 1995.
[6] G. Medioni and R. Waupotitsch, "Face recognition and
modeling in 3D", in Proc. of IEEE Int'l Workshop on
Analysis and Modeling of Faces and Gestures (AMFG
2003), Nice, France, pp. 232-233, Oct. 2003.
[7] D. Zhang, G. Lu, Shape-based image retrieval using
generic Fourier descriptor, in Proc. of Signal Processing:
Image Communication 17, pp. 825848, 2002.
The 18th International Conference on Pattern Recognition (ICPR'06)
0-7695-2521-0/06 $20.00 2006

Vous aimerez peut-être aussi