Académique Documents
Professionnel Documents
Culture Documents
RECOGNITION
A Seminar Report
Submitted By
JASMEET SINGH
0822231019
In partial fulfilment for the award of the degree
of
B. Tech
IN
At
Certificate
H.O.D. (E.C.E)
April, 2011
TABLE OF CONTENTS
1 4
2 Numerical model 4
6 MotorolaSYN8610 Hands-Free 12
Speakerphone
7 Peak detector 7
Biometrics is the science and technology of measuring and analyzing biological data. In
information technology, biometrics refers to technologies that measure and analyze human body
characteristics, such as DNA, fingerprints, eye retinas and irises, voice patterns, facial patterns
and hand measurements, for authentication purp
Introduction
In recent years face recognition has received substantial attention from researchers in
biometrics, pattern recognition, and computer vision communities. The machine
learning and computer graphics communities are also increasingly involved in face
recognition. This common interest among researchers working in diverse fields is
motivated by our remarkable ability to recognize people and the fact that human
activity is a primary concern both in everyday life and in cyberspace. Besides,
there is a large number of commercial, security, and forensic applications requiring the
use of face recognition technologies. These applications include automated
crowd surveillance, access control, mugshot identification (e.g., for issuing
driver licenses), face reconstruction, design of human computer interface
(HCI), multimedia communication (e.g., generation of synthetic faces), and content-
based image database management. A number of commercial face
recognition systems have been deployed, such as Cognitec , Eyematic ,
Viisage , and Identix . Facial scan is an effective biometric
attribute/indicator. Different biometric indicators are suited for different
kinds of identification applications due to their variations in intrusiveness,
accuracy, cost, and ease of sensing (see Fig. 1(a)). Among the six
biometric indicators considered in [10], facial features scored the highest
compatibility, shown in Fig. 1(b), in a machine readable travel documents
(MRTD) system based on a number of evaluation factors [10].
Figure 1: Comparison of various biometric features: (a) based on zephyr analysis ;
(b) based on MRTD compatibility.
Face recognition scenarios can be classified into two types, (i) face
verification (or authentication) and (ii) face identification (or recognition). In
the Face Recognition Vendor Test (FRVT) 2002 , which was conducted by the
National Institute of Standards and Technology (NIST), another scenario is
added, called the ’watch list’.
The watch list (” Are you looking for me?” ) method is an open universe test.
The test individual may or may not be in the system database. That person
is compared to the others in the system’s database and a similarity score is
reported for each comparison. These similarity scores are then numerically
ranked so that the highest similarity score is first. If a similarity score is
higher than a preset threshold, an alarm is raised. If an alarm is raised, the
system thinks that the individual is located in the system’s database. There
are two main items of interest for watch list applications. The first is the
percentage of times the system raises the alarm and it correctly identifies a person
on the watch list. This is called the ”Detection and Identification Rate.” The second
item of interest is the percentage of times the system raises the alarm for an
individual that is not on the watch list (database). This is called the ”False Alarm
Rate.” In this report, all the experiments are conducted in the identification
scenario. Human face image appearance has potentially very large intra-subject
variations due to
1. 3D head pose.
2. Illumination (including indoor / outdoor)
3. Facial expression
4. Occlusion due to other objects or accessories (e.g., sunglasses, scarf, etc.)
5. Facial hair
6. Aging
A number of face recognition algorithms, along with their modifications, have been developed
during the past several decades
(i) In the global approach, a single feature vector that represents the whole face image is
used as input to a classifier. The global technique works well for classifying the frontal view of
face. However, they are not robust against pose change because the global features are highly
sensitive to translation and rotation of a face.
(ii) The main idea of component based recognition is to compensate for pose change by
allowing a flexible geometric relation between components in the classification.
O. Ayinde et.al categorized the face recognition techniques into five techniques -
neural network, principal component analysis (PCA), template and graph-based techniques, and
the other approach. We find it very difficult to categorize the different face recognition
technique, however, various face recognition techniques found in literature can be categorized
into six techniques – Geometric or feature based face recognition, Analytical to holistic face
recognition technique, Elastic bunch graph matching (EBGM), neural network approaches to
face recognition, neural network approach with principal component analysis (PCA) and other
approaches. We discuss the previous work performed in each category in the following sections.
A convolution neural network for face recognition has sophisticated architecture for
image recognition. The input of such network is whole image. The system combines local image
sampling, a self-organizing map (SOM) neural network, and a convolution neural network. The
SOM provides a quantization of the image samples into a topological space where inputs that are
nearby in the original space are also nearby in the output space, thereby providing dimensionality
reduction and invariance to minor changes in the image sample, and the convolutional network
provides for partial invariance to translation, rotation, scale, and deformation. The convolutional
network extracts successively larger features in a hierarchical set of layers. Reported recognition
rate is from 96% to 98.5%.
Auto associative neural networks for face recognition are special kinds of neural
networks that are used to simulate associative processes in which the input patterns are
associated with themselves. During training, patterns are presented to the network and weights
are gradually adjusted in a way that the final pattern of connectivity matches all patterns being
presented. One complete presentation of all patterns with which the networks is trained is called
one epoch; usually a network requires many such epochs to perform satisfactorily. The weights
can therefore be seen as a distributed representation of the data. A face image can be modeled
using an autoassociative memory. In this model, the face images are coded as a vector, xk, whose
elements give information on the gray levels of the corresponding pixels and these are the inputs
of the system. Each element of face vector xk is used as input to a cell of the autoassociative
memory. In this model, the number of cells in the autoassociative memory is equal to the number
of pixels in the face vector, xk. the output of a given cell for a given face is simply the sum of its
inputs (the elements of the face vector, xk ) weighted by the connection strengths between itself
and all of the other cells.
Face recognition using eigenfaces, also called principal component analysis (PCA) is
a fairly popular approach to face recognition. It encodes the most relevant information in a group
of faces which best distinguish them from one another. The approach transform face images into
a small set of characteristic feature images called “eigenfaces”, which are the principal
components of the initial training set of face images. Recognition is performed by projecting a
new face image into the sub-space spanned by the eigenfaces (face space) and then classifying
the face by comparing its position in the face space with the position of known individuals. Each
face image in the training set can be represented exactly in terms of a linear combination of the
eigenfaces and associated weights. A new face is recognized by comparing the feature weights
needed to approximately reconstruct them with weights associated with the known individuals.
The PCA finds a set of basis images and represent faces as a linear combination of
those images. The basis images found by PCA depend only on pair wise relationships between
pixels in the image database. A generalization of PCA, independent component analysis (ICA)
finds better basis images sensitive to high-order relationships among pixels. The face images
employed by them were a subset of the FERET database. The data set contained images of 425
individuals. The images cropped to 60 x 50 pixels. The goal in this approach was to find a set of
statistically independent basis images. A data matrix X was organized so that the images are in
the rows and the pixels are in the columns, i.e. X has 425 rows and 3000 columns, and each
image has zero mean. ICA is performed on face images in the FERET database under two
different architectures, one which treats the images as random variables and the pixels as
outcomes, and a second which treats the pixels as random variables and the images as outcomes.
The first architecture finds spatially local basis images for the faces. Under this architecture, ICA
found a basis set of statistically independent images. The images in this basis set were sparse and
localized in space, resembling spatial features. The second architecture treated pixels as random
variables images as random trials. Under this architecture, the image coefficients were
approximately independent, resulting in a factorial face code. Both ICA representations are
superior to representations based on PCA for recognizing faces across days and changes in
expression. A classifier that combines the two ICA representations gives the best performance.
A novel neural network architecture, which can recognize human faces with any view
with defined viewing angle range (from left 30 degrees to right 30 degrees out of plane rotation),
is proposed. View specific eigenface analysis is used as the front end of the system to extract
features, and neural network ensemble is used for recognition. The approach is based on
extending the eigenface approach i.e. one eigenface set is built for each view. Then the feature
coefficients of each image are extracted in the corresponding eigenspace. An ensemble neural
network (consisting of two layers) is used as the classifier to perform the pose invariant face
recognition. The first layer contains four view-specific neural networks, each of them is a
conventional feed-forward network trained with the backpropagation algorithm based on the
training data of a specific view. Each network accepts the 20 dimensional eigenface coefficients
as the input vector, has 15 hidden units, and has 6 output units. The training data for each neural
network contains 300 vectors, all of which are eigenface coefficients calculated with the
corresponding view. Among the 300 images, each of the 5 persons to be recognized contributes
40 images, and each of the 5 persons to be rejected contributes 20 images. The second layer is a
combinational neural network trained on the output results of all the networks in the first layer.
The output of the second layer network can not only tell the identity of the input image, but also
tell the pose of the image, which is a bonus to the face recognition. The system achieves an
average recognition ratio as high as 98.75%.
The Principal Component Analysis (PCA) is one of the most successful techniques
that have been used in image recognition and compression. PCA is a statistical
method under the broad title of factor analysis. The purpose of PCA is to reduce the
large dimensionality of the dataspace (observed variables) to the smaller intrinsic
dimensionality of feature space (independent variables), which are needed to
describe the data economically. This is the case when there is a strong correlation
between observed variables. The jobs which PCA can do are prediction, redundancy
removal, feature extraction, data compression etc. Because PCA is a classical
technique which can do something in the linear domain, applications having linear
models are suitable, such as signal processing, image processing, system and
control theory, communications, etc. Face recognition has many applicable areas.
Moreover, it can be categorized into face identification, face classification, or sex
determination. The most useful applications contain crowd surveillance, video
content indexing, personal identification (ex. driver’s license), mug shots matching,
entrance security, etc. The main idea of using PCA for face recognition is to express
the large 1-Dvector of pixels constructed from 2-D facial image into the compact
principal components of the feature space. This can be called eigenspace
projection. Eigenspace is calculated by identifying the eigenvectors of the
covariance matrix derived from a set of facial images(vectors). Here we describe
mathematical formulation of PCA.
.
4.1 Mathematics of PCA
A 2-D facial image can be represented as 1-D vector by concatenating each row (or column) into
a long thin vector. Let’s suppose we have M vectors of size N (= rows of image £ columns of
image) representing a set of sampled images. pj’s represent the pixel values.
The images are mean centered by subtracting the mean image from each image vector. Let m
represent the mean image.
Our goal is to find a set of ei’s which have the largest possible projection onto each of the wi’s.
We wish to find a set of M orthonormal vectors ei for which the quantity
The eigenvectors corresponding to nonzero eigen values of the covariance matrix produce an
orthonormal basis for the subspace within which most image data can be represented with a
small amount of error. The eigenvectors are sorted from high to low according to their
corresponding eigen values. The eigenvector associated with the largest eigen value is one that
reflects the greatest variance in the image. That is, the smallest eigen value is associated with the
eigenvector that finds the least variance. They decrease in exponential fashion, meaning that the
roughly 90% of the total variance is contained in the first 5% to 10% of the dimensions. A facial
image can be projected onto M0 (<< M) dimensions by computing
where vi = eTi wi. vi is the ith coordinate of the facial image in the new space, which came to be
the principal component. The vectors ei are also images, so called, eigenimages, or eigenfaces in
our case, which was first named by [1]. They can be viewed as images and indeed look like faces
.So, describes the contribution of each eigenface in representing the facial image by treating the
eigenfaces as a basis set for facial images. The simplest method for determining which face class
provides the best description of an input facial image is to find the face class k that minimizes
the Euclidean distance
where k is a vector describing the kth face class. If ²k is less than some predefined threshold µ², a
face is classified as belonging to the class k.
4.2 PCA Features
1. PCA computes means, variances, covariance’s, and correlations of large data sets
2. PCA computes and ranks principal components and their variances.
3. Automatically transforms data sets.
4. PCA can analyze datasets up to 50,000 rows and 200 columns
Neural networks are adaptive statistical models based on an analogy with the structure of the
brain. They are adaptive because they can learn to estimate the parameters of some population
using a small number of exemplars(one or a few) at a time. They do not differ essentially from
standard statistical models. For example, one can find neural network architectures akin to
discriminant analysis, principal component analysis, logistic regression, and other techniques. In
fact, the same mathematical tools can be used to analyze standard statistical models and neural
networks. Neural networks are used as statistical tools in a variety of fields, including
psychology, statistics, engineering, econometrics, and even physics. They are used also as
models of cognitive processes by neuro- and cognitive scientists. Basically, neural networks are
built from simple units, sometimes called neuronsor cells by analogy with the real thing. These
units are linked by a set of weighted connections. Learning is usually accomplished by
modification of the connection weights. Each unit codes or corresponds to a feature or a
characteristic of a pattern that we want to analyze or that we want to use as a predictor. These
networks usually organize their units into several layers. The first layer is called the input layer,
the last one the output layer. The intermediate layers (if any) are called the hidden layers. The
information to be analyzed is fed to the neurons of the first layer and then propagated to the
neurons of the second layer for further processing. The result of this processing is then
propagated to the next layer and so on until the last layer. Each unit receives some information
from other units (or from the external world through some devices) and processes this
information, which will be converted into the output of the unit. The goal of the network is to
learn or to discover some association between input and output patterns, or to analyze, or to find
the structure of the input patterns. The learning process is achieved through the modification of
the connection weights between units. In statistical terms, this is equivalent to interpreting the
value of the connections between units as parameters (e.g., like the values of a and b in the
regression equation by = a + bx) to be estimated.
The learning process specifies the “algorithm” used to estimate the parameters.
Neural networks are made of basic units (see Figure 1) arranged in layers. A unit collects
information provided by other units (or by the external world) to which it is connected with
weighted connections called synapses. These weights, called synaptic weights multiply (i.e.,
amplify or attenuate) the input information:
A positive weight is considered excitatory, a negative weight inhibitory.
Figure 1: The basic neural unit processes the input information into the output information.
Each of these units is a simplified model of a neuron and transforms its input information into an
output response. This transformation involves two steps: First, the activation of the neuron is
computed as the weighted sum of it inputs, and second this activation is transformed into a
response by using a transfer function. Formally, if each input is denoted xi, and each weight wi,
then the activation is equal to a =summation(xiwi), and the output denoted o is obtained
as o = f(a). Any function whose domain is the real numbers can be used as a transfer function.
The most popular ones are the linear function (o proportional to a), the step function (activation
values less than a given threshold are set to 0 or to −1and the other values are set to +1), the
logistic function
which maps the real numbers into the interval [−1 + 1] and whose derivative, needed for
learning, is easily computed {f0(x) = f(x) [1 − f(x)]}, and the normal or Gaussian function
Some of these functions can include probabilistic variations; for example, a neuron can transform
its activation into the response +1 with a probability of 1/2 when the activation is larger than a
given threshold.
The architecture (i.e., the pattern of connectivity) of the network, along with the transfer
functions used by the neurons and the synaptic weights, completely specify the behavior of the
network.
Neural networks are adaptive statistical devices. This means that they can change iteratively the
values of their parameters (i.e., the synaptic weights) as a function of their performance. These
changes are made according to learning rules which can be characterized as supervised (when a
desired output is known and used to compute an error signal) or unsupervised (when no such
error signalis used). The Widrow-Hoff (a.k.a., gradient descent or Delta rule) is the most widely
known supervised learning rule. It uses the difference between the actual input of the cell and the
desired output as an error signal for units in the output layer. Units in the hidden layers cannot
compute directly their error signal but estimate it as a function (e.g., a weighted average) of the
error of the units in the following layer. This adaptation of the Widrow-Hoff learning rule is
known as error backpropagation. With Widrow-Hoff learning, the correction to the synaptic
weights is proportional to the error signal multiplied by the value of the activation given by the
derivative of the transfer function. Using the derivative has the effect of making finely tuned
corrections when the activation is near its extreme values (minimum or maximum) and larger
corrections when the activation is in its middle range. Each correction has the immediate effect
of making the error signal smaller if a similar input is applied to the unit. In general, supervised
learning rules implement optimization algorithms akin to descent techniques because they search
for a set of values for the free parameters (i.e., the synaptic weights) of the system such that
some error function computed for the whole network is minimized. The Hebbian rule is the most
widely known unsupervised learning rule, it is based on work by the Canadian neuropsychologist
Donald Hebb, who theorized that neuronal learning (i.e., synaptic change) is a local phenomenon
expressible in terms of the temporal correlation between the activation values of neurons.
Specifically, the synaptic change depends on both presynaptic and postsynaptic activities and
states that the change in a synaptic weight is a function of the temporal correlation between the
presynaptic and postsynaptic activities. Specifically, the value of the synaptic weight between
two neurons increases whenever they are in the same state and decreases when they are in
different states. Some important neural network architecture One the most popular architectures
in neural networks is the multi-layer perceptron (see Figure 2). Most of the networks with this
architecture use the Widrow-Hoff rule as their learning algorithm and the logistic function as the
transfer function of the units of the hidden layer (the transfer function is ingeneral non-linear for
these neurons). These networks are very popular because they can approximate any multivariate
function relating the input to the output. In a statistical framework, these networks are akin to
multivariate non-linear regression. When the input patterns are the same are the output patterns,
these networks are called auto-associators. They are closely related to linear (if the hidden units
are linear) or non-linear (if not) principal component analysis and other statistical techniques
linked to the general linear model (see Abdi et al., 1996), such as discriminant analysis or
correspondence analysis.
A recent development generalizes the radial basis function networks (rbf) (see Abdi, Valentin, &
Edelman, 1999) and integrates them with statistical learning theory (see Vapnik, 1999) under the
name of support vector machine or SVM (see Sch¨olkopf & Smola, 2003). In these networks, the
hidden units (called the support vectors) represent possible (or even real) input patterns and their
response is a function to their similarity to the input pattern under consideration. The similarity is
evaluated by a kernel function (e.g., dot product; in the radial basis function the kernel is the
Gaussian transformation of the Euclidean distance between the support vector and the input). In
the specific case of rbf networks—that we will use as an example of SVM—the output of
the units of the hidden layers are connected to an output layer composed of linear units. In fact,
these networks work by breaking the difficult problem of a nonlinear approximation into two
more simple ones. The first step is a simple nonlinear mapping (the Gaussian transformation of
the distance from the kernel to the input pattern), the second step corresponds to a linear
transformation from the hidden layer to the output layer. Learning occurs at the level of the
output layer. The main difficulty with these architectures resides in the choice of the support
vectors and the specific kernels to use. These networks are used for pattern recognition,
classification, and for clustering data.
6.0 CONCLUSION
Face Recognition