Vous êtes sur la page 1sur 22

FACE

RECOGNITION
A Seminar Report

Submitted By
JASMEET SINGH
0822231019
In partial fulfilment for the award of the degree

of

B. Tech

IN

Electronics and Communication Engineering

At

I. T. S. ENGINEERING COLLEGE, GREATER NOIDA


I. T. S. ENGINEERING COLLEGE, GREATER NOIDA

Certificate

It is to certified that this is a bonafide record of seminar report


on---- Face Recognition—by Jasmeet Singh of B.Tech (Electronics
and Communication Engineering) during the year 2010-2011 in
the partial fulfillment of the requirements for the award of the
degree of B. Tech in Electronics and Communication Engineering
by Gautam Buddha Technical university, Lucknow.

Date: Mr. R. K. Yadav

H.O.D. (E.C.E)

April, 2011
TABLE OF CONTENTS

CHAPTER TITLE PAGE No.


NO.
1.0 BIOMETRICS

1.1 WHY WE CHOOSE FACE RECOGNITION


2.0 FACE RECOGNITION
3.0 DIFFERENT TECHNIQUES FOR FACE
RECOGNITION
3.1 NEURAL NETWORK APPROACH
3.2 NEURAL NETWORK APPROACH WITH PCA
4.0 PRINCIPAL COMPONENT ANALYSIS
4.1 MATHEMATICS OF PCA
4.2 FEATURES OF PCA
4.3 BENEFITS OF PCA
5.0 NEURAL NETWORK
5.1 BUILDING BLOCK OF NN
5.2 LEARNING RULES
6.0 CONCLUSION
List of figures

FIGURE NO NAME OF FIGURE PAGE NO.

1 4

2 Numerical model 4

3 Simulated field distribution 5


plots of the proposed wireless
charging system (with |Ez|
represented by colors)

4 Simulated phase difference of 5


pilot signals received at two
base stations

5 Nokia DCV-15 Desktop Stan 11

6 MotorolaSYN8610 Hands-Free 12
Speakerphone

7 Peak detector 7

8 Half-wave Peak Rectifier Output 7


Waveform

9 Full wave rectifier 8

10 Full-wave Rectifier Output 8


Waveform

11 Voltage double schematic 9

12 Voltage Doubler Waveforms 10

13 Previous Project Board 12

14 Test Setup Using Previous 12


Board

15 Test Setup Using Previous 13


Board
1.0BIOMETRICS

Biometrics is the science and technology of measuring and analyzing biological data. In
information technology, biometrics refers to technologies that measure and analyze human body
characteristics, such as DNA, fingerprints, eye retinas and irises, voice patterns, facial patterns
and hand measurements, for authentication purp

Biometrics is the automated technique of measuring the physical characteristics or personal


trait of an individual and comparing that characteristic or trait to a database for purpose of
recognizing that individual.

Authentication by biometric verification is becoming increasingly common in corporate and


public security systems, consumer electronics and point of sale (POS) applications. In addition to
security, the driving force behind biometric verification has been convenience.

A biometric is a unique, measurable characteristic of a human being that can be used to


automatically recognize an individual or verify an individual’s identity. Biometrics can measure
both physiological and behavioral characteristics. Physiological biometrics (based on
measurements and data derived from direct measurement of a part of the human body) include:
a. Finger-scan
b. Facial Recognition
c. Iris-scan
d. Retina-scan
e. Hand-scan
Behavioural biometrics (based on measurements and data derived from an action) include:
a. Voice-scan
b. Signature-scan
c. Keystroke-scan
A “biometric system” refers to the integrated hardware and software used to conduct biometric
identification or verification.

1.1 Why we choose face recognition over other biometric?


There are number reasons to choose face recognition. This includes the following
a. It requires no physical interaction on behalf of the user.
b. It is accurate and allows for high enrolment and verification rates.
c. It does not require an expert to interpret the comparison result.
d. It can use your existing hardware infrastructure, existing cameras and image capture
devices will work with no problems
e. It is the only biometric that allow you to perform passive identification in a one to.
many environments (e.g.: identifying a terrorist in a busy Airport terminal )

2.0 FACE RECOGNITION

Introduction

In recent years face recognition has received substantial attention from researchers in
biometrics, pattern recognition, and computer vision communities. The machine
learning and computer graphics communities are also increasingly involved in face
recognition. This common interest among researchers working in diverse fields is
motivated by our remarkable ability to recognize people and the fact that human
activity is a primary concern both in everyday life and in cyberspace. Besides,
there is a large number of commercial, security, and forensic applications requiring the
use of face recognition technologies. These applications include automated
crowd surveillance, access control, mugshot identification (e.g., for issuing
driver licenses), face reconstruction, design of human computer interface
(HCI), multimedia communication (e.g., generation of synthetic faces), and content-
based image database management. A number of commercial face
recognition systems have been deployed, such as Cognitec , Eyematic ,
Viisage , and Identix . Facial scan is an effective biometric
attribute/indicator. Different biometric indicators are suited for different
kinds of identification applications due to their variations in intrusiveness,
accuracy, cost, and ease of sensing (see Fig. 1(a)). Among the six
biometric indicators considered in [10], facial features scored the highest
compatibility, shown in Fig. 1(b), in a machine readable travel documents
(MRTD) system based on a number of evaluation factors [10].
Figure 1: Comparison of various biometric features: (a) based on zephyr analysis ;
(b) based on MRTD compatibility.

Face recognition scenarios can be classified into two types, (i) face
verification (or authentication) and (ii) face identification (or recognition). In
the Face Recognition Vendor Test (FRVT) 2002 , which was conducted by the
National Institute of Standards and Technology (NIST), another scenario is
added, called the ’watch list’.

Face verification (”Am I who I say I am ?”) is a one-to-one match that


compares a query face image against a template face image whose identity
is being claimed. To evaluate the verification performance, the verification
rate (the rate at which legitimate users are grantedaccess) vs. false accept
rate (the rate at which imposters are granted access) is plotted, called ROC
curve. A good verification system should balance these two rates based on
operational needs.

Face identification (”Who am I?”) is a one-to-many matching process that


compares a query face image against all the template images in a face
database to determine the identity of the query face (see Fig. 2). The
identification of the test image is done by locating the image in the
database who has the highest similarity with the test image. The
identification process is a ”closed” test, which means the sensor takes an
observation of an individual that is known to be in the database. The test
subject’s (normalized) features are compared to the other features in the
system’s database and a similarity score is found for each comparison. These
similarity scores are then numerically ranked in a descending order. The
percentage of times that the highest similarity score is the correct match for
all individuals is referred to as the ”top match score.” If any of the top r
similarity scores corresponds to the test subject, it is considered as a correct
match in terms of the cumulative match. The percentage of times one of
those similarity scores is the correct match for all individuals is referred to
as the ”Cumulative Match Score”,. The ”Cumulative Match Score” curve is the
rank n versus percentage of correct identification, where rank n is the
number of top similarity scores reported.

Figure 2: Face identification scenario

The watch list (” Are you looking for me?” ) method is an open universe test.
The test individual may or may not be in the system database. That person
is compared to the others in the system’s database and a similarity score is
reported for each comparison. These similarity scores are then numerically
ranked so that the highest similarity score is first. If a similarity score is
higher than a preset threshold, an alarm is raised. If an alarm is raised, the
system thinks that the individual is located in the system’s database. There
are two main items of interest for watch list applications. The first is the
percentage of times the system raises the alarm and it correctly identifies a person
on the watch list. This is called the ”Detection and Identification Rate.” The second
item of interest is the percentage of times the system raises the alarm for an
individual that is not on the watch list (database). This is called the ”False Alarm
Rate.” In this report, all the experiments are conducted in the identification
scenario. Human face image appearance has potentially very large intra-subject
variations due to

1. 3D head pose.
2. Illumination (including indoor / outdoor)
3. Facial expression
4. Occlusion due to other objects or accessories (e.g., sunglasses, scarf, etc.)
5. Facial hair
6. Aging

A number of face recognition algorithms, along with their modifications, have been developed
during the past several decades

Different face recognition methods.

3.0 DIFFERENT FACE RECOGNITION TECHNIQUES


There are many approaches to the task of face recognition. Some of the approaches,
called analytic approaches, involve the detection of some conspicuous features (called salient
facial features) from the face, and comparison of the features extracted. Other approaches make
use of the information derived from the whole face pattern. These are called holistic approaches.
There are still other approaches that are in-between analytic and holistic approaches, or even a
blend of the two approaches. Attempts have been made to categorize the face recognition
technique. The various approaches in literature can be categorized. Heiseleetal divided the face
recognition techniques in the two categories namely global approach and component based
approach.

(i) In the global approach, a single feature vector that represents the whole face image is
used as input to a classifier. The global technique works well for classifying the frontal view of
face. However, they are not robust against pose change because the global features are highly
sensitive to translation and rotation of a face.

(ii) The main idea of component based recognition is to compensate for pose change by
allowing a flexible geometric relation between components in the classification.

O. Ayinde et.al categorized the face recognition techniques into five techniques -
neural network, principal component analysis (PCA), template and graph-based techniques, and
the other approach. We find it very difficult to categorize the different face recognition
technique, however, various face recognition techniques found in literature can be categorized
into six techniques – Geometric or feature based face recognition, Analytical to holistic face
recognition technique, Elastic bunch graph matching (EBGM), neural network approaches to
face recognition, neural network approach with principal component analysis (PCA) and other
approaches. We discuss the previous work performed in each category in the following sections.

3.1 Neural Network Approaches to Face Recognition


Multi-layer Perceptorn (MLP) neural network is a good tool for classification
purposes. The neural network weights are adjusted by supervised training procedure called
backpropagation. During the training procedure MLP builds separate hyper surfaces in the input
space. After training, MLP can successfully apply acquired skills to the previously unseen
samples. It has good extrapolative and interpolative abilities. There is one hidden layer with 20
and 30 units in it. The number of units in the input layer is equal to the number of image pixels,
2576 (i.e. 46x56). The number of output units is equal to the number of classes, i.e. 40, the
number of persons in the ORL database. Each output unit has corresponding “own” class. A
hyperbolic tangent is used as an activation function. A multilayer perceptron neural network
having one hidden layer with number of hidden units varying from 60 to 80 was used which
achieved a recognition rate from 94% to 97%. The input of the neural network is a set of discrete
cosine transform coefficients. The first 30 coefficients from 10304 are used.

A convolution neural network for face recognition has sophisticated architecture for
image recognition. The input of such network is whole image. The system combines local image
sampling, a self-organizing map (SOM) neural network, and a convolution neural network. The
SOM provides a quantization of the image samples into a topological space where inputs that are
nearby in the original space are also nearby in the output space, thereby providing dimensionality
reduction and invariance to minor changes in the image sample, and the convolutional network
provides for partial invariance to translation, rotation, scale, and deformation. The convolutional
network extracts successively larger features in a hierarchical set of layers. Reported recognition
rate is from 96% to 98.5%.

Auto associative neural networks for face recognition are special kinds of neural
networks that are used to simulate associative processes in which the input patterns are
associated with themselves. During training, patterns are presented to the network and weights
are gradually adjusted in a way that the final pattern of connectivity matches all patterns being
presented. One complete presentation of all patterns with which the networks is trained is called
one epoch; usually a network requires many such epochs to perform satisfactorily. The weights
can therefore be seen as a distributed representation of the data. A face image can be modeled
using an autoassociative memory. In this model, the face images are coded as a vector, xk, whose
elements give information on the gray levels of the corresponding pixels and these are the inputs
of the system. Each element of face vector xk is used as input to a cell of the autoassociative
memory. In this model, the number of cells in the autoassociative memory is equal to the number
of pixels in the face vector, xk. the output of a given cell for a given face is simply the sum of its
inputs (the elements of the face vector, xk ) weighted by the connection strengths between itself
and all of the other cells.

3.2 Neural Network Approach with Principal Component Analysis

Face recognition using eigenfaces, also called principal component analysis (PCA) is
a fairly popular approach to face recognition. It encodes the most relevant information in a group
of faces which best distinguish them from one another. The approach transform face images into
a small set of characteristic feature images called “eigenfaces”, which are the principal
components of the initial training set of face images. Recognition is performed by projecting a
new face image into the sub-space spanned by the eigenfaces (face space) and then classifying
the face by comparing its position in the face space with the position of known individuals. Each
face image in the training set can be represented exactly in terms of a linear combination of the
eigenfaces and associated weights. A new face is recognized by comparing the feature weights
needed to approximately reconstruct them with weights associated with the known individuals.

Reconstruction and recognition of face images can also be performed using


recirculation neural network (RNN). A recirculation neural network is based on multi-layer
perceptron, and it has one hidden layer. The number of units in the input and output layers is n
and equals to the number of pixels in the input image. The number of units in the hidden layer is
m, where m<<n, and m is equal to the number of principal components which is predefined.
During the training process a neural network is learn to compress and reconstruct input images
through the small number of hidden units, and the output of the hidden units is a compressed
representation of the image. Recognition is performed by calculation of Euclidean distance
between points, whose coordinates were the outputs of hidden units. The set of distances from
the unknown image to all images from training set is calculated. An image with minimal distance
to the unknown image is considered as recognized. Average recognition rate is 92%. The
advantage of RNN is that training time linearly depends on the number of required principal
components, not on the number of pixels in the face images.

The PCA finds a set of basis images and represent faces as a linear combination of
those images. The basis images found by PCA depend only on pair wise relationships between
pixels in the image database. A generalization of PCA, independent component analysis (ICA)
finds better basis images sensitive to high-order relationships among pixels. The face images
employed by them were a subset of the FERET database. The data set contained images of 425
individuals. The images cropped to 60 x 50 pixels. The goal in this approach was to find a set of
statistically independent basis images. A data matrix X was organized so that the images are in
the rows and the pixels are in the columns, i.e. X has 425 rows and 3000 columns, and each
image has zero mean. ICA is performed on face images in the FERET database under two
different architectures, one which treats the images as random variables and the pixels as
outcomes, and a second which treats the pixels as random variables and the images as outcomes.
The first architecture finds spatially local basis images for the faces. Under this architecture, ICA
found a basis set of statistically independent images. The images in this basis set were sparse and
localized in space, resembling spatial features. The second architecture treated pixels as random
variables images as random trials. Under this architecture, the image coefficients were
approximately independent, resulting in a factorial face code. Both ICA representations are
superior to representations based on PCA for recognizing faces across days and changes in
expression. A classifier that combines the two ICA representations gives the best performance.

A novel neural network architecture, which can recognize human faces with any view
with defined viewing angle range (from left 30 degrees to right 30 degrees out of plane rotation),
is proposed. View specific eigenface analysis is used as the front end of the system to extract
features, and neural network ensemble is used for recognition. The approach is based on
extending the eigenface approach i.e. one eigenface set is built for each view. Then the feature
coefficients of each image are extracted in the corresponding eigenspace. An ensemble neural
network (consisting of two layers) is used as the classifier to perform the pose invariant face
recognition. The first layer contains four view-specific neural networks, each of them is a
conventional feed-forward network trained with the backpropagation algorithm based on the
training data of a specific view. Each network accepts the 20 dimensional eigenface coefficients
as the input vector, has 15 hidden units, and has 6 output units. The training data for each neural
network contains 300 vectors, all of which are eigenface coefficients calculated with the
corresponding view. Among the 300 images, each of the 5 persons to be recognized contributes
40 images, and each of the 5 persons to be rejected contributes 20 images. The second layer is a
combinational neural network trained on the output results of all the networks in the first layer.
The output of the second layer network can not only tell the identity of the input image, but also
tell the pose of the image, which is a bonus to the face recognition. The system achieves an
average recognition ratio as high as 98.75%.

4.0 PRINCIPAL COMPONENT ANALYSIS


Introduction

The Principal Component Analysis (PCA) is one of the most successful techniques
that have been used in image recognition and compression. PCA is a statistical
method under the broad title of factor analysis. The purpose of PCA is to reduce the
large dimensionality of the dataspace (observed variables) to the smaller intrinsic
dimensionality of feature space (independent variables), which are needed to
describe the data economically. This is the case when there is a strong correlation
between observed variables. The jobs which PCA can do are prediction, redundancy
removal, feature extraction, data compression etc. Because PCA is a classical
technique which can do something in the linear domain, applications having linear
models are suitable, such as signal processing, image processing, system and
control theory, communications, etc. Face recognition has many applicable areas.
Moreover, it can be categorized into face identification, face classification, or sex
determination. The most useful applications contain crowd surveillance, video
content indexing, personal identification (ex. driver’s license), mug shots matching,
entrance security, etc. The main idea of using PCA for face recognition is to express
the large 1-Dvector of pixels constructed from 2-D facial image into the compact
principal components of the feature space. This can be called eigenspace
projection. Eigenspace is calculated by identifying the eigenvectors of the
covariance matrix derived from a set of facial images(vectors). Here we describe
mathematical formulation of PCA.

.
4.1 Mathematics of PCA

A 2-D facial image can be represented as 1-D vector by concatenating each row (or column) into
a long thin vector. Let’s suppose we have M vectors of size N (= rows of image £ columns of
image) representing a set of sampled images. pj’s represent the pixel values.

The images are mean centered by subtracting the mean image from each image vector. Let m
represent the mean image.

And let wi be defined as mean centered image

Our goal is to find a set of ei’s which have the largest possible projection onto each of the wi’s.
We wish to find a set of M orthonormal vectors ei for which the quantity

is maximized with the orthonormality constraint


It has been shown that the ei’s and ¸i’s are given by the eigenvectors and eigenvalues of the
covariance matrix
C = WWT (6)
Where W is a matrix composed of the column vectors wi placed side by side. The size of Cis
N £ N which could be enormous. For example, images of size 64 £ 64 create the covariance
matrix of size 4096£4096. It is not practical to solve for the eigenvectors of C directly. A
common theorem in linear algebra states that the vectors ei and scalars ¸i can be obtained by
solving forthe eigenvectors and eigenvalues of the M £M matrix WTW. Let di and ¹i be the
eigenvectors and eigenvalues of WTW, respectively.

WTWdi = ¹idi (7)

By multiplying left to both sides by W


WWT (Wdi) = ¹i(Wdi)
which means that the first M ¡ 1 eigenvectors ei and eigenvalues ¸i of WWT are given byWdiand
¹i, respectively. Wdi needs to be normalized in order to be equal to ei. Since we only sumup a
finite number of image vectors, M, the rank of the covariance matrix cannot exceed M ¡ 1(The ¡1
come from the subtraction of the mean vector m).

The eigenvectors corresponding to nonzero eigen values of the covariance matrix produce an
orthonormal basis for the subspace within which most image data can be represented with a
small amount of error. The eigenvectors are sorted from high to low according to their
corresponding eigen values. The eigenvector associated with the largest eigen value is one that
reflects the greatest variance in the image. That is, the smallest eigen value is associated with the
eigenvector that finds the least variance. They decrease in exponential fashion, meaning that the
roughly 90% of the total variance is contained in the first 5% to 10% of the dimensions. A facial
image can be projected onto M0 (<< M) dimensions by computing

where vi = eTi wi. vi is the ith coordinate of the facial image in the new space, which came to be
the principal component. The vectors ei are also images, so called, eigenimages, or eigenfaces in
our case, which was first named by [1]. They can be viewed as images and indeed look like faces
.So, describes the contribution of each eigenface in representing the facial image by treating the
eigenfaces as a basis set for facial images. The simplest method for determining which face class
provides the best description of an input facial image is to find the face class k that minimizes
the Euclidean distance

where k is a vector describing the kth face class. If ²k is less than some predefined threshold µ², a
face is classified as belonging to the class k.
4.2 PCA Features

1. PCA computes means, variances, covariance’s, and correlations of large data sets
2. PCA computes and ranks principal components and their variances.
3. Automatically transforms data sets.
4. PCA can analyze datasets up to 50,000 rows and 200 columns

4.3 Benefits of PCA

The basic Benefit in PCA is to reduce the dimension of the data.


1. No data redundancy as components is Orthogonal.
2. With help of PCA, complexity of grouping the images can be reduced.
3. Application of PCA in the prominent field of criminal investigation is beneficial.
4. PCA also benefits entrance control in buildings, access control for computers in general,
for automatic teller machines in particular, day-to-day affairs like withdrawing money
from bank account, dealing with the post office, passport verification, and identifying the
faces in a given databases

5.0 NEURAL NETWORK

Neural networks are adaptive statistical models based on an analogy with the structure of the
brain. They are adaptive because they can learn to estimate the parameters of some population
using a small number of exemplars(one or a few) at a time. They do not differ essentially from
standard statistical models. For example, one can find neural network architectures akin to
discriminant analysis, principal component analysis, logistic regression, and other techniques. In
fact, the same mathematical tools can be used to analyze standard statistical models and neural
networks. Neural networks are used as statistical tools in a variety of fields, including
psychology, statistics, engineering, econometrics, and even physics. They are used also as
models of cognitive processes by neuro- and cognitive scientists. Basically, neural networks are
built from simple units, sometimes called neuronsor cells by analogy with the real thing. These
units are linked by a set of weighted connections. Learning is usually accomplished by
modification of the connection weights. Each unit codes or corresponds to a feature or a
characteristic of a pattern that we want to analyze or that we want to use as a predictor. These
networks usually organize their units into several layers. The first layer is called the input layer,
the last one the output layer. The intermediate layers (if any) are called the hidden layers. The
information to be analyzed is fed to the neurons of the first layer and then propagated to the
neurons of the second layer for further processing. The result of this processing is then
propagated to the next layer and so on until the last layer. Each unit receives some information
from other units (or from the external world through some devices) and processes this
information, which will be converted into the output of the unit. The goal of the network is to
learn or to discover some association between input and output patterns, or to analyze, or to find
the structure of the input patterns. The learning process is achieved through the modification of
the connection weights between units. In statistical terms, this is equivalent to interpreting the
value of the connections between units as parameters (e.g., like the values of a and b in the
regression equation by = a + bx) to be estimated.
The learning process specifies the “algorithm” used to estimate the parameters.

5.1 The building blocks of neural networks

Neural networks are made of basic units (see Figure 1) arranged in layers. A unit collects
information provided by other units (or by the external world) to which it is connected with
weighted connections called synapses. These weights, called synaptic weights multiply (i.e.,
amplify or attenuate) the input information:
A positive weight is considered excitatory, a negative weight inhibitory.

Figure 1: The basic neural unit processes the input information into the output information.
Each of these units is a simplified model of a neuron and transforms its input information into an
output response. This transformation involves two steps: First, the activation of the neuron is
computed as the weighted sum of it inputs, and second this activation is transformed into a
response by using a transfer function. Formally, if each input is denoted xi, and each weight wi,
then the activation is equal to a =summation(xiwi), and the output denoted o is obtained
as o = f(a). Any function whose domain is the real numbers can be used as a transfer function.
The most popular ones are the linear function (o proportional to a), the step function (activation
values less than a given threshold are set to 0 or to −1and the other values are set to +1), the
logistic function

which maps the real numbers into the interval [−1 + 1] and whose derivative, needed for
learning, is easily computed {f0(x) = f(x) [1 − f(x)]}, and the normal or Gaussian function

Some of these functions can include probabilistic variations; for example, a neuron can transform
its activation into the response +1 with a probability of 1/2 when the activation is larger than a
given threshold.

The architecture (i.e., the pattern of connectivity) of the network, along with the transfer
functions used by the neurons and the synaptic weights, completely specify the behavior of the
network.

5.3 Learning rules

Neural networks are adaptive statistical devices. This means that they can change iteratively the
values of their parameters (i.e., the synaptic weights) as a function of their performance. These
changes are made according to learning rules which can be characterized as supervised (when a
desired output is known and used to compute an error signal) or unsupervised (when no such
error signalis used). The Widrow-Hoff (a.k.a., gradient descent or Delta rule) is the most widely
known supervised learning rule. It uses the difference between the actual input of the cell and the
desired output as an error signal for units in the output layer. Units in the hidden layers cannot
compute directly their error signal but estimate it as a function (e.g., a weighted average) of the
error of the units in the following layer. This adaptation of the Widrow-Hoff learning rule is
known as error backpropagation. With Widrow-Hoff learning, the correction to the synaptic
weights is proportional to the error signal multiplied by the value of the activation given by the
derivative of the transfer function. Using the derivative has the effect of making finely tuned
corrections when the activation is near its extreme values (minimum or maximum) and larger
corrections when the activation is in its middle range. Each correction has the immediate effect
of making the error signal smaller if a similar input is applied to the unit. In general, supervised
learning rules implement optimization algorithms akin to descent techniques because they search
for a set of values for the free parameters (i.e., the synaptic weights) of the system such that
some error function computed for the whole network is minimized. The Hebbian rule is the most
widely known unsupervised learning rule, it is based on work by the Canadian neuropsychologist
Donald Hebb, who theorized that neuronal learning (i.e., synaptic change) is a local phenomenon
expressible in terms of the temporal correlation between the activation values of neurons.
Specifically, the synaptic change depends on both presynaptic and postsynaptic activities and
states that the change in a synaptic weight is a function of the temporal correlation between the
presynaptic and postsynaptic activities. Specifically, the value of the synaptic weight between
two neurons increases whenever they are in the same state and decreases when they are in
different states. Some important neural network architecture One the most popular architectures
in neural networks is the multi-layer perceptron (see Figure 2). Most of the networks with this
architecture use the Widrow-Hoff rule as their learning algorithm and the logistic function as the
transfer function of the units of the hidden layer (the transfer function is ingeneral non-linear for
these neurons). These networks are very popular because they can approximate any multivariate
function relating the input to the output. In a statistical framework, these networks are akin to
multivariate non-linear regression. When the input patterns are the same are the output patterns,
these networks are called auto-associators. They are closely related to linear (if the hidden units
are linear) or non-linear (if not) principal component analysis and other statistical techniques
linked to the general linear model (see Abdi et al., 1996), such as discriminant analysis or
correspondence analysis.

A recent development generalizes the radial basis function networks (rbf) (see Abdi, Valentin, &
Edelman, 1999) and integrates them with statistical learning theory (see Vapnik, 1999) under the
name of support vector machine or SVM (see Sch¨olkopf & Smola, 2003). In these networks, the
hidden units (called the support vectors) represent possible (or even real) input patterns and their
response is a function to their similarity to the input pattern under consideration. The similarity is
evaluated by a kernel function (e.g., dot product; in the radial basis function the kernel is the
Gaussian transformation of the Euclidean distance between the support vector and the input). In
the specific case of rbf networks—that we will use as an example of SVM—the output of
the units of the hidden layers are connected to an output layer composed of linear units. In fact,
these networks work by breaking the difficult problem of a nonlinear approximation into two
more simple ones. The first step is a simple nonlinear mapping (the Gaussian transformation of
the distance from the kernel to the input pattern), the second step corresponds to a linear
transformation from the hidden layer to the output layer. Learning occurs at the level of the
output layer. The main difficulty with these architectures resides in the choice of the support
vectors and the specific kernels to use. These networks are used for pattern recognition,
classification, and for clustering data.

6.0 CONCLUSION

Face Recognition

Vous aimerez peut-être aussi