00051969

252
LEARNING IN A SINGLE PASS: A NEURAL MODEL FOR PRINCIPAL COMPONENTS ANALYSIS
AND LINEAR REGRESSION

D. Rosenblatt ('1.
A. Lelu p*), A. George1 (**I
Concept Technologies, FRANCE,
(")
INIST-CNRS, FRANCE
ABSTRACT
We describe a neural data-analyser. We 5rst prove

that the factors of principal components analysis
( P W - i.e. the eigenvectors of the data covariance
matrix - can be computed according to a recursive
algorithm. We then derive a neural network
extracting the factors of a vector data flow. This is
achieved by a learning law. involving hebbian
reinforcement and lateral interaction between
neurons. A realistic implementation is then
discussed. We then show how the former model can
be used to implement learning gain control on a
neural network achieving linear regression analysis.
An important feature of OUT model is that It obtains
the exact solutions to the problems of PCA and
regression in a single learning pass on data
pattems.
INTRODUCTION
Data analysis encounters an increasing interest in
neural network research, on account of possible
applications in many fields. such as socio-economic
analysis, prospective analysis [Tumer (1)) and
documentquerying systems (Lelu (2)).But the
domains involved often require that problems like
intermediate storage space, execution time and
incremental learning be solved beforehand. For
instance, a documentary querying system may store
millions of documents with thousands of descriptors
[keywords), and therefore it is imperative that the
learning of a new data pattem does not imply a
recalculation involving all the data already stored.
Well tried classical algorithms rapidly find their limits
with respect to the previous problems. Even stochastic
iterative algorithms used in advanced data analysis
such a s Jacobi's. Lanczos's or Amoldi's [Lascaux and
ThCodor (3)).are not actually suited for large amounts
of data because they require heavy matrix calculations.
Furthermore, they do not solve the problem of
incremental learning. In the factorial analysis field, the
stochastic approximation algorithm described by
Benzecri (4) - implies however a smaller amount of
calculation and storage, but needs several passes over
data before a "good" convergence is reached, leading to
an unsatisfactory solution to the problem of
incremental leaming.
Neural networks appear to be a natural framework for
solving these problems. Nevertheless, a major effort is
required in formalizing learning algorithms. In PCA
domain, Oja (5)describes a stochastic learning neuron
extracting the first principal component from an input
vector sequence. In (6). Lelu and Rosenblatt derive a
single pass learning algorithm extracting an
approdmate value of the first principal component.
Gallinand Fogelman-Soulie (7) suggest an

algorithm extracting a new factor at every complete
pass over the data. Nevertheless, none of these
algorithms takes into account all the desired criteria.
In the regression analysis domain, Kohonen (8)
describes an algorithm, derived from Greville's
algorithm for pseudo-inverse calculation, extracting
the regression axis between two sets of vector pattems.
This algorithm is theoretically an exact solution for the
regression problem, but leads to practical computation
instabilities as noted by Kohonen.
In this paper we derive a recursive algorithm achieving

a complete PCA using the mathematical properties of
the covariance matrix of the leamed data pattems and
of its eigenvectors and eigenvalues. These eigenvalues
and eigenvectors can be extracted in a single pass over
the data and their properties, such as orthogonality,
normalization, etc..., are rigorously and constantly
preserved, provided some realistic conditions on the
data pattems are assumed. We then interpret this
algorithm as a leaming law of a linear response neural
network, where each neuron represents one of the
eigenvectors. Initialization. stability and convergence of
the network are then discussed and solutions are
provided for limiting cases. Great emphasis is put
upon constructing a realistic version of the algorithm
for real world implementations. Experimental results
are then discussed.
After that, we give a brief description of Kohonen's
regression algorithm, and then derive a neural model
achieving PCA and regression simultaneously. This
model is formed of two networks. the first one
achieving PCA. the second one using the extracted
factors to perform gain control. Previous results on
stability and convergence of PCA can then be
succesfully applied to this new model.
A NEW ALGORITHM FOR PCA LEARNING
Principal Components Analysis is based on extraction
of factors - i.e. the eigenvectors of covariance matrix of the data pattems.
In the case of continous data varying with time t, all of
which are equally weighed, the covariance matrix can
be written:
The hypothesis of equally weighed data, and even the

use of other metrics, can be achieved by a
transformation of the data pattems. For example. we
should use the x 2 metrics to perform a
Correspondence Analysis of the data.
253
Let c i . lSi<q, be the q eigenvectors of ,C

corresponding to the q non-zero eigenvalues hi: then
V i, C,,
ci
h i c i , andthus
(2)
Denoting ui=xTci. and multiplying (3)by ciT, we

obtain the differential equation
(4)
should be noticed that the term 1/ (hi-hj ) appears in

the expression of the rounding error propagated while
computing the eigenvectors of a matrix. and it is
generally a n essential term in convergence properties
of classical extraction methods as discussed in (3).We
can now summarize our results:
Let x=x (t) E 5Rn be a stochastic vector. Let c,, be
the covariance matrix of X. and q the number of its
non-zero eigenvalues. Let h i be one of these
eigenvalues. related to the normallzed eigenvector c i
Denotingui=xTci, ifthe multiplicity of all the hi is 1,
the learning law for each eigenvalue and associated
eigenvector is given by:
(5)
Without loss of generality, we may consider normalized

eigenvectors, and combining relations (5) and (3). we
obtain
A
NEURAL "ORK
IMPLEMENIATION FOR PCA

Network definition
The image space Im A of a matrix operator
A is shown
to be the space spanned by the eigenvectors of A. Thus

the term C,,
d c i / d t is a linear combination of
vectors c i . and it clearly appears from (6)that d c i / d t
itself is a linear combination of vectors c i and the data
pattem x. so that q values a i j and one value p i can
be found, for which
(7)
On the other hand, noting that the eigenvectors of a

symmetric matrix form an orthogonal base of the
matrix image. x may be decomposed as follows:
We can now exhibit a neural network model able to

extract the eigenvalues and eigenvectors of a
covariance matrix of a vector data flow. A diagram of
the model, is shown in fig. 1.
A n input vector xs5Rn generates an output vector
USWwhere q is the number of bom neurons, e n . At
an instant k no more than k neurons may be bom. At
-.
each learning iteration. - i.e. for each input x

an
interaction vector ei is created for every bom neuron
cell. This interaction vector wlll be used by the learning
algorithm of the neuron, and represents an
inhibition/excitation interaction between cells.
X
II
V
(8)
where & is the orthogonal projection of x on the space

orthogonal to Im cxx.We can now evaluate ai and p i
by replacing terms in equation (6) using relations (7)
and (8).and observing that all vectors involved
are. by definition. linearly independent. thus leading to
the solutions
Figure 1 Neural network model
The leamine a l orithm,

A neuron i is defined by,
aii
piui,
assuming h i d , for i#j. This is a necessary

condition for the existence of a unique solution to the
problem. This condition is normally fulfilled for real
world data, and in exceptional cases some "noise" may
be introduced on eigenvalues to overcome the problem,
as will be seen - and quantiAed - later. Moreover. it
- two state parameters, namely a synaptic weight

vector c i and an information rate hi.
- an input/output transfer function ui=ciTx,
where the ( u i) are the components of the vector U.
- a learning law depending on the current input,
output and state of the network, by means of the
254
interaction vectors ei. The interaction vector ei for

neuron i is defined as
T h e learning law is derived from relations (9) as a first

order approximation in a Taylor expansion of ci. Thus
we have the new algorithm:
Initialisation:
largest variation vector IIAc(Imaxof the (ci) is a good

measure of the instability generated when learning a
new data pattem. 11 Ac \Imax can be deduced from
An important value of ((Acllmax

would lead to angular
and normalization distorsion, and thus instability.

From equation (13)two possible reasons of instability
clearly appear:
q + o
F o r data pattern x at iteration k, the
following rules are applied on all existent

neurons:
ui
ci + ci + h i k ( x
- an eigenvalue hi is small, corresponding to the

weak amount of inertia of the related axis at instant k
and
- two eigenvalues are nearly equal, corresponding
to some isotropy in the data distribution.
ei),
k-1
k
q
if x
#EUici
a neuron q is created:
i=l
q + q + 1 ,
X
cq
4-
1
,
We tested the following methods in order to overcome

these instabilities:
z,
Nxll
1l111I2l
Adding "noise" to eigenvalues. The modifications of the

computed eigenvalues. the eigenvectors remaining
unchanged, is equivalent to a modification dCxx of
the matrix cxx,namely
where
q
& = x
-E uici.
In real computations, regions of dangerous instability

appear, partkulary when the number of bom neurons
reaches the dimension of the data space, as shown in
figure 2. We can also see that the instability of the
network dramatically increases with the number of
neurons.
(10)
i=l
The values hi are the eigenvalues. and the vectors ci

the associated eigenvectors of cxx.
and from equation (12)we obtain
v:
At an iteration
k, the
synaptic weight vectors - resp. the information rates of the neurons must correspond to the eigenvectors resp. the eigenvalues - of the covariance matrix of
some learned data. To satisfy these requirements, the
matrix formed by the eigenvectors must be unitary and
the information values positive. This can be written
ciTcj
Thus for small changes on eigenvalues
i=l
1 if i=j
The euclidean norm of this variation matrix is a good

measure of the induced error. This norm is the largest
eigenvalue of Acxx, thus
0 otherwise,
V i, hi>O.
In that case, matrix cxx can be computed as follows:

(12)
Thus, regardless of the convergence properties which

will be discussed later, the stability properties will
strongly rely on the actual validity of requirements
(1 1).as shown by practical computations.
Instantaneous instability of the algorithm is due to the
first order approximation
made during the
computation of the (ci), so that the above
requirements can only be respected stochastically. The
IIACXXII~=m ~ xIMi
Cancellin$ the birth of a neuron. If the value lx112

inducing the birth of a new neuron is too small, the
related eigenvalue would later be a source of
instabilities. A simple threshold may be used to decide
the opportunity of creating a neuron. Work on adaptive
thresholds is in progress.
p
For .
important
values of IIAci 11, data may be learned repeatedly. In
this case, the iteration step is adapted.
255
Classic
Ten
Algorithm Neurons
+
+
+
+
++
TABLE 2
- Components of first eigenvector

after complete learning.
0
0
0
0.37
0.37
-0.06
0.32
0.35
0.42
0.13
-0.27
-0.40
-0.25
0.09
-0.36
-0,37
0.07
-0.33
-0.35
-0.41
-0.16
0.27
0.40
0.29
0.07
-0.37
-0.37
0.06
-0.32
-0.35
-0.41
-0.12
0.27
0.40
0.25
-0.08
7
8
9
10
11
@ @
1
2
3
4
5
Four
Neurons
0
0
Classic
Ten
Algorithm Neurons
>
Four
Neurons
dim x
1
2
Figure 2 log IIAcllmax as a function of the

number of learned data for ten
neurons (+), and four neurons (0)
4
5
6
7
8
9
10
11
Algorithm convergence and results
We tested the algorithm on data corresponding to the

evolution of the French budget in the last hundred
years. Bourroche and Saporta (9) perform a PCA on
these data using classical methods. The results in
tables 1. 2 and 3 were obtained using the previously
described stabilization methods.
TABLE 3
0.15
0.21
0.52
0.00
0.24
0.44
0.28
-0.10
-0.07
-0.56
-0.15
-0.20
0.08
-0.08
-0.52
-0.00
-0.24
-0.44
-0.28
0.10
0.07
0.56
0.50
0.03
0.25
0.49
0.32
-0.16
-0.07
-0.49
-0.16
-0.22
-0.04
- Components of second eigenvector

after complete learning.
The first series of results are obtained with a threshold

leading to the birth of ten neurons, which is the actual
A NEURAL REGRESSION MODEL
dimensionality of the data space. In the second series,

the threshold has been changed, and only four
neurons have grown.
Consider a (n,k) matrix X where each column x j is a

data vector. Consider a (p,k) matrix Y where each
column y j is a desired response to the entry x j. T h e
goal of regression is to find a matrix M, best
approximate solution in the sense of least squares to
The results of the ten neurons learning pass are

virtually the same as those obtained with the classical
method, as can be seen in tables 1, 2 and 3.T h e four
neurons simulation gives slightly different results, but
in practice this would be negligible for many data
analysis applications.
The comparison of results in tables 1, 2, 3 and figure
2. shows the opposite evolutions of the models
stability and convergence.
1
2
3
4
5
6
7
8
9
10
TABLE 1
Classic
Ten
Algorithm Neurons
Four
Neurons
4.98
2.05
1.29
0.99
0.71
0.56
0.20
0.12
0.06
0.03
4.97
1.86
0.97
0.89
5.05
2.02
1.32
1.01
0.69
0.56
0.20
0.12
0.06
0.03
- Eigenvalues after complete

learning.
MX
Kohonens algorithm
Kohonen (8) derives a recursive solution from the
Grevilles theorem related to pseudo-inverses. Denoting
0 =
Y =
h =
g =
I - XXf
(XXT)+
0x
Yx
The algorithm reads:

Initialisation:
I,
0, M
For each vector x:

if h = 0,
p = CJ(l+gTx)-l,
0 4- 0,
Y t Y - gpT,
otherwise,
p = h(hTh)-l,
0 C 0 - hp,
Y t Y + (l+xTg)ppT
M 4- M + ( y - Mx)pT.
- gpT
pgT,
(14)
256
DerlvinP the new algorithm

The previous algorithm is highly unstable. This is due
to the difficulty in deciding whether the vector h is zero
or not and also to numerical problems while computing
matrix Y . The Arst problem is not only computational,
since it is fflusory to take account of a weak amount of
information without inducing instabilities. Our new
algorithm for regression aim's at overcoming the
second problem, using the previous results on PCA.
Figure 3 Regression Neural network model
using the notations introduced for PCA, we have

Y =
:c+,.
(15)
xx+ is the orthogonal projection on the data space. The

latter is also the space spanned by the orthonormal
eigenvectors ( c i ) of cXx.Thus
-c
The stability propemes highly depend on the

eigenvalues of cXx,as can be seen in the expression of
the gain vector p. The stabilization methods for the
PCA network, especially the threshold method. apply
'straightforwardly to the above algorithm.
Ujcj
CONCLUSION
(16)
j=1
From relations (15) and (16). and noticing that the

spectral decomposition of cxx+ is
We are able now to compute vectors p and 8 as

functions of eigenvectors. eigenvalues and output
calculated by a PCA network, thus leading to a new
neural network model.
Network definition
The new regression analyser model is made of two
neural networks. The first one performs a PCA and is
used to compute a gain control vector p for the second
network wich calculates the regression matrix M.
Our PCA neural model has proven its flexibility and

reliability. The precision of computation of eigenvectors
and eigenvalues can be adapted to particular
requirements as needed. Our regression analysis
neural model improves the reliability comparatively to
previous methods because the origin of instabilities
clearly appears during computations.
W e believe that our neural network model can be

extended so as to include other data analysis methods,
such as classification or non-linear and local data
analysis. Furthermore it is worthwhile to notice that
the neural network approach allows a link between two
separate fields of data analysis, namely PCA and
regression analysis.
1.
The learning algorithm
The corresponding algorithm is:
2.
Initialisation:
a
3.
For each vector x :
teach data p a t t e r n x t o t h e
using algorithm
(lo),
PCA
network
4.
5.
6.
else
7.
P =
8.
vi2
IIxI12
1=1
M C M + (Y
- Mx)PT
9.
REFERENCES
Tumer W. A., 1988, "Information aids for
technological
decision-making:
new
dataprocessing and interrogation techniques for
full-text
patent
databases",
Conference
proceedings "RIAO 8 8 , MIT, U S .
Lelu A., 1988, "Browsing through Image
Databases via Data Analysis and Neural
Networks". Conference proceedings "RIAO 88.
MITT.U S .
Lascaux P.. Theodor R. 1987, "Analyse
numtrique matrlcielle appliquee a l'art de
l'ingenieur". Masson. Paris. France.
Benzecri J. P.. 1969. Bull. Socio. Math.. n" 97,
"Approxtmation stochastique dans une algebre
non normee commutative".
Oja E., 1982, Jour. Math. Bio.. "A simplified
Neuron Model as a Principal Component
Analyser".
Lelu A.. Rosenblatt D.. 1986, Cahiers Ana.
donnees, vol. XI, "Representation et parcours
d u n espace documentaire. Analyse de donn6es.
reseaux neuronaux et bases dimages".
Gallinary P.. Fogelman-Soulie F., 1988.
"Pro essive design of MLPs Architecture",
Congrence proceedings "Neuro Nimes 88".
Nimes. France.
Kohonen T., 1984, "Self-Organization and
Associative Memory", Springer Verlag. New
York, U S .
Bourroche J.M.. Saporta G.. 1980, "L'analyse de
donnees", PUF. Paris, France

00051969

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

00051969

Transféré par

Droits d'auteur :

Formats disponibles

252

LEARNING IN A SINGLE PASS: A NEURAL MODEL FOR PRINCIPAL COMPONENTS ANALYSIS

AND LINEAR REGRESSION

A. Lelu p*), A. George1 (**I

Concept Technologies, FRANCE,

We describe a neural data-analyser. We 5rst prove

Gallinand Fogelman-Soulie (7) suggest an

In this paper we derive a recursive algorithm achieving

The hypothesis of equally weighed data, and even the

Let c i . lSi<q, be the q eigenvectors of ,C

Denoting ui=xTci. and multiplying (3)by ciT, we

should be noticed that the term 1/ (hi-hj ) appears in

Without loss of generality, we may consider normalized

IMPLEMENIATION FOR PCA

The image space Im A of a matrix operator

to be the space spanned by the eigenvectors of A. Thus

On the other hand, noting that the eigenvectors of a

We can now exhibit a neural network model able to

each learning iteration. - i.e. for each input x

where & is the orthogonal projection of x on the space

The leamine a l orithm,

assuming h i d , for i#j. This is a necessary

- two state parameters, namely a synaptic weight

interaction vectors ei. The interaction vector ei for

T h e learning law is derived from relations (9) as a first

largest variation vector IIAc(Imaxof the (ci) is a good

An important value of ((Acllmax

and normalization distorsion, and thus instability.

following rules are applied on all existent

- an eigenvalue hi is small, corresponding to the

We tested the following methods in order to overcome

Adding "noise" to eigenvalues. The modifications of the

In real computations, regions of dangerous instability

The values hi are the eigenvalues. and the vectors ci

Thus for small changes on eigenvalues

The euclidean norm of this variation matrix is a good

In that case, matrix cxx can be computed as follows:

Thus, regardless of the convergence properties which

Cancellin$ the birth of a neuron. If the value lx112

- Components of first eigenvector

Figure 2 log IIAcllmax as a function of the

Algorithm convergence and results

We tested the algorithm on data corresponding to the

- Components of second eigenvector

The first series of results are obtained with a threshold

A NEURAL REGRESSION MODEL

dimensionality of the data space. In the second series,

Consider a (n,k) matrix X where each column x j is a

The results of the ten neurons learning pass are

- Eigenvalues after complete

The algorithm reads:

For each vector x:

DerlvinP the new algorithm

using the notations introduced for PCA, we have

xx+ is the orthogonal projection on the data space. The

The stability propemes highly depend on the

From relations (15) and (16). and noticing that the

We are able now to compute vectors p and 8 as

Our PCA neural model has proven its flexibility and

W e believe that our neural network model can be

The learning algorithm

The corresponding algorithm is:

For each vector x :

Vous aimerez peut-être aussi