Vous êtes sur la page 1sur 5

252

LEARNING IN A SINGLE PASS: A NEURAL MODEL FOR PRINCIPAL COMPONENTS ANALYSIS

AND LINEAR REGRESSION


D. Rosenblatt ('1.

A. Lelu p*), A. George1 (**I

Concept Technologies, FRANCE,

(")

INIST-CNRS, FRANCE

ABSTRACT

We describe a neural data-analyser. We 5rst prove


that the factors of principal components analysis
( P W - i.e. the eigenvectors of the data covariance
matrix - can be computed according to a recursive
algorithm. We then derive a neural network
extracting the factors of a vector data flow. This is
achieved by a learning law. involving hebbian
reinforcement and lateral interaction between
neurons. A realistic implementation is then
discussed. We then show how the former model can
be used to implement learning gain control on a
neural network achieving linear regression analysis.
An important feature of OUT model is that It obtains
the exact solutions to the problems of PCA and
regression in a single learning pass on data
pattems.
INTRODUCTION
Data analysis encounters an increasing interest in
neural network research, on account of possible
applications in many fields. such as socio-economic
analysis, prospective analysis [Tumer (1)) and
documentquerying systems (Lelu (2)).But the
domains involved often require that problems like
intermediate storage space, execution time and
incremental learning be solved beforehand. For
instance, a documentary querying system may store
millions of documents with thousands of descriptors
[keywords), and therefore it is imperative that the
learning of a new data pattem does not imply a
recalculation involving all the data already stored.
Well tried classical algorithms rapidly find their limits
with respect to the previous problems. Even stochastic
iterative algorithms used in advanced data analysis
such a s Jacobi's. Lanczos's or Amoldi's [Lascaux and
ThCodor (3)).are not actually suited for large amounts
of data because they require heavy matrix calculations.
Furthermore, they do not solve the problem of
incremental learning. In the factorial analysis field, the
stochastic approximation algorithm described by
Benzecri (4) - implies however a smaller amount of
calculation and storage, but needs several passes over
data before a "good" convergence is reached, leading to
an unsatisfactory solution to the problem of
incremental leaming.
Neural networks appear to be a natural framework for
solving these problems. Nevertheless, a major effort is
required in formalizing learning algorithms. In PCA
domain, Oja (5)describes a stochastic learning neuron
extracting the first principal component from an input
vector sequence. In (6). Lelu and Rosenblatt derive a
single pass learning algorithm extracting an
approdmate value of the first principal component.

Gallinand Fogelman-Soulie (7) suggest an


algorithm extracting a new factor at every complete
pass over the data. Nevertheless, none of these
algorithms takes into account all the desired criteria.
In the regression analysis domain, Kohonen (8)
describes an algorithm, derived from Greville's
algorithm for pseudo-inverse calculation, extracting
the regression axis between two sets of vector pattems.
This algorithm is theoretically an exact solution for the
regression problem, but leads to practical computation
instabilities as noted by Kohonen.

In this paper we derive a recursive algorithm achieving


a complete PCA using the mathematical properties of
the covariance matrix of the leamed data pattems and
of its eigenvectors and eigenvalues. These eigenvalues
and eigenvectors can be extracted in a single pass over
the data and their properties, such as orthogonality,
normalization, etc..., are rigorously and constantly
preserved, provided some realistic conditions on the
data pattems are assumed. We then interpret this
algorithm as a leaming law of a linear response neural
network, where each neuron represents one of the
eigenvectors. Initialization. stability and convergence of
the network are then discussed and solutions are
provided for limiting cases. Great emphasis is put
upon constructing a realistic version of the algorithm
for real world implementations. Experimental results
are then discussed.
After that, we give a brief description of Kohonen's
regression algorithm, and then derive a neural model
achieving PCA and regression simultaneously. This
model is formed of two networks. the first one
achieving PCA. the second one using the extracted
factors to perform gain control. Previous results on
stability and convergence of PCA can then be
succesfully applied to this new model.
A NEW ALGORITHM FOR PCA LEARNING
Principal Components Analysis is based on extraction
of factors - i.e. the eigenvectors of covariance matrix of the data pattems.
In the case of continous data varying with time t, all of
which are equally weighed, the covariance matrix can
be written:

The hypothesis of equally weighed data, and even the


use of other metrics, can be achieved by a
transformation of the data pattems. For example. we
should use the x 2 metrics to perform a
Correspondence Analysis of the data.

253

Let c i . lSi<q, be the q eigenvectors of ,C


corresponding to the q non-zero eigenvalues hi: then
V i, C,,

ci

h i c i , andthus

(2)

Denoting ui=xTci. and multiplying (3)by ciT, we


obtain the differential equation
(4)

should be noticed that the term 1/ (hi-hj ) appears in


the expression of the rounding error propagated while
computing the eigenvectors of a matrix. and it is
generally a n essential term in convergence properties
of classical extraction methods as discussed in (3).We
can now summarize our results:
Let x=x (t) E 5Rn be a stochastic vector. Let c,, be
the covariance matrix of X. and q the number of its
non-zero eigenvalues. Let h i be one of these
eigenvalues. related to the normallzed eigenvector c i
Denotingui=xTci, ifthe multiplicity of all the hi is 1,
the learning law for each eigenvalue and associated
eigenvector is given by:

(5)

Without loss of generality, we may consider normalized


eigenvectors, and combining relations (5) and (3). we
obtain
A

NEURAL "ORK

IMPLEMENIATION FOR PCA


Network definition

The image space Im A of a matrix operator

A is shown

to be the space spanned by the eigenvectors of A. Thus


the term C,,
d c i / d t is a linear combination of
vectors c i . and it clearly appears from (6)that d c i / d t
itself is a linear combination of vectors c i and the data
pattem x. so that q values a i j and one value p i can
be found, for which
(7)

On the other hand, noting that the eigenvectors of a


symmetric matrix form an orthogonal base of the
matrix image. x may be decomposed as follows:

We can now exhibit a neural network model able to


extract the eigenvalues and eigenvectors of a
covariance matrix of a vector data flow. A diagram of
the model, is shown in fig. 1.
A n input vector xs5Rn generates an output vector
USWwhere q is the number of bom neurons, e n . At
an instant k no more than k neurons may be bom. At

-.

each learning iteration. - i.e. for each input x


an
interaction vector ei is created for every bom neuron
cell. This interaction vector wlll be used by the learning
algorithm of the neuron, and represents an
inhibition/excitation interaction between cells.
X

II
V

(8)

where & is the orthogonal projection of x on the space


orthogonal to Im cxx.We can now evaluate ai and p i
by replacing terms in equation (6) using relations (7)
and (8).and observing that all vectors involved
are. by definition. linearly independent. thus leading to
the solutions
Figure 1 Neural network model

The leamine a l orithm,


A neuron i is defined by,

aii

piui,

assuming h i d , for i#j. This is a necessary


condition for the existence of a unique solution to the
problem. This condition is normally fulfilled for real
world data, and in exceptional cases some "noise" may
be introduced on eigenvalues to overcome the problem,
as will be seen - and quantiAed - later. Moreover. it

- two state parameters, namely a synaptic weight


vector c i and an information rate hi.
- an input/output transfer function ui=ciTx,
where the ( u i) are the components of the vector U.
- a learning law depending on the current input,
output and state of the network, by means of the

254

interaction vectors ei. The interaction vector ei for


neuron i is defined as

T h e learning law is derived from relations (9) as a first


order approximation in a Taylor expansion of ci. Thus
we have the new algorithm:
Initialisation:

largest variation vector IIAc(Imaxof the (ci) is a good


measure of the instability generated when learning a
new data pattem. 11 Ac \Imax can be deduced from

An important value of ((Acllmax


would lead to angular

and normalization distorsion, and thus instability.


From equation (13)two possible reasons of instability
clearly appear:

q + o
F o r data pattern x at iteration k, the

following rules are applied on all existent


neurons:
ui
ci + ci + h i k ( x

- an eigenvalue hi is small, corresponding to the


weak amount of inertia of the related axis at instant k
and
- two eigenvalues are nearly equal, corresponding
to some isotropy in the data distribution.

ei),

k-1
k
q

if x

#EUici

a neuron q is created:

i=l
q + q + 1 ,
X

cq

4-

1
,

We tested the following methods in order to overcome


these instabilities:

z,

Nxll
1l111I2l

Adding "noise" to eigenvalues. The modifications of the


computed eigenvalues. the eigenvectors remaining
unchanged, is equivalent to a modification dCxx of
the matrix cxx,namely

where
q

& = x

-E uici.

In real computations, regions of dangerous instability


appear, partkulary when the number of bom neurons
reaches the dimension of the data space, as shown in
figure 2. We can also see that the instability of the
network dramatically increases with the number of
neurons.

(10)

i=l

The values hi are the eigenvalues. and the vectors ci


the associated eigenvectors of cxx.
and from equation (12)we obtain

v:
At an iteration

k, the
synaptic weight vectors - resp. the information rates of the neurons must correspond to the eigenvectors resp. the eigenvalues - of the covariance matrix of
some learned data. To satisfy these requirements, the
matrix formed by the eigenvectors must be unitary and
the information values positive. This can be written

ciTcj

Thus for small changes on eigenvalues

i=l

1 if i=j

The euclidean norm of this variation matrix is a good


measure of the induced error. This norm is the largest
eigenvalue of Acxx, thus

0 otherwise,

V i, hi>O.

In that case, matrix cxx can be computed as follows:


(12)

Thus, regardless of the convergence properties which


will be discussed later, the stability properties will
strongly rely on the actual validity of requirements
(1 1).as shown by practical computations.
Instantaneous instability of the algorithm is due to the
first order approximation
made during the
computation of the (ci), so that the above
requirements can only be respected stochastically. The

IIACXXII~=m ~ xIMi

Cancellin$ the birth of a neuron. If the value lx112


inducing the birth of a new neuron is too small, the
related eigenvalue would later be a source of
instabilities. A simple threshold may be used to decide
the opportunity of creating a neuron. Work on adaptive
thresholds is in progress.

p
For .
important
values of IIAci 11, data may be learned repeatedly. In
this case, the iteration step is adapted.

255

Classic
Ten
Algorithm Neurons

+
+

+
+

++

TABLE 2

- Components of first eigenvector


after complete learning.

0
0
0

0.37
0.37
-0.06
0.32
0.35
0.42
0.13
-0.27
-0.40
-0.25

0.09
-0.36
-0,37
0.07
-0.33
-0.35
-0.41
-0.16
0.27
0.40
0.29

0.07
-0.37
-0.37
0.06
-0.32
-0.35
-0.41
-0.12
0.27
0.40
0.25

-0.08

7
8
9
10
11

@ @

1
2
3
4
5

Four
Neurons

0
0

Classic
Ten
Algorithm Neurons

>

Four
Neurons

dim x

1
2

Figure 2 log IIAcllmax as a function of the


number of learned data for ten
neurons (+), and four neurons (0)

4
5
6

7
8
9
10
11

Algorithm convergence and results

We tested the algorithm on data corresponding to the


evolution of the French budget in the last hundred
years. Bourroche and Saporta (9) perform a PCA on
these data using classical methods. The results in
tables 1. 2 and 3 were obtained using the previously
described stabilization methods.

TABLE 3

0.15
0.21

0.52
0.00
0.24
0.44
0.28
-0.10
-0.07
-0.56
-0.15
-0.20

0.08

-0.08

-0.52
-0.00
-0.24
-0.44
-0.28
0.10
0.07
0.56

0.50
0.03
0.25
0.49
0.32
-0.16
-0.07
-0.49
-0.16
-0.22
-0.04

- Components of second eigenvector


after complete learning.

The first series of results are obtained with a threshold


leading to the birth of ten neurons, which is the actual

A NEURAL REGRESSION MODEL

dimensionality of the data space. In the second series,


the threshold has been changed, and only four
neurons have grown.

Consider a (n,k) matrix X where each column x j is a


data vector. Consider a (p,k) matrix Y where each
column y j is a desired response to the entry x j. T h e
goal of regression is to find a matrix M, best
approximate solution in the sense of least squares to

The results of the ten neurons learning pass are


virtually the same as those obtained with the classical
method, as can be seen in tables 1, 2 and 3.T h e four
neurons simulation gives slightly different results, but
in practice this would be negligible for many data
analysis applications.
The comparison of results in tables 1, 2, 3 and figure
2. shows the opposite evolutions of the models
stability and convergence.

1
2
3
4
5
6

7
8
9
10
TABLE 1

Classic
Ten
Algorithm Neurons

Four
Neurons

4.98
2.05
1.29
0.99
0.71
0.56
0.20
0.12
0.06
0.03

4.97
1.86
0.97
0.89

5.05
2.02
1.32
1.01
0.69
0.56
0.20
0.12
0.06
0.03

- Eigenvalues after complete


learning.

MX

Kohonens algorithm
Kohonen (8) derives a recursive solution from the
Grevilles theorem related to pseudo-inverses. Denoting
0 =
Y =
h =
g =

I - XXf
(XXT)+

0x
Yx

The algorithm reads:


Initialisation:

I,

0, M

For each vector x:


if h = 0,
p = CJ(l+gTx)-l,
0 4- 0,
Y t Y - gpT,
otherwise,
p = h(hTh)-l,
0 C 0 - hp,
Y t Y + (l+xTg)ppT
M 4- M + ( y - Mx)pT.

- gpT

pgT,
(14)

256

DerlvinP the new algorithm


The previous algorithm is highly unstable. This is due
to the difficulty in deciding whether the vector h is zero
or not and also to numerical problems while computing
matrix Y . The Arst problem is not only computational,
since it is fflusory to take account of a weak amount of
information without inducing instabilities. Our new
algorithm for regression aim's at overcoming the
second problem, using the previous results on PCA.
Figure 3 Regression Neural network model

using the notations introduced for PCA, we have


Y =

:c+,.

(15)

xx+ is the orthogonal projection on the data space. The


latter is also the space spanned by the orthonormal
eigenvectors ( c i ) of cXx.Thus

-c

The stability propemes highly depend on the


eigenvalues of cXx,as can be seen in the expression of
the gain vector p. The stabilization methods for the
PCA network, especially the threshold method. apply
'straightforwardly to the above algorithm.

Ujcj

CONCLUSION

(16)

j=1

From relations (15) and (16). and noticing that the


spectral decomposition of cxx+ is

We are able now to compute vectors p and 8 as


functions of eigenvectors. eigenvalues and output
calculated by a PCA network, thus leading to a new
neural network model.
Network definition
The new regression analyser model is made of two
neural networks. The first one performs a PCA and is
used to compute a gain control vector p for the second
network wich calculates the regression matrix M.

Our PCA neural model has proven its flexibility and


reliability. The precision of computation of eigenvectors
and eigenvalues can be adapted to particular
requirements as needed. Our regression analysis
neural model improves the reliability comparatively to
previous methods because the origin of instabilities
clearly appears during computations.

W e believe that our neural network model can be


extended so as to include other data analysis methods,
such as classification or non-linear and local data
analysis. Furthermore it is worthwhile to notice that
the neural network approach allows a link between two
separate fields of data analysis, namely PCA and
regression analysis.

1.

The learning algorithm

The corresponding algorithm is:

2.

Initialisation:

a
3.

For each vector x :

teach data p a t t e r n x t o t h e
using algorithm

(lo),

PCA

network

4.
5.

6.
else

7.

P =

8.

vi2

IIxI12
1=1
M C M + (Y

- Mx)PT

9.

REFERENCES
Tumer W. A., 1988, "Information aids for
technological
decision-making:
new
dataprocessing and interrogation techniques for
full-text
patent
databases",
Conference
proceedings "RIAO 8 8 , MIT, U S .
Lelu A., 1988, "Browsing through Image
Databases via Data Analysis and Neural
Networks". Conference proceedings "RIAO 88.
MITT.U S .
Lascaux P.. Theodor R. 1987, "Analyse
numtrique matrlcielle appliquee a l'art de
l'ingenieur". Masson. Paris. France.
Benzecri J. P.. 1969. Bull. Socio. Math.. n" 97,
"Approxtmation stochastique dans une algebre
non normee commutative".
Oja E., 1982, Jour. Math. Bio.. "A simplified
Neuron Model as a Principal Component
Analyser".
Lelu A.. Rosenblatt D.. 1986, Cahiers Ana.
donnees, vol. XI, "Representation et parcours
d u n espace documentaire. Analyse de donn6es.
reseaux neuronaux et bases dimages".
Gallinary P.. Fogelman-Soulie F., 1988.
"Pro essive design of MLPs Architecture",
Congrence proceedings "Neuro Nimes 88".
Nimes. France.
Kohonen T., 1984, "Self-Organization and
Associative Memory", Springer Verlag. New
York, U S .
Bourroche J.M.. Saporta G.. 1980, "L'analyse de
donnees", PUF. Paris, France

Vous aimerez peut-être aussi