Vous êtes sur la page 1sur 30

The Dynamics of Learning Vector Quantization, RUG, 10.01.

2005
The Dynamics of Learning Vector Quantization
Rijksuniversiteit Groningen
Mathematics and Computing Science
Michael Biehl, Anarta Ghosh
TU Clausthal-Zellerfeld
Institute of Computing Science
Barbara Hammer
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Vector Quantization (VQ)
Learning Vector Quantization (LVQ)
Introduction

The dynamics of learning
a model situation: randomized data
learning algorithms for VQ und LVQ
analysis and comparison: dynamics, success of learning
Summary
Outlook
prototype-based learning from example data:
representation, classification
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Vector Quantization (VQ)
aim:
representation of large amounts
of data by (few) prototype vectors
example:
identification and grouping
in clusters of similar data
assignment of feature vector
to the closest prototype w
(similarity or distance measure,
e.g. Euclidean distance )
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
unsupervised competitive learning
initialize K prototype vectors
present a single example
identify the closest prototype,
i.e the so-called winner
move the winner even
closer towards the example

intuitively clear, plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function ...
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
quantization error
( ) ( )

k
K
j k
P
1
j

K
1 j
VQ
d d
2
w H =
[
= = =

j
d prototypes data
w
j
is the winner !
here:
Euclidean distance
aim: faithful representation (in general: clustering )
Result depends on - the number of prototype vectors
- the distance measure / metric used
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Learning Vector Quantization (LVQ)
aim:
classification of data
learning from examples
Learning: choice of prototypes according to example data
example situtation:
3 classes
classification:
assignment of a vector
to the class of the closest
prototype w
, 3 prototypes






aim : generalization ability, i.e. correct classification
of novel data after training
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
prominent example [Kohonen]: LVQ 2.1.
present a single example
initialize prototype vectors
(for different classes)
identify the closest correct
and the closest wrong prototype

move the corresponding winner
towards / away from the example

known convergence / stability problems,
e.g. for infrequent classes
mostly: heuristically motivated variations of competitive learning
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
LVQ algorithms ...
- are frequently applied in a variety of problems involving
the classification of structured data, a few examples:
- appear plausible, intuitive, flexible
- are fast, easy to implement
- real time speech recognition
- medical diagnosis, e.g. from histological data
- texture recognition and classification
- gene expression data analysis
- . . .
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
illustration: microscopic images of (pig) semen cells after freezing
and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
healthy cells
damaged cells
prototypes
obtained
by LVQ (1)
illustration: microscopic images of (pig) semen cells after freezing
and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
LVQ algorithms ...
- are often based on purely heuristic arguments,

or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure,
inappropriate for heterogeneous data
- lack, in general, a thorough theoretical understanding of
dynamics, convergence properties,
performance w.r.t. generalization, etc.
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
In the following:
analysis of LVQ algorithms w.r.t.
- dynamics of the learning process
- performance, i.e. generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized, high-dimensional data
- essential features of LVQ learning
aim: - contribute to the theoretical understanding
- develop efficient LVQ schemes
- test in applications
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
model situation: two clusters of N-dimensional data
random vectors
N
according to
) P( p ) P(
1



=
=
( )
( )
(

=
2

N/2
-
2
1
exp
2
1
) P( mixture of two Gaussians:
orthonormal center vectors:
B
+
, B
-

N
, ( B
o
)
2
=1, B
+
B
-
=0
prior weights of classes p
+
,

p
-

p
+
+ p
-
= 1
B
+
B
-
(p
+
)
(p
-
)
separation

j
j
B
,

=
2 2 2
2
2
N 1
N
1

+ = = =

= j
j j j

independent components:
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
high-dimensional data (formally: N)
400 examples

N
, N=200, =1, p
+
=0.6




B

y


(240)
(160)
projections into the plane of
center vectors B
+
,

B
-




B y

=
+ +




2

2

x


=

(240)
(160)
projections in two independent
random directions w
1,2


1 1
x
w
=
model for studying typical behavior of LVQ algorithms,
not: density-estimation based classification
Note:
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
dynamics of on-line training
sequence of independent random data ( ) ... 1,2,3,

= acc. to ( )

P
learning rate,
step size
competition,
direction of
update etc.
change of prototype
towards or away
from the current data
above examples:
unsupervised Vector Quantization | | ( ) d d f

s

s s
O =

...
The Winner Takes It All
(classes irrelevant/unknown)
Learning Vector Quantization 2.1. | | { S f
s
) ( 1
) ( 1
...
class correct
class wrong
+

= =
here: two prototypes, no
explicit competition
| | ( )
1 -
s

s -

s s
1 -
s

s
, S, d d f
N

w w w + = ... , ,
( )
2
1
=
=

s

s
d
1 S,
w
update of prototype vectors:
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
| | ( )
| | ( ) | | ( ) | | | | ( )

1
... f ... f Q x ... f Q x ... f
1/N
Q Q
R y ... f
1/N
R R
t s
1 -
st

s t
1 -
st

t
s
1 -
st

st
1 -
s

s
1 -
s

s
+ + + =

2
| | ( )
1 -
s

s -

s s
1 -
s

s
, S, d d f
N

w w w + = ... , ,
recursions
mathematical analysis of the learning dynamics
( ) ( )
1
2 2
1 -
ss

s

s

s
Q 2x d + = =

w
1 -
s

s
B y x = =
t t
w
projections
distances
random vector

enters only in the form of
{ } ( ) 1 , 1 + e = = t, s,

st

s
Q B R w w w
projections in
the (B
+,
B
-
)-plane
length and relative
position of prototypes
1. description in terms of a few characteristic quantitities
( here:
2N

7
)

The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
s
N
1 j
j , j s,

N
1 j
j s,

s
R x = = =

= =
B w w
j

completely specified in terms of first and second moments (w/o indices )


in the thermodynamic limit N
random vector acc. to ) | P(



1 -
s

s
B y
w x

=
=
t t

correlated Gaussian
random quantities
st

t s
Q x x - x x =
t t t s

s

s
R y x - y x =
t t t
o y y - y y


=

= =
=
else
if
s

y
0
S
o
t
2. average over the current example
averaged recursions closed in { R
s
, Q
st
} p

1



=
=
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here: N
-1
)

st
R , Q
learning dynamics is completely described in terms of averages
3. self-averaging properties
4. continuous learning time
N

=
# of examples
# of learning steps
per degree of freedom
) ( R ), ( Q
s st
recursions coupled, ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
probability for misclassification of a novel example
( ) ( )

O +
+
O =
+ + +
d d p d d p
g

( ) ( )
(
(

+
(
(

u =

+
+

+ +
+

+ +



+
+

+ +
+

+ +

+ +
+
Q Q Q
R R 2 Q Q
Q Q Q
R R 2 Q Q
p p
2 2 2
1
2
1

5. learning curve
generalization error
g
() after training with N examples
N
- repulsive/attractive fixed points of the dynamics
- asymptotic behavior for o
- dependence on learning rate, separation, initialization
- ...
investigation and comparison of given algorithms
- time-dependent learning rate ()
- variational optimization w.r.t. f
s
[...]
- ...
optimization and development of new prescriptions
maximize

g
d
d
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
optimal classification with minimal generalization error
B
-
B
+
(p
-
>p
+
)
(p
+
)
separation of classes by the plane with
1) P( p 1) P( p + = = =
+

in the model situation (equal variances of clusters):
excess error
minimal
g
as a function
of prior weights
=2

g
0.25
0.50
0
0.5
1.0 0 p
+
=1
=0

The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
LVQ 2.1. update the correct and wrong winner

( )
1 -
s
1 -
s

s
S
N

w w w + =
(analytical)
integration
for w
s
(0) = 0
( ) ( )
( ) ( )


m m
m m
e 1
2
m 1
m
R e 1
2
m 1
m
R
Q e 1
2
m 1
m
R e 1
2
m 1
m
R
+
+
+ +

+ +

+
=
+

=
=

=
+
=
p
o
= (1+m o ) / 2 (m>0)
[Seo, Obermeyer]: LVQ2.1. cost function
(likelihood ratios)

+ +
+ + + + +
Q Q R R
Q , R , R
with , ,
finite remain
,
Q
+ +
R
+ +
R
+
R
+
Q
+
Q

R


10 2 4 8 6
6 -
0
6
theory and simulation (N=100)
p
+
=0.8, =1, q=0.5
averages over 100 independent runs
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
(p
-
)
(p
+
> p
-
)
strategies:
- selection of data in a window close to
the current decision boundary
slows down the repulsion,
system remains instable
- Soft Robust Learning Vector Quantization [Seo & Obermayer]
density-estimation based cost function
limiting case Learning from mistakes: LVQ2.1-step only,
if the example is currently misclassified
slow learning, poor generalization
problem: instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fr :

g
= max { p
+
,p
-
}
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
The winner takes it all
numerical
integration
for w
s
(0)=0

theory and simulation (N=200)
p
+
=0.2, =1.2, q=1.2
averaged over 100 indep. runs
Q
++

Q
--

Q
+-


w
+
w
-
B
+
B
-
trajectories in the (B
+
,B
-
)-plane
() o=20,40,....140
....... optimal decision boundary
____ asymptotic position
R
S
+

R
S-

R
--
R
-+
R
--
R
++
winner w
s
1
I) LVQ 1 [Kohonen] | | ( )
1 -
s

S

S
1 -
s

s
S d d
N

w w w O + =

only the winner is updated
according to the class membership
w
-
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
learning curve
o

g

q=1.2
(p+=0.2, =1.2)

g
() grows lin. with
- stationary state:
- role of the learning rate

100 200 300

g

0.26
0.22
0.18
0.14
0

2.0
0.4
0.2
0
- variable rate () !?
- well-defined asymptotics:



(ODE linear in )


10

g

20 30 40 50
0
0.14
0.26
0.22
0.18
min.
g
( )
0
0,
( )
suboptimal
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
The winner takes it all
II ) LVQ+ ( only positive steps without repulsion)
| | ( ) ( )
1 -
s

S
1 -
s

s
d d
N

w w w O + =

,

winner correct
asymptotic configuration
symmetric about (B
+
+B
-
)/2
w
-
w
+
B
+
B
-
p+=0.2, =1.2, q=1.2
classification scheme and the
achieved generalization error are
independent of the prior weights p

(and optimal for p

= 1/2 )
LVQ+ VQ within the classes
(w
s
updated only from class S)
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
- LVQ 2.1.
trivial assignment to the
more frequent class
optimal
classification

g
p
+
min {p
+
,p
-
}
- LVQ 1
here: close to optimal
classification
p
+
- LVQ+
min-max solution
p

-independent classification
p+=0.2, =1.0, q=1.0

g

learning curves

LVQ+
LVQ1

asymptotics: 0, ()
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Vector Quantization
competitive learning | | ( )
1 -
s

S

S
1 -
s

s
d d
N

w w w O + =

w
s
winner
class membership is unknown
or identical for all data
numerical integration for w
s
(0)0
( p
+
=0.2, =1.0, q=1.2 )

g


VQ
LVQ+
LVQ1


R
++

R
+-

R
-+

R
--

100 200 300 0
0
1.0
system is invariant under
exchange of the prototypes
weakly repulsive fixed points
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
interpretations:
- VQ, unsupervised learning
unlabelled data
- LVQ, two prototypes of the
same class, identical labels
- LVQ, different classes, but
labels are not used in training

g
p
+
asymptotics (o,q0, qo)
p
+
0

p
-
1

- low quantization error
- high gen. error
g

The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
work in progress, outlook
regularization of LVQ 2.1, Robust Soft LVQ [Seo, Obermayer]
model: different cluster variances, more clusters/prototypes
optimized procedures: learning rate schedules,
variational approach / density estimation / Bayes optimal on-line
several classes and prototypes

Summary
prototype-based learning
Vector Quantization and Learning Vector Quantization
a model scenario: two clusters, two prototypes
dynamics of online training
comparison of algorithms:
LVQ 2.1.: instability, trivial (stationary) classification
LVQ 1 : close to optimal asymptotic generalization
LVQ + : min-max solution w.r.t. asymptotic generalization
VQ : symmetry breaking, representation
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Perspectives
Self-Organizing Maps (SOM)
(many) N-dim. prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
Generalized Relevance LVQ [Hammer & Villmann]
adaptive metrics, e.g. distance measure ( )

=
=
N
i
i i i
w
1
2
) (
s
w, d
training
applications

Vous aimerez peut-être aussi