Algoritma LVQ

The Dynamics of Learning Vector Quantization, RUG, 10.01.
2005
The Dynamics of Learning Vector Quantization
Rijksuniversiteit Groningen
Mathematics and Computing Science
Michael Biehl, Anarta Ghosh
TU Clausthal-Zellerfeld
Institute of Computing Science
Barbara Hammer
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Vector Quantization (VQ)
Learning Vector Quantization (LVQ)
Introduction

The dynamics of learning
a model situation: randomized data
learning algorithms for VQ und LVQ
analysis and comparison: dynamics, success of learning
Summary
Outlook
prototype-based learning from example data:
representation, classification
Vector Quantization (VQ)
aim:
representation of large amounts
of data by (few) prototype vectors
example:
identification and grouping
in clusters of similar data
assignment of feature vector
to the closest prototype w
(similarity or distance measure,
e.g. Euclidean distance )
unsupervised competitive learning
initialize K prototype vectors
present a single example
identify the closest prototype,
i.e the so-called winner
move the winner even
closer towards the example

intuitively clear, plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function ...
quantization error
( ) ( )
k
K
j k
P
1
j
K
1 j
VQ
d d
2
w H =
[
= = =
j
d prototypes data
w
j
is the winner !
here:
Euclidean distance
aim: faithful representation (in general: clustering )
Result depends on - the number of prototype vectors
- the distance measure / metric used
Learning Vector Quantization (LVQ)
aim:
classification of data
learning from examples
Learning: choice of prototypes according to example data
example situtation:
3 classes
classification:
assignment of a vector
to the class of the closest
prototype w
, 3 prototypes

aim : generalization ability, i.e. correct classification
of novel data after training
prominent example [Kohonen]: LVQ 2.1.
present a single example
initialize prototype vectors
(for different classes)
identify the closest correct
and the closest wrong prototype

move the corresponding winner
towards / away from the example

known convergence / stability problems,
e.g. for infrequent classes
mostly: heuristically motivated variations of competitive learning
LVQ algorithms ...
- are frequently applied in a variety of problems involving
the classification of structured data, a few examples:
- appear plausible, intuitive, flexible
- are fast, easy to implement
- real time speech recognition
- medical diagnosis, e.g. from histological data
- texture recognition and classification
- gene expression data analysis
- . . .
illustration: microscopic images of (pig) semen cells after freezing
and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain
healthy cells
damaged cells
prototypes
obtained
by LVQ (1)
illustration: microscopic images of (pig) semen cells after freezing
and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain
LVQ algorithms ...
- are often based on purely heuristic arguments,

or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure,
inappropriate for heterogeneous data
- lack, in general, a thorough theoretical understanding of
dynamics, convergence properties,
performance w.r.t. generalization, etc.
In the following:
analysis of LVQ algorithms w.r.t.
- dynamics of the learning process
- performance, i.e. generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized, high-dimensional data
- essential features of LVQ learning
aim: - contribute to the theoretical understanding
- develop efficient LVQ schemes
- test in applications
model situation: two clusters of N-dimensional data
random vectors
N
according to
) P( p ) P(
1

=
=
( )
( )
(
=
2
N/2
-
2
1
exp
2
1
) P( mixture of two Gaussians:
orthonormal center vectors:
B
+
, B
-

N
, ( B
o
)
2
=1, B
+
B
-
=0
prior weights of classes p
+
,

p
-

p
+
+ p
-
= 1
B
+
B
-
(p
+
)
(p
-
)
separation

j
j
B
,
=
2 2 2
2
2
N 1
N
1
+ = = =

= j
j j j

independent components:
high-dimensional data (formally: N)
400 examples

N
, N=200, =1, p
+
=0.6

B

y

(240)
(160)
projections into the plane of
center vectors B
+
,

B
-

B y

=
+ +

2

2

x

=

(240)
(160)
projections in two independent
random directions w
1,2

1 1
x
w
=
model for studying typical behavior of LVQ algorithms,
not: density-estimation based classification
Note:
dynamics of on-line training
sequence of independent random data ( ) ... 1,2,3,
= acc. to ( )
P
learning rate,
step size
competition,
direction of
update etc.
change of prototype
towards or away
from the current data
above examples:
unsupervised Vector Quantization | | ( ) d d f

s
s s
O =

...
The Winner Takes It All
(classes irrelevant/unknown)
Learning Vector Quantization 2.1. | | { S f
s
) ( 1
) ( 1
...
class correct
class wrong
+
= =
here: two prototypes, no
explicit competition
| | ( )
1 -
s

s -
s s
1 -
s
s
, S, d d f
N
w w w + = ... , ,
( )
2
1
=
=
s

s
d
1 S,
w
update of prototype vectors:
| | ( )
| | ( ) | | ( ) | | | | ( )
1
... f ... f Q x ... f Q x ... f
1/N
Q Q
R y ... f
1/N
R R
t s
1 -
st
s t
1 -
st
t
s
1 -
st
st
1 -
s
s
1 -
s
s
+ + + =
2
| | ( )
1 -
s

s -
s s
1 -
s
s
, S, d d f
N
w w w + = ... , ,
recursions
mathematical analysis of the learning dynamics
( ) ( )
1
2 2
1 -
ss
s

s

s
Q 2x d + = =

w
1 -
s
s
B y x = =
t t
w
projections
distances
random vector

enters only in the form of
{ } ( ) 1 , 1 + e = = t, s,
st
s
Q B R w w w
projections in
the (B
+,
B
-
)-plane
length and relative
position of prototypes
1. description in terms of a few characteristic quantitities
( here:
2N
7
)

s
N
1 j
j , j s,
N
1 j
j s,
s
R x = = =

= =
B w w
j
completely specified in terms of first and second moments (w/o indices )

in the thermodynamic limit N
random vector acc. to ) | P(

1 -
s
s
B y
w x
=
=
t t

correlated Gaussian
random quantities
st
t s
Q x x - x x =
t t t s

s
s
R y x - y x =
t t t
o y y - y y

=
= =
=
else
if
s
y
0
S
o
t
2. average over the current example
averaged recursions closed in { R
s
, Q
st
} p

1

=
=
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here: N
-1
)
st
R , Q
learning dynamics is completely described in terms of averages
3. self-averaging properties
4. continuous learning time
N
=
# of examples
# of learning steps
per degree of freedom
) ( R ), ( Q
s st
recursions coupled, ordinary differential equations
evolution of projections
probability for misclassification of a novel example
( ) ( )

O +
+
O =
+ + +
d d p d d p
g
( ) ( )
(
(
+
(
(
u =

+
+
+ +
+
+ +

+
+
+ +
+
+ +
+ +
+
Q Q Q
R R 2 Q Q
Q Q Q
R R 2 Q Q
p p
2 2 2
1
2
1

5. learning curve
generalization error
g
() after training with N examples
N
- repulsive/attractive fixed points of the dynamics
- asymptotic behavior for o
- dependence on learning rate, separation, initialization
- ...
investigation and comparison of given algorithms
- time-dependent learning rate ()
- variational optimization w.r.t. f
s
[...]
- ...
optimization and development of new prescriptions
maximize
g
d
d
optimal classification with minimal generalization error
B
-
B
+
(p
-
>p
+
)
(p
+
)
separation of classes by the plane with
1) P( p 1) P( p + = = =
+

in the model situation (equal variances of clusters):
excess error
minimal
g
as a function
of prior weights
=2
g
0.25
0.50
0
0.5
1.0 0 p
+
=1
=0

LVQ 2.1. update the correct and wrong winner

( )
1 -
s
1 -
s
s
S
N
w w w + =
(analytical)
integration
for w
s
(0) = 0
( ) ( )
( ) ( )

m m
m m
e 1
2
m 1
m
R e 1
2
m 1
m
R
Q e 1
2
m 1
m
R e 1
2
m 1
m
R
+
+
+ +
+ +
+
=
+
=
=
=
+
=
p
o
= (1+m o ) / 2 (m>0)
[Seo, Obermeyer]: LVQ2.1. cost function
(likelihood ratios)

+ +
+ + + + +
Q Q R R
Q , R , R
with , ,
finite remain
,
Q
+ +
R
+ +
R
+
R
+
Q
+
Q

R

10 2 4 8 6
6 -
0
6
theory and simulation (N=100)
p
+
=0.8, =1, q=0.5
averages over 100 independent runs
(p
-
)
(p
+
> p
-
)
strategies:
- selection of data in a window close to
the current decision boundary
slows down the repulsion,
system remains instable
- Soft Robust Learning Vector Quantization [Seo & Obermayer]
density-estimation based cost function
limiting case Learning from mistakes: LVQ2.1-step only,
if the example is currently misclassified
slow learning, poor generalization
problem: instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fr :
g
= max { p
+
,p
-
}
The winner takes it all
numerical
integration
for w
s
(0)=0

theory and simulation (N=200)
p
+
=0.2, =1.2, q=1.2
averaged over 100 indep. runs
Q
++

Q
--

Q
+-

w
+
w
-
B
+
B
-
trajectories in the (B
+
,B
-
)-plane
() o=20,40,....140
....... optimal decision boundary
____ asymptotic position
R
S
+

R
S-

R
--
R
-+
R
--
R
++
winner w
s
1
I) LVQ 1 [Kohonen] | | ( )
1 -
s

S
S
1 -
s
s
S d d
N
w w w O + =

only the winner is updated
according to the class membership
w
-
learning curve
o
g

q=1.2
(p+=0.2, =1.2)
g
() grows lin. with
- stationary state:
- role of the learning rate

100 200 300
g

0.26
0.22
0.18
0.14
0

2.0
0.4
0.2
0
- variable rate () !?
- well-defined asymptotics:

(ODE linear in )

10
g

20 30 40 50
0
0.14
0.26
0.22
0.18
min.
g
( )
0
0,
( )
suboptimal
The winner takes it all
II ) LVQ+ ( only positive steps without repulsion)
| | ( ) ( )
1 -
s
S
1 -
s
s
d d
N
w w w O + =

,

winner correct
asymptotic configuration
symmetric about (B
+
+B
-
)/2
w
-
w
+
B
+
B
-
p+=0.2, =1.2, q=1.2
classification scheme and the
achieved generalization error are
independent of the prior weights p

(and optimal for p
= 1/2 )
LVQ+ VQ within the classes
(w
s
updated only from class S)
- LVQ 2.1.
trivial assignment to the
more frequent class
optimal
classification
g
p
+
min {p
+
,p
-
}
- LVQ 1
here: close to optimal
classification
p
+
- LVQ+
min-max solution
p
-independent classification
p+=0.2, =1.0, q=1.0
g

learning curves

LVQ+
LVQ1

asymptotics: 0, ()
Vector Quantization
competitive learning | | ( )
1 -
s

S
S
1 -
s
s
d d
N
w w w O + =

w
s
winner
class membership is unknown
or identical for all data
numerical integration for w
s
(0)0
( p
+
=0.2, =1.0, q=1.2 )
g

VQ
LVQ+
LVQ1

R
++

R
+-

R
-+

R
--

100 200 300 0
0
1.0
system is invariant under
exchange of the prototypes
weakly repulsive fixed points
interpretations:
- VQ, unsupervised learning
unlabelled data
- LVQ, two prototypes of the
same class, identical labels
- LVQ, different classes, but
labels are not used in training
g
p
+
asymptotics (o,q0, qo)
p
+
0

p
-
1

- low quantization error
- high gen. error
g

work in progress, outlook
regularization of LVQ 2.1, Robust Soft LVQ [Seo, Obermayer]
model: different cluster variances, more clusters/prototypes
optimized procedures: learning rate schedules,
variational approach / density estimation / Bayes optimal on-line
several classes and prototypes

Summary
prototype-based learning
Vector Quantization and Learning Vector Quantization
a model scenario: two clusters, two prototypes
dynamics of online training
comparison of algorithms:
LVQ 2.1.: instability, trivial (stationary) classification
LVQ 1 : close to optimal asymptotic generalization
LVQ + : min-max solution w.r.t. asymptotic generalization
VQ : symmetry breaking, representation
Perspectives
Self-Organizing Maps (SOM)
(many) N-dim. prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
Generalized Relevance LVQ [Hammer & Villmann]
adaptive metrics, e.g. distance measure ( )
=
=
N
i
i i i
w
1
2
) (
s
w, d
training
applications

Algoritma LVQ

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Algoritma LVQ

Transféré par

Droits d'auteur :

Formats disponibles

The Dynamics of Learning Vector Quantization, RUG, 10.01.

completely specified in terms of first and second moments (w/o indices )

Vous aimerez peut-être aussi