4 Retro Propagation

Les rseaux de neurones pour lapprentissage
ESSEC, le 28 Juin 2002

Rseaux de neurones artificiels

la rtropropagation du gradient

S. Canu,
laboratoire PSI, INSA de Rouen
quipe systmes dinformation pour
lenvironnement

asi.insa-rouen.fr/~scanu
ESSEC, le 28 Juin 2002 Histoire
1940 : La machine de Turing
1943 : Le neurone formel (McCulloch & Pitts)
1948 : Les rseaux d'automates (Von Neuman)
1949 : Premire rgle dapprentissage (Hebb)
1958-62 : Le perceptron (Rosenblatt)
1960 : L'adaline (Widrow & Hoff)
1969 : Perceptrons (Minsky & Papert)
les limites du Perceptron
besoin d'architectures + complexes,
Comment effectuer l'apprentissage ? On ne sait pas !
1974 : Rtropropagation (Werbos)
pas de succs !?!?
ESSEC, le 28 Juin 2002 Histoire (suite)
1986 : Rtropropagation (Rumelhart & McClelland)
nouvelles architectures de Rseaux de Neurones
applications :
- reconnaissance de lcriture
- reconnaissance/synthse de la parole
- vision (traitement dimages)

1990 : Socit de lInformation
nouvelles applications
- recherche/filtrage dinformation dans le Web
- extraction dinformation / veille technologique
- multimedia (indexation, )
- data mining
besoin de combiner diffrents modles
ESSEC, le 28 Juin 2002 Plan
Rappels :
Moindres carrs stochastiques
Algorithmes de gradient
Perceptron Multicouches

Principe de la rtropropagation

Algorithmes de rtropropagation
impossible ( ! ) mthode itrative :
w
init
Rpter
w
new
= w
old
- A
Tant qu il reste des mals classs ou que le cot nvolue plus
( ) ( )
( )
( ) ( ) ( ) A = =
c
c
=
c
c

= =
X WX ' y WX 2
W
WX
y WX 2
W
) W ( J
n
1 i
i
n
1 i
i
0
W
J
=
c
c
Algorithme itratif de gradient
( ) ( ) b x ' w ) x ( D et y ) x ( D ) D ( J
i i
n
1 i
2
i i
+ = =
=
( ) ( ) ( ) ( )

= =
= + =
n
1 i
2
i
n
1 i
2
i i
y WX y b x ' w ) b , w ( J
Moindres carrs stochastiques
ADALINE (Widrow Hoff 1960)
Le gradient est orthogonal aux lignes
diso-cot : argument la Taylor
Algorithme de gradient
+
Minimum du cot
Lignes diso-cot :
J(W) = constante
Direction du gradient
J(W)
w
1
w
2
Illustration dans le plan (w
1
,w
2
)

ESSEC, le 28 Juin 2002 Algorithme de gradient
Illustration dans le plan (J(w),w) : la descente
de gradient

w
J(w)
Minimum du cot
Direction du gradient
J(W)
Le gradient :

Approximation linaire (Adaline)

Perceptron : =1

Neurone formel : on remplace
par une fonction drivable
ex : (x)=th(x) fonction sigmode
( ) ( ) ( )
i i
n
1 i
i i
x Wx ' y Wx 2
W
) W ( J
=
c
c
=
3 solutions
( ) ( ) ( )
i i
n
1 i
i i
x Wx ' y Wx 2
W
) W ( J
=
c
c
=
( ) ( )
i
n
1 i
i i
x y Wx 2
W
) W ( J
=
=
c
c
( ) ( )
i
n
1 i
i i
x y Wx 2
W
) W ( J
=
=
c
c
ESSEC, le 28 Juin 2002 Perceptron Multi-Couches
Rseau feedforward
(1986)

Fonction de transfert tanh(.) (sauf couche de sortie
linaire)
Mthode dapprentissage (supervis) usuelle :
rtropropagation du gradient
x
1

x
i

x
n0

y
ESSEC, le 28 Juin 2002 Notations
Biais :
avec x
0
=1

idem pour toutes les couches (ex : PMC une couche cache)

W
1
=[w
ji
]
W
2
=[w
kj
]

j=1:n
1
k=1:n
2

( )
|
|
.
|
\
|
=
|
|
.
|
\
|
+ =

= =
i
n
0 i
ji i
n
1 i
ji
1
j
x w f b x w f y
0 0
i=1:n
0

x
i

x
0
=1
(1)
x
0
=1
y
k

w
ji
w
kj

ESSEC, le 28 Juin 2002 Propagation
Calcul des sorties du rseau en propageant
les valeurs de x de couche en couche :
( )
i
n
0 i
ji
1
j
x w a
0
=
=
1
1
( ) ( )
=
=
1
n
0 j
1
j kj
2
k
x w a
2
2
( )
( )
) 1 (
j
1
j
a f x =
( )
) 2 (
k k
a g y =
w
ji
w
kj

x
i

x
j

(1)
y
k

j=1:n
1
k=1:n
2
i=1:n
0

ESSEC, le 28 Juin 2002 Algorithme de propagation
Function y = propag(x,w1,w2)

a1 = [x ones(n,1)]*W1;
x1 = tanh(a1);
a2 = [x1 ones(n,1)]*W2;
y = a2;

Paralllis sur les exemples
(si x est une matrice, a marche !)
ESSEC, le 28 Juin 2002 Calcul de l erreur
Fonction de cot :
on prsente un exemple x=[x
1
... x
n
0
] (avec y
des
sortie dsire)
on calcule la sortie correspondante y =[y
1
... y
n
2
]

erreur :

cot associ l exemple :

cot global :

k
des
k k
y y e =
=
=
2
n
1 k
2
k ) exemple (
e
2
1
J
=
=
n
1 l
) l exemple (
J J
ESSEC, le 28 Juin 2002 Calcul du gradient
Mise jour de w
ji
et w
kj
selon une rgle delta:

Problme = calcul de et

w
J
w
c
c
q = A
ji
w
J
c
c
kj
w
J
c
c
( )
=
=
2
n
1 k
2
k
des
k
y y
2
1
J
Couche de sortie
Calcul de pour un exemple fix

posons

kj
w
J
c
c
( )
k
des
k
y y
( )
( )
kj
2
k
2
k
k
k kj
w
a
a
y
y
J
w
J
c
c
c
c
c
c
=
c
c
w
kj

x
j

(1)
y
k

j=1:n
1
k=1:n
2

( ) ( )
=
=
1
n
0 j
1
j kj
2
k
x w a
( )
) 2 (
k k
a g y =
( ) 1
j
x ( ) 1 a g
) 2 (
k
= '
( ) 1
j k
kj
x . Err
w
J
=
c
c
( )
( )
( )
( )
2
k k
des
k
2
k
k
a g y y
a
J
Err ' =
c
c

ESSEC, le 28 Juin 2002 Couche cache
Calcul de pour un exemple fix

ji
w
J
c
c
( )
( )
( )
( )
ji
1
j
1
j
1
j
1
j
ji
w
a
a
x
x
J
w
J
c
c
c
c
c
c
=
c
c
i
x
( )
) 1 (
j
a f '
( )
i
n
0 i
ji
1
j
x w a
0
=
= ( )
( )
) 1 (
j
1
j
a f y =
w
ji

x
i

j=1:n
1

i=1:n
0

( ) 1
j
x
( ) ( )
( )
( )
=
c
c
c
c
=
c
c
2
n
0 k
1
j
2
k
2
k
1
j
x
a
a
J
x
J
( )

=
=
c
c
2
n
0 k
kj k
1
j
w Err
x
J
( )
( )
( )
1
j
n
0 k
kj k
1
j
j
a f w Err
a
J
Err
2
'
|
|
.
|
\
|
=
c
c

=
i j
ji
x . Err
w
J
=
c
c
ESSEC, le 28 Juin 2002 Algorithme de rtropropagation
Function grad = retropropag(x,yd,w1,w2)
...
a1 = [x ones(n,1)]*W1; x1 = tanh(a1);
a2 = [x1 ones(n,1)]*W2; y = a2;

ERRk = -(yd-y).*(1-y.*y);
GradW2 = [x1 ones(n,1)]'* ERRk ;
ERRj = (w2(1:n2-1,:)*ERRk')'.*(1-x1.*x1);
GradW1 = [x ones(n,1)]'* ERRj ;

w1 = w1 - pas1 .* GradW1;
w2 = w2 - pas2 .* GradW2;
ESSEC, le 28 Juin 2002 Exemple 1/4
x

= [0.5 1] y
des
= [0.5 1]
W
1
=[0.5 0.5 ; 0.5 0.5] (pas de biais)
W
2
=[1 1 ; 1 1]

x
1
= 0.5
x
2

(1)
y
1
= 1.2703

n
1
=2 n
2
=2 n
0
=2
x
2
= 1
x
1

(1)
y
2
= 1.2703
a
(1)
=[0.75 0.75]
x
(1)
=[0.6351 0.6351]

a
(2)
=[1.2703 1.2703]
y = [1.2703 1.2703]

x
1
= 0.5
x
2

(1)
err
1
= 0.7703

n
1
=2 n
2
=2 n
0
=2
x
2
= 1
x
1

(1)
err
2
= 0.2703
ERRk = [0.7703 0.2703]
GradW2 = [0.4893 0.1717 ; 0.4893 0.1717]
ERRj = [0.6208 0.6208]
GradW1 =[0.3104 0.3104 ; 0.6208 0.6208]
MAJ de W1 et W2

Nouvelle propagation, etc...

x
1
= 0.5
x
2

(1)
y
1

= 0.5242

n
1
=2 n
2
=2 n
0
=2
x
2
= 1
x
1

(1)
y
2

= 0.6344
w1 =[0.3448 0.3448 ; 0.1896 0.1896]
w2 =[0.7554 0.9142 ; 0.7554 0.9142]
y = [0.5242 0.6344]
Evolution de y
1
et y
2

0 5 10 15
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
y
1
y
2
ESSEC, le 28 Juin 2002 Gradient batch / squentiel
2 faons d appliquer lalgorithme de
rtropropagation :

batch :
mise jour des poids aprs la prsentation de tous les
exemples
calculs et stockage plus lourds si trop d exemples

squentiel : (on-line, stochastique)
mise jour des poids aprs chaque exemple
besoin de tirer l exemple au hasard
problmes de convergence

2 faons d appliquer lalgorithme de
rtropropagation :

batch :
mise jour des poids aprs la prsentation de tous les
exemples
calculs et stockage plus lourds si trop d exemples

squentiel : (on-line, stochastique)
mise jour des poids aprs chaque exemple
besoin de tirer l exemple au hasard
problmes de convergence

Gradient batch / squentiel
Moins de 5000 exemples,
Matlab
plus de 5000 exemples
SNNS, SN, du C
ESSEC, le 28 Juin 2002 Pas dapprentissage
Pas dapprentissage :
trop petit = convergence lente vers la solution
trop grand = risque doscillations

heuristiques courantes :
diminuer le pas dapprentissage au fur et a mesure
la main
en fonction de la forme de la surface d erreur
approximations :
premier ordre
Rtropropagation avec un moment dinertie
Delta-Bar-Delta, Rprop, ...
second ordre
Quickprop
Levenberg Marquard
Moment dinertie (Rumelhart et al. 1986)

avec |o|<1

Delta-Bar-Delta (Jacobs 1988)
calcul d un gradient moyen
modification du pas dapprentissage selon la direction du
gradient par rapport au gradient moyen

Premier ordre 1/2
( ) ( ) 1 t w x . Err . t w
ji i j ji
A o + q = A
( ) ( ) ( ) 1 t x . Err 1 t
ji i j ji
|o + | = o
( )
( ) ( ) ( )
( ) ( ) ( )
( )
q
< o q
> o + q
= + q
on sin t
0 1 t . t x . Err si d t
0 1 t . t x . Err si u t
1 t
ji
ji i j ji
ji i j ji
ji
( ) ( )
i j ji ji
x . Err . t t w q = A
on acclre
on freine
ESSEC, le 28 Juin 2002 Premier ordre 2/2
Rprop (Riedmiller and Braun 1993)
modification du pas dapprentissage selon la direction du
gradient par rapport au gradient prcdent
on borne le pas d apprentissage

un poids nest modifi que s il va dans le bon sens
( )
( ) ( ) ( )( )( )( )
( ) ( ) ( )( )( )( )
( )
q
< q q
> q q
= + q
on sin t
0 1 t x . Err t x . Err si , d t max
0 1 t x . Err t x . Err si , u t min
1 t
ji
i j i j min ji
i j i j max ji
ji
on acclre
on freine
( )
( ) ( ) ( )( )( )( )
> q
= + A
on sin 0
0 1 t x . Err t x . Err si x . Err sgn t
1 t w
i j i j i j ji
ji
ESSEC, le 28 Juin 2002 Second ordre 1/2
Dveloppement de Taylor de la fonction de cot :

H = matrice Hessienne,
le Hessien de du cot

Calcul du gradient :

on cherche h / le gradient soit nul
( ) ( )h w h
2
1
h
w
J
) w ( J h w J H ' +
'
|
.
|
\
|
c
c
+ ~ +
(
(
(
(
(
(
(
(
(
c
c
c c
c
c
c
c c
c
c c
c
c c
c
c
c
=
2
n
2
1 n
2
2
2
2
1 2
2
n 1
2
2 1
2
2
1
2
w
J
w w
J
w
J
w w
J
w w
J
w w
J
w
J

H
( ) ( ) w h ) w (
w
J
h w
w
J
H ' +
c
c
~ +
c
c
( ) ) w (
w
J
w h w
1
c
c
= = A

H
Problme = calcul de H
-1

ESSEC, le 28 Juin 2002 Second ordre 2/2
Approximation du Hessien :
hessien = matrice diagonale

Quickprop (Fahlman 1988)
on vite de calculer H

Il existe dautres mthodes qui calculent (partiellement)
les informations du 2nd ordre
mthodes de gradient conjugu
2
2
w
J
w
J
) t ( w
c
c
c
c
= A
( ) ( )
) t ( w
) t ( w
w
J
) 1 t ( w
w
J
w
J
2
2
A
c
c
+
c
c
~
c
c
( )
( ) ( ) ) 1 t ( w
w
J
) t ( w
w
J
) t ( w
w
J
) 1 t ( w ) t ( w
c
c
c
c
c
c
A = A
ESSEC, le 28 Juin 2002 Conclusion
La rtropropagation est une mthode de gradient

on a un problme doptimisation rsoudre,..
. Et tous les coups sont bon !

On a un problme doptimisation non linaire convexe
si la fonction cot est quadratique

Soft : matlab (petits problmes) - SN (gros problmes)
ESSEC, le 28 Juin 2002 Bibliographie
Neural Networks : a comprehensive
foundation - S. Haykin (Prenctice Hall)
Neural Networks : a systematic introduction
R. Rojas (Springer)
The Handbook of Brain Theory and Neural
Networks - M.A. Arbib (MIT Press)
Self-Organizing Maps - T. Kohonen
(Springer)
Rseaux Neuronaux et Traitement du
Signal - J. Hrault & C. Jutten (Herms)
Backpropagator s review :
des informations sur la rtropropagation
http://www.dontveter.com/bpr/bpr.html
un petit tutoriel :
http://www.dontveter.com/bpr/public2.html

4 Retro Propagation

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

4 Retro Propagation

Transféré par

Droits d'auteur :

Formats disponibles

Les rseaux de neurones pour lapprentissage

ESSEC, le 28 Juin 2002

Les rseaux de neurones pour lapprentissage

Vous aimerez peut-être aussi