Académique Documents
Professionnel Documents
Culture Documents
E (d i oi ) (d i f ( wi x)) 2
2 t
2 2
E (d i oi ) f ' ( wi x) x
t
net 2 1.948
Example 2
x1 2
0.982
x d 1 0.1
0.5 0
4 -3.93
d=t x2
1
w x o=1/(1+exp(-0.04)) = 0.51
Error=1-0.51=0.49
(1 .51)(1 .51)(.51) .1225
w * * 0.982 0.1* .1225 * .982 0.012 4+0.1*0.1225*.5=4.0061
wnew wold .012 2 .012 2.012 -3.93+.1*.1225*1=-3.9178
net = 2.012*0.982+4.0061*0.5-3.9178*1=0.061
Error=1-0.5152=0.4848
o=1/(1+exp(-0.061)=0.5152
By chain rule:
∂E ∂E ∂oi ∂xi
---- = ---- ---- ----
∂wij ∂oi ∂xi ∂wij
∂E
---- = (1/2) 2 (di - oi) (-1) = (oi - ti) E = 1/2 ∑ (di - oi)2
∂oi i
∂oi ∂
---- = ---- [1 / (1 + e-xi)] = - [1 / (1 + e-xi)2] (- e-xi ) = e-xi / (1 + e-xi)2
∂xi ∂xi
(1 + e-xi) - 1 1
= ------------- • ----------- = [1 - 1 / (1 + e-xi)] • [1 / (1 + e-xi)]
(1 + e-xi) (1 + e-xi)
= (1 - oi) oi
∂xi
---- = aj xi = ∑ wijaj
∂wij j
∂E ∂E ∂oi ∂xi
---- = ---- ---- ----
∂wij ∂oi ∂xi ∂wij
∂E
Δwij = - η ----- (where η is an arbitrary learning rate)
∂wij
d
F ( x ) = F ( x* ) + F (x) ( x – x* )
dx x = x*
2
1 d 2
+ --- F (x) ( x – x* ) +
2 d x2
x = x*
n
1 d
( x – x* ) +
n
nnd8ts + ----- F (x)
n! d x n
x = x*
Example
–x
F( x ) = e
–x –0 –0 1 –0 2 1 –0 3
F (x ) = e = e – e ( x – 0 ) + ---e ( x – 0 ) – -- e ( x – 0 ) +
2 6
1 2 1 3
F ( x ) = 1 – x + -- x – --- x +
2 6
F ( x ) F0 ( x ) = 1
F ( x) F 1 ( x) = 1 – x
1 2
F ( x ) F 2 ( x ) = 1 – x + --- x
2
Plot of Approximations
6
F2 ( x )
3
2 F1 ( x )
1
F0 ( x )
-2 -1 0 1 2
Vector Case
F ( x) = F ( x1 x 2 x n )
F ( x ) = F ( x* ) + F (x ) ( x 1 – x 1* ) + F (x ) ( x 2 – x 2* )
x1 x = x * x2 x=x *
2
1 2
+ + F (x ) ( x – x * ) + --
- F ( x ) ( x – x * )
xn x* x = x*
n n 2 x2 1 1
x =
1
2
1
+ --- F (x ) *
( x 1 – x 1* ) ( x 2 – x 2* ) +
2 x 1 x 2 x = x
Matrix Form
T
F ( x ) = F ( x* ) + F ( x ) ( x – x* )
x = x*
1 T
+ --- ( x – x * ) 2F ( x ) ( x – x* ) +
2 x = x*
Gradient Hessian
2 2 2
F (x ) F (x ) F (x )
F (x ) 2
x1 1 2
x x 1 n
x x
x1
2 2 2
F (x ) F (x ) F (x ) F (x )
F ( x ) = x2 (x ) =
2F
2 1
x x 2
x2 2 n
x x
2 2 2
F (x )
xn F (x ) F (x ) F (x )
n 1
x x n 2
x x 2
xn
Directional Derivatives
First derivative (slope) of F(x) along xi axis: F ( x ) xi
2 2
Second derivative (curvature) of F(x) along xi axis: F (x ) x i
T
p F ( x )
First derivative (slope) of F(x) along vector p: -----------------------
p
T
Second derivative (curvature) of F(x) along vector p: p 2 F ( x ) p
------------------------------
2
p
Example
2 2
F (x ) = x 1 + 2x 1 x2 + 2 x2
x* = 0.5 p = 1
0 –1
F( x )
x1 2x 1 + 2x 2 1
F ( x) = = =
x = x* 2x 1 + 4x 2 1
F( x )
x2 x = x*
x = x*
1
T 1 – 1
p F ( x ) 1 0
----------------------- = ------------------------ = ------- = 0
p 1 2
–1
Plots
Directional
Derivatives
2
20
15
1
1.4
10
1.3
5
x2 0 1.0
0 0.5
2
1 2
-1
0.0
0 1
0
-1
x2 -2 -2
-1
x1
-2
-2 -1 0 1 2
x1
nnd8dd
Minima
Strong Minimum
The point x* is a strong minimum of F(x) if a scalar > 0 exists, such that F(x*) <
F(x* + x) for all x such that > ||x|| > 0.
Global Minimum
Weak Minimum
The point x* is a weak minimum of F(x) if it is not a strong minimum, and a scalar > 0
exists, such that F(x*) F(x* + x) for all x such that > ||x|| > 0.
Scalar Example
4 2 1
F ( x ) = 3x – 7x – --- x + 6
2
8
Strong Maximum
6
2 Strong Minimum
Global Minimum
0
-2 -1 0 1 2
Quadratic Functions
1 T T
F ( x ) = -- x Ax + d x + c (Symmetric A)
2
F( x ) = Ax + d
2F ( x ) = A
• If the eigenvalues of the Hessian matrix are all positive,
the function will have a single strong minimum.
• If the eigenvalues are all negative, the function will
have a single strong maximum.
• If some eigenvalues are positive and other eigenvalues
are negative, the function will have a single saddle
point.
• If the eigenvalues are all nonnegative, but some
eigenvalues are zero, then the function will either have
a weak minimum or will have no stationary point.
• If the eigenvalues are all nonpositive, but some
eigenvalues are zero, then the function will either have
a weak maximum or will have no stationary point.
Stationary point nature summary
xT Ax i Definiteness H Nature x*
0 Positive d. Minimum
Indefinite Saddlepoint
0
0 Negative semi-d. Ridge
0 Negative d. Maximum
Steepest Descent
2 2
F ( x ) = x1 + 2 x1 x 2 + 2x 2 + x1
x 0 = 0.5 = 0.1
0.5
F( x )
x1 2x 1 + 2x2 + 1 g0 = F (x ) = 3
F ( x ) = = x= x0
2x 1 + 4x 2 3
F( x )
x2
-1
-2
-2 -1 0 1 2
F( x ) = Ax + d
x k + 1 = xk – gk = x k – ( Ax k + d ) xk + 1 = I – A x k – d
Stability is determined
by the eigenvalues of
this matrix.
I – A zi = z i – Az i = z i – iz i = ( 1 – i) z i
Stability Requirement:
2 2
( 1 – i) 1 ---- ------------
i max
Example
0.851 0.526
A= 22 (
1 = 0.764) z
1 =
2 = 5.24 z
2 =
24 – 0.526 0.851
2 2
------------ = ---------- = 0.38
max 5.24
= 0.37 = 0.39
2 2
1 1
0 0
-1 -1
-2 -2
-2 -1 0 1 2 -2 -1 0 1 2