Vous êtes sur la page 1sur 21

Back Propagation: Variations

Neuron #1 x'1 X1 Neuron #2 . . X


n

x'2 x'i

Neuron #1 . . Neuron #p

Y 1(m+1)

Input

. . Neuron #k

Output
Y p(m+1)

Layer 0 Layer m

Layer m+1

BP- Improvements
Second order derivatives (Parker, 1982) Dynamic range modification
and Huberman, 1987) (Stronetta

F(x) = -1/2 + 1/1+e-x Meta Learning (Jacobs, 1987; Hagiwara, 1990) Selective updates (Huang and Huang, 1990)

BP- Improvements (Cont.)


Use of Momentum weight change
(Rumelhart, 1986) wkmi(t+1) = * km * xi(t) + wkmi(t)

Exponential smoothing (Sejnowski and


Rosenburg, 1987) wkmi(t+1) = (1) * km * xi(t) + wkmi(t)

BP- Improvements (Cont.)


Accelerating the BP algorithm (Kothari,
Klinkhachorn, and Nutter, 1991)

Gradual increase in learning accuracy


Without incorporating the disadvantage of increased network size, more complex neurons or otherwise violating the parallel structure of computation

Gradual increase in learning accuracy


Temporal instability Absence of tru direction of descent
Void Acc_BackProp (Struct Network *N, struct Train_Set *T) { Assume_coarse_error () while ( < Eventual_Accuracy) { while (not_all_trained) { Present_Next_Pattern; while (!Trained) Train_Pattern; } Increase_Accuracy ( -= Step); } }

Training with gradual increase in accuracy


Direction of descent suggested by examplar 1 Direction of descent suggested by examplar 2 Direction of descent suggested by examplar 3

Direction of Steepest descent

...

Direction of descent suggested by examplar M

Error VS Trainning Passes


12

Overall error

10

BP BPGIA BP+Mom BPGIA+Mom

0 0 10000 20000 30000 Training passes 40000 50000 60000

Minimization of the error for a 4 bit 1's complementor (Graph has been curtailed to show detail)

Error VS Trainning Passes


5

Overall error

BP BPGIA BP+Mom BPGIA+Mom

0 0 100000 200000 Training passes 300000

Minimization of the error for a 3-to-8 Decoder

Error VS Trainning Passes


0.8

0.6 Overall error

BP BPGIA BP+Mom BPGIA+Mom

0.4

0.2

0.0 0 100000 Training passes 200000

Minimization of the error for the Xor problem

Error VS Trainning Passes


3

2 Overall error

BP BPGIA BP+Mom BPGIA+Mom

0 0 20000 40000 60000 Training passes 80000 100000 120000

Minimization of the error for simple shape recognizer

Error VS Trainning Passes


4

3 Overall error

BP BPGIA BP+Mom BPGIA+Mom

0 0 10000 20000 30000 40000 50000

Training passes

Minimization of the error for a 3 bit rotate register

Error VS Trainning Passes

Problem (network size) 1s complement (4x8x4) 3 to 8 decoder (3x8x8) Exor (2x2x1) Rotate register (3x6x3) Differentiation between a square, circle and triangle (16x20x1) BP 9.7 (134922) 5.4 (347634) 4.5 (211093) 4.3 (72477) 2.3 (71253) BPGIA 6.6 (92567) 4.2 (268833) 1.8 (88207) 2.0 (33909) 1.3 (33909) BP+Mom. 2.2 (25574) 1.1 (61366) 2.5 (107337) 1.1 (15929) 6.11 (145363) BPGIA+Mom. 1.0 (11863) 1.0 (53796) 1.0 (45916) 1.0 (14987) 1.0 (25163)

Training with gradual increase in accuracy


On an average, doubles the convergence rate of back propagation or the back propagation algorithm utilizing a momentum weight change without requiring additional or more complex neurons

Nonsaturating Activation Functions


For some applications, where saturation is not especially beneficial, a nonsaturating activation function may be used. One suitable example is F(x) = log(1+x) for x>0 =-log(1-x) for x<0. Or F(x) = 1/(1+x) for x>0 = 1/(1-x) for x<0.
Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Nonsaturating Activation Functions


Example: BP for the XOR
Problem
Standard bipolar XOR Modified bipolar XOR (+.8 or -.8)

Logarithmic

Bipolar Sigmoid

144 epochs 387 epochs 77 epochs 264 epochs

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Nonsaturating Activation Functions


Example: Product of sine functions (continuous single output)

Y=sin(2x1)*sin(2x2) *Training 5000 epochs @ mean squared error 0.024, =.05


Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Strictly Local Backpropagation


Standard BP
Requires sharing of information among processors (violation of accepted theories on the functioning of biological neurons) lacks biological plausibility

Strictly Local BP (Fausett, 1990)


Alleviates the standard BP deficiency

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Strictly Local BP Architecture

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Strictly Local BP Architecture


Cortical unit
Sums its inputs and sends the resulting value as a signal to the next unit above it

Synaptic units
Receive a single input signal, apply an activation function to the input, multiply the result by a weight, and send the result to a single unit above

Thalamic unit
Compare the computed output with the target value. If they do not match, the thalamic unit sends an error signal to the output synaptic unit below it

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

BP VS. Strictly Local BP

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

BP VS. Strictly Local BP

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Vous aimerez peut-être aussi