Back Propagation: Variations

Back Propagation: Variations
Neuron #1 x'1 X1 Neuron #2 . . X

n
x'2 x'i
Neuron #1 . . Neuron #p
Y 1(m+1)
Input
. . Neuron #k
Output
Y p(m+1)
Layer 0 Layer m
Layer m+1
BP- Improvements
Second order derivatives (Parker, 1982) Dynamic range modification
and Huberman, 1987) (Stronetta
F(x) = -1/2 + 1/1+e-x Meta Learning (Jacobs, 1987; Hagiwara, 1990) Selective updates (Huang and Huang, 1990)
BP- Improvements (Cont.)

Use of Momentum weight change
(Rumelhart, 1986) wkmi(t+1) = * km * xi(t) + wkmi(t)
Exponential smoothing (Sejnowski and

Rosenburg, 1987) wkmi(t+1) = (1) * km * xi(t) + wkmi(t)
BP- Improvements (Cont.)

Accelerating the BP algorithm (Kothari,
Klinkhachorn, and Nutter, 1991)
Gradual increase in learning accuracy

Without incorporating the disadvantage of increased network size, more complex neurons or otherwise violating the parallel structure of computation
Gradual increase in learning accuracy

Temporal instability Absence of tru direction of descent
Void Acc_BackProp (Struct Network *N, struct Train_Set *T) { Assume_coarse_error () while ( < Eventual_Accuracy) { while (not_all_trained) { Present_Next_Pattern; while (!Trained) Train_Pattern; } Increase_Accuracy ( -= Step); } }
Training with gradual increase in accuracy

Direction of descent suggested by examplar 1 Direction of descent suggested by examplar 2 Direction of descent suggested by examplar 3
Direction of Steepest descent
...
Direction of descent suggested by examplar M
Error VS Trainning Passes

12
Overall error
10
BP BPGIA BP+Mom BPGIA+Mom
0 0 10000 20000 30000 Training passes 40000 50000 60000
Minimization of the error for a 4 bit 1's complementor (Graph has been curtailed to show detail)

5
Overall error
0 0 100000 200000 Training passes 300000
Minimization of the error for a 3-to-8 Decoder

0.8
0.6 Overall error
0.4
0.2
0.0 0 100000 Training passes 200000
Minimization of the error for the Xor problem

3
2 Overall error
0 0 20000 40000 60000 Training passes 80000 100000 120000
Minimization of the error for simple shape recognizer

4
3 Overall error
0 0 10000 20000 30000 40000 50000
Training passes
Minimization of the error for a 3 bit rotate register
Problem (network size) 1s complement (4x8x4) 3 to 8 decoder (3x8x8) Exor (2x2x1) Rotate register (3x6x3) Differentiation between a square, circle and triangle (16x20x1) BP 9.7 (134922) 5.4 (347634) 4.5 (211093) 4.3 (72477) 2.3 (71253) BPGIA 6.6 (92567) 4.2 (268833) 1.8 (88207) 2.0 (33909) 1.3 (33909) BP+Mom. 2.2 (25574) 1.1 (61366) 2.5 (107337) 1.1 (15929) 6.11 (145363) BPGIA+Mom. 1.0 (11863) 1.0 (53796) 1.0 (45916) 1.0 (14987) 1.0 (25163)
Training with gradual increase in accuracy

On an average, doubles the convergence rate of back propagation or the back propagation algorithm utilizing a momentum weight change without requiring additional or more complex neurons
Nonsaturating Activation Functions

For some applications, where saturation is not especially beneficial, a nonsaturating activation function may be used. One suitable example is F(x) = log(1+x) for x>0 =-log(1-x) for x<0. Or F(x) = 1/(1+x) for x>0 = 1/(1-x) for x<0.
Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Example: BP for the XOR
Problem
Standard bipolar XOR Modified bipolar XOR (+.8 or -.8)
Logarithmic
Bipolar Sigmoid
144 epochs 387 epochs 77 epochs 264 epochs

Example: Product of sine functions (continuous single output)
Y=sin(2x1)*sin(2x2) *Training 5000 epochs @ mean squared error 0.024, =.05

Strictly Local Backpropagation

Standard BP
Requires sharing of information among processors (violation of accepted theories on the functioning of biological neurons) lacks biological plausibility
Strictly Local BP (Fausett, 1990)

Alleviates the standard BP deficiency
Strictly Local BP Architecture
Strictly Local BP Architecture

Cortical unit
Sums its inputs and sends the resulting value as a signal to the next unit above it
Synaptic units
Receive a single input signal, apply an activation function to the input, multiply the result by a weight, and send the result to a single unit above
Thalamic unit
Compare the computed output with the target value. If they do not match, the thalamic unit sends an error signal to the output synaptic unit below it
BP VS. Strictly Local BP
BP VS. Strictly Local BP

Back Propagation: Variations

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Back Propagation: Variations

Transféré par

Droits d'auteur :

Formats disponibles

Back Propagation: Variations

Neuron #1 x'1 X1 Neuron #2 . . X

BP- Improvements (Cont.)

Exponential smoothing (Sejnowski and

BP- Improvements (Cont.)

Gradual increase in learning accuracy

Gradual increase in learning accuracy

Training with gradual increase in accuracy

Direction of Steepest descent

Direction of descent suggested by examplar M

Error VS Trainning Passes

BP BPGIA BP+Mom BPGIA+Mom

0 0 10000 20000 30000 Training passes 40000 50000 60000

Error VS Trainning Passes

BP BPGIA BP+Mom BPGIA+Mom

0 0 100000 200000 Training passes 300000

Minimization of the error for a 3-to-8 Decoder

Error VS Trainning Passes

0.6 Overall error

BP BPGIA BP+Mom BPGIA+Mom

0.0 0 100000 Training passes 200000

Minimization of the error for the Xor problem

Error VS Trainning Passes

BP BPGIA BP+Mom BPGIA+Mom

0 0 20000 40000 60000 Training passes 80000 100000 120000

Minimization of the error for simple shape recognizer

Error VS Trainning Passes

BP BPGIA BP+Mom BPGIA+Mom

0 0 10000 20000 30000 40000 50000

Minimization of the error for a 3 bit rotate register

Error VS Trainning Passes

Training with gradual increase in accuracy

Nonsaturating Activation Functions

Nonsaturating Activation Functions

144 epochs 387 epochs 77 epochs 264 epochs

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Nonsaturating Activation Functions

Y=sin(2x1)*sin(2x2) *Training 5000 epochs @ mean squared error 0.024, =.05

Strictly Local Backpropagation

Strictly Local BP (Fausett, 1990)

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Strictly Local BP Architecture

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Strictly Local BP Architecture

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

BP VS. Strictly Local BP

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

BP VS. Strictly Local BP

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

Vous aimerez peut-être aussi

Y=sin(2x1)sin(2x2) Training 5000 epochs @ mean squared error 0.024, =.05