ANN Module2

Overcoming Linear separability Limitation
Linear separability limitation of single layer networks

can be overcome by adding more layers.
Multilayer networks can perform more general tasks.
Perceptron
training
Algorithmas follows
Training methods
used can
be summarized
1. Apply an input pattern and calculate the output.
2. .
a) If the output is correct, go to step 1.
b) If the output is incorrect, and is zero, add each input to
its corresponding weight;or
c) If the output is incorrect, and is one, subtract each
input to its corresponding weight.
3.Go to step 1.
THE DELTA RULE

Delta rule is an important generalization of perceptron
training algorithm .
Perceptron training algorithm is generalized by
introducing a term
= ( T-A )
T = Target Output.
A = Actual Output
If
= 0 step 2a
>0 step2b
< 0 step2c
In any of these case ,the perceptron training algorithm

is satisfied if is multiplied the value of each input xi
and this product is added to the corresponding weight.a
Learning Rate coeifficent is multiplied with xi
product to allow the average size of weight changes.
= ( T-A )
i = xi
wi(n+1) =wi(n)+ i
Where
i
= the correction associated with i th input xi
wi(n+1) = the value of weight i after adjustment
wi(n)
= the value of weight i before adjustment
Problems with Perceptron Training Algorithm
It is difficult to determine whether the input sets are

lineaerly seperable or not.
In real world situation the inputs are often time

varying and may be sepearable at one time and not at
another.
The number of steps required is not properly

defined.
There is no proof that the perceptron algorithms are

faster than simply changing the values.
Module2
Back propagation: Training Algorithm - Application Network Configurations - Network Paralysis - Local
Minima - Temporal instability.
INTRODUCTION
The expansion of ANN was under eclipse due to lack of
algorithms for training multilayer ANN.
Back propagation is a systematic method of training

multilayer ANN.
The back propagation algorithms dramatically

expanded the range of problem that can be solved using
ANN.
BACK PROPAGATION
Back propagation is a systematic method for training
multilayer artificial neural networks
Overcoming Linear separability Limitation
Linear separability limitation of single layer Peceptron

networks can be overcome by adding more layers.
Multilayer networks can perform more general tasks.
The multi layer perceptron ,trained by BACK
PROPAGATION algorithm is the most widely used
NN.
Three Layer Neural Network
p1
First Layer
n1
Second Layer
a1
n1
b1
p2
PR
n
2
2n
3
b3
a2
a3
n
2
b
2n
3
b3
*Notations are incorrect
a1
b1
b1
b
p3
Third Layer
n1
a1
a2
a3
n
2
b
2n
3
b3
a2
a3
Back Propagation Training algorithm

Network Configuration
X1
W1,1
X2
W1,2
X3
W1,3
X4
W1,4
NET= XW
Artificial Neuron
OUT
The generally used activation function for NN using

Back Propagation Algorithm is Sigmoid Function
Sigmoid Function Gives a nonlinear gain for Artificial
Neuron.
Sigmoid Function
Why Sigmoid Function is used in Back Propagation?
BP requires a function differentiable everywhere.

Sigmoid function has an additional advantage of
providing a form of automatic gain control.
Multilayer network will have more representational

power with non linear functions.
Sigmoid Function
OUT=
1
(1 EXP NET )
d (OUT)
---------- =OUT(1-OUT)
d (NET)
Multilayer Layer Back Propagation Network
p1
n1
a1
b1
p2
n
2
b
p3
PR
2n
3
b3
n1
OUT1
Target 1
b1
a2
a3
n
2
b
2n
3
b3
OUT2
Target 2
OUT3
Target 3
OVER VIEW OF TRAINING
OBJECTIVE OF TRAINING
TRAINING PAIR
TRAINING SET
TRAINING STEPS
The Steps Required

1. Select the Training Pair from the training
set;apply the input to the network input.
2. Calculate the output of the network.
3. Calculate the error between the network output
and the target.
4. Adjust the weights of the network in a way to
minimise the error.
5. Repeat step 1 to 4 for each vector in trhe training
set,until the error for the entire set is acceptably
low.
Forward Pass
Step 1 and Step 2 constitute forward pass
Signal propagate from input to output.
NET = XW.
OUT = F(XW)
Reverse Pass
Step 3 and step 4 constitute reverse pass.
Weights in the OUTPUT LAYER is adjusted with the
modified delta rule.
Training is more complicated in the HIDDEN LAYERS,
as their output have no target for comparison.
Adjusting Weights of the Output Layer

Training Process is as follows
Consider the weight between neuron p in the hidden
layer j to neuron q in the output layer.
The OUTPUT of the neuron in the layer k is
subtracted from the Target value to produce the error
signal.
This is multiplied by the derivative of the Function
calculated for layer k.
=OUT(1--OUT)(Target -OUT)
Adjusting Weights of the Output Layer Contd..

Wqp
p
FI
*
*
*
Wqp(n)
,Training Rate
wqp
Wqp(n+1)
Then the is multiplied by the OUT from neuron jthe source neuron.
This product is multiplied by the Learning Rate.
() typically the learning rate is taken as value
between 0.01-1.0.
This result is added to the weight.
An identical process is done for each weight
proceeding from a neuron in the hidden layer to
output layer.
The following Equation will illustrate this calculation..
w qp q, k OUTp, j
w
w
qp
qp
(n 1)
qp
(n) w qp
(n) -The value of weight from neuron p in the

hidden layer to q in the output layer during
nth iteration
Adjusting Weights of Hidden Layer

Hidden layer have no target vectors, so training process
described above is not used for them.
Back
propagation
trains
the
hidden
layers
by
propagating the output error back through a the

network layer by layer, adjusting weights at each layer.
The same equation as in the previous case can be
utilized here also.
i.e..
w qp q, k OUTp, j
qp
(n 1)
qp
(n) w qp
How to generate for hidden layers?

First is calculated for each neuron in the output
layer.
It is used to adjust the weights feeding into the
output layer.
Propagated back the above through the same
weights to generate for each neuron in the first
hidden layer.
Calculate the by summing up all the weighted .

These s are used for adjusting weights of this
hidden layer.
Now these are propagated back to all the preceding

layers in a similar way.
pj OUT pj (1 OUT pj )( qk wqp ,k )
Previous Layer
Hidden Layer
Output Layer
w1p
w2p
1,k
k
1
2,k
q ,k
wqp
Derivation of Learning Rule for Back Propagation

Assumptions and Notations.
yk yin,k E
zj
j
k
The output of the kth neuron yk =f(yin,k )

The net input to the neuron k.
Squired error, E =0.5 (Target-OUT)2
Output from jth hidden layer.
portion of error correction weight.
Hidden Layer
Output Layer
E =0.5 (Target-OUT)2.
E =0.5 (tk-yk)2
y k
E
(t k y k )
wkj
wkj
f ( y ink ) y ink
(t k y k )
*
y ink
wkj
(t k y k ) f ' ( y ink ) * z j
let
k (t k y k ) f ( y in , k )
(t k
d (OUT )
yk )
.
d ( NET )
Usually the activation function for the BP Network is

either Binary Sigmoid function (range [0,1]) or
Bipolar sigmoid function (range [-1,1]).Hence the
above equation for becomes
k (t k y k ) f ( y in , k )
(t k y k )OUT (1 OUT ).
For Hidden Layer

y k
E
(t k y k )
w ji
w ji
(t k y k )
f ( y in , k )
y in , k
(t k y k ) f ( y in , k ) *
k w jk *
k w jk *
y in , k
w ji
y in , k z ji
z ji
z ji z inj
z inj w ji
f ( z inj ) z inj
z inj
w ji
k w jk * f ( z inj ) xi
w ji
let
j k w jk f ( z inj )
Now consider the first case
E
w jk
w jk
for output layer
k z j .
Now consider the second case (Hidden Layer)
w ji
E

w ji
j x i .
BiasAdjustments. (For Hidden and Output Layers)
b m (k 1) b m (k ) m
Example :Find the equation for Change in weight by
back propagation algorithm, when the activation function
used is Tan Sigmoid Function.
e n e n
OUT a n
e e n
OUT (1 ( a ) 2 )
Hence
(t k )(1 a 2 ).
w (t k )(1 a 2 )OUT k 1
Example: For the network shown the initial weights and

biases are chosen to be w1(0)=-1,b1(0)=1,w2(0)=-2,b2(0)=1.
An input Target pair is given to be ((p=-1),(T=1)).
Perform the Back propagation algorithm for one iteration.
F=Tan-Sigmoid function
w1
n1
b1
a1 w
2
b2
n2
a2
Example: For the neural network shown in Fig. With

initial data determine the new weights after applying the
sample(0,0) once. Assume the learning rate as 0.3 and
the activation function for the hidden layer and the
1
output layer as
1 e x
x1
0.1
0.1
x2
0.1
z2
-0.2
y1
0.15
0.2
1
Applications of Back propagation algorithm
Short-term Load Forecasting

Image Processing
Online Motor fault detection
Power system stability.
Network Paralysis
During BP training the weights can become very

large.
This force all or most of the neuron to operate at

large values of OUT.
Derivative in this region is very small.

The error sent back for training is
also small.
(Proportional to derivative of OUT)
Hence the training process
can come to a virtual
stand still. (called Network Paralysis)
It is commonly avoided by reducing the step size.
Local Minima
Back Propagation algorithm employs a type of
Gradient descent method.
The error surface of a complex network is highly
convoluted, full of hills, valleys, folds etc.
The network can be get trapped in a local minimum
(shallow valley) when there is a much deeper
minimum nearby. ( This problem denoted as Local
Minima.)
It is avoided by statistical training methods.
Wasserman proposed a combined statistical and gradient
descent method
Temporal Instability
Human
brain has the ability of retaining the data

,while able to record new data.
Conventional ANN have failed to solve this stability
problem.
Learning a new pattern may erase or modifies too
often.
In BPNN the new set of applied input may badly
change the existing weights, hence complete retraining is
required.
In real world problems NN is exposed to a
continuously changing environment.
BPNN learn nothing because of continuous change in
the input pattern., never arriving at satisfactory
settings.

ANN Module2

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

ANN Module2

Transféré par

Droits d'auteur :

Formats disponibles

Overcoming Linear separability Limitation

Linear separability limitation of single layer networks

THE DELTA RULE

In any of these case ,the perceptron training algorithm

Problems with Perceptron Training Algorithm

It is difficult to determine whether the input sets are

In real world situation the inputs are often time

The number of steps required is not properly

There is no proof that the perceptron algorithms are

Back propagation is a systematic method of training

The back propagation algorithms dramatically

Overcoming Linear separability Limitation

Linear separability limitation of single layer Peceptron

Three Layer Neural Network

*Notations are incorrect

Back Propagation Training algorithm

The generally used activation function for NN using

Why Sigmoid Function is used in Back Propagation?

BP requires a function differentiable everywhere.

Multilayer network will have more representational

Multilayer Layer Back Propagation Network

OVER VIEW OF TRAINING

The Steps Required

Adjusting Weights of the Output Layer

Adjusting Weights of the Output Layer Contd..

The following Equation will illustrate this calculation..

(n) -The value of weight from neuron p in the

Adjusting Weights of Hidden Layer

propagating the output error back through a the

How to generate for hidden layers?

Calculate the by summing up all the weighted .

Now these are propagated back to all the preceding

pj OUT pj (1 OUT pj )( qk wqp ,k )

Derivation of Learning Rule for Back Propagation

The output of the kth neuron yk =f(yin,k )

Usually the activation function for the BP Network is

For Hidden Layer

Now consider the first case

for output layer

BiasAdjustments. (For Hidden and Output Layers)

Example: For the network shown the initial weights and

Example: For the neural network shown in Fig. With

Applications of Back propagation algorithm

Short-term Load Forecasting

During BP training the weights can become very

This force all or most of the neuron to operate at

Derivative in this region is very small.

(Proportional to derivative of OUT)

Hence the training process

can come to a virtual

stand still. (called Network Paralysis)

It is commonly avoided by reducing the step size.

brain has the ability of retaining the data

Vous aimerez peut-être aussi