Vous êtes sur la page 1sur 39

Overcoming Linear separability Limitation

Linear separability limitation of single layer networks


can be overcome by adding more layers.
Multilayer networks can perform more general tasks.

Perceptron
training
Algorithmas follows
Training methods
used can
be summarized
1. Apply an input pattern and calculate the output.
2. .
a) If the output is correct, go to step 1.
b) If the output is incorrect, and is zero, add each input to
its corresponding weight;or
c) If the output is incorrect, and is one, subtract each
input to its corresponding weight.
3.Go to step 1.

THE DELTA RULE


Delta rule is an important generalization of perceptron
training algorithm .
Perceptron training algorithm is generalized by
introducing a term
= ( T-A )
T = Target Output.
A = Actual Output
If

= 0 step 2a
>0 step2b
< 0 step2c

In any of these case ,the perceptron training algorithm


is satisfied if is multiplied the value of each input xi
and this product is added to the corresponding weight.a
Learning Rate coeifficent is multiplied with xi
product to allow the average size of weight changes.

= ( T-A )
i = xi
wi(n+1) =wi(n)+ i
Where
i
= the correction associated with i th input xi
wi(n+1) = the value of weight i after adjustment
wi(n)
= the value of weight i before adjustment

Problems with Perceptron Training Algorithm

It is difficult to determine whether the input sets are


lineaerly seperable or not.

In real world situation the inputs are often time


varying and may be sepearable at one time and not at
another.

The number of steps required is not properly


defined.

There is no proof that the perceptron algorithms are


faster than simply changing the values.

Module2
Back propagation: Training Algorithm - Application Network Configurations - Network Paralysis - Local
Minima - Temporal instability.

INTRODUCTION
The expansion of ANN was under eclipse due to lack of
algorithms for training multilayer ANN.

Back propagation is a systematic method of training


multilayer ANN.

The back propagation algorithms dramatically


expanded the range of problem that can be solved using
ANN.

BACK PROPAGATION
Back propagation is a systematic method for training
multilayer artificial neural networks

Overcoming Linear separability Limitation

Linear separability limitation of single layer Peceptron


networks can be overcome by adding more layers.
Multilayer networks can perform more general tasks.
The multi layer perceptron ,trained by BACK
PROPAGATION algorithm is the most widely used
NN.

Three Layer Neural Network

p1

First Layer
n1

Second Layer
a1
n1

b1
p2

PR

n
2
2n
3
b3

a2

a3

n
2
b

2n
3
b3

*Notations are incorrect

a1

b1

b1

b
p3

Third Layer
n1
a1

a2

a3

n
2
b

2n
3
b3

a2

a3

Back Propagation Training algorithm


Network Configuration

X1

W1,1

X2

W1,2

X3

W1,3

X4

W1,4

NET= XW

Artificial Neuron

OUT

The generally used activation function for NN using


Back Propagation Algorithm is Sigmoid Function
Sigmoid Function Gives a nonlinear gain for Artificial
Neuron.

Sigmoid Function

Why Sigmoid Function is used in Back Propagation?

BP requires a function differentiable everywhere.


Sigmoid function has an additional advantage of
providing a form of automatic gain control.

Multilayer network will have more representational


power with non linear functions.
Sigmoid Function
OUT=

1
(1 EXP NET )

d (OUT)
---------- =OUT(1-OUT)
d (NET)

Multilayer Layer Back Propagation Network

p1

n1

a1

b1
p2

n
2
b

p3
PR

2n
3
b3

n1

OUT1

Target 1

b1
a2

a3

n
2
b

2n
3
b3

OUT2
Target 2
OUT3
Target 3

OVER VIEW OF TRAINING

OBJECTIVE OF TRAINING
TRAINING PAIR
TRAINING SET
TRAINING STEPS

The Steps Required


1. Select the Training Pair from the training
set;apply the input to the network input.
2. Calculate the output of the network.
3. Calculate the error between the network output
and the target.
4. Adjust the weights of the network in a way to
minimise the error.
5. Repeat step 1 to 4 for each vector in trhe training
set,until the error for the entire set is acceptably
low.

Forward Pass
Step 1 and Step 2 constitute forward pass
Signal propagate from input to output.
NET = XW.
OUT = F(XW)
Reverse Pass
Step 3 and step 4 constitute reverse pass.
Weights in the OUTPUT LAYER is adjusted with the
modified delta rule.
Training is more complicated in the HIDDEN LAYERS,
as their output have no target for comparison.

Adjusting Weights of the Output Layer


Training Process is as follows
Consider the weight between neuron p in the hidden
layer j to neuron q in the output layer.
The OUTPUT of the neuron in the layer k is
subtracted from the Target value to produce the error
signal.
This is multiplied by the derivative of the Function
calculated for layer k.
=OUT(1--OUT)(Target -OUT)

Adjusting Weights of the Output Layer Contd..


Wqp
p

FI
*
*
*
Wqp(n)

,Training Rate
wqp

Wqp(n+1)

Then the is multiplied by the OUT from neuron jthe source neuron.
This product is multiplied by the Learning Rate.
() typically the learning rate is taken as value
between 0.01-1.0.
This result is added to the weight.
An identical process is done for each weight
proceeding from a neuron in the hidden layer to
output layer.

The following Equation will illustrate this calculation..

w qp q, k OUTp, j

w
w

qp

qp

(n 1)

qp

(n) w qp

(n) -The value of weight from neuron p in the


hidden layer to q in the output layer during
nth iteration

Adjusting Weights of Hidden Layer


Hidden layer have no target vectors, so training process
described above is not used for them.
Back

propagation

trains

the

hidden

layers

by

propagating the output error back through a the


network layer by layer, adjusting weights at each layer.
The same equation as in the previous case can be
utilized here also.
i.e..

w qp q, k OUTp, j

qp

(n 1)

qp

(n) w qp

How to generate for hidden layers?


First is calculated for each neuron in the output
layer.
It is used to adjust the weights feeding into the
output layer.
Propagated back the above through the same
weights to generate for each neuron in the first
hidden layer.

Calculate the by summing up all the weighted .


These s are used for adjusting weights of this
hidden layer.

Now these are propagated back to all the preceding


layers in a similar way.

pj OUT pj (1 OUT pj )( qk wqp ,k )

Previous Layer

Hidden Layer

Output Layer

w1p

w2p

1,k

k
1

2,k

q ,k

wqp

Derivation of Learning Rule for Back Propagation


Assumptions and Notations.
yk yin,k E
zj

j
k

The output of the kth neuron yk =f(yin,k )


The net input to the neuron k.
Squired error, E =0.5 (Target-OUT)2
Output from jth hidden layer.
portion of error correction weight.
Hidden Layer
Output Layer

E =0.5 (Target-OUT)2.
E =0.5 (tk-yk)2

y k
E
(t k y k )
wkj
wkj
f ( y ink ) y ink
(t k y k )
*
y ink
wkj
(t k y k ) f ' ( y ink ) * z j
let

k (t k y k ) f ( y in , k )
(t k

d (OUT )
yk )
.
d ( NET )

Usually the activation function for the BP Network is


either Binary Sigmoid function (range [0,1]) or
Bipolar sigmoid function (range [-1,1]).Hence the
above equation for becomes

k (t k y k ) f ( y in , k )
(t k y k )OUT (1 OUT ).

For Hidden Layer


y k
E
(t k y k )
w ji
w ji
(t k y k )

f ( y in , k )
y in , k

(t k y k ) f ( y in , k ) *
k w jk *
k w jk *

y in , k
w ji

y in , k z ji
z ji

z ji z inj
z inj w ji
f ( z inj ) z inj
z inj

w ji

k w jk * f ( z inj ) xi

w ji

let

j k w jk f ( z inj )

Now consider the first case

E
w jk
w jk

for output layer

k z j .
Now consider the second case (Hidden Layer)

w ji

E

w ji
j x i .

BiasAdjustments. (For Hidden and Output Layers)

b m (k 1) b m (k ) m
Example :Find the equation for Change in weight by
back propagation algorithm, when the activation function
used is Tan Sigmoid Function.
e n e n
OUT a n
e e n
OUT (1 ( a ) 2 )
Hence

(t k )(1 a 2 ).
w (t k )(1 a 2 )OUT k 1

Example: For the network shown the initial weights and


biases are chosen to be w1(0)=-1,b1(0)=1,w2(0)=-2,b2(0)=1.
An input Target pair is given to be ((p=-1),(T=1)).
Perform the Back propagation algorithm for one iteration.
F=Tan-Sigmoid function

w1

n1

b1

a1 w
2

b2

n2

a2

Example: For the neural network shown in Fig. With


initial data determine the new weights after applying the
sample(0,0) once. Assume the learning rate as 0.3 and
the activation function for the hidden layer and the
1
output layer as
1 e x
x1
0.1
0.1
x2

0.1
z2

-0.2

y1
0.15

0.2
1

Applications of Back propagation algorithm

Short-term Load Forecasting


Image Processing
Online Motor fault detection
Power system stability.

Network Paralysis

During BP training the weights can become very


large.

This force all or most of the neuron to operate at


large values of OUT.

Derivative in this region is very small.


The error sent back for training is

also small.

(Proportional to derivative of OUT)

Hence the training process

can come to a virtual

stand still. (called Network Paralysis)

It is commonly avoided by reducing the step size.

Local Minima
Back Propagation algorithm employs a type of
Gradient descent method.
The error surface of a complex network is highly
convoluted, full of hills, valleys, folds etc.
The network can be get trapped in a local minimum
(shallow valley) when there is a much deeper
minimum nearby. ( This problem denoted as Local
Minima.)
It is avoided by statistical training methods.
Wasserman proposed a combined statistical and gradient
descent method

Temporal Instability

Human

brain has the ability of retaining the data


,while able to record new data.
Conventional ANN have failed to solve this stability
problem.
Learning a new pattern may erase or modifies too
often.
In BPNN the new set of applied input may badly
change the existing weights, hence complete retraining is
required.
In real world problems NN is exposed to a
continuously changing environment.
BPNN learn nothing because of continuous change in
the input pattern., never arriving at satisfactory
settings.

Vous aimerez peut-être aussi