Académique Documents
Professionnel Documents
Culture Documents
3
2
1
p
p
p
P =
(Rx1)
Weight Matrix
W =
(
(
(
R s s s
R
R
w w w
w w w
w w w
, 2 , 1 ,
, 2 2 , 2 1 , 2
, 1 2 , 1 1 , 1
Multiple Layer Neural Networks
p
1
b
1
p
2
b
2
p
3
b
3
a
1
a
2
a
3
n
1
n
2
n
3
P
R
b
1
b
2
b
3
a
1
a
2
a
3
n
1
n
2
n
3
W
b
+
f
R
p
Rx1
n
S x1
a
S x1
SxR
S x 1
Layer of s neurons- Abbreviated notation.
W1
b1
+
f
R
p
Rx1
n
a
S
1
x1
S
1
xR
S
1
x 1
a
S
2
x1
W2
b
+
f
p
n
S
2
x1
S
2
x S
1
S
1
x 1
Abbreviated representation-Two Layer Network
TRAINING OF NEURAL NETWORKS
A network is trained so that a set of inputs
produces the desired set of outputs.
Training is accomplished by sequentially
applying input vectors ,while adjusting networks
weights according to a predetermined procedure.
During training the network weights gradually
converge to a value such that each input vector
produces ad desired output vector.
Types Of Training
Supervised Training.
Unsupervised Training.
Supervised Training.
Supervised training requires the pairing of input
vector with a target vector representing the desired
output. (Training Pair)
The Network is usually trained with a number of
such training pairs.
An input vector is applied output vector calculated
difference (error) fed back-Network weights are changed
accordingly-to minimize the error.
The Training pairs are applied sequentially, errors are
calculated and the weights adjusted for each vector, until the
error for entire training set is in the acceptably low level.
Supervised Training.(Contd)
Unsupervised training
Supervised training methods are biologically
implausible.
Unsupervised training Methods are far more plausible
It requires no target vectors for output.
No comparison to predetermined ideal response.
Training set consists solely of input vectors.
The Training Algorithms modifies the weights to
produce output vectors that are consistent.
Consistent similar input vectors will produce same
output.
Unsupervised method utilizes the statistical property
of input vectors.
Applying a vector from a given class to the input will
produce a specific output vector, but there is no way to
determine prior to training which output pattern will be
produced by a given input vector class.
TYPES OF NETWORKS
FEED FORWARD NETWORKS
COMPETATIVE NETWORKS
RECURRENT NETWORKS
Perceptron
Perceptron is a feed forward network. In this the
summing unit multiplies the input vector by a weight
and sums the weighted output.
If this sum is greater than a predetermined threshold
value, the output is one; otherwise Zero (in case of
Hardlim and -1 in case of Hardlims)
X
1
X
2
X
3
X
4
w
1
w
2
w
3
w
4
NET= XW
F
OUT
Artificial Neuron
Threshold
Perceptron Representation
Representation & Learning
Representation refers to the ability of the network to simulate
a specified function.
Learning requires the existence of a systematic procedure for
adjusting the weights to produce that function.
Example: Representation
Can we represent a Odd/Even Number discriminating
machine by a perceptron?
A Basic Pattern Recognition Problem using
Perceptron.
APPLE ORANGE
SORTER
SENSOR
ANN
P=
(
(
(
Weight
Texture
Shape
P
1
=
(
(
(
1
1
1
Prototype of orange
P
2
=
(
(
(
1
1
1
Prototype of apple
Two Input case: Single Neuron Preceptron
w n
p
1
a
b
p
2
a = hardlims (wp+b)
Single input neuron can classify the input vectors into
two categories.
Example 1:
Let for the above two input perceptron w
11
=1 and w
12
=1
Then
a=hardlims([ 1 1 ]p+b)
if b=1,
n=[1 1 ]p+1=0 represent a boundary line.
P
2
P
1
-1
-1
n>0
n<0
Perceptron Decision Boundary
Example 2:
Let for the above two input perceptron w
11
=-1 and
w
12
=1
Then
a=hardlims([ -1 1 ]p+b)
if b= -1,
n=[-1 1 ]p-1=0 represent a boundary line.
1
-1
1
n>0
n<0
The Key Property of Single-neuron perceptron is that it
can separate the input vectors into two category.
This category is determined by the equation
Wp + b =0.
Single layer perceptron can be used only to recognize
patterns which are LINEARLY SEPARABLE
Pattern recognition Example (Contd.)
There are only two category, Hence we can use single-
neuron perceptron.
The input vector is of order 3x1.
Perceptron equation
a=Hardlims([w
11
w
12
w
13
] +b )
(
(
(
3
2
1
p
p
p
Here ,to implement this pattern recognition problem we
have to select a linear boundary which separates the
proto type vectors ( Here it is Apple and Orange ).
(
(
(
1
1
1
Orange =
(
(
(
1
1
1
Apple =
P
3
P
1
P
2
Apple (1 1 -1)
Orange (1 -1 -1 )
Hence the linear boundary between the output are a
plane P
1
P
3 .
That is, P
2
.=0.
Wp+b=0 is here P2 .=0.
([w11 w12 w13] +b )=0.
(
(
(
3
2
1
p
p
p
[0 1 0 ] + 0 =0.
(
(
(
3
2
1
p
p
p
Hence weight matrix = [0 1 0 ].
Bias ,b =0.
Here the weight matrix is orthogonal to the
Boundary Layer.
Is X-OR Problem is representational?
Example 2
Take two input XOR gate
X Value Y Value Desired
Output
Point
0 0 0 A
0
0 1 1 B
0
1 0 1 B
1
1 1 0 A
1
X
Y
A
0
B
0
B
1
A
1
xw
1
+ yw
2
Exapmle:3
Check whether AND, OR functions are linearly seperable ?
Linear Separability
For some class of function the input vectors can be
separated geometrically .For two input case ,the
separator is a straight line. For three inputs it can
be done with a flat plane., cutting the resultant
three dimensional space. For four or more inputs
visualization is difficult . we can generalize to a
space of n dimensions divided by a
HYPERPLANE, which divides the space into four
or more regions.
Overcoming Linear separability Limitation
Linear separability limitation of single layer networks
can be overcome by adding more layers.
Multilayer networks can perform more general tasks.
Perceptron training Algorithm
Training methods used can be summarized as follows
1. Apply an input pattern and calculate the output.
2. .
a) If the output is correct, go to step 1.
b) If the output is incorrect, and is zero, add each input to
its corresponding weight;or
c) If the output is incorrect, and is one, subtract each input
from its corresponding weight.
3.Go to step 1.
THE DELTA RULE
Delta rule is an important generalization of perceptron
training algorithm .
Perceptron training algorithm is generalized by
introducing a term
= ( T-A )
T = Target Output.
A = Actual Output
If = 0 step 2a
>0 step2b
< 0 step2c
In any of these case ,the perceptron training algorithm
is satisfied if is multiplied the value of each input x
i
and this product is added to the corresponding weight.a
Learning Rate coeifficent is multiplied with x
i
product to allow the average size of weight changes.
= ( T-A )
i
= xi
w
i
(n+1)
=w
i
(n)+ i
Where
i = the correction associated with i th input x
i
wi(n+1) = the value of weight i after adjustment
wi(n) = the value of weight i before adjustment
Problems with Perceptron Training Algorithm
It is difficult to determine whether the input sets are
lineaerly seperable or not.
In real world situation the inputs are often time
varying and may be sepearable at one time and not at
another.
The number of steps required is not properly
defined.
There is no proof that the perceptron algorithms are
faster than simply changing the values.
ADALINE Network
n = Wp + b
a = purelin(Wp + b)
+
a
S1
n
S1
1
b
S1
W
SR
R
p
R1
S
Single-layer
perceptron
a
S1
+
n
S1
1
b
S1
W
SR
R
p
R1
S
Linear Neuron Model
Network Architecture
Singler-Layer Linear Network
ADALINE Network
ADALINE (Adaptive Linear Neuron) network
and its learning rule, LMS (Least Mean Square)
algorithm are proposed by Widrow and Marcian
Hoff in 1960.
Both ADALINE network and the perceptron
suffer from the same inherent limitation: they can
only solve linearly separable problems.
The LMS algorithm minimizes mean square
error (MSE), and therefore tires to move the
decision boundaries as far from the training
patterns as possible.
Single ADALINE
Set n = 0, then Wp + b = 0
specifies a decision boundary.
The ADALINE can be used to
classify objects into two
categories if they are linearly
separable.
11
w
12
w
1
p
2
p
E
n a
1
b
| |
n n purelin a
b n
w w
p
p
= =
+ =
=
(
=
) (
,
12 11
2
1
Wp
W p
1
p
2
p
W
0 > a
0 < a
Single ADALINE
The linear networks (ADALINE) are similar to the
perceptron, but their transfer function is linear rather
than hard-limiting.
This allows their outputs to take on any value, whereas
the perceptron output is limited to either 0 or 1.
Linear networks, like the perceptron, can only solve
linearly separable problems.
Widrow and Hoff introduced ADALINE
network.
Learning rule :LMS(Least Mean Square)
Algorithm.
The least mean square error (LMS or Widrow-Hoff)
algorithm is an example of supervised training, in which the
learning rule is provided with a set of examples of desired
network behavior:
{p1, t1} , {p2, t2} , , {pQ, tQ}
Mean Square Error
The LMS algorithm is an example of supervised
training.
The LMS algorithm will adjust the weights and
biases of the ADALINE in order to minimize the
mean square error, where the error is the difference
between the target output (t
q
) and the network
output (p
q
).
=
(
=
1
,
1
p
z
w
x
b
] ) [( ] ) [( ] [ ) (
2 T 2 2
z x x = = = t E a t E e E F
z x p w
T T
1
= + = b a
E[ ]: expected value
MSE
:
LMS or Widrow-Hoff
Mean Square Error
As each input is applied to the network, the network output
is compared to the target. The error is calculated as the
difference between the target output and the network
output.
We want to minimize the average of the sum of these errors.
The LMS algorithm adjusts the weights and biases of the
linear network so as to minimize this mean square error.
Estimate the mean square error by using the squared error
at each iteration.
The LMS algorithm or Widrow-Hoff learning algorithm, is
based on an approximate steepest descent procedure.
Next look at the partial derivative with respect to the
error.
Here p
i
(k) is the ith element of the input vector at the kth
iteration.
Finally, the change to the weight matrix and the bias
will be
These two equations form the basis of the Widrow-Hoff
(LMS) learning
algorithm.
These results can be extended to the case of multiple
neurons, and written in matrix form as
the error e, W and b are
vectors and
is a learning rate
(0.2..0.6).