ANN PG Module1

Module 1
Introduction :Artificial Neuron-Activation function-Single layer

and Multi layer networks-Training Artificial Neural Networks-
Perceptron-Representation-Linear Separability-Learning-
Training Algorithms

What is Artificial Neural Networks?
Artificial neural systems can be considered as simplified
mathematical models of brain like systems and they
function as parallel distributed computing networks.
Artificial neural systems, or neural networks, are physical
cellular systems which can acquire, store, and utilize
experiential knowledge
Definition of ANN (by Hetct-Nelsen)
An artificial Neural Network is a parallel ,distributed
information processing structure consisting of Processing
Units interconnected via unidirectional signal channels
called connections.
Each processing unit has a single output connection that
branches into as many collateral connections as desired.-
each carries the same signal-the out put of processing
units

Contd...
Processing Units:-

Can Possess a Local Memory

Can carry out localized information processing
Processing unit output can be of any desired
mathematical type desired.
The information processes in the unit is
completely local, ie, input arriving at the
units,values stored in memory.
Traditional Algorithamic Approach
Vs
Artificial Neural Network
Traditional algorithmic approach is useful for problems
where it is possible to find
a precise sequence of mathematical operations
a precise sequence of rules.
Main weakness of Traditional approach
Sequential Computation
Usually instructions have to be executed in sequentially
even if the two instruction are not related

Local Representation
Any corruption in the sequence can ruin the entire process.
As complexity of program increases the reliability decreases.
OOPS tries to eliminate this problem

Main weakness of Traditional approach (Contd)
Learning Difficulties
If we define learning as the construction or modification
of some computational representation or model it is difficult
to simulate learning by traditional methods.

Self decision Problems
Digital computer can solve problems that are difficult for
human,but it is very difficult to use them to automate tasks
that human can solve with little effort.

ANN: Is it similar to Neuron?
Artificial Neural Networks are biologically inspired
ANN composed of elements similar to biological neuron
The anatomy may similar to that of anatomy of brain

ANN also has surprising number of brain Characteristics
such as..
1.Learn from previous examples
2.Generalize from previous examples to new ones.
3.Abstract essentials data from input with irrelevant data

Despite of this similarities nobody can say ANN will soon
replace the functions of human brain
The actual intelligence exhibited by the most sophisticated
ANN works is below the level of a tape worm.
REPRESENTATION OF A NEURON
Artificial Neuron
Set of input applied is output another neuron
Input is multiplied by a corresponding weight ~ synaptic
strength.
All the weighted inputs are summed to get the activation
level of the neuron.
The Artificial Neuron was designed to mimic the first order
Characteristics of biological neuron.
X
1
X
2

X
3
X
4
w
1
w
2
w
3
w
4
NET= XW
NET=X
1*
w
1
+

X2*w2+X3*w3+X
4
*w
4
Artificial Neuron

ACTIVATION FUNCTION
A function used in between the actual output and the
NET.
Activation function processes the NET .

NET= X*W
OUT=F(NET)
X,W are vectors
Activation function can be
Simple Linear function
The Threshold Function
The Sigmoid Function.

Simple Linear Function

OUT = K(NET)
K is a constant

Threshold Function

OUT=1 if NET>T
OUT=0 otherwise.

Sigmoid Function

OUT=
) 1 (
1
NET
+ EXP
Hyperbolic Tangent Functions

OUT=tanh(x)
Sigmoid Function Gives a nonlinear gain for Artificial
Neuron.
Sigmoid Function
Single Input Neuron.
Scalar input given (p) is multiplied by the scalar weight.
A bias b is passed to the neuron.
The summer output is given to the activation function
The output a=f (wp+b)

Input
General Neuron

w n
p
a
b
a= f( w*p+b)
Single Input Neuron.
X
1
X
2

X
3
X
4
W
1,1
W
1,2
W
1,3
W
1,4
NET= XW

F
OUT
Artificial Neuron
ARTIFICIAL NEURON WITH ACTIVATION FUNCTION

b
Multi input Neuron.
W
b
+
f
R
p
Rx1
n
1x1
a
1x1
Neuron with R inputs (Abbreviated notation)
The first index represent the particular neuron destination for
that weight.
The second index represents the source of the signal fed to
the neuron.
The indices in W
1,2
say that weight represents the
connection to the first neuron from the second source.
Weight Indices
Single Layer Neural Networks
One Neuron , even with many inputs, may not be
sufficient.
A Layer of neurons connected in parallel is called
single layer neuron.
Each elements of the input vector is connected to all of
the neuron of that layer.
Layer includes: weight matrix , the summers (soma)
,bias b, activation function.

p
1
b
1

p
2
b
2

p
3
b
3
a
1
a
2
a
3
n
1
n
2
n
3
Single Layer Neural Networks with 3 Neuron
W
1,1
W
3,3
W
1,2
p
4
The Vector Notation
Input vector
(
(
(
3
2
1
p
p
p
P =
(Rx1)
Weight Matrix
W =
(
(
(
R s s s
R
R
w w w
w w w
w w w
, 2 , 1 ,
, 2 2 , 2 1 , 2
, 1 2 , 1 1 , 1
Multiple Layer Neural Networks

p
1
b
1

p
2
b
2

p
3
b
3
a
1
a
2
a
3
n
1
n
2
n
3
P
R

b
1

b
2

b
3
a
1
a
2
a
3
n
1
n
2
n
3
W
b
+
f
R
p
Rx1
n
S x1
a
S x1
SxR
S x 1
Layer of s neurons- Abbreviated notation.
W1
b1
+
f
R
p
Rx1
n
a
S
1
x1
S
1
xR
S
1
x 1
a
S
2
x1
W2
b
+
f
p
n
S
2
x1
S
2
x S
1
S
1
x 1
Abbreviated representation-Two Layer Network
TRAINING OF NEURAL NETWORKS
A network is trained so that a set of inputs
produces the desired set of outputs.
Training is accomplished by sequentially
applying input vectors ,while adjusting networks
weights according to a predetermined procedure.
During training the network weights gradually
converge to a value such that each input vector
produces ad desired output vector.
Types Of Training
Supervised Training.
Unsupervised Training.
Supervised Training.
Supervised training requires the pairing of input
vector with a target vector representing the desired
output. (Training Pair)
The Network is usually trained with a number of
such training pairs.

An input vector is applied output vector calculated
difference (error) fed back-Network weights are changed
accordingly-to minimize the error.
The Training pairs are applied sequentially, errors are
calculated and the weights adjusted for each vector, until the
error for entire training set is in the acceptably low level.
Supervised Training.(Contd)
Unsupervised training
Supervised training methods are biologically
implausible.
Unsupervised training Methods are far more plausible
It requires no target vectors for output.
No comparison to predetermined ideal response.
Training set consists solely of input vectors.

The Training Algorithms modifies the weights to
produce output vectors that are consistent.
Consistent similar input vectors will produce same
output.
Unsupervised method utilizes the statistical property
of input vectors.
Applying a vector from a given class to the input will
produce a specific output vector, but there is no way to
determine prior to training which output pattern will be
produced by a given input vector class.
TYPES OF NETWORKS
FEED FORWARD NETWORKS
COMPETATIVE NETWORKS
RECURRENT NETWORKS
Perceptron
Perceptron is a feed forward network. In this the
summing unit multiplies the input vector by a weight
and sums the weighted output.
If this sum is greater than a predetermined threshold
value, the output is one; otherwise Zero (in case of
Hardlim and -1 in case of Hardlims)

X
1
X
2

X
3
X
4
w
1
w
2
w
3
w
4
NET= XW
F
OUT
Artificial Neuron

Threshold
Perceptron Representation
Representation & Learning
Representation refers to the ability of the network to simulate
a specified function.
Learning requires the existence of a systematic procedure for
adjusting the weights to produce that function.
Example: Representation
Can we represent a Odd/Even Number discriminating
machine by a perceptron?

A Basic Pattern Recognition Problem using
Perceptron.
APPLE ORANGE
SORTER
SENSOR
ANN
P=
(
(
(
Weight
Texture
Shape
P
1
=

(
(
(
1
1
1
Prototype of orange
P
2
=

(
(
(
1
1
1
Prototype of apple
Two Input case: Single Neuron Preceptron

w n
p
1
a
b
p
2
a = hardlims (wp+b)
Single input neuron can classify the input vectors into
two categories.
Example 1:
Let for the above two input perceptron w
11
=1 and w
12
=1
Then
a=hardlims([ 1 1 ]p+b)
if b=1,
n=[1 1 ]p+1=0 represent a boundary line.

P
2
P
1
-1
-1
n>0
n<0
Perceptron Decision Boundary
Example 2:
Let for the above two input perceptron w
11
=-1 and
w
12
=1
Then
a=hardlims([ -1 1 ]p+b)
if b= -1,
n=[-1 1 ]p-1=0 represent a boundary line.

1
-1
1
n>0
n<0
The Key Property of Single-neuron perceptron is that it
can separate the input vectors into two category.
This category is determined by the equation
Wp + b =0.
Single layer perceptron can be used only to recognize
patterns which are LINEARLY SEPARABLE
Pattern recognition Example (Contd.)
There are only two category, Hence we can use single-
neuron perceptron.
The input vector is of order 3x1.
Perceptron equation
a=Hardlims([w
11
w
12
w
13
] +b )
(
(
(
3
2
1
p
p
p
Here ,to implement this pattern recognition problem we
have to select a linear boundary which separates the
proto type vectors ( Here it is Apple and Orange ).
(
(
(
1
1
1
Orange =
(
(
(
1
1
1
Apple =
P
3
P
1
P
2
Apple (1 1 -1)
Orange (1 -1 -1 )
Hence the linear boundary between the output are a
plane P
1
P
3 .
That is, P
2
.=0.
Wp+b=0 is here P2 .=0.
([w11 w12 w13] +b )=0.
(
(
(
3
2
1
p
p
p
[0 1 0 ] + 0 =0.
(
(
(
3
2
1
p
p
p
Hence weight matrix = [0 1 0 ].

Bias ,b =0.

Here the weight matrix is orthogonal to the
Boundary Layer.
Is X-OR Problem is representational?
Example 2
Take two input XOR gate

X Value Y Value Desired
Output
Point
0 0 0 A
0
0 1 1 B
0
1 0 1 B
1
1 1 0 A
1
X
Y
A
0
B
0
B
1
A
1
xw
1
+ yw
2
Exapmle:3
Check whether AND, OR functions are linearly seperable ?
Linear Separability
For some class of function the input vectors can be
separated geometrically .For two input case ,the
separator is a straight line. For three inputs it can
be done with a flat plane., cutting the resultant
three dimensional space. For four or more inputs
visualization is difficult . we can generalize to a
space of n dimensions divided by a
HYPERPLANE, which divides the space into four
or more regions.
Overcoming Linear separability Limitation
Linear separability limitation of single layer networks
can be overcome by adding more layers.
Multilayer networks can perform more general tasks.
Perceptron training Algorithm
Training methods used can be summarized as follows
1. Apply an input pattern and calculate the output.
2. .
a) If the output is correct, go to step 1.
b) If the output is incorrect, and is zero, add each input to
its corresponding weight;or
c) If the output is incorrect, and is one, subtract each input
from its corresponding weight.
3.Go to step 1.
THE DELTA RULE
Delta rule is an important generalization of perceptron
training algorithm .
Perceptron training algorithm is generalized by
introducing a term
= ( T-A )

T = Target Output.
A = Actual Output

If = 0 step 2a
>0 step2b
< 0 step2c

In any of these case ,the perceptron training algorithm
is satisfied if is multiplied the value of each input x
i

and this product is added to the corresponding weight.a
Learning Rate coeifficent is multiplied with x
i

product to allow the average size of weight changes.

= ( T-A )
i
= xi
w
i
(n+1)

=w
i
(n)+ i
Where
i = the correction associated with i th input x
i
wi(n+1) = the value of weight i after adjustment
wi(n) = the value of weight i before adjustment

Problems with Perceptron Training Algorithm
It is difficult to determine whether the input sets are
lineaerly seperable or not.
In real world situation the inputs are often time
varying and may be sepearable at one time and not at
another.
The number of steps required is not properly
defined.
There is no proof that the perceptron algorithms are
faster than simply changing the values.

ADALINE Network
n = Wp + b
a = purelin(Wp + b)
+
a
S1
n
S1
1
b
S1
W
SR
R
p
R1
S
Single-layer
perceptron
a
S1
+
n
S1
1
b
S1
W
SR
R
p
R1
S
Linear Neuron Model
Network Architecture
Singler-Layer Linear Network
ADALINE Network
ADALINE (Adaptive Linear Neuron) network
and its learning rule, LMS (Least Mean Square)
algorithm are proposed by Widrow and Marcian
Hoff in 1960.
Both ADALINE network and the perceptron
suffer from the same inherent limitation: they can
only solve linearly separable problems.
The LMS algorithm minimizes mean square
error (MSE), and therefore tires to move the
decision boundaries as far from the training
patterns as possible.
Single ADALINE
Set n = 0, then Wp + b = 0
specifies a decision boundary.
The ADALINE can be used to
classify objects into two
categories if they are linearly
separable.
11
w
12
w
1
p
2
p
E
n a
1
b
| |
n n purelin a
b n
w w
p
p
= =
+ =
=
(
=
) (
,
12 11
2
1
Wp
W p
1
p
2
p
W
0 > a
0 < a
Single ADALINE
The linear networks (ADALINE) are similar to the
perceptron, but their transfer function is linear rather
than hard-limiting.

This allows their outputs to take on any value, whereas
the perceptron output is limited to either 0 or 1.

Linear networks, like the perceptron, can only solve
linearly separable problems.
Widrow and Hoff introduced ADALINE
network.
Learning rule :LMS(Least Mean Square)
Algorithm.

The least mean square error (LMS or Widrow-Hoff)
algorithm is an example of supervised training, in which the
learning rule is provided with a set of examples of desired
network behavior:

{p1, t1} , {p2, t2} , , {pQ, tQ}
Mean Square Error
The LMS algorithm is an example of supervised
training.
The LMS algorithm will adjust the weights and
biases of the ADALINE in order to minimize the
mean square error, where the error is the difference
between the target output (t
q
) and the network
output (p
q
).
=
(
=
1
,
1
p
z
w
x
b
] ) [( ] ) [( ] [ ) (
2 T 2 2
z x x = = = t E a t E e E F
z x p w
T T
1
= + = b a
E[ ]: expected value
MSE
:
LMS or Widrow-Hoff
Mean Square Error

As each input is applied to the network, the network output
is compared to the target. The error is calculated as the
difference between the target output and the network
output.
We want to minimize the average of the sum of these errors.

The LMS algorithm adjusts the weights and biases of the
linear network so as to minimize this mean square error.
Estimate the mean square error by using the squared error
at each iteration.

The LMS algorithm or Widrow-Hoff learning algorithm, is
based on an approximate steepest descent procedure.
Next look at the partial derivative with respect to the
error.
Here p
i
(k) is the ith element of the input vector at the kth
iteration.
Finally, the change to the weight matrix and the bias
will be
These two equations form the basis of the Widrow-Hoff
(LMS) learning
algorithm.
These results can be extended to the case of multiple
neurons, and written in matrix form as
the error e, W and b are
vectors and
is a learning rate
(0.2..0.6).

ANN PG Module1

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

ANN PG Module1

Transféré par

Droits d'auteur :

Formats disponibles

Module 1

Introduction :Artificial Neuron-Activation function-Single layer

Vous aimerez peut-être aussi