Vous êtes sur la page 1sur 43

Artificial Neural Networks

System that can acquire, store, and utilize


experiential knowledge
Brain and Machine
• The Brain
– Pattern Recognition
– Association
– Complexity
– Noise Tolerance

• The Machine
– Calculation
– Precision
– Logic
Overview
The Brain
Brain vs.. Computers
The Perceptron
Multi-layer networks
Some Applications
The contrast in architecture
• The Von Neumann
architecture uses a single
processing unit;
– Tens of millions of operations
per second
– Absolute arithmetic precision

• The brain uses many slow


unreliable processors
acting in parallel
Features of the Brain

• Ten billion (1010) neurons


• On average, several thousand connections
• Hundreds of operations per second
• Die off frequently (never replaced)
• Compensates for problems by massive
parallelism
The Structure of Neurons

synapse axon

nucleus

cell body

dendrites
The Structure of Neurons
A neuron has a cell body, a branching input
structure (the dendrIte) and a branching output
structure (the axOn)

• Axons connect to dendrites via synapses


• Electro-chemical signals are propagated from the
dendritic input, through the cell body, and down
the axon to other neurons
The Structure of Neurons
• A neuron only fires if its input signal exceeds a
certain amount (the threshold) in a short time
period
• Synapses vary in strength
– Good connections allowing a large signal
– Slight connections allow only a weak signal
– Synapses can be either excitatory or inhibitory
The Artificial Neuron
(Perceptron)
ao+1
wj0
a1 wj1
wj2
a2
Sj f(Sj) Xj

wjn
an
General Symbols
Synaptic
w1 Connections
x1
w2
x2
w3 O
x3 f(wx)

wn Neuron’s
xn
processing node

Multiplicative Weights
W=[ w1,w2,…wn ]T

X=[x1,x2,…xn]T

O  f (W T X )

 n 
O  f   wi xi 
 i 1 
Activation functions
Continuous model – Bipolar - sigmoid

2
f (net ) 
1  exp( net )
1 -Bipolar continuous

Continuous model – Unipolar - sigmoid

1
f (net )  -Unipolar continuous
1  exp( net )
Activation functions
Discrete model – Bipolar -hard limit

f (net )  sgn( net ) +1 if net>0


-1 if net <0

Discrete model – Unipolar – hard limit

f (net )  sgn( net ) 1 if net >0


0 if net <0
Activation functions
• Transforms neuron’s input into output
• Features of activation functions:
• A squashing effect is required
• Prevents accelerating growth of activation levels
through the network
• Simple and easy to calculate
Learning
Learning in network is to find W – weights that best
approximate training patterns
Supervised learning
- like class room teaching
- Training set is needed
- Teacher estimates negative error gradient direction and
reduces error accordingly
Unsupervised learning
- Like video lecture
- No feedback for error
- Used for clustering
Supervised/Unsupervised
x O
Neural Network

Learning Signal d
Generator

Supervised Learning

x O
Neural Network

Unsupervised Learning
Supervised Vs
Unsupervised
Training and test data sets
Training set; input & target

Function approx Vs Grading


Generalization
The network is said to generalized well when it sensibly
interpolates input patterns that are new to the network

Training data

Y • • •
• •
• • •
• • •
• •
• •

xT X
Neuro Processing
Association response–auto association–hetero association

Input
Pattern { O}
Input Output
Pattern Pattern

{• }
{ O •
} Distorte Rhomboi
d Square {X  8} d
Distorte Square
d Square
Auto association Hetero association
Classification response – classification - recognition

Input Class
Input Class
Pattern member
Pattern member
 1
  1 { X •
}  1
 { X •
}  1  
 

recognition
classification
Learning in Neural
Networks
Learn values of weights from I/O pairs
Start with random weights
Load training example’s input
Observe computed input
Modify weights to reduce difference
Iterate over all training examples
Terminate when weights stop changing OR when
error is very small
Single perceptron as
classifier
What is net doing while learning?
Consider two input one output network

x1 w1
O1=1 or 0
f
x2 w2
w1x1+w2x2>0 class 1 : o1=1
w1x1+w2x2<0 class 2 : o1=0
x2
x x
w1x1+w2x2=0
x x
x1
W WXT=0
o o
W is
perpendicular
o o
to X
In practical application line may not pass through
origin
Bias is used to achieve this
Bias also needs to train
Conclusion: Classes must be linearly separable
Example
x2
Construct ANN to solve AND gate

P X1 X2 D X

1 0 0 -1
2 0 1 -1
x1
3 1 0 -1
4 1 1 +1
x1 w1
O1=1 or -1
f
x2 w2
w0
Boundary
w1x1+w2x2-
-1 w0=0
world is not that simple…
x2
Ex-OR gate
X
P X1 X2 D

1 0 0 -1
2 0 1 +1 X
x1
3 1 0 +1
4 1 1 -1

Patterns are not linearly separable


Hidden transformation
x2 L1
L1: -2x1+x2-1/2=0
X L2: x1-x2-1/2=0
L2

X o1=sgn(-2x1+x2-1/2)
x1 o2=sgn( x1-x2-1/2)
Image space
Pattern Space Image Space Class

x1 x2 o1 o2 -

0 0 -1 -1 2 x1
-2 o1
0 1 1 -1 1 f
1
1 0 -1 1 1
1 o2
-1
1 1 -1 -1 2
x2 f
1/2
1/2

-1
Image Space
o2 o3>0

03=sgn(o1+o2+1)
o1

o1 1 o3
f
o2 1
-1
o3<0 o3=0 -1
Finally…
Pattern Space Image Space o1+o2+1 03 Class

x1 x2 o1 o2 - - -
0 0 -1 -1 -ve -1 2
0 1 1 -1 +ve +1 1
1 0 -1 1 +ve +1 1
1 1 -1 -1 -ve -1 2
Two Layer Network

o1
x1 -2 f 1

1 o3
f
1
o2 1
x2 -1 f -1
1/2
1/2
-1
-1
Learning Rules
General Rule: Weight vector increases in proportion to the
product of input X and learning signal r,
where r = r(wi,X,di) so
change in weight = [cr(wi,X,di)] X, C learning constant

x1
oi
: f
xn
w
r Learning signal
X generator di
c
1. Hebbian learning rule
In this rule learning signal r is,
r  f (WiT X )

Wi  cf (WiT X ) X -Unsupervised learning


-Initial weights random,
around zero

wij  cf (WiT X ) x j i=1,..m

wij  coi x j J= 1,..n


Perceptron learning rule
In this rule learning signal r, is
-Supervised learning
-Weights can be any
r  d i  oi value initially

Wi  c[di  sgn( Wi X )] X for i=1,..,m


T

wij  c[di  sgn( WiT X )]x j for j=1,..,n


Delta learning rule
In this rule learning signal r is,
r  [d i  f (WiT X )] f ' (WiT X )

f ' (Wi T X ) -Derivative of activation function

Weights can be initialized any any values randomly


Can be generalized for multilayer networks
Supervised training rule
Only valid for continuous activation function
Derived from least square error between oi and di
Other learning rules
Widrow Hoff learning rule –
Winner all take learning rule
Here, neuron with maximum
response due to X is found.
Weight are modified to match
that X.w
Error-back-Propagation algo.
Given P training pairs {z1,d1,,z2,d2,…,zp,dp},
where Zi is (Ix1) and di is (Kx1) and
I=1,2,…,P. Ith component of each Z 0, E has
i
q  1value
, p  1, E  0
max

–1 (bias). Hidden layers having output Y. Jth


Z  Z ,d  d
component of Y having value –1. Y: (Jx1) , O:
P P

y  f (V Z )
T
For j=1,2,…,J V is the col vector and j row of V.
j
th

(Kx1) W is :(KxJ), V is : (JxI)


j j

O  f (W Y )
T
For k=1,2,…K W is a col vector and k row of W.th

1. Weights are initialized to small random values.


K K k

E  0.5(d  o )  E 2

Choose k k For k=1,2,…,K


 : (kX1),  : ( JX 1)
o y

2. Choose patterns are random from training set.


Input
 0.5(d iso )(1presented and   0output
.5(1  y )  wis J=1,…J
computed.(
K
 o ) K=1,…,K
2 2
ok k k k yj j ok kj
k 1
for bipolar sigmoid activation function):
BPA continue…
5.Output layer weights are adjusted:
wkj  wkj   ok y j For K=1,…,K and j=1,…,J

v ji  v ji   yj zi For j=1,…,J and i=1,…I

6.Hidden layer
p  pweights
1, q  q  1are

adjusted:
E  0, p  1

7.If p<P then


OCR for 8x10 characters

10 10 10

8 8 8
NN are able to generalise
learning involves generating a partitioning of the input
space
for single layer network input space must be linearly
separable
what is the dimension of this input space?
how many points in the input space?
this network is binary(uses binary values)
networks may also be continuous
ALVINN
Drives 70 mph on a public highway

30 outputs
for steering
30x32 weights
4 hidden
into one out of
units
four hidden
30x32 pixels unit
as inputs
Stock market prediction

• “Technical trading” refers to trading based solely


on known statistical parameters; e.g. previous
price
• Neural networks have been used to attempt to
predict changes in prices
• Difficult to assess success since companies using
these techniques are reluctant to disclose
information
Best References
Introduction to Artificial Neural Networks: Jacek M Zurada
Introduction to Artificial Neural Networks : S Haykin
Matlab ANN Toolbox
Studgard university ANN simulator

Vous aimerez peut-être aussi