Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge

Artificial Neural Networks
System that can acquire, store, and utilize

experiential knowledge
Brain and Machine
• The Brain
– Pattern Recognition
– Association
– Complexity
– Noise Tolerance
• The Machine
– Calculation
– Precision
– Logic
Overview
The Brain
Brain vs.. Computers
The Perceptron
Multi-layer networks
Some Applications
The contrast in architecture
• The Von Neumann
architecture uses a single
processing unit;
– Tens of millions of operations
per second
– Absolute arithmetic precision
• The brain uses many slow

unreliable processors
acting in parallel
Features of the Brain
• Ten billion (1010) neurons

• On average, several thousand connections
• Hundreds of operations per second
• Die off frequently (never replaced)
• Compensates for problems by massive
parallelism
The Structure of Neurons
synapse axon
nucleus
cell body
dendrites
A neuron has a cell body, a branching input
structure (the dendrIte) and a branching output
structure (the axOn)
• Axons connect to dendrites via synapses

• Electro-chemical signals are propagated from the
dendritic input, through the cell body, and down
the axon to other neurons
• A neuron only fires if its input signal exceeds a
certain amount (the threshold) in a short time
period
• Synapses vary in strength
– Good connections allowing a large signal
– Slight connections allow only a weak signal
– Synapses can be either excitatory or inhibitory
The Artificial Neuron
(Perceptron)
ao+1
wj0
a1 wj1
wj2
a2
Sj f(Sj) Xj
wjn
an
General Symbols
Synaptic
w1 Connections
x1
w2
x2
w3 O
x3 f(wx)
wn Neuron’s
xn
processing node
Multiplicative Weights
W=[ w1,w2,…wn ]T
X=[x1,x2,…xn]T
O  f (W T X )
 n 
O  f   wi xi 
 i 1 
Activation functions
Continuous model – Bipolar - sigmoid
2
f (net ) 
1  exp( net )
1 -Bipolar continuous
Continuous model – Unipolar - sigmoid
1
f (net )  -Unipolar continuous
1  exp( net )
Discrete model – Bipolar -hard limit
f (net )  sgn( net ) +1 if net>0

-1 if net <0
Discrete model – Unipolar – hard limit
f (net )  sgn( net ) 1 if net >0

0 if net <0
• Transforms neuron’s input into output
• Features of activation functions:
• A squashing effect is required
• Prevents accelerating growth of activation levels
through the network
• Simple and easy to calculate
Learning
Learning in network is to find W – weights that best
approximate training patterns
Supervised learning
- like class room teaching
- Training set is needed
- Teacher estimates negative error gradient direction and
reduces error accordingly
Unsupervised learning
- Like video lecture
- No feedback for error
- Used for clustering
Supervised/Unsupervised
x O
Neural Network
Learning Signal d
Generator
Supervised Learning
x O
Neural Network
Unsupervised Learning
Supervised Vs
Unsupervised
Training and test data sets
Training set; input & target
Function approx Vs Grading

Generalization
The network is said to generalized well when it sensibly
interpolates input patterns that are new to the network
Training data
Y • • •
• •
• • •
• • •
• •
• •
xT X
Neuro Processing
Association response–auto association–hetero association
Input
Pattern { O}
Input Output
Pattern Pattern

{• }
{ O •
} Distorte Rhomboi
d Square {X  8} d
Distorte Square
d Square
Auto association Hetero association
Classification response – classification - recognition
Input Class
Input Class
Pattern member
Pattern member
 1
  1 { X •
}  1
 { X •
}  1  
 
recognition
classification
Learning in Neural
Networks
Learn values of weights from I/O pairs
Start with random weights
Load training example’s input
Observe computed input
Modify weights to reduce difference
Iterate over all training examples
Terminate when weights stop changing OR when
error is very small
Single perceptron as
classifier
What is net doing while learning?
Consider two input one output network
x1 w1
O1=1 or 0
f
x2 w2
w1x1+w2x2>0 class 1 : o1=1
w1x1+w2x2<0 class 2 : o1=0
x2
x x
w1x1+w2x2=0
x x
x1
W WXT=0
o o
W is
perpendicular
o o
to X
In practical application line may not pass through
origin
Bias is used to achieve this
Bias also needs to train
Conclusion: Classes must be linearly separable
Example
x2
Construct ANN to solve AND gate
P X1 X2 D X
1 0 0 -1
2 0 1 -1
x1
3 1 0 -1
4 1 1 +1
x1 w1
O1=1 or -1
f
x2 w2
w0
Boundary
w1x1+w2x2-
-1 w0=0
world is not that simple…
x2
Ex-OR gate
X
P X1 X2 D
1 0 0 -1
2 0 1 +1 X
x1
3 1 0 +1
4 1 1 -1
Patterns are not linearly separable

Hidden transformation
x2 L1
L1: -2x1+x2-1/2=0
X L2: x1-x2-1/2=0
L2
X o1=sgn(-2x1+x2-1/2)
x1 o2=sgn( x1-x2-1/2)
Image space
Pattern Space Image Space Class
x1 x2 o1 o2 -
0 0 -1 -1 2 x1
-2 o1
0 1 1 -1 1 f
1
1 0 -1 1 1
1 o2
-1
1 1 -1 -1 2
x2 f
1/2
1/2
-1
Image Space
o2 o3>0
03=sgn(o1+o2+1)
o1
o1 1 o3
f
o2 1
-1
o3<0 o3=0 -1
Finally…
Pattern Space Image Space o1+o2+1 03 Class
x1 x2 o1 o2 - - -
0 0 -1 -1 -ve -1 2
0 1 1 -1 +ve +1 1
1 0 -1 1 +ve +1 1
1 1 -1 -1 -ve -1 2
Two Layer Network
o1
x1 -2 f 1
1 o3
f
1
o2 1
x2 -1 f -1
1/2
1/2
-1
-1
Learning Rules
General Rule: Weight vector increases in proportion to the
product of input X and learning signal r,
where r = r(wi,X,di) so
change in weight = [cr(wi,X,di)] X, C learning constant
x1
oi
: f
xn
w
r Learning signal
X generator di
c
1. Hebbian learning rule
In this rule learning signal r is,
r  f (WiT X )
Wi  cf (WiT X ) X -Unsupervised learning

-Initial weights random,
around zero
wij  cf (WiT X ) x j i=1,..m
wij  coi x j J= 1,..n

Perceptron learning rule
In this rule learning signal r, is
-Supervised learning
-Weights can be any
r  d i  oi value initially
Wi  c[di  sgn( Wi X )] X for i=1,..,m

T
wij  c[di  sgn( WiT X )]x j for j=1,..,n

Delta learning rule
In this rule learning signal r is,
r  [d i  f (WiT X )] f ' (WiT X )
f ' (Wi T X ) -Derivative of activation function
Weights can be initialized any any values randomly

Can be generalized for multilayer networks
Supervised training rule
Only valid for continuous activation function
Derived from least square error between oi and di
Other learning rules
Widrow Hoff learning rule –
Winner all take learning rule
Here, neuron with maximum
response due to X is found.
Weight are modified to match
that X.w
Error-back-Propagation algo.
Given P training pairs {z1,d1,,z2,d2,…,zp,dp},
where Zi is (Ix1) and di is (Kx1) and
I=1,2,…,P. Ith component of each Z 0, E has
i
q  1value
, p  1, E  0
max
–1 (bias). Hidden layers having output Y. Jth

Z  Z ,d  d
component of Y having value –1. Y: (Jx1) , O:
P P
y  f (V Z )
T
For j=1,2,…,J V is the col vector and j row of V.
j
th
(Kx1) W is :(KxJ), V is : (JxI)

j j
O  f (W Y )
T
For k=1,2,…K W is a col vector and k row of W.th
1. Weights are initialized to small random values.

K K k
E  0.5(d  o )  E 2
Choose k k For k=1,2,…,K

 : (kX1),  : ( JX 1)
o y
2. Choose patterns are random from training set.

Input
 0.5(d iso )(1presented and   0output
.5(1  y )  wis J=1,…J
computed.(
K
 o ) K=1,…,K
2 2
ok k k k yj j ok kj
k 1
for bipolar sigmoid activation function):
BPA continue…
5.Output layer weights are adjusted:
wkj  wkj   ok y j For K=1,…,K and j=1,…,J
v ji  v ji   yj zi For j=1,…,J and i=1,…I
6.Hidden layer
p  pweights
1, q  q  1are
adjusted:
E  0, p  1
7.If p<P then

OCR for 8x10 characters
10 10 10
8 8 8
NN are able to generalise
learning involves generating a partitioning of the input
space
for single layer network input space must be linearly
separable
what is the dimension of this input space?
how many points in the input space?
this network is binary(uses binary values)
networks may also be continuous
ALVINN
Drives 70 mph on a public highway
30 outputs
for steering
30x32 weights
4 hidden
into one out of
units
four hidden
30x32 pixels unit
as inputs
Stock market prediction
• “Technical trading” refers to trading based solely

on known statistical parameters; e.g. previous
price
• Neural networks have been used to attempt to
predict changes in prices
• Difficult to assess success since companies using
these techniques are reluctant to disclose
information
Best References
Introduction to Artificial Neural Networks: Jacek M Zurada
Introduction to Artificial Neural Networks : S Haykin
Matlab ANN Toolbox
Studgard university ANN simulator

Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge

Transféré par

Droits d'auteur :

Formats disponibles

Artificial Neural Networks

System that can acquire, store, and utilize

• The brain uses many slow

• Ten billion (1010) neurons

• Axons connect to dendrites via synapses

Continuous model – Unipolar - sigmoid

f (net )  sgn( net ) +1 if net>0

Discrete model – Unipolar – hard limit

f (net )  sgn( net ) 1 if net >0

Function approx Vs Grading

Patterns are not linearly separable

Wi  cf (WiT X ) X -Unsupervised learning

wij  cf (WiT X ) x j i=1,..m

wij  coi x j J= 1,..n

Wi  c[di  sgn( Wi X )] X for i=1,..,m

wij  c[di  sgn( WiT X )]x j for j=1,..,n

f ' (Wi T X ) -Derivative of activation function

Weights can be initialized any any values randomly

–1 (bias). Hidden layers having output Y. Jth

(Kx1) W is :(KxJ), V is : (JxI)

1. Weights are initialized to small random values.

Choose k k For k=1,2,…,K

2. Choose patterns are random from training set.

v ji  v ji   yj zi For j=1,…,J and i=1,…I

7.If p<P then

• “Technical trading” refers to trading based solely

Vous aimerez peut-être aussi