Académique Documents
Professionnel Documents
Culture Documents
com
Learning Processes
www.StudentRockStars.com
By
T.Sunil Chowdary
www.StudentRockStars.com
www.StudentRockStars.com
Learning
Learning is a process by which free parameters of NN are
adapted thru stimulation from environment
Sequence of Events
stimulatedby an environment
www.StudentRockStars.com
undergoes changes in its free parameters
responds in a new way to the environment
Learning Algorithm
prescribedsteps of process to make a system learn
ways to adjust synaptic weight of a neuron
No unique learning algorithms - kit of tools
www.StudentRockStars.com
adjustment of synaptic weights
Minimizing a cost function, E(n), or index of performance
1 2
E (n) = ek (n)
2
Also called instantaneous value of error energy
step-by-step adjustment until
system reaches steady state; synaptic weights are stabilized
Also called deltra rule, Widrow-Hoff rule
www.StudentRockStars.com
www.StudentRockStars.com
Error Correction Learning(II)
∆wkj(n) = ηek(n)xj(n)
η : rate of learning; learning-rate parameter
www.StudentRockStars.com
Criterionused for defining local neighbor
Learning rule applied to the training examples
www.StudentRockStars.com
www.StudentRockStars.com
Nearest Neighbor Rule
Cover and Hart
Examples are independent and identically distributed
The sample size N is infinitely large
then, error(NNR) < 2 * error(Bayesian rule)
www.StudentRockStars.com
Half of the information is contained in the Nearest Neighbor
www.StudentRockStars.com
www.StudentRockStars.com
Hebbian learning process
If two neurons on either side of a synapse connection are activated simultaneously,
then the strength of that synapse is increased.
If two neurons on either side of a synapse are activated asynchronously,
then the strength of that synapse is weakened or eliminated.
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
Hebbian Learning
If two neurons of a connection are activated
simultaneously(synchronously), then its strength is increased
asynchronously, then the strength is weakened or eliminated
Hebbian synapse
time
www.StudentRockStars.com
dependent
depend on exact time of occurrence of two signals
local
locally available information is used
interactive mechanism
learning is done by two signal interaction
conjunctional or correlational mechanism
cooccurrence of two signals
presynaptic &
Hebbian learning is found in Hippocampus postsynaptic signals
www.StudentRockStars.com
www.StudentRockStars.com
Math Model of Hebbian Modification
∆wkj(n) = F(yk(n), xj(n))
Hebb’s hypothesis
∆wkj(n) = ηyk(n)xj(n)
where η is rate of learning
also
www.StudentRockStars.com
called activity product rule
repeated application of x leads to exponential growth
j
Covariance hypothesis
∆wkj(n) = η(xj - x)(yk - y)
1. wkj is enhanced if xj > x and yk > y
2. wkj is depressed if (xj > x and yk < y ) or (xj < x and yk > y )
www.StudentRockStars.com
www.StudentRockStars.com
Competitive NN
Feedforward
connection
is excitatory
www.StudentRockStars.com Lateral
connection ( )
is inhibitory
- lateral inhibition
www.StudentRockStars.com
www.StudentRockStars.com
Competitive Learning
Output signal
1 if vk > vj for all j, j ≠ k
yk =
0 otherwise
www.StudentRockStars.com
Wkj denotes the synaptic weight
∑wj
kj
= 1 for all k
www.StudentRockStars.com
www.StudentRockStars.com
Visible neurons and hidden neurons
energy function 1
E = − ∑∑ wkj xk x j
2 j k
j≠k
j ≠ k means no self feedback
www.StudentRockStars.com
www.StudentRockStars.com
Boltzman Learning Rule
Let ρ +kj denote the correlation between the states of
neurons j and k with its clamped condition
ρ kj+ = ∑∑ p( X
xα ∈ℑ x β
β = xβ | Xα = xα ) x j xi
www.StudentRockStars.com
−
Let ρ kj denote the correlation between the states of
neurons j and k with its free-running condition
ρ −kj = ∑∑ p( X
xα ∈ℑ x
= x) x j xi
where η is a learning-rate
www.StudentRockStars.com
www.StudentRockStars.com
Credit-Assignment Problem
Problems of assigning credit or blame for overall outcomes
to each of internal decisions
Two sub-problems
assignment of credit to actions
www.StudentRockStars.com
temporal credit assignment problem : time instance of action
many actions are taken; which action is responsible the outcome
assignment of credit to internal decisions
structural credit assignment problem : internal structure of action
multiple components are contributed; which one do best
www.StudentRockStars.com
www.StudentRockStars.com
Learning with Teacher
Supervised learning
Teacher has knowledge of environment to learn
input and desired output pairs are given as a training set
Parameters are adjusted based on error signal step-by-step
www.StudentRockStars.com
System performance measure
mean-square-error
sumof squared errors over the training sample
visualized as error surface with parameter as coordinates
Move toward to a minimum point of error surface
may not be a global minimum
use gradient of error surface - direction of steepest descent
www.StudentRockStars.com
since critics may be given at the time of final output
Primary reinforcement
Environment Critics
Heuristic
reinforcement
Learning
Systems
www.StudentRockStars.com
www.StudentRockStars.com
Unsupervised Learning
Self-organized learning
No external teacher or critics
Task-independent measure of quality is required to learn
Network parameters are optimized with respect to the measure
www.StudentRockStars.com
competitive learning rule is a case of unsupervised learning
www.StudentRockStars.com
www.StudentRockStars.com
Learning Tasks
Pattern Association
Pattern Recognition
Function Approximation
Control
www.StudentRockStars.com
Filtering
Beamforming
www.StudentRockStars.com
www.StudentRockStars.com
Pattern Association
Associative memory is distributed memory that learns by
association
predominant features of human memory
xk → yk, k=1, 2,…, q
www.StudentRockStars.com
Storage capacity, q
Storage phase and Recall phase
NN is required to store a set of patterns by repeatedly presenting
partial description or distorted pattern is used to retrieve the
particular pattern
xk is act as a stimulus that determines location of memorized pattern
Autoassociation : when xk = yk,
Hetroassociation : when xk ≠ yk,
Have q as large as possible yet recall correctly
www.StudentRockStars.com
www.StudentRockStars.com
Pattern Recognition
Process whereby a received pattern is assigned to one of
prescribed classes (categories)
Pattern recognition by NN is statistical in nature
patternis a point in a multidimensional decision space
www.StudentRockStars.com
decision boundaries - region associated with a class
decision boundaries are determined by training
Input Outout
Feature Extraction Classification
pattern class
Feature
unsupervised vector supervised
r-D
www.StudentRockStars.comq-D
m-D
www.StudentRockStars.com
Function Approximation
Given set of labeled examples ℑ = {( x i , d i )}i =1
N
drawn from d = f ( x) ,
find F ( x) which approximates unknown function f ( x)
s.t. || F ( x) − f ( x) || < ε for all x
www.StudentRockStars.com
Supervised learning is good for this task
System identification di
construct an inverse system Unknown
system ei
input Σ
NN yi
model
www.StudentRockStars.com
www.StudentRockStars.com
Control
Feedback control system
Reference e u
signal Σ Controller Plant y
d
www.StudentRockStars.com
search of Jacobian matrix ∂y k
J=
∂u j
for error correction learning
indirect learning
using input-output measurement on the plant, construct NN
model
the NN model is used to estimate Jacobian
direct learning
www.StudentRockStars.com
www.StudentRockStars.com
Filtering
Extract information from a set of noisy data
information processing tasks
filtering
using data measured up to and including time n
www.StudentRockStars.com
smoothing
using data measured up to and after time n
prediction
predict at n+n0 ( n0 > 0) using data measured up to time n
www.StudentRockStars.com
www.StudentRockStars.com
Blind source separation
x1(n)
u1(n) y1(n)
Unknown
mixer Demixer
www.StudentRockStars.com
...
...
...
A W
um(n) ym(n)
xm(n)
Unknown environment
www.StudentRockStars.com
www.StudentRockStars.com
Memory
Relatively enduring neural alterations induced by interaction
with environment - neurobiological definition
accessible to influence future behavior
activity pattern is stored by learning process
www.StudentRockStars.com
memory and learning are connected
Short-term memory
compilation of knowledge representing current state of environment
Long-term memory
knowledge stored for a long time or permanently
www.StudentRockStars.com
www.StudentRockStars.com
Associative memory
Memory is distributed
stimulus pattern and response pattern consist of data vectors
information is stored as spatial patterns of neural activities
information of stimulus pattern determines not only stage
www.StudentRockStars.com
location but also address for its retrieval
although neurons is not reliable, memory is
there may be interaction between patterns stored. (not a
simple storage of patterns) There is possibility of error in
recall process
www.StudentRockStars.com
www.StudentRockStars.com
Correlation Matrix Memory
Association of key vector xk with memorized vector yk
yk = W(k)xk, k = 1, 2, 3, …, q
Total experience
q
M = ∑ W (k )
www.StudentRockStars.com
k =1
M = M + W(k) , k = 1, 2, ..., q
k k -1
www.StudentRockStars.com
www.StudentRockStars.com
Correlation Matrix memory
Learning : Estimate of memory
^ q
matrix M M = ∑ yk xkT
calledOuter Product Rule k =1
generalization of Hebb’s
postulate of learning
www.StudentRockStars.com x1T
T
[ ]
x2
^
M = y1 , y 2 , ..., yq = YX T
T
x q
^ ^
M k = M k −1 + y k xTk , k = 1, 2, ..., q
www.StudentRockStars.com
www.StudentRockStars.com
Recall
^
y =Mxj
http://www.StudentRockStars.com
m m
y = ∑ y k x x j = ∑ (xTk x j ) y k
T
k
Normalize to have unit matrix k =1 k =1
www.StudentRockStars.com
m
= (x x )y + ∑ (x x )y T
j j j
T
k j k
k =1, k ≠ j
V defined as j
m
Noise Vector
=yj + ∑ (x
k =1, k ≠ j
T
k x j )y k
xTk x j
normalized to have unit matrix
cos( x k , x j ) =
|| x k || || x k ||
= xTk x j
www.StudentRockStars.com
www.StudentRockStars.com
Orthogonal set
m
vj = ∑ cos(x
k =1, k ≠1
k , x j )y k
www.StudentRockStars.com
1, k = j
x x = T
0, k ≠ j
k j
Storage Capacity
rank : # of independent
columns
limited by dimensionality of
input
www.StudentRockStars.com
www.StudentRockStars.com
Error of Associative Memory
Real-life is Neither orthogonal nor highly separated
Lower bound γ
xTk x j ≥ γ for k ≠ j
www.StudentRockStars.com
if γ is large, the memory may fail to distinguish
memory act as having patterns of never seen before
termed animal logic
http://www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
Adaptation
Space and time : fundamental dimension
spatiotemporal nature of learning
Stationary system
learned parameters are frozen to be used later
www.StudentRockStars.com
Nonstationary system
continually adapt its parameters to variations in the incoming signal
in a real-time fashion
continuous learning; learning-on-the-fly
Psudostationary
consider stationary for a short time window
speech signal : 10~30 msec
weather forcasting : period of minutes
http://www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
Statistical Nature of Learning Process
www.StudentRockStars.com
X is random vector consisting of a set of independent
variables
it is denoted by
it is denoted by
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
Statistical Nature of Learning Process
ℑ = {(x i , d i )}i =1
n
Training sample denoted by
functional relationship between X and D : D = f (X) + ε
deterministic function of Random variable X
www.StudentRockStars.com
expectation error
random
variable
called regressive model
E[ε | x] = 0
Mean value is zero f ( X) = E[ D | x]
f ( x) = E ( D | X = x)
error is uncorrelated (principles of orthogonality)
E[ε f (x)] = 0
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
Symbol is 1
denoted average operator Ξ(w ) = Eℑ[( d − F (x, ℑ)) 2
taking over the entire
training sample
2
www.StudentRockStars.com
www.StudentRockStars.com
http://www.StudentRockStars.com
http://www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
Minimizing Cost Function
1 N
Ξ(w ) = ∑ (d i − F (x i , w )) 2
2 i =1
1
Eℑ is averaging Ξ( w ) = Eℑ [( d − F (x, ℑ)) 2
2
www.StudentRockStars.com
operator
1 1
Ξ(w ) = Eℑ [ε 2 ] + Eℑ [( f (x) − F (x, ℑ)) 2 ]
2 2