Vous êtes sur la page 1sur 44

www.StudentRockStars.

com

Learning Processes

www.StudentRockStars.com
By

T.Sunil Chowdary
www.StudentRockStars.com
www.StudentRockStars.com
Learning
 Learning is a process by which free parameters of NN are
adapted thru stimulation from environment
 Sequence of Events
 stimulatedby an environment

www.StudentRockStars.com
 undergoes changes in its free parameters
 responds in a new way to the environment

 Learning Algorithm
 prescribedsteps of process to make a system learn
 ways to adjust synaptic weight of a neuron
 No unique learning algorithms - kit of tools

 The Chapter covers


 fivelearning rules, learning paradigms, issues of learning task
 probabilistic and statistical aspect of learning
www.StudentRockStars.com
www.StudentRockStars.com
Error Correction Learning(I)
 Error signal, ek(n)
ek(n) = dk(n) - yk(n)
where n denotes time step
 Error signal activates a control mechanism for corrective


www.StudentRockStars.com
adjustment of synaptic weights
Minimizing a cost function, E(n), or index of performance
1 2
E (n) = ek (n)
2
 Also called instantaneous value of error energy
 step-by-step adjustment until
 system reaches steady state; synaptic weights are stabilized
 Also called deltra rule, Widrow-Hoff rule
www.StudentRockStars.com
www.StudentRockStars.com
Error Correction Learning(II)
∆wkj(n) = ηek(n)xj(n)
 η : rate of learning; learning-rate parameter

wkj(n+1) = wkj(n) + ∆wkj(n)


www.StudentRockStars.com
wkj(n) = Z-1[wkj(n+1) ]

 Z-1 is unit-delay operator


 adjustment is proportioned to the product of error signal
and input signal
 error-correction learning is local
 The learning rate η determines the stability or convergence
www.StudentRockStars.com
www.StudentRockStars.com
Memory-based Learning
 Past experiences are stored in memory of correctly
classified input-output examples
 retrieve and analyze “local neighborhood”
 Essential Ingredient

www.StudentRockStars.com
 Criterionused for defining local neighbor
 Learning rule applied to the training examples

 Nearest Neighbor Rule (NNR)


 the vector X’n ∈ { X1, X2, …,XN } is the nearest neighbor of Xtest if
) = d(X
'
mini
d(X i, X test N
, X test
)

 X’n is the class of Xtest

www.StudentRockStars.com
www.StudentRockStars.com
Nearest Neighbor Rule
 Cover and Hart
 Examples are independent and identically distributed
 The sample size N is infinitely large
then, error(NNR) < 2 * error(Bayesian rule)


www.StudentRockStars.com
 Half of the information is contained in the Nearest Neighbor

K-nearest Neighbor rule


 variant of NNR
 Select k-nearest neighbors of Xtest and use a majority vote
 act like averaging device
 discriminate against outlier

 Radial-basis function network is a memory-based classifier

www.StudentRockStars.com
www.StudentRockStars.com
Hebbian learning process
 If two neurons on either side of a synapse connection are activated simultaneously,
then the strength of that synapse is increased.
 If two neurons on either side of a synapse are activated asynchronously,
then the strength of that synapse is weakened or eliminated.

www.StudentRockStars.com

www.StudentRockStars.com
www.StudentRockStars.com
Hebbian Learning
 If two neurons of a connection are activated
 simultaneously(synchronously), then its strength is increased
 asynchronously, then the strength is weakened or eliminated

 Hebbian synapse
 time
www.StudentRockStars.com
dependent
 depend on exact time of occurrence of two signals
 local
 locally available information is used
 interactive mechanism
 learning is done by two signal interaction
 conjunctional or correlational mechanism
 cooccurrence of two signals
presynaptic &
 Hebbian learning is found in Hippocampus postsynaptic signals
www.StudentRockStars.com
www.StudentRockStars.com
Math Model of Hebbian Modification
∆wkj(n) = F(yk(n), xj(n))
 Hebb’s hypothesis
∆wkj(n) = ηyk(n)xj(n)
where η is rate of learning
 also

www.StudentRockStars.com
called activity product rule
repeated application of x leads to exponential growth
j

 Covariance hypothesis
∆wkj(n) = η(xj - x)(yk - y)
1. wkj is enhanced if xj > x and yk > y
2. wkj is depressed if (xj > x and yk < y ) or (xj < x and yk > y )

where x and y are time-average


www.StudentRockStars.com
www.StudentRockStars.com
Competitive Learning
 Output neurons of NN compete to became active
 Only single neuron is active at any one time
 salient feature for pattern classification
 Basic Elements
A
www.StudentRockStars.com
set of neurons that are all same except synaptic weight distribution
 responds differently to a given set of input pattern
 A Limits on the strength of each neuron
 A mechanism to compete to respond to a given input
 winner-takes-all

 Neurons learn to respond specialized conditions


 become feature detectors

www.StudentRockStars.com
www.StudentRockStars.com
Competitive NN

Feedforward
connection
is excitatory

www.StudentRockStars.com Lateral
connection ( )
is inhibitory
- lateral inhibition

layer of Single layer of


source Input output neurons

www.StudentRockStars.com
www.StudentRockStars.com
Competitive Learning
Output signal
1 if vk > vj for all j, j ≠ k
yk = 
0 otherwise


www.StudentRockStars.com
Wkj denotes the synaptic weight

∑wj
kj
= 1 for all k

 Competitive Learning Rule


η (x j − wkj ) if neuron k wins the competitio n
∆wkj = 
 0 if neuron k loses the competitio n
 If neuron does not respond to a particular input, no learning
takes place
www.StudentRockStars.com
www.StudentRockStars.com
Competitive Learning
2
 x has some constant Euclidean length and ∑ kj
w = 1 for all k
j

www.StudentRockStars.com

 perform clustering thru competitive learning


www.StudentRockStars.com
www.StudentRockStars.com
Boltzman Learning
 Rooted from statistical mechanics
 Boltzman Machine : NN on the basis of Boltzman learning
 The neurons constitute a recurrent structure
 operate in binary manner: “on”: +1 and “off”: -1

www.StudentRockStars.com
 Visible neurons and hidden neurons
 energy function 1
E = − ∑∑ wkj xk x j
2 j k
j≠k
 j ≠ k means no self feedback

 Goal of Boltzman Learning is to maximize likelihood function


 set of synaptic weights is called a model of the environment if it
leads the same probability distribution of the states of visible units
www.StudentRockStars.com
www.StudentRockStars.com
Boltzman Machine Operation
 choosing a neuron at random, k, then flip the state of the
neuron from state xk to state -xk with probability
1
P ( xk → − xk ) =
− ∆Ek
1 + exp( )
www.StudentRockStars.com T

where ∆E is energy change resulting from such a flip


k

 If this rule applied repeatedly, the machine reaches thermal


equilibrium.
 Two modes of operation
 Clamped condition : visible neurons are clamped onto specific states
 Free-running condition: all neurons are allowed to operate freely

www.StudentRockStars.com
www.StudentRockStars.com
Boltzman Learning Rule
 Let ρ +kj denote the correlation between the states of
neurons j and k with its clamped condition
ρ kj+ = ∑∑ p( X
xα ∈ℑ x β
β = xβ | Xα = xα ) x j xi


www.StudentRockStars.com

Let ρ kj denote the correlation between the states of
neurons j and k with its free-running condition
ρ −kj = ∑∑ p( X
xα ∈ℑ x
= x) x j xi

 Boltzman Learning Rule (Hinton and Sejnowski 86)

∆wkj = η (ρ kj+ − ρ −kj ), j≠ k

where η is a learning-rate
www.StudentRockStars.com
www.StudentRockStars.com
Credit-Assignment Problem
 Problems of assigning credit or blame for overall outcomes
to each of internal decisions
 Two sub-problems
 assignment of credit to actions

www.StudentRockStars.com
 temporal credit assignment problem : time instance of action
 many actions are taken; which action is responsible the outcome
 assignment of credit to internal decisions
 structural credit assignment problem : internal structure of action
 multiple components are contributed; which one do best

 CAP occurs in Error-correction learning


 inmultiple layer feed-forward NN
 How do we credit or blame for the actions of hidden neurons ?

www.StudentRockStars.com
www.StudentRockStars.com
Learning with Teacher
 Supervised learning
 Teacher has knowledge of environment to learn
 input and desired output pairs are given as a training set
 Parameters are adjusted based on error signal step-by-step
 www.StudentRockStars.com
System performance measure
 mean-square-error
 sumof squared errors over the training sample
 visualized as error surface with parameter as coordinates
 Move toward to a minimum point of error surface
 may not be a global minimum
 use gradient of error surface - direction of steepest descent

 Good for pattern recognition and function approximation


www.StudentRockStars.com
www.StudentRockStars.com
Learning without Teacher
 Reinforcement learning
 Noteacher to provide direct (desired) response at each step
 example : good/bad, win/loose
 must solve temporal credit assignment problem

www.StudentRockStars.com
 since critics may be given at the time of final output

Primary reinforcement

Environment Critics
Heuristic
reinforcement
Learning
Systems

www.StudentRockStars.com
www.StudentRockStars.com
Unsupervised Learning
 Self-organized learning
 No external teacher or critics
 Task-independent measure of quality is required to learn
 Network parameters are optimized with respect to the measure

www.StudentRockStars.com
 competitive learning rule is a case of unsupervised learning

www.StudentRockStars.com
www.StudentRockStars.com
Learning Tasks
 Pattern Association
 Pattern Recognition
 Function Approximation
 Control
 www.StudentRockStars.com
Filtering
 Beamforming

www.StudentRockStars.com
www.StudentRockStars.com
Pattern Association
 Associative memory is distributed memory that learns by
association
 predominant features of human memory
 xk → yk, k=1, 2,…, q


www.StudentRockStars.com
Storage capacity, q
Storage phase and Recall phase
 NN is required to store a set of patterns by repeatedly presenting
 partial description or distorted pattern is used to retrieve the
particular pattern
 xk is act as a stimulus that determines location of memorized pattern
 Autoassociation : when xk = yk,
 Hetroassociation : when xk ≠ yk,
 Have q as large as possible yet recall correctly
www.StudentRockStars.com
www.StudentRockStars.com
Pattern Recognition
 Process whereby a received pattern is assigned to one of
prescribed classes (categories)
 Pattern recognition by NN is statistical in nature
 patternis a point in a multidimensional decision space

www.StudentRockStars.com
 decision boundaries - region associated with a class
 decision boundaries are determined by training

Input Outout
Feature Extraction Classification
pattern class

Feature
unsupervised vector supervised
r-D
www.StudentRockStars.comq-D
m-D
www.StudentRockStars.com
Function Approximation
Given set of labeled examples ℑ = {( x i , d i )}i =1
N

drawn from d = f ( x) ,
find F ( x) which approximates unknown function f ( x)
s.t. || F ( x) − f ( x) || < ε for all x
www.StudentRockStars.com
 Supervised learning is good for this task
 System identification di
 construct an inverse system Unknown
system ei
input Σ
NN yi
model

www.StudentRockStars.com
www.StudentRockStars.com
Control
 Feedback control system
Reference e u
signal Σ Controller Plant y
d

www.StudentRockStars.com
 search of Jacobian matrix  ∂y k 
 
J= 
 ∂u j 
 for error correction learning
 
 indirect learning
 using input-output measurement on the plant, construct NN
model
 the NN model is used to estimate Jacobian
 direct learning

www.StudentRockStars.com
www.StudentRockStars.com
Filtering
 Extract information from a set of noisy data
 information processing tasks
 filtering
 using data measured up to and including time n

www.StudentRockStars.com
 smoothing
 using data measured up to and after time n
 prediction
 predict at n+n0 ( n0 > 0) using data measured up to time n

 Cocktail party problem


 focus on a speaker in the noisy environment
 believed that preattentive, preconscious analysis involved

www.StudentRockStars.com
www.StudentRockStars.com
Blind source separation

x1(n)
u1(n) y1(n)
Unknown
mixer Demixer
www.StudentRockStars.com
...

...

...
A W
um(n) ym(n)
xm(n)

Unknown environment

www.StudentRockStars.com
www.StudentRockStars.com
Memory
 Relatively enduring neural alterations induced by interaction
with environment - neurobiological definition
 accessible to influence future behavior
 activity pattern is stored by learning process


www.StudentRockStars.com
 memory and learning are connected
Short-term memory
 compilation of knowledge representing current state of environment
 Long-term memory
 knowledge stored for a long time or permanently

www.StudentRockStars.com
www.StudentRockStars.com
Associative memory
 Memory is distributed
 stimulus pattern and response pattern consist of data vectors
 information is stored as spatial patterns of neural activities
 information of stimulus pattern determines not only stage
www.StudentRockStars.com
location but also address for its retrieval
 although neurons is not reliable, memory is
 there may be interaction between patterns stored. (not a
simple storage of patterns) There is possibility of error in
recall process

www.StudentRockStars.com
www.StudentRockStars.com
Correlation Matrix Memory
 Association of key vector xk with memorized vector yk
yk = W(k)xk, k = 1, 2, 3, …, q
 Total experience
q
M = ∑ W (k )
www.StudentRockStars.com
k =1

M = M + W(k) , k = 1, 2, ..., q
k k -1

 Adding W(k) to Mk-1 loses distinct identity of mixture of


contributions that form Mk
 but information about stimuli is not lost

www.StudentRockStars.com
www.StudentRockStars.com
Correlation Matrix memory
 Learning : Estimate of memory
^ q
matrix M M = ∑ yk xkT
 calledOuter Product Rule k =1
 generalization of Hebb’s
postulate of learning
www.StudentRockStars.com x1T 
 T
[ ]
 x2 
^
M = y1 , y 2 , ..., yq = YX T
 
 T
x q 

^ ^
M k = M k −1 + y k xTk , k = 1, 2, ..., q

www.StudentRockStars.com
www.StudentRockStars.com

Recall
^
y =Mxj
http://www.StudentRockStars.com
m m
y = ∑ y k x x j = ∑ (xTk x j ) y k
T
k
Normalize to have unit matrix k =1 k =1

www.StudentRockStars.com
m
= (x x )y + ∑ (x x )y T
j j j
T
k j k
k =1, k ≠ j
V defined as j
m
Noise Vector
=yj + ∑ (x
k =1, k ≠ j
T
k x j )y k

xTk x j
normalized to have unit matrix
cos( x k , x j ) =
|| x k || || x k ||
= xTk x j
www.StudentRockStars.com
www.StudentRockStars.com
Orthogonal set
m
vj = ∑ cos(x
k =1, k ≠1
k , x j )y k

orthogonal set cos( x k , x j ) = 0, k≠ j

www.StudentRockStars.com
1, k = j
x x = T

0, k ≠ j
k j

 Storage Capacity
 rank : # of independent
columns
 limited by dimensionality of
input

www.StudentRockStars.com
www.StudentRockStars.com
Error of Associative Memory
 Real-life is Neither orthogonal nor highly separated
 Lower bound γ
xTk x j ≥ γ for k ≠ j

www.StudentRockStars.com
 if γ is large, the memory may fail to distinguish
 memory act as having patterns of never seen before
 termed animal logic

http://www.StudentRockStars.com

www.StudentRockStars.com
www.StudentRockStars.com
Adaptation
 Space and time : fundamental dimension
 spatiotemporal nature of learning
 Stationary system
 learned parameters are frozen to be used later

www.StudentRockStars.com
Nonstationary system
 continually adapt its parameters to variations in the incoming signal
in a real-time fashion
 continuous learning; learning-on-the-fly

 Psudostationary
 consider stationary for a short time window
 speech signal : 10~30 msec
 weather forcasting : period of minutes
http://www.StudentRockStars.com
www.StudentRockStars.com
www.StudentRockStars.com
Statistical Nature of Learning Process

www.StudentRockStars.com
X is random vector consisting of a set of independent
variables

it is denoted by

D is random scalar consisting of a set of dependent variables

it is denoted by
www.StudentRockStars.com
www.StudentRockStars.com

www.StudentRockStars.com

www.StudentRockStars.com
www.StudentRockStars.com
Statistical Nature of Learning Process

ℑ = {(x i , d i )}i =1
n
 Training sample denoted by
 functional relationship between X and D : D = f (X) + ε
 deterministic function of Random variable X

www.StudentRockStars.com
 expectation error
 random
variable
 called regressive model
E[ε | x] = 0
 Mean value is zero f ( X) = E[ D | x]
f ( x) = E ( D | X = x)
 error is uncorrelated (principles of orthogonality)

E[ε f (x)] = 0
www.StudentRockStars.com
www.StudentRockStars.com

www.StudentRockStars.com

www.StudentRockStars.com
www.StudentRockStars.com

 Cost function is equal to


1N
the 1/2 squared difference
between the desired Ξ(w ) = ∑ (di − F (xi , w )) 2

response “d” and the actual 2 i =1


response “y” of the neural
www.StudentRockStars.com
network

 Symbol is 1
denoted average operator Ξ(w ) = Eℑ[( d − F (x, ℑ)) 2
taking over the entire
training sample
2

www.StudentRockStars.com
www.StudentRockStars.com
http://www.StudentRockStars.com
 http://www.StudentRockStars.com

www.StudentRockStars.com

www.StudentRockStars.com
www.StudentRockStars.com

www.StudentRockStars.com

www.StudentRockStars.com
www.StudentRockStars.com
Minimizing Cost Function

1 N
Ξ(w ) = ∑ (d i − F (x i , w )) 2
2 i =1
1
Eℑ is averaging Ξ( w ) = Eℑ [( d − F (x, ℑ)) 2
2
www.StudentRockStars.com
operator
1 1
Ξ(w ) = Eℑ [ε 2 ] + Eℑ [( f (x) − F (x, ℑ)) 2 ]
2 2

natural measure of Lav ( f (x) − F (x, w )) = Eℑ [( f (x) − F (x, ℑ)) 2 ]


the effectiveness

Lav ( f ( x) − F ( x, w )) = Eℑ [( E ( D | X = x] − F (x, ℑ)) 2 ]


since f ( x) = E ( D | X = x)
www.StudentRockStars.com
www.StudentRockStars.com
Bias/Variance Dilemma
E[ D | X = x] − F (x, ℑ) = E[ D | X = x] − Eℑ [ F (x, ℑ)] +
Eℑ [ F (x, ℑ)] − F (x, ℑ)
Bias: B (w ) = Eℑ [ F (x, ℑ)] − E[ D | X = x]
variance: V (w ) = Eℑ [( F (x, ℑ) − Eℑ [ F (x, ℑ)]) 2 ]
 Bias www.StudentRockStars.com
 represents inability of the NN to approximate the regression function
 variance
 inadequacy of the information contained in the training set ℑ
 we can reduce both of bias and variance only with Large
training samples
 purposely introduce “harmless” bias to reduce variance
 designed bias - constrained network takes prior knowledge
www.StudentRockStars.com

Vous aimerez peut-être aussi