Vous êtes sur la page 1sur 82

RECURRENT NETWORKS

Introduction
Suppose we want to predict tomorrows stock markets
average y(t+1) based on todays economic indicators xi(t)
Suppose we want the prediction to be
based on todays and yesterdays
economic indicators xi(t) & xi(t-1)
In this case we can augment the number
of inputs

RECURRENT NETWORKS
Introduction
However, if we wish the network to consider an arbitrarysized window of time in the past (e.g. sometimes two days,
sometimes 20 days), then a different solution is required
The recurrent networks provide one such solution
They are used to learn sequential or time-varying patterns,
in which the current value of the pattern is dependent upon
its past history

RECURRENT NETWORKS
Architecture
A specific group of
signals
from
the
previous time step
These units are known
as context units
The weights on the feedback connections to the context
units are fixed (e.g. 1)

RECURRENT NETWORKS
Architecture
Context unit is feedforward connected to its
hidden unit & all other
hidden unit
Number of context units
= number of hidden
units
One hidden unit has feedback connection to one context
unit only

RECURRENT NETWORKS
Training of Weights
At time t, the activations of the context units are the
activations (output signals) of the hidden units at the
previous time step
The weights from the context units to the hidden units are
trained in exactly the same manner as the weights from the
input units to the hidden units
Thus, at any time step, the training algorithm is the same as
for standard backpropagation

RECURRENT NETWORKS
Utilization
After training, the recurrent
nets can be presented an
arbitrary sized sequence and
it will predict the next element
of the sequence
It can also be trained to
classify or label a sequence.
We give an arbitrary sized
portion of the sequence as
input and the network will try
to tell us the name of the
sequence

RECURRENT NETWORKS
Example: Next valid letter in a string of characters
Suppose that a string of characters are generated by a small
finite-state grammar. The string begins with the symbol B
and ends with the symbol E
At each decision point,
either path can be taken
with equal probability
Two examples of the
possible strings are
B PVVE
&
BTX XTTVPS E

RECURRENT NETWORKS
Example: Architecture used

RECURRENT NETWORKS
Example: Training
The training patterns for the neural net consisted of 60,000
randomly generated strings
The string length ranged from 3 to 30 letters (plus the Begin
and End symbols at the beginning and end of strings)
The letters in the string are presented sequentially to the net,
one by one, respecting the order in which they appear
Each letter in the string becomes an input, on its turn, and its
successor is considered as the output

RECURRENT NETWORKS
Example: Training
The training algorithm for the context sensitive grammar is:

RECURRENT NETWORKS
Example: Training
String
BTXSE

RECURRENT NETWORKS
Example: Testing & Utilization
After training, we hope that the net has learnt the grammar
Given a valid sequence of letters, it can be used to predict the
next valid letter in the sequence
An interesting application can be the determination of a
given string as a valid string
As each symbol is presented, the net predicts the possible
valid successors of that symbol (output units with activations
of 0.3 or more are counted as valid successors)

RECURRENT NETWORKS
Example: Testing & Utilization
If the next letter in the string is not one of the predicted valid
successors, the string is rejected. If all the letters pass the
test, then the string is accepted as valid
The reported result for 70,000 random strings, 0.3% (210) of
which were valid, are that the net correctly reject 99.7% of
the invalid strings and accepted 100% of the valid strings
The net also performed perfectly on a set of extremely long
strings (100 or more characters in length)

RECURRENT NETWORKS
Example:
Utilization

Testing

&

Note that the grammar is

such that we can establish
context with the help of two
samples
If the context is deeper than
more complex network may
be required

RECURRENT NETWORKS
Example:
Utilization

Testing

&

The example may be

For example, letters may be
replaced
by
features
composing the letters
or
by words
sentence

composing

RECURRENT NETWORKS
Conclusion
The architecture that we have studied is called simple
recurrent network as it contains a single hidden layer
It is also called Elman network and is similar to an
architecture proposed by Jordan
It is implemented in Matlabs neural network tool-box

RECURRENT NETWORKS
References
Read Section 7.2.3, page 372 of Laurene Fausett

ART-1
Unsupervised method of clustering
Allows increase in number of clusters only if required
Developed by Carpenter & Grossberg in 1987
ART-1 is designed for clustering binary valued vectors and
ART-2 is for continuous valued inputs

ART-1
Architecture

ART-1
Architecture
The architecture of the computational units for ART1
consists of
F1 units (input and interface units),
F2 units (cluster units),
and a reset unit
The reset unit implements user control over the degree of
similarity of patterns placed on the same cluster

ART-1
Architecture
Each unit in the F1(a) input layer is connected only to the
corresponding unit in the F1(b) interface layer.
Each unit in the input and interface layers is connected to
the reset unit. The reset unit is connected to every F2
(cluster) unit.

ART-1
Architecture
Each unit of the interface layer is connected to each unit in
the cluster layer by two weighted pathways.
The interface unit Xi is connected to the cluster unit Yj by
bottom up weight bij.
Similarly, unit Yj is connected to unit Xi by top-down
weights tji
The cluster layer is a competitive layer in which only the
uninhibited node with the largest net input has a non-zero
activation

ART-1
Training
Notations
n
m

number of components in the input vector

maximum number of clusters that can be
formed
bij
bottom-up weights (from interface unit Xi to
cluster unit Yj)
tji
binary top-down weights (from cluster unit Yj
to interface unit Xi)
p
vigilance parameter
s
binary input vector (an n-tuple)
x
activation vector for interface layer (binary)
x norm of vector x, defined as the sum of the
components xi

ART-1
Training: Competition-to-Learn Phase
A binary input vector s is presented to the input layer, and
the signals are sent to the corresponding X units. These
interface units then broadcast to the cluster layer over
connection pathways with bottom-up weights
Each cluster unit computes its net input and the units
compete for the right to be active
The unit with the largest net input sets its activation to 1; all
others have an activation of zero. Let the index of the
winning unit be J. This winning unit becomes the candidate
to learn the input pattern

ART-1
Training: Similarity Check
A signal is then sent down from cluster to interface layer
(multiplied by the top-down weights)
The interface X units remain on only if they receive nonzero signals from both the input and cluster units.
The activation vector of interface layer has the states of
individual units as its components. Those units are 1 or on
which have non-zero inputs
The norm of the vector x (the activation vector for the
interface layer) gives the number of components in which the
top-down weight vector for the winning cluster unit t J and
the input vector s are both 1. This quantity is sometimes
referred to as the match.

ART-1

Training: Weights Update

If the ratio of x to s is greater than or equal to the
vigilance parameter, the weights (top down and bottom up)
for the winning cluster unit are adjusted
The use of the ratio allows an ART1 net to respond to
relative differences. This reflects the fact that a difference of
one component in vectors with only a few non-zero
components is much more significant than a difference of one
component in vectors with many non-zero components

ART-1
Training: Search for another unit
if 1st candidate is rejected
However, if the ratio is less than the vigilance parameter, the
candidate unit is rejected, and another candidate unit must
be chosen
The winning cluster unit becomes inhibited, & it cannot be
chosen again as a candidate on this learning trial, & the
activations of the input & interface units are reset to zero
The same input vector again sends its signal to the interface
units, which again send this as the bottom-up signal to the
cluster layer, & the competition is repeated (but without the
participation of any inhibited units)

ART-1
Training: Search for another unit
if 1st candidate is rejected
The process continues until either a satisfactory match is
found (a candidate is accepted) or all units are inhibited
The action to be taken if all units are inhibited must be
specified by the user: reduce the value of the vigilance
parameter thus allowing less similar patterns to be placed on
the same cluster, or to increase the number of cluster units,
or simply to designate the current input pattern as an outlier
that could not be clustered

ART-1
Training: Search for another unit
if 1st candidate is rejected
At the end of each presentation of a pattern, all cluster units
are returned to inactive status and are available to
participate in the next competition

ART-1
Training

ART-1
Training

ART-1
Training

ART-1
Training

ART-1
Architecture

ART-1
Training: Example
The ART1 algorithm is used to group 4 vectors in at most 3
clusters
Vectors are: [1
[0
[1
[0

1
0
0
0

0
0
0
1

0]
1]
0]
1]

ART-1
Training: Example

ART-1
Training: Example

ART-1
Training: Example

ART-1
Training: Example

ART-1
Training: Example

ART-1
References
Section 5.2 of Laurene Fausette

ART-1
References
Section 5.2 of Laurene Fausette

RBF NETWORKS
Architecture

RBF NETWORKS
Architecture
Radial Basis Function Networks consist of:
- A hidden layer of k Gaussian neurons
- A set of weights wi
For Gaussian basis functions, the activation is given as

RBF NETWORKS
Training
RBFNs are trained using a two step process:
Determine the K neurons using:
Kohonen Unsupervised Learning
Determine the weights
Supervised Learning (Backpropagation)

RBF NETWORKS
Architecture

RBF NETWORKS
Architecture

RBF NETWORKS

There can be several outputs m, and each output would be

independent of other outputs: the network can be
viewed as m independent networks

RBF NETWORKS

Outputs as
linear
combination of
hidden layer of
RBF neurons

Inputs

RBF NETWORKS
References
Engelbrecht

Chapter 5

HOPFIELD NETWORK
Architecture

HOPFIELD NETWORK
Architecture

HOPFIELD NETWORK
Utilization

HOPFIELD NETWORK
Utilization

HOPFIELD NETWORK
Utilization

HOPFIELD NETWORK
Training by Hebbs Rule

HOPFIELD NETWORK
Training by Hebbs Rule

Consider 2 nodes which have (1, 1) activation for the

pattern to be stored
If there is a positive weight between them, then each would
reinforce the other (each one is making a positive
contribution to the activation of other)
For (0, 1) or (1, 0) activations, negative weights reinforce
each others activations

HOPFIELD NETWORK
Training by Hebbs Rule

This is a short cut for Hebbs rule

It gives a positive weight, if both activations are (1, 1) or (0,
0) and a negative weight if they are (1, 0) or (0, 1)

HOPFIELD NETWORK
Training by Hebbs Rule: Bi-polar activations

This is again a short cut for Hebbs rule

The formula gives positive weights for (1, 1) and negative
weights for (-1, 1) activations

HOPFIELD NETWORK
Storage Capacity

HOPFIELD NETWORK
References
Laurene Fausette
Kevin Gurney

Section 3.4.4
Chapter 5 & 6

BOLTZMANN MACHINE
Introduced in 1983 by Hinton & Sejnowski
The architecture of a Boltzmann machine consists of a set of
units and a set of bidirectional connections between pairs of
units
Not all units are connected, however if two units are
connected then their connection is bidirectional and the
weights in both directions are same, i.e. wij = wji
A unit may also have a self-connection with weight wii

BOLTZMANN MACHINE
Architecture
Boltzmann machine to
solve
the
Travelling
Salesman problem
Units are arranged in a
two dimensional array
The units within each row
are fully interconnected
Similarly, the units within
each column are fully
interconnected

BOLTZMANN MACHINE
Architecture
Architecture for 10 city TSP

BOLTZMANN MACHINE
Architecture
The state xi of a unit Xi is either on (1) or off (0)
The objective of the neural net is to maximize to consensus
function over all the units of the net

BOLTZMANN MACHINE
Architecture
The net attempts to find this maximum by letting each unit
attempt to change its state from on to off or vice versa
The change in consensus for a change of state of unit Xi is

Where xi is the current state of Xi

The coefficient [1 2xi] will be +1 if unit Xi is currently off
and -1 if it is on

BOLTZMANN MACHINE
Architecture
However, unit Xi does not necessarily change its state, even if
doing so would increase the consensus of the net
We update is done probabilistically so that the chances of the
net getting stuck in a local maximum are reduced

BOLTZMANN MACHINE
Architecture
The probability of the net accepting a change in state is

Where T is a control parameter called Temperature

Lower values of T makes it more likely that the net will accept
a change of state that increases its consensus and less likely
that it will accept a change that reduces its consensus
The control parameter T is gradually reduced as the net
search proceeds

BOLTZMANN MACHINE
Architecture
The connection pattern for the net is as follows

BOLTZMANN MACHINE
Architecure

BOLTZMANN MACHINE
Architecture
Consider the relation between the penalty weight -p and the
bonus weight b
Allowing a unit Uij to turn on should be encouraged only if no
other units are on in that column or row (no city can be
visited twice, neither can be two different cities visited at the
same time)
If we set p > b, then our objective will be achieved, and if a
unit turns on with a unit already on in its row or column,
then the consensus function will have a negative change

BOLTZMANN MACHINE
Architecture
Now let us consider the relation between the constraint weight
b and the distance weights
Let d denote the maximum distance between any two cities on
the tour
Suppose that no unit in on in a column j and in row i
Allowing the Uij to turn on should be encouraged and the
weights should be set so that the consensus will be increased if
it turns on

BOLTZMANN MACHINE
Architecture
The change in consensus will be b di,k1 di,k2
where k1 indicates city visited at stage j-1 of the tour, and k2
denotes the city visited at stage j+1 (and the city i is visited at
stage j)

BOLTZMANN MACHINE
Architecture
Since d is the maximum possible distance, hence the minimum
change in consensus b di,k1 di,k2 will be equal to b 2d
Since the change in consensus should be positive we can take
the value of b > 2d for the net to function properly

BOLTZMANN MACHINE
Setting the weights
The weights for a Boltzmann machine are set manually
Their values are chosen such that the net will tend to make
transitions towards a maximum of the consensus function

BOLTZMANN MACHINE
Algorithm

BOLTZMANN MACHINE
Algorithm

BOLTZMANN MACHINE
Algorithm: Initial Temperature
The initial temperature should be taken large enough so that
the probability of accepting a change of state is approximately
0.5, regardless of whether the change is beneficial or
detrimental

BOLTZMANN MACHINE
Example
The Boltzmann machine was used to solve the TSP
The starting configuration had approximately half the units
on
The parameters were: To = 20, b = 60, & p = 70
The cooling schedule was Tnew = 0.9Told after each epoch
An epoch consisted of each unit attempting to change its
value

BOLTZMANN MACHINE
Example
The experiment was repeated 100 times for different initial
configurations
Typically valid tours were produced in 10 or fewer epochs &
for all 100 configurations valid tours were found in 20 or
fewer epochs
In these experiments, it was rare for the net to change its
configuration once a valid tour was found

BOLTZMANN MACHINE
Example
Five tours of less length were

BOLTZMANN MACHINE
References
Laurene Fausette

Section 7.1.1