Académique Documents
Professionnel Documents
Culture Documents
References:
1- Lippman R. P.,"An Introduction to computing with Neural Nets",
IEEE ASSP Magazine, April, 1987.
2- Zurada J. M.,"Introduction to Artificial Neural Systems", Jaico
publishing House, 1996
3- Pham D. T. and Xing L., Neural Networks for Identification,
Prediction and Control, Spring-Verlag London Limited, 1997.
4- Ruan D.,"Intelligent Hybrid Systems: Fuzzy Logic, Neural
Networks and Gentics Algorithms" Kluwer Academic publishers,
1997.
5-
Lecture 1
Lecturer: (M.Sc.) Miss. Bushra K.U.
Intelligent Systems
Fuzzy system
Genetic Algorithms
1- Introduction:
Intelligent control is the discipline in which control algorithms are
developed by emulating certain characteristics of intelligent biological
systems for example:1) Neural Network (N.N): tried to emulate the low-level biological
functions of the brain to solve difficult control problems (after training)
2) Fuzzy System: can be designed to emulate the human deductive
process; that is, the process people use infers conclusions from what they
know. They use collection of rules called knowledge bases or rule bases
that hold a set of If- Then rules that quantify the expert's knowledge about
solving a particular problem.
3) Genetic Algorithms: here the goal is to embody the principles of
evaluation, natural selection and genetics from natural biological systems
in a computer algorithm.
Intelligent control is now becoming a common tool in many engineering and
industrial applications. Intelligent control must have the following features
1) learning ability and adaptability
2) robustness
3) Simple control algorithm.
Lecture 1
Lecturer: (M.Sc.) Miss. Bushra K.U.
2- Neural Networks:
2.1 What is a Neural Networks?
Artificial neural networks are computational models (mathematic models) of
the human brain. It are composed of a large number of highly interconnected
processing elements (neurons) working in unison to solve specific problems.
Neuron is a computational element that defines the characteristics of
input/output relationships.
2.2 Historical background
The most basic element of the human brain is a specific type of cell,
which provides us with the abilities to remember, think, and apply previous
experiences to our every action. These cells are known as neurons, each of
these neurons can connect with up 200000 other neurons. The power of the
brain comes from the numbers of these basic components and the multiple
connections between them. In 1909, Cajef and Purkinje, shows a model to
nerves cell. Where neuron consists of four regions
1- Cell body (Soma): provides the support functions and structure of the cell
2- Axon: branching fiber which carries signals away from the neurons
3- Dendrites: consists of more branching fibers which receive signals.
4- Synapses: the electrochemical contact between neurons.
Lecture 1
Lecturer: (M.Sc.) Miss. Bushra K.U.
X1
Input
W1
X2
W2
Xn
Wn
Activation
F(.)
Output
X1
X2
X3
XR
a = f ( WiXi + bi)
i =1
Lecture 1
Lecturer: (M.Sc.) Miss. Bushra K.U.
a-
1 n 0
f (n) =
0 n < 0
(unipolar)
n0
(unipolar)
0 n < 0
n
f (n) =
f ( n) =
a
kn
n>b
b < n < b (bipolar)
n < b
-b
-a
n>b
a
f ( n) =
kn 0 < n < b
(unipolar)
Lecture 1
Lecturer: (M.Sc.) Miss. Bushra K.U.
f ( n) =
1
(uniplor)
1 + e n
where
n = 0, f ( n) = 0.5
n = +, f (n) 1
n = , f (n) 0
f ( n) =
2
1 (bipolar)
1 + e n
if n = 0, f (n) = 0,
2
1
1+1
2
1=1
1+ 0
2
n = ,
1 = 1
1+
n = +,
where n = , f (n) = 1
n = , f ( n) = 1
Lecture 1
Lecturer: (M.Sc.) Miss. Bushra K.U.
w1
w2
w3
Net = X1*W1+X2*W2+X3*W3
Net = (0.5*0) + (1*-0.3) + (-0.7*0.6) = -0.72
1) if f is linear
y=-0.72
2) if f is hard limiter (on-off)
y=-1
3) if f is sigmoid
y=
1
1 + e ( 0.72 )
= 0.32
4) if f is tansh
y=
e 0.72 e 0.72
= 0.6169
e 0.72 + e 0.72
Net = X1*W1+X2*W2+X3*W3+1*b
Net = (0.5*0) + (1*-0.3) + (-0.7*0.6)+(1*1)
X2
X3
= 0.28
w1
w2
w3
1) if f is linear
y=0.28
1
2) if f is hard limiter
7
Lecture 1
Lecturer: (M.Sc.) Miss. Bushra K.U.
y=1
3) if f is sigmoid
y=
1
= 0.569
1 + e 0.28
4) if f is tansh
y=
e 0.28 e 0.28
= 0.272
e 0.28 + e 0.28
5) if f is TLU
y = 0.28
Lecture 2
Lecturer:(M.Sc.) Miss. Bushra K.U.
NNs
Learning algorithms
Structures
Feedforword
networks
Examples
*The multi-layer
perceptron (MLP)
{Rumelhart and
McClelland, 1986}.
*The learning vector
quantization (LVQ)
network
{Kohonen, 1989}.
*The cerebellar model
articulation control
(CMAC)
{Albus, 1975}.
*The group-method of
data handling (GMDH)
network
{Hecht-Nielsen, 1990}.
Recurrent
networks
Supervised
learning
Examples
*The Hopfield
network
{Hopfield,
1982}.
*The Elman
network
{Elman, 1990}.
*The Jordan
network
{Jordan, 1986}.
Examples
Unsupervised
learning
Examples
*The Kohonen
learning
algorithms
{Widrow and Hoff,
1960}.
*The CarpenterGrossberg
Adaptive
Resonance
Theory (ART)
competitive
learning
algorithms
{CarpenterGrossberg, 1988 }.
Lecture 2
Lecturer:(M.Sc.) Miss. Bushra K.U.
2.4.1 NN Topology
The topology of NN describes factors such as how many
interconnections there are for each neuron, that is each neuron connected to a
few other neurons, too many other neurons or to all other neurons in the
network
The term "fully interconnected" NNs refers to network models for
which the o/p of each neuron may be connected to the inputs of all the neurons
(in the next layers), on the other hand, the partially "not fully" interconnected
NN, the o/p of a given neuron is allowed to connect only to certain of its
neighbours.
10
Lecture 2
Lecturer:(M.Sc.) Miss. Bushra K.U.
b1
i
bn
wji
X1
y1
ym
Xn
Input layer
buffer
Output layer
11
Lecture 2
Lecturer:(M.Sc.) Miss. Bushra K.U.
y j (t ) = f ( w ji x i + b j )
i =1
j = yd j y j
RMS =
(y
k =1
(k ) y (k )) 2
NOD
Step 6: if error is less than the desired error then go to step 3 or finish.
12
Lecture 2
Lecturer:(M.Sc.) Miss. Bushra K.U.
START
Initialize weights
&bias
Yes
Update weights to
minimize error
END
13
Lecture 3
Lecturer: (M.Sc.) Miss. Bushra K. U.
1
i
x1
x2
.
.
.
bj
bk
O1
y1
y2
.
.
.
xn
.
.
.
Ol
Input layer
buffer
Hidden layer
ym
output layer
The additional layers are hidden because they are not directly interacting with
the outside world.
14
Lecture 3
Lecturer: (M.Sc.) Miss. Bushra K. U.
pairs of input and output patterns. The neural network first uses the input
pattern to produce its own output pattern and then compares this with the
desired output or the target pattern if there is no difference between an actual
output and the target pattern, then no learning takes place. Otherwise, the
connection weights are changed to reduce the difference. It uses the gradient
algorithm to minimize the Root Mean Square error or (mean error, square
error) between the actual output of a multilayer feed-forward perceptron and
the desired output. It requires continuous differentiable non-linearities in the
following assumes a sigmoid nonlinearity function.
Step1: Initialize Weights and Threshold to
Initial weight must be 1- non zero, 2- small, 3- random value.
Step 2: present new continuous valued input x0, x1, ...., xn along with the
desired output yd
Step3: calculate actual output
n
o j = f ( w ji xi + b j )
i =1
m
y k = f ( w o + b )
j =1
kj j
k = f k (net k )( yd y k )
15
Lecture 3
Lecturer: (M.Sc.) Miss. Bushra K. U.
hidden layer
w ji (new) = w ji (old ) + w ji
w ji = j x i
m
j = f j (net j ) k wkj
k =1
RMS =
(y
k =1
(k ) y (k )) 2
NOD
Step 6: if error is less than the desired error then go to step 2 or finish.
16
Lecture 4(part1)
Lecturer: (M.Sc.) Miss. Bushra K. U.
Notes
The bias
-Some networks employ a bias unit as part of every layer except the output
layer
=1
=1
x1
x2
xn
bk
bj
.
.
.
y1
.
.
.
.
.
.
y2
ym
modifying the shape of the sigmoid where a low value of o tends to make the
sigmoid to take on characteristics of (TLU) and a high value results in a gently
varying function.
17
Lecture 4(part1)
Lecturer: (M.Sc.) Miss. Bushra K. U.
The sigmoid function
Notes
if f ( x) =
2)
f ( x) =
2
1
1 e x
1
(1 y 2 )
2
Note: before training the net a decision has to be made on the setting of the
learning rate theoretically the larger the learning rate, the faster the training
process goes. But practically the learning rate may have to be set to a small
value (0.5 to 0.6) in order to prevent the training process from being trapped at
a local minimum resulting in an oscillatory response.
Momentum term
The main drawback of BP is trapping in local minimum. One way to
overcome this problem is by increasing the learning rate without leading to
oscillation is to modifying the back- propagation by adding a momentum term.
w ji (t + 1) = w ji (t ) + j xi + ( w ji (t ) w ji (t 1))
Lecture 4(part1)
Lecturer: (M.Sc.) Miss. Bushra K. U.
Notes
Convergence
The network dose not leaves a local minimum by the standard BP algorithm
therefore special techniques should be used to get out of a local minimum for
example
a- Change the learning rate or the momentum term.
b- Start the learning process again with different initial weights.
c- Add small random values to the weights (e.g. 10% of the values of
oscillatory weights).
d- Avoid repeated or less noisy data.
e- Increase the number of hidden units (e.g. 10%).
General useful notes
a- Initialize the weights between (-0.1 to 0.1) or (-0.5 to 0.5).
b- Use fixed or adjustable bias and the weight is updated by the same
learning algorithm.
c- Use variable learning rate (like high at the beginning then smaller as
we approach the convergence state) or adaptive .
d- Usually, the i/p patterns are scaled before presenting them to the net. To
avoid saturating the neurons.
e- In modeling applications, the transformation function is chosen to be
linear, f(y)=y for o/p neurons to get unlimited ranges for output.
19
Lecture 5
NN classification and Hopfield network
Lecturer: (M.Sc.) Miss. Bushra K. U.
Binary input
supervised
Hopfield
NN
Hamming
NN
Continuous input
unsupervised
supervised
Carpenter
And
Grosspery
SLP
MLP
unsupervised
Kohonen
NN
Hopfield NN
In the beginning of the 1980 Hopfield published two scientific papers
about its network. This network normally accepts binary and bipolar input (+1
or -1). It has a single layer of neurons, each connected to all the others, giving
it a recurrent structure. Hopfield networks, or associative networks, are
typically used for classification. Given a distorted input vector, the Hopfield
network associates it with an undistorted pattern stored in the network. This
net are most appropriate when exact binary representations are possible as
with black and white images where input elements are pixel values, or with
ASCII text where input values could represent bits in 8-bit ASCII
representation of each character. Hopfield NN is shown in figure below.
X1
Xn
X2
..
Y1
Y2
20
Yn
Lecture 5
NN classification and Hopfield network
Lecturer: (M.Sc.) Miss. Bushra K. U.
It has N nodes containing hard limiting nonlinearities and binary inputs and
outputs taking on the values +1 and -1 the output of each node is fed back to
all other nodes via weights as mentioned above. The operation of this net is
done by first weights are set using the given recipe from exemplar patterns for
all classes. Then an unknown pattern is imposed on the net at time zero by
forcing the output of the net to match the unknown pattern. Then the net
iterates in discrete time until the output no longer change.
21
Lecture 5
NN classification and Hopfield network
Lecturer: (M.Sc.) Miss. Bushra K. U.
i j
i= j
where wij is the connection weight from node i to node j and x is (which is
either +1 or -1) is the ith component of the training input pattern for class s, M
the number of classes and N the number of neurons (or number of components
in the input pattern)
Step 2: initialize with unknown input pattern
y i (0) = xi
1 i N
y j (t + 1) = f h ( wij . yi (t )), 1 j N
i =1
the function fh is the hard limiting nonlinearity as defined below. The process
is repeated until node outputs remain unchanged with further iterations. The
node outputs then represent the exemplar pattern that best matches the
unknown input.
1 x 0
f ( x) =
-1 x < 0
22
Lecture 6
Lecturer: (M.Sc.) Miss. Bushra K.U.
Hamming Network
Hamming Network
It is used when the i/ps are binary value. The net selects a winner from
the stored patterns (x(m), m=1,2,, m) which has the least Hamming distance
from i/p vector.
The Hamming distance is the number of bits in the input which do not
match the corresponding exemplar bits.
Hamming network consist from two part, first the lower subnet which
calculates N minus the Hamming distance to M exemplar patterns. These
matching score will range from 0 to the number of elements in the (n) input
and are highest for those nodes corresponding to classes with exemplars that
best match the input.
Second part is the upper subnet where the binary inputs with (m)
patterns are presented to it. The input is then removed and the MAXNET
iterates until the output of only one node is positive (or big one). Classification
is then complete and the selected class is that corresponding to the node with
+ve output.
Hamming net algorithm
x ij
N
, j =
2
2
x11
1 2
or w ji = x1
2 m
x1
, 1 i N, 1 j M
x 12
x 22
x 2m
x 1n
x n2
x nm
k =l
,
k l
1
M
1 l, k M
23
Lecture 6
Lecturer: (M.Sc.) Miss. Bushra K.U.
or
1
wkl =
Hamming Network
where wji is the connection weight from input i to node j in the lower subnet
and is the bias in the node. wkl is the connection weight from node k to node l
in the upper subnet, and all the bias in this subnet are zero, xij is element i of
exemplar j.
net j = w ji xi + j
i =0
y j = f t (net j )
0
where yj is the output of node j in the lower subnet at time zero, xi is element i
of the input vector, and ft is the threshold logic nonlinearity.
Step3: iterate until convergence
The output of upper subnet
yk
t +1
= f t ( wkl y lt )
l =1
this process is repeated until convergence after which the output of only one
node remains positive
Step4: repeat by going to step3.
24
Lecture 6
Lecturer: (M.Sc.) Miss. Bushra K.U.
Hamming Network
Y2k+1
Y1k+1
Ymk+1
Upper subnet
(MAXNET)
Y20
Y10
Ym0
Lower subnet
(Calculate matching score)
X1
n/2
X2
Y1
Xn
X3
n/2
n/2
Y2
Ym
F(net)
1
X1
X2
Xn
X3
net
Lower subnet
U
Y10
Y1k+1
F(net)
Y20
Y2k+1
1
Ymk+1
Ym
U
Upper subnet
25
net
Lecture 7
Figure (1): Kohonen SOM with two dimensional neighborhood & input vector
The basis feature of this net is the concept of excitatory learning within a
neighborhood around the winning neuron which slowly decreases in size with
each iteration. Weights between input and output nodes are initially set to small
random values and an input is presented with out desired output.
26
Lecture 7
d j = ( xi (t ) w ji )) 2 Euclidean distances
i =1
where xi(t) is the input to node i at time t and wji is the weight from input node i
to output node j at time t.
Step4: select output node with minimum distance. Select node j* as that output
node with minimum dj.
Step5: update weights to node j* and neighbors
Weights are updated for node j* and all nodes in the neighborhood defined by
NEj(t). New weights are
w ji (t + 1) = w ji (t ) + (t )( xi (t ) w ji (t ))
for j NET j (t )
1 i N
Where Ka <1 and Ta is the decay constant of the learning rate, which ranges
from 1000 to 10000.
The term (t ) is a gain term (0< (t ) <1) that decreases in time.
Step6: repeat by going to Step2.
The weights eventually converge and are fixed after the gain term in step 5 is
reduced to zero.
In a well-trained kohonen network. Output neurons that are close to one another
have similar reference vector.
27
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
Fuzzy System
1- Introduction:
Fuzzy systems (or Controller) can be designed to emulate the human
deductive processes that is, the process people use to infer conclusion from what
they know.
They use collection of rules called knowledge base or rule bases that hold
a set of (if- then) rules that quantify the expert's knowledge about solving a
problem.
There are applications of fuzzy system theory to control such as robot, mobile
robot, engine, motor, washing machine, vacuum cleaner, etc, signal processing
such as fuzzy filters or fuzzy signal detection, medical diagnosis, securities, data
compression and so on
Reference
i/p
e
e
Fuzzy Controller
FLC
Plant
Actual
output
28
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
1- The fuzzification: convert the measured "crisp" inputs to "fuzzy" values such
as Positive Big (PB), Negative Small (NS) (which converts controller inputs into
information that the inference mechanism can easily use to activate and apply
rules).
29
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
Knowledge base
Fuzzy
input
Inference Fuzzy
Engine output
Defuzzification
Crisp
e set
Fuzzification
Reference
i/p
Crisp
output
Plant
FLC
30
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
In crisp set an element (u) can either belong or not belong to a set A (i.e.
the degree to which element u belongs to set A is either 1 or 0).
1 if and only if u A
0 if and only if u A
A (u ) =
A : u[0,1]
a- Complement
A'
b- Union
AB
c- Intersection
A B
31
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
Example: let we have set of five pencil located in the box. Determine a fuzzy set
of "short pencil" A as?
A= {p1/0.2, p2/0.5, p3/1.0, p4/1.0, p5/0.9}
P3 and p4 are exactly short, p5 are almost short, p2 is more or less short and p1 is
almost exactly not short.
1
32
Short
20
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
very u=u2
very (very u)= (u2)2
Example: The composite term very old can be obtain from the term old as
Very old=old2
Example: consider the fuzzy set of short pencils
A= {pencil1/0.2, pencil2/0.5, pencil3/1.0, pencil4/1.0, pencil5/0.9}
Then a fuzzy set of very short pencils can be determined as
B= {pencil1/0.04, pencil2/0.25, pencil3/1.0, pencil4/1.0, pencil5/0.81}
Note: less (very u)=u or u 2 = u
2- Intersection
A
= min( A (u ), B (u ))
3- Union
A B (u ) = max( A (u ), B (u ))
33
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
Triangular
Trapezoid
Gaussian
Quadratic
34
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
3- Complement (NOT)
A (u ) = 1 A (u )
35
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
Where A, B, and C are linguistic terms of a fuzzy set which represent the
linguistic labels associated with fuzzy sets specifying their meaning (for
example: SMALL, MEDIUM, LARGE, HIGH,TALL,SHORT, AND FAST).
a and b are the variable that represent the input of the fuzzy controller.
c is the output variable.
(a is A) AND (b is B) is called the antecedent and describes a condition
(c is C) is called the consequent and describes a conclusion
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
37
Lecture 8
Lecturer:(M.Sc.) Miss. Bushra K.U.
Since the first defuzzifcation method ( Centre of gravity/ area) is the best
well known method ( one of the most popular defuzzification method), then it
will introduced mathematically. The crisp output using the centre of gravity/area
is given by:
U =
u
i =1
(u i )
(u )
i =1
38