Vous êtes sur la page 1sur 15

Artificial neural networks:

There is an increasing demand of greater machine intelligence, and ingenuity. We


required revolutionary theory to create brain like machine. A biological neural network exists in
human brain. The better we understand the brain, the better we have understanding to build
thinking machines.

As information about the functions of the brain was accumulated, a new technology
artificial neural network evolved.

Artificial neural network comprise many neurons, interconnected in certain ways to cast
them in to identifiable topologies. Topology may be multilayer feed forward or multilayer
competitive of bilayer feed forward or monolayer hetero feedback type. Typically three layers
namely input, output & hidden layer exist in each topology.

To model a single neuron the Mcculloch-pitts model is used, in which no mechanism
exist to compare the actual expected output response. This model known as perceptron requires
supervised learning. It is a paradigm. Paradigms are characteristic artificial neural networks, inspired
from the biological world.

Winner-takes-all learning algorithm is used in case of unsupervised learning. Back
propagation algorithm widely used in feed forward multilayer neural networks having one or more
hidden layers. It is an involved mathematical tool. It differs from other algorithms by the process
which is used to calculate the weights during the learning phase of the network.

1
2
k
1
2
m
1
x
2
x
N
x
1
O
2
O
K
O
Input Hidden Output
Layer Layer Layer
Feedforward MLP
Back-Propagation Learning Algorithm

The back-propagation algorithm has been widely used as a learning algorithm in
feed forward multilayer neural networks. The BP is applied to feed forward ANNs
with one or more hidden layers, as shown in.

Based on this algorithm, the network learns a distributed associative map between
the input and output layers.

What makes this algorithm different than the others is the process by which the
weights are calculated during the learning phase of the network. In general, the
difficulty with multilayer Perceptrons is calculating the weights of the hidden layers
in an efficient way that results in the least (or zero) output error; the more hidden
layers there are, the more difficult it becomes. To update the weights, one must
calculate an error. At the output layer this error is easily measured; this is the
difference between the actual and desired (target) outputs. At the hidden layers,
however, there is no direct observation of the error; hence, some other technique
must be used to calculate an error at the hidden layers that will cause minimization
of the output error, as this is the ultimate goal.

Learning with the Back-Propagation Algorithm

The back-propagation algorithm is an involved mathematical tool; however, execution of the
training equations is based on iterative processes, and thus is easily implementable on a
computer.

During the training session of the network, a pair of patterns is presented (X
k
, T
k
), where
X
k
is the input pattern and T
k
is the target or desired pattern. The X
k
pattern causes output
responses at each neuron in each layer and, hence, an actual output O
k
at the output layer. At
the output layer, the difference between the actual and target outputs yields an error signal.
This error signal depends on the values of the weights of the neurons in each layer. This error
is minimized, and during this process new values for the weights are obtained. The speed and
accuracy of the learning process that is, the process of updating the weights also depends
on a factor, known as the learning rate.

Before starting the back-propagation learning process, we need the following:

- The set of training patterns, input, and target
- A value for the learning rate
- A criterion that terminates the algorithm
- A methodology for updating weights
- The nonlinearity function (usually the sigmoid)
Initial weight values (typically small random values)
Mathematical Analysis

Consider a feed forward network with the following parameters:

L layers and N
l
nodes in layer l,
w
l,j,I
= weight between node i of layer l 1 and node j of layer l,
O
l,,j
(x
p
) = actual output (for pattern x
p
of jth node in layer l (after nonlinearity)),
T
L.j
(x
p
) = expected, or target, output (for pattern x
p
of jth node in layer L),

l.j
(x
p
) = activation output (for pattern P) for node j in layer l (Prior to nonlinearity),
P = training patterns and x
p
the pth training pattern.

To illustrate the back-propagation learning procedure, assume that node i of the (l + 1)th
layer receive signals from node j in the lth layer via the weights .

N
l
nodes in the lth layer, the output signal from node i of the (l + 1)th node, and the
kth input pattern to the network are expresses by

(k) = f

= f

where the threshold term has been included in the summation.

j
l
i
w
( )

=
+

l
N
j
l
i
l
j
l
ij
k O w
1
1
) ( u
1 + l
i
O
( )

+
=
1
1
, ) (
l
N
j
l
j
l
ij
k O w
1 + l
i
u
1
j
l
N
1
i
1 + l
N
l
ij
W
) ( /
1
k O
+
1
/
+
W
O/(k)
Learning procedure with back propagation at a node
l layer l+1 layer
If the sigmoid function f(x) = 1 / (1 + exp(-x)) is used, its derivative is

(x) = f (x) (1 f (x)).

The total error, E, for the network and for all patterns K is defined as the sum of
squared differences between the actual network output and the target (or desired)
output at the output layer L:

E = =

The goal is to evaluate a set of weights in all layers of the network that minimize E.
The learning rule is specified by setting the change in the weights proportional to the
negative derivative of the error with respect to weights:



=
K
k
k
E
1

= =
|
|
.
|

\
|

K
k
N
i
L
i i
L
k O k T
1 1
2
] ) ( ) ( [
2
1
l
nm
l
nm
Ek
e
e
c
c
~ A
To calculate the dependence of the error E
k
on the nmth weight of a neuron in the lth
layer, we use the chain rule:





Then




If we introduce the sigmoid function and its derivative into the latter relationship and
for l = L 1 ( i.e., weights of the outout layer), then






l
nm
L
i
L
i
l
nm
k O
k O
Ek Ek
e e c
c
c
c
=
c
c ) (
) (

=
c
c
=
c
c

L
N
i
l
nm
L
i L
J
l
nm
k O
k O k Ti
Ek
1
) (
)) ( ) ( (
e e
1
) 1 ( ) (

=
c
c

L
m
L
n
L
n
L
n n
l
nm
O O O O T
Ek
|
e
Thus, the procedure for adjusting weights of the output layer is

Where is a proportionality factor known as the learning rate.

However, if l L 1, then

still depends on , and the error dependency on
weights, again by applying the chain rule, is


Now, if l = L 2 (i.e., weights of neurons in the last hidden layer), then the latter is
expressed by





1
)] 1 ( ) [(

= A
L
m
L
n
L
n
L
n n
l
nm
O O O O T n e
1 L
m
O
l
nm
e

=
+
=

c
c
' =
c
c

L L
N
i
N
J
l
nm
L
j
L
ij
L
i
L
i i
l
nm
O
O f O T
Ek
1
1
1
1
1
1
) ( ) (
e
e
e

=

' '
=
c
c

l
i
L
m
L
n
L
in
L
i
L
i i
l
nm
O O f O f O T
Ek
1
2 1 1
) ( ) ( ) ( e
e
2
1
1 1
) ( ) ( ) (

=

'

'
=

L
m
N
i
L
in
L
i
L
i i
L
m
O O f O T O f e
Consequently, the procedure for adjusting the weights of the last hidden layer is





The latter is summarized as




Where, for the weights at the output layer,




And, for the weights of the hidden layer,






2
1
1 2
] ) ( ) ( ) ( [

=

'

'
= A
L
in
N
i
L
i
L
i
L
i i
L
n
L
nm
O O f O T O f
L
e n e
,
1
= A
l
j
l
i
l
ij
O no e
), 1 ( ) (
L
i
L
i
L
i i
L
i
O O O T = o
). 1 (
1
1 1 l
i
l
i
N
r
l
ri
l
r
l
i
O O
l

|
|
.
|

\
|
=

=
+ +
e o o
The process of computing the gradient and adjusting the weights is repeated until a
minimum error is found. In practice, one develops an algorithm termination criterion so
that the algorithm does not continue this interative process forever.

It is apparent that for nodes in layer l the computation of depends on the errors
computed at layer l+1: that is, the computating of the differences is computed
backwords.

Applications:

Before applying the algorithm, one need to

1.Decide on the function of the network to be performed (i.e., recognition,
prediction, or generalization).
2. Have a complete set of input and output training patterns.
3. Determine the number of layer in the network and the number of nodes per layer
4. Select the nonlinerity function (typically a sigmoid) and a value for the learning
rate.
5. Determine the algorithm termination criteria.


l
i
o
The learning algorithm can now be applied as follows:
1. Initialize all weights to small random values.
2. Choose a training pair (x(k), T(k)).
3. Calculate the actual output from each neuron in a layer starting with the input
layer and proceeding layer by layer toward the output layer L:



4. Compute the gradient and the difference for each input of the neuron in a layer
starting with the output layer and backtracking layer by layer toward the input.
5. Update the weights.
6. Repeat steps 2-5.

|
|
.
|

\
|
=

1
0
1
) (
l
N
m
l
m
l
jm
l
j
O f k O e
Application of hand-free telephone



We consider the case of a mobile telephone hung onto the vehicle control panel. The
motor creates a noise p that will superimpose itself, with a multiplier, onto the drivers
voice that constitutes the useful signal x.

The noise p is measured by a sensor located close to the engine. The microphone of the
telephone receives a part of parasitic, e.g., k p (k<1) that is added to useful signal x,

We send the engine noise to the input of the neuron and we wish to predict the noisy
signal y. The neuron will take out the part of the signal there correlated with the noise p,
either a near signal of. The error of prediction, the difference between the target (noisy
signal) and the output signal of the neuron will constitute the valued x useful signal.


%interference cancellation
clear all, close all
t = 0:0.0005:1;
x = sin(sin(20*t).*t*200);
% plotting of useful signal x
figure(1), plot(t, x)
axis([0 1 -1.2 1.2])
xlabel('time'), title('useful voice signal x'), grid
% parasitic signal
p = (sin(20*pi*t)/2)+(0.2*sin(100*pi*t-100));
figure(2)
plot(t, p)
axis([0 1 -1.2 1.2])
xlabel('time'), title('the engine noise p'), grid
% noisy signal
figure(3)
z = x + 0.833*p;
plot(t, x)
xlabel('time'), title('noisy signal z = x + 0.833 p'), grid
% random initialization of the weights w and bias b
w = randn(1, 1); b = randn(1, 1);
weights_w = w; bias_b = b;
eta = 0.1;
% adaptation gain
for i=1:length(t)
% the neuron output
y(i) = w * p(i) + b;
% error output
e(i) = z(i) - y(i);
[dw,db] = learnwh(p(i),e(i),eta);
%Updating the weights w and bias b matrices
w = w + dw; b = b + db;
% saving the weights and bias matrices
weights_w = [weights_w, w];
bias_b = [bias_b, b];
end
% extracted signal
figure(4), plot(t, e), axis([0, 1 -1.2 1.2])
xlabel('time'), title('extracted signal'), grid
% extraction error
figure(5), plot(t, x-e), xlabel('time'),
title('extraction error'), grid
axis([0 1 -1 1])

Vous aimerez peut-être aussi