Vous êtes sur la page 1sur 5

BACKPROPAGATION NETWORK There are variety of artificial neural network algorithms that can be used for such training

of the system required for various jobs like classification, pattern recognition or mapping of continuous input values to continuous output values. However, due to versatility and relative ease of use, Back-propagation (BP) is currently the most popular neural network algorithm. BackPropagation Network. The essential elements of a backpropagation network are shown in Fig. 1. During the process of training, the BP algorithm changes the weight vectors and threshold weights as per the input vectors. The output is generally obtained by a sigmoid transfer function given by Output, O = 1/{1 + exp(- I W)} (1)

Fig. 1 Three layered Backpropagation Net As Fig. 2 shows, a sigmoidal transfer function is monotonic and semi-linear, meaning that it always increases and that it is roughly linear in the center but nonlinear at both extremes. The transfer function therefore acts like a broadly tuned center pass filter, moderating raw sums with very small or very large values but passing the midrange neutrally. The sigmoid approaches '0' and '1' asymptotically, so the sigmoid guarantees that all outputs fall between '0' and '1'.
-1 0 .0 0 1 .0 0 - 5 .0 0 0 .0 0 5 .0 0 1 0 .0 0 1 .0 0

0 .8 0

0 .8 0

0 .6 0

0 .6 0

O U TPU T
0 .4 0

0 .4 0

0 .2 0

0 .2 0

0 .0 0 -1 0 .0 0 - 5 .0 0 0 .0 0 5 .0 0 1 0 .0 0

0 .0 0

IN P U T

Figure 2. Sigmoidal Transfer Function

CHBPN code (1995)

A.K.Mahendra

The hidden layer is the first computational layer in the network. Each hidden-layer neurons connected to all of the input neurons and it calculates the weighted sum of its inputs I W, applies the transfer function to the sum so obtained to generate the output, O. The number of neurons in the hidden layer influences significantly the network's behavior. Networks with too many hidden neurons tend to memorize the training data without any extrapolation / interpolation capability while those with too few cannot learn the problem at all. The number of hidden neuron is chosen based on sensitivity analysis. A BP network always passes forward the data through the hidden layers to the output layer. If the output of the net is not the correct one, the net will backpropagate the error. The backpropagation algorithm is used to adapt weights by minimizing the error function using various techniques like steepest descent or gradient approaches such as quasi-Newton, conjugate-gradient, etc. The network learns to associate specific input patterns with corresponding output patterns by adjusting its weights during the training process. Once the error is minimized, the adaptation of weights ( i.e. the training process ) gets completed. By applying a vector to the input side of the trained neural network, one can produce a corresponding vector on the output side. A BP network learns by example, repeatedly processing a training file that contains a series of input vectors and the correct (or target) output vector for each. Each pass through the training file is one epoch. During each epoch, the network compares the target result with the actual result, calculates the error, and modifies the network's weights to minimize the error. Through this process--called supervised training--the network learns to associate input patterns with the correct output patterns. The accumulated change in the weights represents what the network learned, and saving the trained weights preserves the learned solution. This section provides a brief overview of the calculations BP performs during training. As an example the steps involved for a 3-layer network using a generalized delta rule with momentum factor can be summarized as : Assign the initial values for the primary weight matrices Wkh ( for input to hidden layer) and Whi .( for hidden to output layer) Calculate the output vector for each pattern of the input vector (Inp)k =1,km
km

Ih =

W
k =1

kh

(Inp)k

(2)

Oh = 1/{1 + exp(-Ih)}
hm

(3)

Ii =

W
h =1

hi

Oh

(4)

Oi = 1/{1 + exp(-Ii)}

(5)

CHBPN code (1995)

A.K.Mahendra

where subscript h denotes hidden neuron and hm is the total number of hidden neurons. Calculate the error for the output layer. ( if the output so obtained is not matching with the derivative vector Fi=1,n ) i = O i - Fi
n

(6)

Calculate the sum of the square of errors for all the output neurons, increment the total error, E for all the patterns, T.
T n i 2

( )
i

and then

i=1

E=

( )
t=1 i=1

(7)

After calculating the error for a neuron, the algorithm then adjusts the neuron's weights using the Least Mean Squared (LMS) learning rule, also called the Delta rule. The following equation computes the new weight vector for output neuron j: Update the weights between output to hidden layer i.e Wih Wihnew = Wihold + [ i (Oi/ Ii)Oh + Wihold ] (8)

where (0,1) is called the momentum factor and (0,1) is called the learning rate. As Equation (8) suggests, the vector term added to the current weight vector to adjust it during training is often called the delta vector. This learning rule is designed to adjust the network's weights to find the least mean squared error for the network as a whole. This minimization has an intuitive geometrical meaning. It can be shown that the mean squared error is a quadratic function of the weight vector. As a result, plotting the mean squared error against the weight vector components produces a hyperparabolic surface. Fig. 3 shows an idealized error surface, assuming two-dimensional weight vectors for simplicity. Aggregate Error

Delta vector

Weight y
Old weight vector

Weight x

New weight vector Ideal weight vector

CHBPN code (1995)

A.K.Mahendra

Figure 3. Gradient Descent As Fig. 3 suggests, the geometrical effect of the LMS learning rule is to move the weight vectors toward values that produce the minimum mean squared error, represented by the bottom point of the bowl-shaped error surface. In reality, the error surface typically has complex ravine-like features and many local minima. The delta vector follows the locally-steepest path, a little like a ball rolling downhill, so this process is some times called gradient descent. So far, this discussion ignores the network's layered architecture. A typical network has at least two levels of weighted connections, one for the hidden neurons and the other for the output neurons. At first glance, it is difficult to determine how the hidden neurons contribute to the output-layer error because their target output is unknown. The key to the BP algorithm lies in assigning credit for the actual results back to the hidden layer as required. First, finding the error for the output-layer neurons is straightforward. As shown in Equation (6), the error is proportional to the difference between the target result and the actual result. To find the error in the hidden layer, the error value for each output-layer neuron is sent back to the hidden layer using the same weighted connections. This backward propagation of errors gives the algorithm its name. Because the error is transmitted backward using the original weighted connections, BP in effect assumes that a hidden neuron contributes to the output-layer error in proportion to the weighted sum of the back-propagated error values. This sum is calculated using the hidden-to-output weights before the output layer updates them. Error for the hidden layer;
n

h = (Oh/ Ih)
i =0

ih i

(9)

Updation of the weights between hidden to input layer i.e. Whk Whknew = Whkold + [h(Inp)k + Whkold] (10)

It may be noted that for a sigmoid function the derivative Olayer/Ilayer = Olayer (1-Olayer). The derivative term serves to moderate the sum using the value of the hidden neuron's result. This moderation is necessary partly because a strong connection sometimes transmits a weak result. If a hidden neuron's result approaches zero, for instance, then that neuron probably did not contribute much to the output error, even if the neuron is strongly weighted. Allowing for the hidden-layer output when computing hidden-layer error reduces the risk of blaming a neuron unfairly. The derivative term also contributes to the network's stability. If there are more than one hidden layer then the equation for weight change is given by
hm(L+1)

CHBPN code (1995)

A.K.Mahendra

Wh(L) h(L-1)new = [(Oh(L)/ I h(L))O h(L-1)

new h(L+1) h(L) h(L+1) h(L+1) =0

] + Wh(L) h(L-1)old

(11)

where subscript L is the layer number. L+1 is the layer number nearest to the output layer. Go for the next pattern of the input vector (Inp)k=1,km. Repeat the process for all the patterns, T. As already indicated earlier, sets of such patterns can be created from the data generated either experimentally or by numerical simulation. Training gets over if error criterion based on total error, E is met. After the training is over ( i.e. adaptation of the weight matrices is frozen ), the validation of the artificial neural network is done by utilizing fresh experimental/simulated input vector (Inp)k=1,km. The weight matrices of the validated neural network are preserved for their use in final application. Because the transfer function is an "S"-shaped curve, its derivative is a bell-shaped curve. The sigmoid's derivative has large values in the middle range and small values toward both extremes. This shape assures that large changes in weights do not occur when the sum of the backpropagated errors approaches a very large or very small value. Each hidden neuron calculates its own error, then adjusts the input-to-hidden layer connection weights using the LMS learning rule described in Equation (8). The network is then ready for the next input pattern. To summarize, the algorithm follows this sequence of steps during training: 1. Receives an input pattern at the unweighted input layer. 2. Calculates the hidden-layer weighted sums and applies the transfer function to the sums, producing the hidden-layer result. 3. Transmits the hidden-layer result to the output layer. 4. Calculates the output-layer weighted sums and applies the transfer function to the sums, producing the output-layer result. 5. Compares the actual output-layer result with the target result, calculating an output-layer error for each neuron. 6. Transmits the output-layer errors to the hidden layer (back-propagation). 7. Calculates the hidden-layer error using the weighted sum of the back-propagated error vector moderated by the hidden-layer result. 8. Updates the output-layer and hidden layer weights using the LMS learning rule. 9. Receives the next pattern.

CHBPN code (1995)

A.K.Mahendra

Vous aimerez peut-être aussi