Neural Network Learning Without Backpropagation

NEURAL NETWORK LEARNING WITHOUT BACKPROPAGATION
Presented by Shujon Naha
INTRODUCTION
The popular EBP(Error Back Propagation) algorithm is relatively simple and it can handle problems with basically an unlimited number of patterns. Also, because of its simplicity, it was relatively easy to adopt the EBP algorithm for more efficient neural network architectures where connections across layers are allowed . However, the EBP algorithm can be up to 1000 times slower than more advanced second-order algorithms
INTRODUCTION(CONTD.)

As EBP uses first order algorithms so no major improvements can be done. The very efficient second-order LevenbergMarquardt (LM) algorithm was adopted for neural network training by Hagan and Menhaj. The LM algorithmuses signicantly more number of parameters describing the error surface than just gradient elements as in the EBP algorithm. As a consequence, the LM algorithm is not only fast but it can also train neural networks for which the EBP algorithm has difficulty in converging .
THE LEVENBERG-MARQUARDT ALGORITHM
The Levenberg-Marquardt (LM) algorithm is the most widely used optimization algorithm. It outperforms simple gradient descent and other conjugate gradient methods in a wide variety of problems.
THE LEVENBERG-MARQUARDT ALGORITHM(CONTD.)
The problem for which the LM algorithm provides a solution is called Nonlinear Least Squares Minimization. This implies that the function to be minimized is of the following special form :
LM AS A BLEND OF GRADIENT DESCENT AND GAUSS-NEWTON ITERA-TION
Vanilla gradient descent is the simplest, most intuitive technique to nd minima in a function. Parameter updating is performed by adding the negative of the scaled gradient at each step, i.e.
LM AS A BLEND OF GRADIENT DESCENT AND GAUSS-NEWTON ITERATION(CONTD.) Simple gradient descent suffers from various convergence problems. Logically, we would like to take large steps down the gradient at locations where the gradient is small (the slope is gentle) and conversely, take small steps when the gradient is large, so as not to rattle out of the minima.
LM AS A BLEND OF GRADIENT DESCENT AND GAUSS-NEWTON ITERATION(CONTD.)
Another issue is that the curvature of the error surface may not be the same in all directions. For example, if there is a long and narrow valley in the error surface, the component of the gradient in the direction that points along the base of the valley is very small while the component along the valley walls is quite large. This results in motion more in the direction of the walls even though we have to move a long distance along the base and a small distance along the walls. This situation can be improved upon by using curvature as well as gradient information, namelysecond derivatives. One way to do this is to use Newtons method to solve the equation f (x)= 0.
DISADVANTAGES OF LM

It cannot be used for problems with many training patterns because the Jacobian matrix becomes prohibitively too large. The LM algorithm requires the inversion of a quasi-Hessian matrix of size nw nw in every iteration, where nw is the number of weights. Because of the necessity of matrix inversion in every iteration, the speed advantage of the LM algorithm over the EBP algorithm is less evident as the network size increases. The Hagan and Menhaj LM algorithm was developed only for multilayer perceptron (MLP) neural networks Therefore, much more powerful neural networks , such as fully connected cascade (FCC) or bridged multilayer preceptron architectures cannot be trained.
DISADVANTAGES OF LM(CONTD.)
In implementing the LM algorithm, Hagan and Menhaj calculated elements of the Jacobian matrix using basically the same routines as in the EBP algorithm. The difference is that the error backpropagation process (for Jacobian matrix computation) must be carried on not only for every pattern but also for every output separately.
PROPOSAL
In this paper, the limitations 3) and 4) of the Hagan and Menhaj LM algorithm are addressed and the proposed method of computation allows it to train networks with arbitrarily connected neurons. This way, more complex feed-forward neural network architectures than MLP can be efciently trained. A further advantage of the proposed algorithm is that the learning process requires only forward computation without the necessity of the backward computations. This way, the proposed method, in many cases, may also lead to the reduction of the computation time. In order to preserve the generalization abilities of neural networks, the size of the networks should be as small as possible. The proposed algorithm partially addresses this problem because it allows training smaller networks with arbitrarily connected neurons.
DEFINITION OF BASIC CONCEPTS IN NEURAL NETWORK TRAINING
Let us consider neuron j with ni inputs, as shown below. If neuron j is in the rst layer, all its inputs would be connected to the inputs of the network, otherwise, its inputs can be connected to outputs of other neurons or to networks inputs if connections across layers are allowed.
DEFINITION OF BASIC CONCEPTS IN NEURAL NETWORK TRAINING(CONTD.)
GRADIENT VECTOR AND JACOBIAN MATRIX COMPUTATION
For every pattern, in EBP algorithm only one backpropagation process is needed, while in second-order algorithms the backpropagation process has to be repeated for every output separately in order to obtain consecutive rows of the Jacobian Matrix

From the following Fig. it can be noticed that, for every pattern p,there are no rows of Jacobian matrix where no is the number of network outputs. The number of columns is equal to number of weights in the networks and the number of rows is equal to np no.
In EBP algorithm, output errors are parts of the parameter In second-order algorithms, the parameters are calculated for each neuron j and each output m separately. Also, in the backpropagation process , the error is replaced by a unit value
In the EBP algorithm, elements of gradient vector are computed as
where j is obtained with error backpropagation process. In second-order algorithms, gradient can be obtained from partial results of Jacobian calculations
where m indicates a network output and mj is given by (9).
The update rule of the EBP algorithm is
where n is the index of iterations, w is the weight vector, is the learning constant, and g is the gradient vector. Derived from the Newton algorithm and the steepest descent method, the update rule of the LM algorithm is
where is the combination coefcient, I is the identity matrix, and J is Jacobian matrix shown in Fig. 2.
FORWARD ONLY COMPUTATION

The proposed method is designed to improve the efciencyof Jacobian matrix computation by removing the backpropaation process. The notation of k,j is extension of (9) and can be interpreted as signal gain between neurons j and k and it is given by
where k and j are the indices of neurons and Fk, j(y j) is the nonlinear relationship between the output node of neuron k and the output node of neuron j . Naturally, in feedforward, k j. If k = j ,then k,k = sk ,where sk is the slope of activation function .

The matrix has a triangular shape, and its elements can be calculated in the forward-only process. Later, elements of gradient vector and elements of Jacobian can be obtained using (10) and (12), where only the last rows of matrix associated with network outputs are used. The key issue of the proposed algorithm is the method of calculating of k, j parameters in the forward calculation process
CALCULATION OF ARCHITECTURES
MATRIX FOR FCC
MATRIX FOR FCC
Let us start our analysis with fully connected neural networks (Fig. 5). Any other architecture could be considered as a simplication of fully connected neural networks by eliminating connections (setting weights to zero). If the feedforward principle is enforced (no feedback), fully connected neural networks must have cascade architectures.

MATRIX FOR FCC
For the rst neuron, there is only one parameter 1,1 = s1. For the second neuron, there are two parameters 2,2 = s2 2,1 = s2w1,2s1. The universal formula to calculate the k, j parameters using already calculated data for previous neurons is
wherein the feedforward network neuron j must be located before neuron k, so k j ; k,k = sk is the slop of activation function of neuron k; wj,k is weight between neuron j and neuron k and k, j is a signal gain through weight wj,k and through the other part of network connected to wj,k .
MATRIX FOR FCC
In order to organize the process, the nn nn computation table is used for calculating signal gains between neurons, where nn is the number of neurons (Fig. 7). Natural indices(from 1 to nn) are given for each neuron according to the direction of signals propagation. For signal gains computation, only connections between neurons need to be concerned, while the weights connected to network inputs and biasing weights of all neurons will be used only at the end of the process. For a given pattern, a sample of the nn nn computation table is shown in Fig. 7. The indices of rows and columns are the same as the indices of neurons.
MATRIX FOR FCC
MATRIX FOR FCC
The computation table consists of three parts:

weights between neurons in upper triangle; vector of slopes of activation functions in main diagonal signal gain matrix in lower triangle
Only main diagonal and lower triangular elements are computed for each pattern. Initially, elements on main diagonal k,k = sk are known, as slopes of the activation functions and values of signal gains k, j are being computed subsequently using (22).
MATRIX FOR FCC
The computation is processed neuron by neuron starting with the neuron closest to network inputs. At rst, the row no. 1 is calculated and then elements of the subsequent rows. Calculation on the row below is done using elements from the rows above using (22). After completion of the forward computation process, all elements of the matrix in the form of the lower triangle are obtained.
MATRIX FOR FCC
In the next step, elements of gradient vector and Jacobian matrix are calculated using (10) and (12).
Then for each pattern, the three rows of the Jacobian matrix, corresponding to three outputs, are calculated in one step using (10) without additional propagation of
MATRIX FOR FCC
The proposed method gives all the information needed to calculate both the gradient vector (12) and the Jacobian matrix (10) without the backpropagation process, instead, the parameters are obtained in relatively simple forward computation [see (22)]. In order to further simplify the computation process, (22) is completed in two steps
ALGORITHM
COMPARISON OF THE TRADITIONAL AND THE PROPOSED ALGORITHM
COMPARISON OF THE TRADITIONAL AND THE PROPOSED ALGORITHM
COMPUTATIONAL SPEED
ASCII Codes to Image Conversion
COMPUTATIONAL SPEED
Error Correction:
CONCLUSION
It can be easily adapted to train arbitrarily connected neural networks and not just MLP topologies. This is very important because neural networks with connections across layers are much more powerful than commonly used MLP architectures it is capable of training neural networks with reduced number of neurons and as consequence has good generalization abilities The proposed method of computation gives identical number of training iterations and success rates as the Hagan and Menhaj implementation of the LM algorithm, since the same Jacobian matrices are obtained from both methods.
REFERENCES
P. J. Werbos, Back-propagation: Past and future, in Proc. IEEE Int. Conf. Neural Netw., vol. 1. San Diego, CA, Jul. 1988, pp. 343353. B. M. Wilamowski, N. J. Cotton, O. Kaynak, and G. Dundar, Comput-ing gradient vector and Jacobian matrix in arbitrarily connected neural networks, IEEE Trans. Ind. Electron., vol. 55, no. 10, pp. 37843790,Oct. 2008. N. Ampazis and S. J. Perantonis, Two highly efcient second-order algorithms for training feedforward networks, IEEE Trans. Neural Netw., vol. 13, no. 5, pp. 10641074, Sep. 2002. C.-T. Kim and J.-J. Lee, Training two-layered feedforward networks with variable projection method, IEEE Trans. Neural Netw., vol. 19, no. 2, pp. 371375, Feb. 2008. B. M. Wilamowski, Neural network architectures and learning algoithms: How not to be frustrated with neural networks, IEEE Ind. Electron. Mag., vol. 3, no. 4, pp. 5663, Dec. 2009. S. Ferrari and M. Jensenius, A constrained optimization approach to preserving prior knowledge during incremental training, IEEE Trans. Neural Netw., vol. 19, no. 6, pp. 9961009, Jun. 2008. Q. Song, J. C. Spall, Y. C. Soh, and J. Ni, Robust neural network tracking controller using simultaneous perturbation stochastic approximation, IEEE Trans. Neural Netw., vol. 19, no. 5, pp. 817835, May 2008.
REFERENCES

V. V. Phansalkar and P. S. Sastry, Analysis of the back-propagation algorithm with momentum, IEEE Trans. Neural Netw., vol. 5, no. 3, pp. 505506, May 1994. M. Riedmiller and H. Braun, A direct adaptive method for faster backpropagation learning: The RPROP algorithm, in Proc. Int. Conf. Neural Netw., San Francisco, CA, 1993, pp. 586591. A. Toledo, M. Pinzolas, J. J. Ibarrola, and G. Lera, Improvement of the neighborhood based LevenbergMarquardt algorithm by local adaptation of the learning coefcient, IEEE Trans. Neural Netw., vol. 16, no. 4, pp. 988992, Jul. 2005. J.-M. Wu, Multilayer Potts perceptrons with LevenbergMarquardt learning, IEEE Trans. Neural Netw., vol. 19, no. 12, pp. 20322043, Dec. 2008. M. T. Hagan and M. B. Menhaj, Training feedforward networks with the Marquardt algorithm, IEEE Trans. Neural Netw., vol. 5, no. 6, pp. 989993, Nov. 1994. H. B. Demuth and M. Beale, Neural Network Toolbox: For Use with MATLAB. Natick, MA: Mathworks, 2000. M. E. Hohil, D. Liu, and S. H. Smith, Solving the N-bit parity problem using neural networks, Neural Netw., vol. 12, no. 9, pp. 13211323,Nov. 1999. B. M. Wilamowski, D. Hunter, and A. Malinowski, Solving parity- N problems with feedforward neural networks, in Proc. IEEE IJCNN,Piscataway, NJ: IEEE Press, 2003, pp. 25462551. B. M. Wilamowski and H. Yu, Improved computation for LevenbergMarquardt training, IEEE Trans. Neural Netw., vol. 21, no. 6, pp. 930937, Jun. 2010.

Neural Network Learning Without Backpropagation

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Neural Network Learning Without Backpropagation

Transféré par

Droits d'auteur :

Formats disponibles

NEURAL NETWORK LEARNING WITHOUT BACKPROPAGATION

Presented by Shujon Naha

THE LEVENBERG-MARQUARDT ALGORITHM

THE LEVENBERG-MARQUARDT ALGORITHM(CONTD.)

LM AS A BLEND OF GRADIENT DESCENT AND GAUSS-NEWTON ITERA-TION

LM AS A BLEND OF GRADIENT DESCENT AND GAUSS-NEWTON ITERATION(CONTD.)

DEFINITION OF BASIC CONCEPTS IN NEURAL NETWORK TRAINING

DEFINITION OF BASIC CONCEPTS IN NEURAL NETWORK TRAINING(CONTD.)

GRADIENT VECTOR AND JACOBIAN MATRIX COMPUTATION

GRADIENT VECTOR AND JACOBIAN MATRIX COMPUTATION

GRADIENT VECTOR AND JACOBIAN MATRIX COMPUTATION

GRADIENT VECTOR AND JACOBIAN MATRIX COMPUTATION

In the EBP algorithm, elements of gradient vector are computed as

where m indicates a network output and mj is given by (9).

GRADIENT VECTOR AND JACOBIAN MATRIX COMPUTATION

The update rule of the EBP algorithm is

FORWARD ONLY COMPUTATION

FORWARD ONLY COMPUTATION

FORWARD ONLY COMPUTATION

MATRIX FOR FCC

MATRIX FOR FCC

MATRIX FOR FCC

MATRIX FOR FCC

MATRIX FOR FCC

MATRIX FOR FCC

The computation table consists of three parts:

MATRIX FOR FCC

MATRIX FOR FCC

MATRIX FOR FCC

COMPARISON OF THE TRADITIONAL AND THE PROPOSED ALGORITHM

COMPARISON OF THE TRADITIONAL AND THE PROPOSED ALGORITHM

ASCII Codes to Image Conversion

Vous aimerez peut-être aussi