Académique Documents
Professionnel Documents
Culture Documents
Abstract
This paper discusses the problem of slow convergence of the gradient descent backpropagation algorithm and proposes using a combination of Nesterovs Accelerated Gradient
Descent algorithm (Nesterov 1983) and parallel coordinate descent in order to alleviate this
problem. This method of accelerated gradient, called BOOM, for boosting with momentum
was developed under Google Research that has been applied to large- scale data sets at
Google.
Background to Neural Network and Backpropagation algorithm
Divide et impera, Latin for Divide and conquer is a strategy that has been seen to be
very powerful in the field of politics and economics. Networks attempt to apply the same
lemma in the field of computations by decomposing a complex problem to simpler parts for
an efficient way of problem solving. One such model is the Artificial Neural Network model
inspired by the functioning of neurons in the animal nervous system. The neurons consist of
dendrites that carry impulses from various sources towards a cell body. If a certain threshold
is met, the neuron fires and the impulses are carried away to other neurons or eventually
the spinal cord or the brain.
Neural networks are in layers that are made up of interconnected nodes. The input layer
receives a pattern communicates to one or more hidden layers where the actual processing
is done. Every connection has a weight associated with it. After processing, the result is
then passed to the output layer.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
clc
% XOR input for x1 and x2
input = [0 0; 0 1; 1 0; 1 1];
% Desired output of XOR
output = [0;1;1;0];
% Bias initialization
bias = [1 1 1];
% Learning coefficient
coeff = 0.8;
% Number of learning iterations
iterations = 5000;
% Calculate weights randomly
weights = 2.*rand(3,3) - 1
err= zeros(iterations,1);
15
16
17
18
19
20
21
for i = 1:iterations
out = zeros(4,1);
numIn = length (input(:,1));
for j = 1:numIn
% Hidden layer
H1 = bias(1,1)*weights(1,1)+ input(j,1)*weights(1,2) + input(j,2)*weights
(1,3);
22
23
24
25
26
27
28
29
30
31
% Output layer
x3 1 = bias(1,3)*weights(3,1)+ x2(1)*weights(3,2)+ x2(2)*weights(3,3);
out(j) = sigma(x3 1);
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
for k = 1:3
if k == 1 % for
weights(1,k)
weights(2,k)
weights(3,k)
else
weights(1,k)
weights(2,k)
weights(3,k)
end
end
end
Bias
= weights(1,k) + coeff*bias(1,1)*delta2 1;
= weights(2,k) + coeff*bias(1,2)*delta2 2;
= weights(3,k) + coeff*bias(1,3)*delta3 1;
= weights(1,k) + coeff*input(j,1)*delta2 1;
= weights(2,k) + coeff*input(j,2)*delta2 2;
= weights(3,k) + coeff*x2(k-1)*delta3 1;
58
59
60
61
62
63
64
65
66
67
68
end
weights
out
figure(1);
plot(err)
Matlab results
Initial weights =
0.8251 0.2134 -0.4771
0.2768 -0.7028 0.5780
-0.1327 -0.3299 0.4744
Final weights =
2.8090 -6.8838 -7.5744
9.1216 -6.7884 -5.5076
-4.0778 -9.1860 8.7378
Final Output =
0.0179
0.9873
0.9767
0.0235