Académique Documents
Professionnel Documents
Culture Documents
Abst~actAlthough an n-th order cross-entropy(nCE) error function resolves the incorrect saturation problem of conventional error backpropagation(EBP) algorithm, performance of multilayer perceptrons(MLPs) using the nCE
function depends; heavily on the order of nCE. In this paper, we propose an adaptive learning rate to make the
MLP performance insensitive t o the order of nCE. .Additionally, we propose to limit error signal values at output
nodes for stable llearning with the adaptive learning rate.
The effectiveness of the proposed method is demonstrated
in a handwritten digit recognition task.
I. INTRODUCTION
Multilayer perceptrons (MLPs) use the error backpropagation (EBP) algorithm for training[l]. Training
is usually done by iterative updating of weights according to the error :signal, which is the negative gradient of
the mean-squared error(MSE) function. In the output
layer, the error signal is the difference between desired
and actual output values of M L P multiplied by the slope
of sigmoid activation function. Then, the error signal is
back-propagated1 to the hidden layers.
In pattern recognition applications, the desired output
value of MLP is one of the two extreme values of the
sigmoid function. If the weighted sum to any output
node is near the wrong extreme value, we say the node
is "incorrectly sizturated[2] ."
When an output node is incorrectly saturated, the
amount of weight change is small due to the small gradient of the sigmoid activation function and the error
remains nearly unchanged[2], [ 3 ] , [4],[5]. This incorrect
saturation probliem is a major reason for slow learning
speed of EBP algorithm.
In order to resolve this problem, an n-th order crossentropy (nCE) error function was proposed so that the
error signal is strong for incorrectly saturated output
node and weak for correctly saturated output node/:6]. In
addition to resching the incorrect saturation problem,
the nCE function prevented overspecialization of MLPs
for training patterns. Generalization performance of the
trained MLPs, however, depends heavily on the order
11.
~ T ORDER
H
CROSS-ENTROPY
ERROR
vpzy-l).
2357
Authorized licensed use limited to: KOLEJ UNIVERSITI TEKNOLOGI TUN HUSSEIN ONN. Downloaded on July 30, 2009 at 23:29 from IEEE Xplore. Restrictions apply.
(3)
layer is
ap
-1
Here,
111. ADAPTIVE
LEARNINGRATEWITH LIMITED
ERRORSIGNAL
To resolve the fore-mentioned problems, we propose
an adaptive learning rate at each epoch s as
where 1 5 15 L - 1
(4)
is the error signal and q is the learning rate.
In the above EBP algorithm, the output node
which has extreme value opposite to t j can not make
a strong error signal for adjusting the weights significantly[2], [5], as shown in Fig. 1. This incorrect saturation retards the search for a minimum in the error
surface.
To resolve the incorrect saturation problem, a strong
error signal is necessary for the incorrectly saturated output nodes, like the cross-entropy(CE) method[5]. For
correctly saturated output nodes, a weak error signal
needs to be generated so that the weight update associated with a training pattern scarcely perturbs the
weights trained for all training patterns. The weak error signal is also necessary to prevent overspecialization
of learning for training patterns, like the classification
figure of merit method[8].
In this sense, Oh proposed an n-th order crossentropy(nCE) error function[6]
where
t j
= 3 1 and n = 1 , 2 , ...
Here, E { ( t j ( s )- Z ~ ( S ) ) ~ and
}
E { c $ ~ ~ ( are
s ) }the expected values considering all output nodes in the sth
epoch. Then, the expected intensity of q ( s ) b f ) ( s )is
2358
Authorized licensed use limited to: KOLEJ UNIVERSITI TEKNOLOGI TUN HUSSEIN ONN. Downloaded on July 30, 2009 at 23:29 from IEEE Xplore. Restrictions apply.
E
g
m
ol
c
.e
e
.-
(9)
0.09
0.08
0.07
0.06
0.05
Here,
sgn(z) =
+1,
-1,
if 3: 2 0
otherwise.
0.04
-f
%
f
0.03
0.02
0.01
n
-0
I.
= 3 ~ ~ l ~ ~ - 1 ) l d E [ ( tk (3:r(s))2].
S)
0.09
0.08
0.07
d-
0.06
0.05
f3 J
JP,
T
(41.
IV. SIMULATION
ratio for untrained 2,213 test patterns. The misclassification ratio for the test patterns with n = 1 shows poor
generalization since the CE method makes the MLP specialized too much for training patterns[6]. With increasing n until 5, we can get more improved results for the
test patterns since the weak error signal near the desired
value prevents overspecialization for training patterns.
With n >_ 6, however, very weak error signal near the
desired value retards learning and the curve decreases
very slowly. From these results, we can take n = 3 or
4 as an optimum order of nCE in viewpoints of training
speed and generalization performance.
To remove the performance variation on the order of
nCE, we adopt the proposed method and draw the simulation results in Fig. 4 and 5. Comparing Fig. 4 with
Fig. 2 which corresponds to the misclassification ratio
for the training patterns, we can find that the proposed
method successfully decreases the learning speed dependency on the order of nCE. Fig. 5 shows the simulation
results for the test patterns. The curve with n = 1 shows
poor generalization performance since this curve corresponds to the CE method with fixed learning rate. With
n 2 2, the curves show better classification ratio for the
2359
Authorized licensed use limited to: KOLEJ UNIVERSITI TEKNOLOGI TUN HUSSEIN ONN. Downloaded on July 30, 2009 at 23:29 from IEEE Xplore. Restrictions apply.
I ,
0.8
0.07
p.df - - Mean o
ThresholdingValue
-B
0.06
0.05
0.04
0.03
0.02
------_________
0.01
0
50 100 150 200 250 300 350 400 450 500
Epoch Number
Misclassification ratio for training patterns with the adaptive learning rate and the limited error signal.
0.09,
0
0
V . CONCLUSION
3 ( 1 1 1
50 100 150 200 250 300 350 400 450 500
Epoch Number
REFERENCES
[l] D. E. Rumelhart and J. L. McClelland, Parallel Distributed
[-346[6p32(s)],
346[6f(s)]]
with nearly zero mean.
Thus, the limited error signal in Eq. (9) makes a little
effect on the learning of MLPs with nCE error function.
[3] J. R. Chen and P. Mars, Stepsize variation methods for accelerating the backpropagation algorithm, PTOC.I J C N N Jan.
15-19, 1990, Washington, D C , U S A , vol. I, pp. 601-604,1990.
[4] A. Rezgui and N. Tepedelenlioglu, The effect of the slope of
the activation function on the back propagation algorithm,
PTOC.I J C N N Jan. 15-19, 1990, Washington, D C , USA, vol. I,
pp. 707-710, 1990.
[5] A. van Ooyen and B. Nienhuis, Improving the convergence of
the back-propagation algorithm, Neural Networks, vol. 5, pp.
465-471, 1992.
[6] S.-H. Oh, Improving the error backpropagation algorithm
with a modified error function, IEEE Trans. Nevral Networks, vol. 8, no. 3, pp. 799-803, 1997.
[7] J. J. Hull, A database for handwritten text recognition research, I E E E Trans. Pat. Ana. Mach. Int., vol. 16, no. 5, pp.
550-554, May 1994.
[8] J. B. Hampshire II and A . H. Waibel, A novel objective function for improved phoneme recognition using time-delay neural
networks, I E E E Trans. Neural Networks, vol. 1, pp. 216-228,
June 1990.
2360
Authorized licensed use limited to: KOLEJ UNIVERSITI TEKNOLOGI TUN HUSSEIN ONN. Downloaded on July 30, 2009 at 23:29 from IEEE Xplore. Restrictions apply.