Vous êtes sur la page 1sur 4

Neural Comput & Applic (2009) 18:377–380

DOI 10.1007/s00521-008-0188-0

ORIGINAL ARTICLE

Classification error of multilayer perceptron neural networks


Lihua Feng Æ Weihu Hong

Received: 24 May 2007 / Accepted: 10 April 2008 / Published online: 1 May 2008
 Springer-Verlag London Limited 2008

Abstract In subject classification, artificial neural net- paper, these problems are discussed on the multilayer per-
works (ANNS) are efficient and objective classification ceptron neural networks (MLPNNS) [5, 6].
methods. Thus, they have been successfully applied to the
numerous classification fields. Sometimes, however, clas-
sifications do not match the real world, and are subjected to 2 Method and fundamental theory of multilayer
errors. These problems are caused by the nature of ANNS. perceptron neural networks
We discuss these on multilayer perceptron neural networks.
By studying of these problems, it helps us to have a better MLPNNS are modeled on the human brain and consist of a
understanding on its classification. number of artificial neurons. MLPNNS are self-organizing,
adaptive, learning and fault tolerance [7]. MLPNNS can be
Keywords Artificial neural networks  used to imitate human brain activities [8, 9]. MLPNNS
Multilayer perceptron neural networks  Subject  have parallel distributed information processing structure
Class  Classification  Error in which nonlinear regression is applied to input/output
mappings. Without employing any mathematical model,
MLPNNS learn by past experiences and process those
1 Introduction nonlinear, noisy, or imprecise data through simulation,
memorization and correlation. The subject classification is
In subject classification, artificial neural networks (ANNS performed by self-adaptation method of pattern recognition
for short) are efficient and objective classification methods. [10].
Since 90s, hundreds of papers have been published about The major MLPNNS algorithms include: Hebb’s rule,
different types of classifications, such as the classifications of Delta rule, Kohenon’s learning law, and BP algorithm
economic systems, remote sensing images, communication [6, 11]. BP algorithm, a back propagation algorithm, is
signals, the failure of rocket engines, energy levels, Chinese developed by Rumelhart et al., the PDP Group, in 1985. It
medicine, fingerprint identification, environmental qualities, realized the concept of multilayer networks proposed by
water sources, stabilities of rock, and ship noises [1–4]. Minsky. As a typical feedback, each of MLPNNS consists
However, some examples have indicated that there are still of an input layer, an output layer and at least one hidden
some problems with some of the ANNS classification. In this layer. Figure 1 is the topological structure.
The neural network with error back propagation algo-
rithm is called BP network [12] whose learning process
L. Feng (&)
Department of Geography, Zhejiang Normal University, includes two phases: forward phase, and backward phase. In
No. 688 Yingbin Road, Jinhua 321004, China forward phase, by Sigmoid function ½f ð xÞ ¼ 1=ð1 þ ex Þ;
e-mail: fenglh@zjnu.cn signal propagates forward layer by layer. The status of
neural cell at one layer can influence only the status of
W. Hong
Department of Mathematics, Clayton State University, neural cell at next layer. If output signal is not as desired,
Morrow, GA 30260, USA changing the weights of neural cell at each layer, and at the

123
378 Neural Comput & Applic (2009) 18:377–380

The error was calculated through backward propaga-


tion (m = m, m - 1, …, 1).
6. Update the weights and thresholds and propagate them
backward, layer by layer:
Wijm ðt þ 1Þ ¼ Wijm ðtÞ þ gdm j yi
m1
h i
þ a Wijm ðtÞ  Wijm ðt  1Þ ð5Þ
h i
hm
j ð t þ 1 Þ ¼ h m
j ð t Þ þ gd m
j þ a h m
j ð t Þ  h m
j ð t  1 Þ ð6Þ

where t is the iteration, g is the learning-rate parameter


[g [ (0, 1)], and a is the momentum constant [a [ (0, 1)].
7. Go back to step 2, repeat the process 2–7 until the
selected error criterion is satisfied
XX 2 
E¼ Tjk  ymj 2 ð7Þ
k j

Fig. 1 The topological structure of a simple three-layer-feed-forward After the network training, all weights and thresholds are
neural network
determined. Thus, it can start the classification.

same time propagating backward the errors associated with For simplicity, Table 1 lists five typical samples S
output neurons. Error is calculated recursively until the (about equal to criteria of five classes) each with eight
selected error criterion is satisfied. attributes xi. The criteria of Table 1 are used as five
Assume an m layer neural network. ym j is the output of learning samples. For the input variables, the desired out-
the jth node at the mth layer, while y0j = xj is the input of puts of BP network are: 0.1, 0.3, 0.5, 0.7, and 0.9. The
the jth node. Let Wm ij be the weight between yi
m-1
and ym
j , network would therefore have 8 input nodes, 1 output node,
m and 12 hidden nodes. Thus, the topological structure of BP
and hj be the threshold of the jth node at the mth layer. The
BP network training can be described as follows [13]: network is (8, 12, 1).
In order to speed up the calculation, the original data
1. Set all the weights and thresholds to random numbers should be normalized:
between -1 and 1.
2. Chose a pair of number (xk, Tk) from training data as x0i ¼ ðxi  xmin Þ=ðxmax  xmin Þ ð8Þ
the input variables, put these input variables to the where xmax and xmin are the maximum and minimum values
input layer (m = 0), and let of the attributes, respectively. Hence, x0i is within [0, 1].
y0i ¼ xki ; for all nodes i ð1Þ The normalized data is used as the input data of the BP
network, and the training parameters are selected for training
where k is the layer number. and learning processes. Here, the learning-rate parameter g is
3. Signal propagates forward through network by Eq. (2) 0.6, and the momentum parameter a is 0.5. After 100,000
which calculates the output ym j of the jth node from the trainings and learnings, the total error of network
first layer to the last one. E = 0.0002. Thus, the network minimizes error to the
!
  X desired threshold. The correspondent outputs of the network
m
ymj ¼ F sj
m
¼F Wijm ym1
i þ hj ð2Þ are 0.11, 0.30, 0.50, 0.70, and 0.89. Using the median of these
i
outputs as classification criteria, we get five classes as shown
where F(s) is Sigmoid function. in Table 2.
4. Calculate the error of node at the output layer:
   Table 1 Five typical samples S each with attributes xi
dm
j ¼ y m
j 1  y m
j T k
j  y m
j ð3Þ Samples x1 x2 x3 x4 x5 x6 x7 x8

The error is the difference between the real output and S1 2 2 2 2 2 2 2 2


the desired value. S2 4 4 4 4 4 4 4 4
5. Calculate the error of the previous layer: S3 6 6 6 6 6 6 6 6
 X S4 8 8 8 8 8 8 8 8
0 m1
dm1
j ¼ F s j Wijm dm
i ð4Þ S5 10 10 10 10 10 10 10 10
i

123
Neural Comput & Applic (2009) 18:377–380 379

Table 2 The five classes based


Number 1 2 3 4 5
on the outputs
Desired outputs 0.1 0.3 0.5 0.7 0.9
Network outputs 0.11 0.30 0.50 0.70 0.89
Demarcation points [0, 0.20] (0.20, 0.40] (0.40, 0.60] (0.60, 0.80] (0.80, 1.00]
Classes I II III IV V

Since the trained network has already simulated and 4. The change of learning rate causes the change of
memorized the relationship between the input and the classification. The bigger the learning rate g is, the
output variables, it can be used for subject classification. faster the weights update. General speaking, g can be
Giving a set of data as the input data, through well trained big if only the calculation is stable [14]. But the results
network, and after that, comparing with classification cri- show that, when g changes, sometimes, the classifica-
teria, the input can be told which class it belongs to. tion changes. For the previous s1, when g = 0.1,
y1 = 0.44, which belongs to class III; when
g = 0.000001, y1 = 0.62, which belongs to class IV.
3 The classification problem 5. The change of momentum parameter causes the
change of classification. In calculation, the momentum
The calculation of real data shows that some problems exist parameter a is adjusted according to the experience
using a well trained MLPNNS: [14]. The results show that when a changes, classifi-
cation changes accordingly. For the same s2, when
1. The change of positions of attributes in input data a = 0.8, y2 = 0.84, which belongs to class V; when
causes the possible classification changes. Let s1 and s2 a = 0.1, y2 = 0.73, which belongs to class IV.
be two samples with the same attributes: 6. The different number of learning times causes the
s1 ¼ ð4; 4; 4; 4; 10; 10; 10; 10Þ different classification. The outputs of BP network
s2 ¼ ð10; 10; 10; 10; 4; 4; 4; 4Þ show that when the learning times T is different, the
classification is different. For instance, let s5 be:
The only difference between s1 and s2 is the position
s5 ¼ ð5; 5; 5; 5; 5; 5; 5; 5Þ
change between x1 * x4 and x5 * x8. But the outputs
of BP networks are: y1 = 0.46 (belongs to class III), When T = 10,000, y5 = 0.37, which belongs to class II;
y2 = 0.74 (belongs to class IV). The results indicate that when T = 100,000, y5 = 0.44, which belongs to class III.
for the same subject, the position change of its attributes 7. The different criteria cause the different classification.
causes the classification change. At present, there is no common rule about the criterion,
2. For those samples with large values of attributes, BP it is chosen by one’s experience. The calculation shows
network losses its ability of classification. Let s3 be a that with different criteria, classification would be
sample with some big attributes: different. For example, for the sample s5, when
s3 ¼ ð2; 2; 2; 100; 100; 100; 100; 100Þ E \ 0.05, y5 = 0.47, which belongs to class III; when
E \ 0.005, y5 = 0.38, which belongs to class II.
In this sample, five attributes are as big as 100. The result 8. It is difficult for BP network to discriminate sharply.
of BP network is y3 = 0.29 (belongs to class II), it can be For instance, let s7 and s8 be:
seen that the classification is not true. s7 ¼ ð6; 4; 4; 4; 4; 4; 4; 4Þ
3. The difference of the number of nodes in hidden layer
s8 ¼ ð60; 4; 4; 4; 4; 4; 4; 4Þ
causes the difference of classification. At present, there
is no theoretical justification about the number of From general knowledge, s8 should belong to a higher class
nodes in hidden layer [14], thus, in practice the choice than that s7 belongs to. The outputs of BP network, how-
of the number of nodes in hidden layer is kind of ever, show that they belong to the same class: y7 = 0.37
arbitrary. The outputs from the network show that a (class II), y8 = 0.37 (class II). Thus we can see that the
different number of nodes in hidden layer correspon- discrimination ability of BP network is limited.
dents to a different classification. For example, 9. The bigger the attribute’s value is, the lower the class
s4 ¼ ð2; 2; 2; 2; 8; 8; 8; 8Þ to that it belongs. For instance, let s9 and s10 be two
different samples:
When the number of nodes in hidden layer, n = 4,
y4 = 0.48 which belongs to class III while when n = 10, s9 ¼ ð10; 5; 2; 2; 2; 2; 2; 2Þ
y4 = 0.31 which belongs to class II. s10 ¼ ð30; 5; 2; 2; 2; 2; 2; 2Þ

123
380 Neural Comput & Applic (2009) 18:377–380

References
In s9, x1 = 10, while in s10, x1 = 30, from this we can see
that the class that s10 belongs to is higher than the class that 1. McCulloch WS, Pitts W (1943) A logical calculus of the ideas
s9 belongs to. But the outputs of BP network are: y9 = 0.25 immanent in nervous activity. Bull Math Biophys 5:115–137
(class II), y10 = 0.16 (class I). 2. Minsky M, Papert SA (1987) Perceptrons-expanded edition: an
introduction to computational geometry. MIT Press, Cambridge
3. Zhang ZH, Lu YC, Zhang P (1999) Discovering classification
rules by using the neural networks. Chin J Comput 22(1):108–112
4 Conclusions 4. Li Q, Wang ZZ (2000) Remote sensing information classification
based on artificial neural network and knowledge. Acta Autom
Sin 26(2):233–239
The advantage of MLPNNS is that there is no need to devise a 5. Callan R (1994) The essence of neural networks. Prentice Hall,
mathematical model in order to perform a specific task. Englewood Cliffs
MLPNNS process information through interconnected pro- 6. Coppin B (2004) Artificial intelligence illuminated. Jones and
Bartlett Publishers, Sudbury
cessing elements (neurons). By nonlinear fitting, MLPNNS 7. Kohonen T (2000) Self-organizing maps. Springer, New York
realize the mapping between input and output. MLPNNS 8. Chen TP (1994) Approximation problems in system identification
have the advantages of self-learning, self-organization, with neural networks. Sci China (Ser A) 24(1):1–7
self-adaptation, and fault tolerance. Thus, MLPNNS have 9. Lippmann RP (1987) An introduction to computing with neural
nets. IEEE ASSP Mag 4(2):4–22
been successfully applied to pattern recognition and system 10. Bishop C M (1996) Neural networks for pattern recognition.
identification. However, according to our experience, Oxford University Press, New York
MLPNNS do not always comply with real situations, and 11. Hebb DO (1949) The organisation of behavior: a neuropsycho-
sometimes, they can cause large errors with testing samples. logical theory. Lawrence Erlbaum Assoc, New Jersey
12. Gurney K (1997) An introduction to neural networks. UCLA
These problems are caused by the nature of MLPNNS. By Press, Los Angels
studying of these problems, it helps us to have a better 13. Zhou JC, Zhou QS, Han PY (1993) Artificial neural network—
understanding on MLPNNS classification and find a way to the implementation of the 6th computer. Publishing House of
improve their performance. Science Popularization, Beijing, pp 47–51
14. Xu LN (1999) Neural network control. Publishing House of
Haerbin Industry University, Haerbin, pp 15–16
Acknowledgments This work was supported by National Natural
Science Foundation of China (No. 40771044) and Zhejiang Provincial
Science and Technology Foundation of China (No. 2006C23066). We
would like to thank the editor and reviewers for their comments that
improved the article.

123

Vous aimerez peut-être aussi