Vous êtes sur la page 1sur 6

48 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 32, NO.

1, FEBRUARY 2002

[20] A. Salski, O. Fränzle, and P. Kandzia, “Fuzzy logic in ecological mod-


eling,” Ecol. Mod., vol. 85, 1995.
[21] J. Siskos, J. Lochard, and J. Lombard, “A multicriteria decision
making methodology under fuzziness: Application to the evaluation
of radiological protection in nuclear power plants,” in Fuzzy Sets
and Decision Analysis, H. J. Zimmermann, L. A. Zadeh, and B. R.
Gaines, Eds. Amsterdam, The Netherlands: North Holland, 1984, pp.
261–264.
[22] R. R. Yager, “Quasi-associative operations in the combination of evi-
dence,” Kybernetes, vol. 16, pp. 37–41, 1987.
[23] , “On ordered weighted averaging aggregation operators in multi-
criteria decision making,” IEEE Trans. Syst., Man, Cybern., vol. 18, pp.
183–190, Jan./Feb. 1988.
[24] L. A. Zadeh, “Fuzzy sets,” Inf. Contr., vol. 8, pp. 338–353, 1965.
[25] , “Similarity relations and fuzzy orderings,” Inf. Sci., vol. 3, pp.
177–200, 1971.
[26] , “Fuzzy sets and information granularity,” in Advances in Fuzzy
Fig. 1. Cell samples in the myelocytic series: (a) Myeloblast;
Sets Theroy and Applications, R. K. Ragade, M. M. Gupta, and R. R.
(b) Promyelocyte; (c) Myelocyte; (d) Metamyelocyte; (e) Band; and (f) PMN.
Yager, Eds. Amsterdam, The Netherlands: North-Holland, 1979, pp.
3–18.
nucleus shape changes from round-shaped to segments, and its texture
is coarser. Sample images of all six cell classes are shown in Fig. 1.
In an effort to overcome the tedious and time-consuming task of
human experts in counting white blood cells in bone marrow or periph-
eral blood, many automated techniques have been proposed [3]–[10].
System-Level Training of Neural Networks for Although some commercial products have been introduced for periph-
Counting White Blood Cells eral blood, the procedure has not been automated for bone marrow due
to the complexity of the images.
Nipon Theera-Umpon and Paul D. Gader White blood cells are categorized by age (continuous variable)
into discrete (noncontinuous) classes. The standard approach is to
segment, do feature extraction, classify each white blood cell into one
Abstract—Neural networks (NNs) that are trained to perform classifica-
tion may not perform as well when used as a module in a larger system. In of the classes, and then count the numbers that are assigned to each
this correspondence, we introduce a novel, system-level method for training class. Generally, background subtraction, histogram manipulation,
NNs with application to counting white blood cells. The idea is to phrase thresholding, and cell modeling are utilized in segmentation [3]–[6].
the objective function in terms of total count error rather than the tradi- Bayes classifiers, learning vector quantization (LVQ), and multilayer
tional class-coding approach because the goal of this particular recogni-
tion system is to accurately count white blood cells of each class, not to perceptrons have all been used in classification [3], [7]. Performance
classify them. An objective function that represents the sum of the squared results vary significantly in the literature depending both on the data
counting errors (SSCE) is defined. A batch-mode training scheme based on set and on the methodology for scoring the results. Beksaç et al. [3]
back-propagation and gradient descent is derived. Sigma and crisp counts achieved a 61% classification rate using neural networks (NNs) in
are used to evaluate the counting performance. The testing results show a 16-class problem consisting of cells in peripheral blood and bone
that the network trained to minimize SSCE performs better in counting
than a classification network with the same structure even though both are marrow. Sohn [7] achieved up to 78% classification rate using NNs
trained a comparable number of iterations. This result is consistent with with tenfold cross validation in a 6-class problem using manually
the principle of least commitment of Marr. segmented cells from bone marrow. It is also shown in [7] that the
Index Terms—Counting error objective function, neural networks (NNs), classification performances of multilayer perceptrons are similar to
sigma count, white blood cell counting. those of Bayes classifiers and better than LVQ. One major difficulty
in automating blood cell counting is that there is significant ambiguity
between classes since we attempt to discretize a continuous variable
I. INTRODUCTION
using subjective methods. Experts working in pathology laboratories
Relative counts of different classes of white blood cells in bone can produce counts that vary by as much as 15%. Even the same
marrow aid in the diagnosis of diseases, such as leukemia. According expert can produce different counts on different days. Given that
to the myelocytic or granulocytic series [1], [2], white blood cells the assignments of blood cells to classes by experts are somewhat
in human bone marrow are classified into six discrete classes in ambiguous, the traditional assignment of desired outputs in the strict
accordance with their ages, namely, Myeloblast, Promyelocyte, class-coded mode is problematic (class-coded outputs are 1 for the
Myelocyte, Metamyelocyte, Band, and PMN, respectively, ordered correct class and 0 for all others). Similar looking samples may be
from the youngest to the oldest. When a white blood cell becomes placed into different classes. Therefore, by class-coding the outputs,
older, several features change. For example, its size is smaller, its we may require that very similar patterns get mapped to quite different
outputs. This can create conflict in training. Requiring that cells be
strictly divided into groups is not necessary. The end requirement is
Manuscript received May 1, 2001; revised February 6, 2002. This paper was
recommended by Associate Editor M. Embrechts. that we produce an accurate count reflecting the percentage of blood
N. Theera-Umpon is with the Department of Electrical Engineering, cells of each type.
Chiang Mai University, Chiang Mai 50200 Thailand (e-mail: DrNipon@chi- We define an objective function that more accurately reflects the goal
angmai.ac.th). of the automated system, the sum of the squared counting error (SSCE).
P. D. Gader is with the Computer and Information Science and Engineering
Department, University of Florida, Gainesville, FL 32611 USA (e-mail: That is, we attempt to train a NN by minimizing the error between the
pgader@cise.ufl.edu). total number of cells in a class estimated by the network and estimated
Publisher Item Identifier S 1094-6977(02)04676-X. by an expert. Desired outputs are not included at the individual cell

1094-6977/02$17.00 © 2002 IEEE

Authorized licensed use limited to: Kongu Eningerring College. Downloaded on August 7, 2009 at 01:33 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 32, NO. 1, FEBRUARY 2002 49

level; only at the class level. There is a desired number of cells per
class, but the network may achieve that count in a variety of ways. The
testing results show that, although yielding lower classification rates,
the network trained to minimize SSCE performs better at counting than
a classification network with the same structure even though both are
trained a comparable number of iterations.
This result is consistent with other results we have achieved in our
experiments in the field of handwriting recognition. In particular, we
have shown repeatedly that, in the handwritten word recognition, high
character recognition rates are not good indicators of high word recog-
Fig. 2. Notation used to derive training rule.
nition rates in standard, lexicon-driven handwritten word recognition
systems [11], [12]. This led us to investigate the issue of word level
training in which the character level NNs are trained using word level achieve minimize classification error (albeit indirectly). We propose a
objective functions. In that case, as in the case discussed in this cor- new training scheme running in the batch mode to achieve the minimum
respondence, the results indicate that system level training is more ap- counting error.
propriate for training NNs to perform as system components. We need to establish notation. Let xn = [x1; n ; x2; n ; . . . ; xP; n ]
We point out that the objective functions and design methodologies denote an input feature vector where P is the number of features, and
in the blood cell problem and the handwriting problem are quite dif- let Q denote the number of classes. Let the training set be fxn g, n =
ferent although the guiding principles are the same. The principle is 1; 2; . . . ; N . We assume a standard feed-forward classification net-
that system level objective function lead to recognition systems that work with one output node for each class. If xn is an input to the net-
perform the ultimate task better than intermediate level objective func- work, we let oq; n denote the output value of the q th output node. Let
tions. These results can be interpreted as special cases of the principle the number of cells assigned to the q th class by an expert be cexp; q . We
of least commitment of Marr [13] which states that decisions should be define the sigma count of the q th class be
delayed as long as possible. In this case, the decision concerns setting
N
the desired outputs of NNs. By training them to count rather than clas-
sify, we allow more flexibility in producing outputs for each particular csigma; q = oq; n ; q = 1; 2; . . . ; Q: (1)
n=1
input sample as long as the sum of the outputs over the entire training
set is close to the desired count. The sigma count is a key notion in this approach. If the outputs are
In Section II, we describe the data. In Section III, we define the all 0 and 1, then the sigma count for a given class is just the number of
counting objective function and discuss training. We describe experi- cells that are assigned an output value of 1 for that class. If we set the
ments and evaluation measures in Section IV. In Section V, we provide outputs from a given input sample xn to 0 or 1 by the rule
results and in Section VI we conclude.
ccrisp; q; n =
1; if oq; n  ok; n for k = 1; . . . ; Q
(2)
II. DATA DESCRIPTION 0; else
The bone marrow images used in the experiments were collected at then the quantity
the University of Missouri Ellis-Fischel Cancer Center. Multicell im-
ages were captured from slides by an Olympus BX50 microscope, a N

B/W CCD camera, and a digitizer (8 bits/pixel, PDI IMAXX) at 6002 ccrisp; q = ccrisp; q; n ; q = 1; 2; . . . ; Q (3)
n=1
magnification. Individual cells were detected and cropped manually
to form single-cell images. Each single-cell image was classified by is called the crisp count.
Dr. C. William Caldwell, Professor of Pathology and Director of the With the sigma count, we do not require that outputs are all 0 and 1.
Pathology Laboratory at the center. Therefore, each cell can contribute partially to each class. For example,
The data set contains 526 single-cell images of 33 Myeloblasts, a cell that is an “old” Promyelocyte and a “young” Myelocyte may
61 Promyelocytes, 77 Myelocytes, 93 Metamyelocytes, 128 Bands, have an output of 0.5 for both classes, thereby reflecting the continuous
and 134 PMNs. Cell segmentation is still an area of research. To aging of cells more accurately.
de-couple the effects of segmentation errors from the recognition The SSCE objective function is defined to be the sum of the squares
issues to be compared (class-coded versus minimum count error of the differences between the sigma counts and the expert counts.
objective functions), each single-cell image was segmented manually There are no cell level desired outputs
into three regions—nucleus, cytoplasm, and background—with the
gray scales of 0, 176, and 255, respectively. Q N
2
Ten features are extracted from each single-cell image. These are E= 1
2 oq; n 0 cexp; q
the best features found in [7]. There are six features extracted from q =1 n=1

the nucleus shape, namely, circularity, elongation, thickness variance, Q

and Fourier descriptors 3, 12, and 15. The remaining features are tex- =
1
2 [csigma; q 0 cexp; q ]
2
: (4)
ture features, namely, light number of patches in nucleus, energy of cy- q =1

toplasm region, and correlation and variance in a cell. These features


were extracted from each single-cell image without any preprocessing. The SSCE objective function in (4) is right on this particular
problem, i.e., the count estimate by the network (sigma count) for
each class is close to that of the expert as much as possible. This can
III. SYSTEM LEVEL TRAINING
be achieved by minimizing the objective function in (4).
The method of training feed-forward classification networks with Consider a feed-forward network with three layers of neurons and
the back-propagation algorithm and class-coded outputs is well known two layers of weights. The network training rule is derived using gra-
[14]. The class-coded objective of the classification network seeks to dient descent. The notation used is shown in Fig. 2.

Authorized licensed use limited to: Kongu Eningerring College. Downloaded on August 7, 2009 at 01:33 from IEEE Xplore. Restrictions apply.
50 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 32, NO. 1, FEBRUARY 2002

The derivative of E with respect to a weight wji in any layer is TABLE I


COUNT ESTIMATES
Q N
@E @oq; n
@wji
= [csigma; q 0 cexp ; q]
@wji
: (5)
q=1 n=1

If wji is in the output layer, then


@oq; n @oq; n @vj; n
= (6)
@wji @vj; n @wji
@oq; n '0j (vj; n ) if j = q
= (7)
@vj; n 0 else
@vj; n
= yi : (8)
@wji rate, defined as the number of cells correctly assigned to the classes
Substituting (7) and (8) into (6) yields (according to the expert) divided by the total number of cells.
We measure the classification rates and algorithm counts for a stan-
@oq; n '0j (vj; n )yi ; if j = q dard classification net and a net trained to minimize counting error on
= (9)
@wji 0; else. both training and test sets. Thus, there are 12 percentage values that we
measure: percVal = fcrClTr, crClTe, ccClTr, scClTe, scClTr, scClTe,
crCoTr, crCoTe, ccCoTr, scCoTe, scCoTr, scCoTeg, where
Substituting (9) into (5) yields

@E
N cr classification rate;
@wji
= [cactual; q 0 cexp ; q] '0j (vj; n )yi : (10) cc crisp count rate;
n=1 sc sigma count rate;
Consider the j th neuron in the hidden layer Cl classification NN;
Co counting NN;
@oq; n @oq; n @yj; n @vj; n
= (11) Tr training set;
@wji @yj; n @vj; n @wji Te test set.
@oq; n Generally, when n-fold cross validation is applied to a classifica-
= wqj (12)
@yj; n tion problem, we sum the numbers of correct classifications divided by
the total number of data points over n folds as the overall classification
@yj; n
= 'j (vj; n )
0
(13) rate. However, this method should not be applied to the counting perfor-
@vj; n
mance measure. If the summation of count estimates in each class over
@vj; n n folds is used, then an underestimated count in one fold can compen-
= yi; n : (14)
@wji sate an overestimated count in another fold. The compensation yields
an artificially high overall counting rate, while, in fact, both underes-
Substituting (12)–(14) into (11) yields
timation and overestimation yield lower counting rates in both folds.
@oq; n For example, if in twofold cross validation, we overestimate by ten in
= wqj '0 (vj; n )yi; n : (15)
@wji one fold and underestimate by ten in the other fold, then the method of
Substituting (15) into (5) yields summing the errors would produce an error of zero which is obviously
incorrect. Therefore, the counting rate in each fold is calculated first,
@E
Q N
and then the average counting rate over n folds is used as the overall
= wqj [csigma; q 0 cexp; q ] '0j (vj; n )yi; n : (16) counting rate. This method is more pessimistic, but more accurate, and
@wji q =1 n=1
is the method that we use.
Thus, the update equation of weights at the (m + 1)th iteration is For clarity, we provide an example. Consider a fourfold cross vali-
dation experiment in which the actual counts for each class are all 100.
@E(m)
wqi (m + 1) = wqi (m) 0  (17) Assume that the count estimates are as shown in Table I. The correct
@wqi (m) counts for each class in this case should all be 25.
where @E=@wqi is calculated from (10) or (16) for the output layer or If we sum the estimates over the folds, then the estimated number of
the hidden layer, respectively, and  is a learning rate. cells per class would be 100, 90, 100, 110, 95, and 105, respectively,
which looks quite accurate. Indeed, this yields a counting rate of (1 0
IV. EXPERIMENTAL FRAMEWORK 30/600) 2 100% = 95.0% which is misleading, since the counts are
inaccurate. However, by computing the error over each fold first and
A. Evaluation Measures averaging those, we achieve a counting rate of (80.0 + 80.0 + 66.7 +
The counting rate is used as an evaluation measure in the experi- 66.7)/4 = 73.4%, which is more accurate.
ments. The counting rate is defined by
6 B. Algorithm Descriptions
jcexp 0 c
;q alg; q j We used fourfold cross validation in the experiments because
Counting rate = 1 0 q =1

6
2 100%: (18) formal training and test sets are not available for this data set. More
jcexp j ;q specifically, we randomly divided data in each class into four groups
q =1
with approximately equal numbers of data points. For example, the
where cexp; q and calg; q are the expert’s and algorithm’s counts of the 33 Myeloblasts are divided into four groups of 8, 8, 8, and 9 data
number of cells in class q , respectively. The notation calg; q can refer to points. In each fold, data points in three groups (about 75% of the
either ccrisp; q or csigma; q . We also measure the average classification entire data) are used as a training set, the data points in the remaining

Authorized licensed use limited to: Kongu Eningerring College. Downloaded on August 7, 2009 at 01:33 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 32, NO. 1, FEBRUARY 2002 51

group (about 25% of the entire data) is used as a test set. Hence,
we have four folds of data. The training and test sets in each fold
are independent. Moreover, the experiment using data in each fold
is done independently. Hence, the cross validation is used here for
separating the data set into several groups of training and test sets, not
for avoiding over fitting [15].
The counting and classification percentages of all four folds were
averaged at the end of the cross validation. To remove effects of net-
work initialization, we performed the cross validation ten times and
averaged the counting and classification percentages at the end. The
counting network should not be trained from initial random weights
because, without any prior information, it can easily achieve the goal
of achieving the correct counts by producing an output of 1=N q for
each class, which is what it does. Therefore, a classification network
is trained as the initialization for both the classification and counting
networks, as indicated in the following algorithm:

For (k = 1 to 10) /3 experiment #1 to 10 3 /


For (i = 1 to 4) /3 fold #1 to 4 3 /
Initialize initClassNet; Fig. 3. Average (over 40 networks) classification rates on training set.
Train initClassNet using the Levenberg-Marquardt (LM) algo-
rithm for ten epochs;
initEp = 10;
While (initEp  50)
Initialize classNet and countNet with initClassNet;
Train classNet using gradient descent for 50 epochs;
Train countNet using gradient descent for 50 epochs;
Test classNet & countNet on training & test sets;
Calculate percVal[k][i][initEp];
If (initEp < 50)
Train initClassNet using the LM algorithm for five epochs;
End If (initEp < 50)
initEp = initEp+5;
End While (initEpoch  50);
End For (i = 1 to 4);
Average evaluation measures in percVal[k] over 4 folds;
End For (k = 1 to 10)
Average evaluation measures in percVal over ten experiments.

A classification network is initially trained for ten epochs using the


LM algorithm [16]. An iterative loop is then entered in which a re- Fig. 4. Average (over 40 networks) classification rates on test set.
fined classification network and the counting network are each trained
using gradient descent for the same number of epochs. Their classi- TABLE II
fication and counting rates on the training and test sets are recorded SAMPLE CONFUSION MATRIX FROM CLASSIFICATION NETWORK
ON TRAINING SET
for each time through the loop using the variable percVal defined in
the previous section. It is worthwhile noting that we use the LM algo-
rithm to train initial classification networks by virtue of its fast con-
vergence. The trainings of final classification and counting networks
in each fold are achieved by using gradient descent. An appropriate
number of parameters in a network and when the training should stop
are difficult to determine [17], [18]. In the experiment, the training of a
classification network is terminated when the sum of the squared errors
is smaller than 1006 . Similarly, the training of a counting network is
terminated when the SSCE is smaller than 1006 . As mentioned in the
algorithm, the maximum number of epochs used in training both net-
works is 50. Overall, we trained and tested 360 classification networks
and 360 counting networks (four folds, ten experiments, and nine dif-
ferent initial epochs).
for classification which achieved 78% classification rate using tenfold
cross validation. The learning parameters are set as follows: cnt =
V. EXPERIMENTAL RESULTS
In the experiments, we use a network with one hidden layer con- 1006 , class = 1002 , activation function = sigmoid. The classifica-
taining 12 hidden neurons. This is the best architecture found in [7] tion network is trained to output 1 at the output node corresponding to

Authorized licensed use limited to: Kongu Eningerring College. Downloaded on August 7, 2009 at 01:33 from IEEE Xplore. Restrictions apply.
52 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 32, NO. 1, FEBRUARY 2002

TABLE III
SAMPLE CONFUSION MATRIX FROM COUNTING NETWORK ON TRAINING SET

TABLE IV
SAMPLE CONFUSION MATRIX FROM CLASSIFICATION NETWORK ON TEST SET

Fig. 5. Average (over 40 networks) crisp counting rates on training set.

TABLE V
SAMPLE CONFUSION MATRIX FROM COUNTING NETWORK ON TEST SET )

Fig. 6. Average (over 40 networks) sigma counting rates on training set.

the actual class, and 0s at the other output nodes. In order to remove
effects of different initializations, each final rate is averaged over four
folds and over ten experiments. We believe that this approach would
strengthen the results conducted from the experiments to compare net-
works’ performances [19].
Figs. 3 and 4 depict the plots of overall classification performance on
the training and test sets, respectively. It is worthwhile noting that we
cannot show all confusion matrices here because there is one confusion
matrix for each cross validation which leads to totally 360 confuson
matrices for training and testing the classification and counting net-
works on ten independent experiments and nine different initial epochs.
Tables II–V are samples of confusion matrices from an experiment with
ten initial epochs in fourfold cross validation. Each element in a confu-
sion matrix is the sum of four elements from four folds at that location.
Therefore, if we sum the elements in a confusion matrix of the test set
along a row, we will have a total number of cells in that class. If we do
so in a confusion matrix of the training set, the result will be triple of
the total number of cells in that class. The plots shown in Figs. 3 and 4 Fig. 7. Average (over 40 networks) crisp counting rates on training set.

Authorized licensed use limited to: Kongu Eningerring College. Downloaded on August 7, 2009 at 01:33 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 32, NO. 1, FEBRUARY 2002 53

REFERENCES
[1] L. W. Diggs, D. Sturm, and A. Bell, The Morphology of Human Blood
Cells. Abbott Park, IL: Abbott Laboratories, 1985.
[2] V. Minnich, Immature Cells in the Granulocytic, Monocytic, and Lym-
phocytic Series. Chicago, IL: Amer. Soc. Clin. Pathol., 1982.
[3] M. Beksaç, M. S. Beksaç, V. B. Tipi, H. A. Duru, M. U. Karakas, and
A. Nur Çakar, “An artificial intelligent diagnostic system on differential
recognition of hematopoietic cells from microscopic images,” Cytom-
etry, vol. 30, pp. 145–150, 1997.
[4] H. Harms, H. Aus, M. Haucke, and U. Gunzer, “Segmentation of stained
blood cell images measured at high scanning density with high mag-
nification and high numerical aperture optics,” Cytometry, vol. 7, pp.
522–531, 1986.
[5] J. Park and J. Keller, “Fuzzy patch label relaxation in bone marrow cell
segmentation,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., Orlando,
FL, 1997, pp. 1133–1138.
[6] S. S. S. Poon, R. K. Ward, and B. Palcic, “Automated image detection
and segmentation in blood smears,” Cytometry, vol. 13, pp. 766–774,
1992.
[7] S. Sohn, “Bone marrow white blood cell classification,” M.S. thesis,
Univ. Missouri, Columbia, 1999.
[8] N. Theera-Umpon, “Morphological granulometric estimation with
Fig. 8. Average (over 40 networks) sigma counting rates on test set. random primitives and applications to blood cell counting,” Ph.D.
dissertation, Univ. Missouri, Columbia, 2000.
[9] N. Theera-Umpon and P. D. Gader, “Counting white blood cells using
should provide enough information on the overall classification perfor- morphological granulometries,” J. Electron. Imag., vol. 9, no. 2, pp.
mances of both networks. The plots of overall crisp and sigma counting 170–177, 2000.
performances of both sets of networks are shown in Figs. 5–8. [10] N. Theera-Umpon, E. R. Dougherty, and P. D. Gader, “Non-homothetic
granulometric mixing theory with application to blood cell counting,”
Pattern Recognit., vol. 34, no. 12, pp. 2547–2560, 2001.
[11] J. Chiang and P. D. Gader, “Hybrid fuzzy-neural systems in handwritten
word recognition,” IEEE Trans. Fuzzy Syst., vol. 5, pp. 497–510, Nov.
VI. DISCUSSION AND CONCLUSION 1997.
[12] P. D. Gader, J. M. Keller, R. Krishnapuram, J. H. Chiang, and M. Mo-
As we expected, Fig. 3 shows that the overall classification perfor- hamed, “Neural and fuzzy methods in handwriting recognition,” IEEE
mance of the classification network is clearly better than that of the Comput., vol. 30, pp. 79–86, Feb. 1997.
counting network, especially on the training set. From Fig. 4, on the test [13] D. Marr, Vision. San Francisco, CA: Freeman, 1982.
[14] S. Haykin, Neural Networks: A Comprehensive Founda-
set, the classification network has a better overall classification perfor-
tion. Englewood Cliffs, NJ: Prentice-Hall, 1999.
mance when the number of epochs used in the network initialization is [15] N. Morgan and H. Bourlard, “Generalization and parameters estimation
small, they have comparative performances when the number of epochs in feedforward nets: Some experiments,” in Advances in Neural Infor-
is large. mation Processing Systems 2, D. S. Touretzky, Ed. San Mateo, CA:
As shown in Figs. 5–8, the counting network has better counting Morgan Kaufman, 1990, pp. 630–637.
performance for both the crisp and sigma counts, and on both training [16] H. Demuth and M. Beale, Neural Network Toolbox: For Use With
and test sets. The crisp count performed better than the sigma count. MATLAB. Natick, MA: Mathworks, 1998.
[17] S. Geman, E. Bienenstock, and R. Doursat, “Neural networks and the
These results were achieved using extensive experimentation. bias/variance dilemma,” Neural Comput., vol. 4, pp. 1–58, 1992.
The reason that the crisp count yields better counting performance [18] L. Prechelt, “Early stopping—But when?,” in Neural Networks: Tricks
than that of the sigma count is that, for each input vector, a network of the Trade, G. B. Orr and K.-R. Muller, Eds. Berlin, Germany:
produces a count of one to a class with the maximum output. Therefore, Springer-Verlag, 1998, pp. 55–69.
the total number of cells from the crisp count and the total number of [19] F. Provost, T. Fawcett, and R. Kohavi, “The case against accuracy es-
true counts are identical. On the other hand, the total number of cells timation for comparing induction algorithms,” in Proc. 15th Int. Conf.
Machine Learning, Madison, WI, July 1998, pp. 445–453.
from the sigma count is generally different from that of true counts.
Hence, the sigma count is more likely to yield a worse counting rate
than the crisp count.
In this particular problem, the main goal is to achieve accurate
counts. Classification is only an indirect tool to achieve that goal.
In this correspondence, we have shown that forming the objective
function in terms of the main, system-level goal can increase the
overall performance of NNs as modules in overall systems. This is a
consistent and important theme in the design and implementation of
recognition system applications.

ACKNOWLEDGMENT

The authors would like to thank the reviewers for their useful
remarks.

Authorized licensed use limited to: Kongu Eningerring College. Downloaded on August 7, 2009 at 01:33 from IEEE Xplore. Restrictions apply.

Vous aimerez peut-être aussi