Académique Documents
Professionnel Documents
Culture Documents
Abstract—Tuberculosis (TB) is an infectious disease in low and Fourier descriptors are extracted from the boundary figure of
middle-income countries. There are many methods of physical each object. As for the image recognition, the authors used
examinations for tuberculosis detection, but the most effective three different classification algorithms and after analysis of the
method is visual examination using microscopes, including sensitivity and specificity, they believed that the multi-layered
fluorescent microscopy and bright field microscopy. However, neural network had the best performance
according to the analysis of previous research work, the method
based on fluorescent microscopes can yield on average 10% on In 2001, Manuel Forero-Vargas et al. proposed a method
sensitiveness than the bright field microscopy. In this paper, we based on fuzzy rules and phased-only correlation techniques for
present a TB detection method based on Random Forest using automatic tuberculosis detection [7]. They detailed another
fluorescent microscopic images. We have conducted experiments segmentation method based on phase-correlation, Canny edge
on three types of classifiers, in terms of Random Forest (RF), detection and morphological operations on green channel.
linear SVM (LinSVM), and Cross-Validation SVM (CVSVM). Finally, threshold segmentation was used to eliminate some
The experimental results show that the machine learning method negative objects. Once the image is segmented, an
of Random Forest for TB segmentation and detection using identification process using areas and eccentricity was
fluorescent images has obtained better performance than other performed to determine which of the objects are true bacilli. In
two methods. order to characterize the clusters, next year, they presented a
method based on Gaussian Mixture Models (GMM) [8]. As for
Keywords-Tuberculosis bacteria; Fluorescent microscopy;
Random Forest; Image processing
the final stage of classification, a minimum error Bayesian
classifier has been adapted to detect TB bacilli.
I. INTRODUCTION In 2012, a group of scientists from University of California
Tuberculosis (TB) is one of the long-standing infectious at Berkeley proposed a method for TB diagnosis using
disease in developing countries, which is caused by the bacillus fluorescence images from a mobile microscope-CellScope [9].
Mycobacterium tuberculosis (M. tuberculosis) [1]. In practice, CellScope is a very portable device which is developed by the
the physical examinations of TB have play a significant role in Fletcher Lab at UC Berkeley. After analysis of previous
tuberculosis bacteria detection [2]. However, visual relative works, the authors performed two methods in parallel
examination using microscope has been suggested by WHO as for identification of TB-object candidate: a white top-hat
the preliminary and basic diagnostic technique for tuberculosis transform and template matching with a Gaussian kernel. After
detection, as it is comparatively less expensive, easy to achieve calculation of features, a 102-dimensional feature vector was
and sufficiently accurate [3]. In clinical application, there are obtained to represent the feature of each potential object. As for
two main methods of displaying the samples which contain the last stage of classification, they adopt a discriminative
tuberculosis bacteria, fluorescent microscopy (FM) and bright approach based on the logistic regression and support vector
field microscopy [4]. The differences of these two methods are machines. They evaluated the performance of their algorithm
in the light sources and also the staining materials. As reported on object-level. They achieved a performance of 89.2±%2.1%.
in previous study, FM yields on average 10% more sensitive In 2014, Selen Ayas and Murat Ekinci have released their
than the bright field microscopy, even though it is more research work about a novel approach based on Random Forest
expensive than latter [5]. Therefore, many researchers and (RF) for TB detection under bright light microscope [10]. The
doctors prefer fluorescence microscopic methods to the bright dataset they used including 116 images was collected from
light microscopic methods. various patients. To characterize the features of each
In 1998, Veropoulos et al. released their research work classifications, each object was resized and centrally translated.
about the automated identification of tubercle bacilli [6]. In the Afterwards, the RF-based identification is achieved for learning
first stage of their work, they firstly introduced their data the difference between the bacilli and non-bacilli object. As for
preparation and image capture setup. Then, the second stage of classifier, the authors adopt Random Forest, which is an
the method is image processing and analysis. Firstly, Canny ensemble learning methods. Experimental analysis has shown
operator is applied for edge detection and then the objects are the RF based learning method has achieved a much better
filtered by size filtered. Finally, shape descriptors based on
554
calculated by voting in the tree mode, which labels the class
belonging of the feature vector. With each node a new
subgroup is generated, as shown in Fig.4. For our case, we
only need to decide whether the testing object is TB. or not.
Therefore, the label of each group should be 0 or 1.
III. EXPERIMENTAL RESULTS AND DISCUSSION
A. Dataset
A database of fluorescent microscopic images was
constructed and used for evaluation of the proposed method.
The experimental devices was set up in Sunny optics., which
consists a personal computer, a fluorescent microscope and a
digital CMOS camera. Sample slides were illuminated by using
Figure 3. Feature matric construction for training. a fluorescent microscope designed and manufactured by
Within each cells, we bin the gradients in 8different Ningbo Sunny instruments CO., LTD.. A CMOS camera with
orientation bins, resulting in a total of 80 gradient-bed the resolution of 1440 1920 was attached for image
features. acquisition. Applying our object segmentation procedure on
TB-positive images, we retain 769 positive and 1664 negative
C. Classification of TB-object Candidates objects. Sample positive and negative objects are shown in
For the classification of TB-object, a classifier based on Fig.5.
Random-Forest is adopted. Random forest is the most widely B. Performance Metrics
used technique in machine learning, whose principle is that a
According to previous performance measurement, we
multilayer structure of decision trees is builded for training
calculate the Sensitivity, Specificity and Accuracy based on
and a decision scope is output for regression
true positive (TP), false positive (FP), false negative (FN) and
1) Training Random Forest true negative (TN), shown in Fig. 6.
We extract features of each object with the method
described on above, which is represented by a vector of
feature. Then we combine each vector into one feature matric,
which is labeled as training sample, as shown in Fig.3.
After extraction of features for training, the feature
matrix will be transferred into Random Forest to construct
decision forests and predict the belongings of each object
which is waiting for testing. Random Forest have been
introduced by Leo Breiman and Adele Cutler, in [14] [15]. Figure 5. Objects after segmentation. Left: positive
candidates. Right: negative candidates.
The proportion of actual positive objects which are
correctly identified, specificity measures the proportion of
actual negative objects. These measures are given as follows
[16]:
Sensitivity (2)
Specificity (3)
Accuracy (4)
Figure 4. Training process based on Random Forest after C. Object-level evaluation of different classifiers via two
feature extraction. categories of features
The classification of Random Forest works as follows: We also consider three types of discriminative classifiers:
the random trees classifiers is trained by learning the input Random Forest (RF), linear SVM (LinSVM) and cross-
feature vector, which is extracted from the feature matrix, validation SVM (CVSVM). To test the performance of our
representing the feature of each object. By processing the method with different features, we perform systematic ablation
feature vector within every tree of the forest, a score is studies by dividing the features into two subsets: MPG (Hu
555
moments, geometric and binary shape properties) and HoG
features. We also train a RF classifiers using each of feature
subsets as well as on a combined feature set in which we
concatenate the MGB (Hu moments, Geometric and Binary
shape properties) and HoG features to obtain 96-dimensional
feature.
556
To test our thought, we also extend the experiments into To compare the computational efficiency, we also
LinSVM and CVSVM. The parameters of CVSVM were using measured the training time and testing time of three categories.
10-fold cross-validation. The performance measures were As shown in Table 2, we can see that the more training
calculated as in Table 1, when the estimated parameters were samples, the procedure of training and testing will occupy more
used. time. In addition, The computational complexity of RF is the
second highest one and lower than the CVSVM by two orders
Table 1. Classification parameters of CVSVM of magnitude in training procedure, while whose performance
Optimum is the best one of the three classifiers.
25%/75% 50%/50% 75%/25%
parameters Table 2. Training and testing time across different
Optimum C 312.5 312.5 312.5
numbers of training/testing samples. (unit: ms)
Optimum P 0 0 0
25%/75% 50%/50% 75%/25%
Optimum
0.00225 0.00225 0.03375 Train Test Train Test Train Test
gamma
Sensitivity 0.817 0.879 0.868 RF 508 13.6 1176.2 21 1875.2 24.2
Specificity 0.886 0.928 0.957 LinSVM 234 13.6 530 23 1972.2 27.6
Accuracy 0.852 0.903 0.913
CVSVM 36270 14.2 163373 23.8 360680 27.8
As shown in Fig.11 and Fig..12. in both LinSVM and
CVSVM, it is clearly seen that the sensitivity specificity and An array of test objects sorted by ascending output Random
accuracy measures in the category of 75%/25% are higher than Forest confidence scores is shown in Fig.13., as expected, the
other two categories. objects with the highest RF confidence scores are exhibit the
characteristic rod-like morphology of TB bacilli. Objects
resulting in low confidence scores are peculiar-looking and
irregular.
557
on the same dataset, which are LinSVM, and CVSVM. This [8] Forero-Vargas M G, Sroubek F, Alvarez-Borrego J, et al. Segmentation,
new TB detection method will be integrated into an automated autofocusing and signagture extraction of tuberculosis sputum
images[C]//International Symposium on Optical Science and
diagnosis system for TB identification and subsequent Technology. International Society for Optics and Photonics, 2002: 171-
automated focus on the layer where the image of TB bacilli is 182.
with the highest sharpness. [9] Chang J, Arbeláez P, Switz N, et al. Automated tuberculosis diagnosis
using fluorescence images from a mobile microscope[M]. Springer
ACKNOWLEDGMENT Berlin Heidelberg, 2012.
The project described here was supported by scientific & [10] Ayas S, Ekinci M. Random forest-based tuberculosis bacteria
classification in images of ZN-stained sputum smear samples[J]. Signal,
technical bureau, Ningbo grant for the project “The research of Image and Video Processing, 2014, 8(1): 49-61.
new 3D HDR multi-focused images for technologies and [11] Hu M K. Visual pattern recognition by moment invariants[J].
application in digital microscopes”. This work is also supported information Theory, IRE Transactions on, 1962, 8(2): 179-187.
by scientific & technical bureau, Ningbo (Project No. [12] Zhang T Y, Suen C Y. A fast parallel algorithm for thinning digital
2012B10055 and 2013D10008) and by the International patterns[J]. Communications of the ACM, 1984, 27(3): 236-239.
Doctoral Innovation Centre (IDIC) at the University of [13] Dalal N, Triggs B. Histograms of oriented gradients for human
Nottingham, Ningbo, China detection[C]//Computer Vision and Pattern Recognition, 2005. CVPR
2005. IEEE Computer Society Conference on. IEEE, 2005, 1: 886-893.
REFERENCES [14] Breiman L. Random forests[J]. Machine learning, 2001, 45(1): 5-32.
[15] Cutler D R, Edwards Jr T C, Beard K H, et al. Random forests for
[1] Santiago-Mozos R, Perez-Cruz F, Madden M, et al. An Automated classification in ecology[J]. Ecology, 2007, 88(11): 2783-2792.
Screening System for Tuberculosis[J]. Biomedical and Health [16] Volpato, Viola, Badr Alshomrani, and Gianluca Pollastri. "Accurate Ab
Informatics, IEEE Journal of, 2014, 18(3): 855-862. Initio and Template-Based Prediction of Short Intrinsically-Disordered
[2] Osman M K, Mashor M Y, Jaafar H. Performance comparison of Regions by Bidirectional Recurrent Neural Networks Trained on Large-
clustering and thresholding algorithms for tuberculosis bacilli Scale Datasets", International Journal of Molecular Sciences, 2015.
segmentation[C]//Computer, Information and Telecommunication
Systems (CITS), 2012 International Conference on. IEEE, 2012: 1-5.
[3] Priya E, Srinivasan S. Automated decision support system for
tuberculosis digital images using evolutionary learning machines[J].
European Journal for Biomedical Informatics, 2013, 9(2): 3-8.
[4] Khutlang R, Krishnan S, Dendere R, et al. Classification of
Mycobacterium tuberculosis in images of ZN-stained sputum smears[J].
Information Technology in Biomedicine, IEEE Transactions on, 2010,
14(4): 949-957.
[5] Panicker R O, Soman B, Saini G, et al. A Review of Automatic Methods
Based on Image Processing Techniques for Tuberculosis Detection from
Microscopic Sputum Smear Images[J]. Journal of medical systems,
2016, 40(1): 1-13.
[6] Veropoulos K, Campbell C, Learmonth G, et al. The automated
identification of tubercle bacilli using image processing and neural
computing techniques[M]//ICANN 98. Springer London, 1998: 797-802.
[7] Forero-Vargas M G, Sierra-Ballen E L, Alvarez-Borrego J, et al.
Automatic sputum color image segmentation for tuberculosis
diagnosis[C]//International Symposium on Optical Science and
Technology. International Society for Optics and Photonics, 2001: 251-
261.
558