Vous êtes sur la page 1sur 6

2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics(CISP-BMEI 2016)

Tuberculosis Bacteria Detection based on Random


Forest using Fluorescent Images.

Chi Zheng, Jingxin Liu Guoping Qiu


School of Computer Science School of Computer Science,
University of Nottingham University of Nottingham,
Ningbo, China UK & Ningbo, China

Abstract—Tuberculosis (TB) is an infectious disease in low and Fourier descriptors are extracted from the boundary figure of
middle-income countries. There are many methods of physical each object. As for the image recognition, the authors used
examinations for tuberculosis detection, but the most effective three different classification algorithms and after analysis of the
method is visual examination using microscopes, including sensitivity and specificity, they believed that the multi-layered
fluorescent microscopy and bright field microscopy. However, neural network had the best performance
according to the analysis of previous research work, the method
based on fluorescent microscopes can yield on average 10% on In 2001, Manuel Forero-Vargas et al. proposed a method
sensitiveness than the bright field microscopy. In this paper, we based on fuzzy rules and phased-only correlation techniques for
present a TB detection method based on Random Forest using automatic tuberculosis detection [7]. They detailed another
fluorescent microscopic images. We have conducted experiments segmentation method based on phase-correlation, Canny edge
on three types of classifiers, in terms of Random Forest (RF), detection and morphological operations on green channel.
linear SVM (LinSVM), and Cross-Validation SVM (CVSVM). Finally, threshold segmentation was used to eliminate some
The experimental results show that the machine learning method negative objects. Once the image is segmented, an
of Random Forest for TB segmentation and detection using identification process using areas and eccentricity was
fluorescent images has obtained better performance than other performed to determine which of the objects are true bacilli. In
two methods. order to characterize the clusters, next year, they presented a
method based on Gaussian Mixture Models (GMM) [8]. As for
Keywords-Tuberculosis bacteria; Fluorescent microscopy;
Random Forest; Image processing
the final stage of classification, a minimum error Bayesian
classifier has been adapted to detect TB bacilli.
I. INTRODUCTION In 2012, a group of scientists from University of California
Tuberculosis (TB) is one of the long-standing infectious at Berkeley proposed a method for TB diagnosis using
disease in developing countries, which is caused by the bacillus fluorescence images from a mobile microscope-CellScope [9].
Mycobacterium tuberculosis (M. tuberculosis) [1]. In practice, CellScope is a very portable device which is developed by the
the physical examinations of TB have play a significant role in Fletcher Lab at UC Berkeley. After analysis of previous
tuberculosis bacteria detection [2]. However, visual relative works, the authors performed two methods in parallel
examination using microscope has been suggested by WHO as for identification of TB-object candidate: a white top-hat
the preliminary and basic diagnostic technique for tuberculosis transform and template matching with a Gaussian kernel. After
detection, as it is comparatively less expensive, easy to achieve calculation of features, a 102-dimensional feature vector was
and sufficiently accurate [3]. In clinical application, there are obtained to represent the feature of each potential object. As for
two main methods of displaying the samples which contain the last stage of classification, they adopt a discriminative
tuberculosis bacteria, fluorescent microscopy (FM) and bright approach based on the logistic regression and support vector
field microscopy [4]. The differences of these two methods are machines. They evaluated the performance of their algorithm
in the light sources and also the staining materials. As reported on object-level. They achieved a performance of 89.2±%2.1%.
in previous study, FM yields on average 10% more sensitive In 2014, Selen Ayas and Murat Ekinci have released their
than the bright field microscopy, even though it is more research work about a novel approach based on Random Forest
expensive than latter [5]. Therefore, many researchers and (RF) for TB detection under bright light microscope [10]. The
doctors prefer fluorescence microscopic methods to the bright dataset they used including 116 images was collected from
light microscopic methods. various patients. To characterize the features of each
In 1998, Veropoulos et al. released their research work classifications, each object was resized and centrally translated.
about the automated identification of tubercle bacilli [6]. In the Afterwards, the RF-based identification is achieved for learning
first stage of their work, they firstly introduced their data the difference between the bacilli and non-bacilli object. As for
preparation and image capture setup. Then, the second stage of classifier, the authors adopt Random Forest, which is an
the method is image processing and analysis. Firstly, Canny ensemble learning methods. Experimental analysis has shown
operator is applied for edge detection and then the objects are the RF based learning method has achieved a much better
filtered by size filtered. Finally, shape descriptors based on

978-1-5090-3710-0/16/$31.00 ©2016 IEEE 553


result than other conventional classification approaches such as feature. (1) thresholding the image into binary image. (2)
ANN and SVM. thinning the binary image to a pixel-level curve. (3) fitting the
Based on above literature review, segmentation and curve into one straight line. (4) calculating the average
identification has played a significant role in the automatic TB distance from points on edge to the line. As for the thinning
detection. Most of the researchers adopt color-based methods, we applied Zhang and Suen’s method which was
segmentation whereas the features selected is also common published in 1984 [12]. As shown in Fig.1, the curve of the
based on the appearance of object. However different positive TB-object candidate is well-fitting into one line which
classifiers have been adopted for various situations. is almost approximate one straight line. Finally, the average
Euclidean Distance (ED ) is calculated as follows:
II. METHODOLOGY
Given the opportunities presented by the cooperation with ∑
ED (1)
Sunny optics, we can develop an automated TB detection
system for fluorescent microscopes. The first and most
important part is to detect TB precisely. With the concepts In this case, the average distances from the points on the
inspired by the related literature proposed previously, we thinning curve to the fitting line (long axis) in the positive
structured our algorithm with three stages: (1) TB-object candidates are far less than the average distances in the
candidate segmentation, (2) feature representation of TB-object negative candidates, as shown in Fig. 2. Therefore, we extract
candidate and (3) discriminative classification of the TB-object. the Euclidean distance as our last geometric feature to classify
the TB-object candidates.
A. TB-Object Candidate Segmentation
In this step, the goal aims at identifying any bright object
which is a potential TB bacteria. Observing the images it was
found that the TB candidates appear in green color with a high
intensity, whereas the background turns into low and
continuous intensity. In this case, we adopt Flood fill algorithm
to eliminate the background and extract the candidates. After
analysis the whole structure of the system, we determine to
give users the priority to select the starting point and the target
color by using mouse-clicking. After recording the selection of
users, the background will be erased and the candidates will be
obtained for subsequent feature extraction.
B. Feature representation of TB-object candidate. Figure 1. Euclidean distances of the positive candidates.
In the next stages, we extracted the feature of each TB- There are 5 steps to calculate this distance: (a) the original
object candidate using Hu moment invariants, geometric and TB-object images, (b) binary images after thresholding, (c)
binary shape properties, and histograms of oriented gradients the binary images with skeleton, (d) fitting the skeleton
(HoG) features. The Hu moment and HoG features are into a straight line (long axis) (e) average Euclidean
extracted from the gray-scale images, whereas geometric and distance from the points on skeleton to the fitting line.
binary shape properties are calculated from binary images. We
calculate 7 Hu moment features, 9 geometric and binary shape
properties and 80 HoG features. Therefore, we utilize a feature
vector of 96 dimensions to represent each object.
1) Hu Moment Invariants
Hu moment invariants provide a succinct object-level
representation which is invariant to rotation, translation and
scaling [11].
2) Geometric and Binary Shape Properties
In addition, we also include 9 geometric and binary shape
properties to describe the shape information of the TB-object
candidates, in terms of areas, convex areas, eccentricity, Figure 2. Euclidean distances of the negative candidates
equivalent diameter, extent, ratio between major and minor
axis lengths, perimeter, solidity and Euclidean distance from 3) Histograms of Oriented Gradients
point to fitting line. In the last stages, we incorporate histograms of oriented
gradients, which is a feature descriptor used for the purposed
a) Euclidean distance from point to fitting line of object detection [13]. In our case, we extract HoG features
Since TB-object candidates are rod shaped, we utilize the from each patch at two scales, with one 48 48 pixel cell
average Euclidean distance from point to fitting line as another containing the entire patch and nine 16 16 pixel cells.
shape feature, which consists of three steps to calculate this

554
calculated by voting in the tree mode, which labels the class
belonging of the feature vector. With each node a new
subgroup is generated, as shown in Fig.4. For our case, we
only need to decide whether the testing object is TB. or not.
Therefore, the label of each group should be 0 or 1.
III. EXPERIMENTAL RESULTS AND DISCUSSION
A. Dataset
A database of fluorescent microscopic images was
constructed and used for evaluation of the proposed method.
The experimental devices was set up in Sunny optics., which
consists a personal computer, a fluorescent microscope and a
digital CMOS camera. Sample slides were illuminated by using
Figure 3. Feature matric construction for training. a fluorescent microscope designed and manufactured by
Within each cells, we bin the gradients in 8different Ningbo Sunny instruments CO., LTD.. A CMOS camera with
orientation bins, resulting in a total of 80 gradient-bed the resolution of 1440 1920 was attached for image
features. acquisition. Applying our object segmentation procedure on
TB-positive images, we retain 769 positive and 1664 negative
C. Classification of TB-object Candidates objects. Sample positive and negative objects are shown in
For the classification of TB-object, a classifier based on Fig.5.
Random-Forest is adopted. Random forest is the most widely B. Performance Metrics
used technique in machine learning, whose principle is that a
According to previous performance measurement, we
multilayer structure of decision trees is builded for training
calculate the Sensitivity, Specificity and Accuracy based on
and a decision scope is output for regression
true positive (TP), false positive (FP), false negative (FN) and
1) Training Random Forest true negative (TN), shown in Fig. 6.
We extract features of each object with the method
described on above, which is represented by a vector of
feature. Then we combine each vector into one feature matric,
which is labeled as training sample, as shown in Fig.3.
After extraction of features for training, the feature
matrix will be transferred into Random Forest to construct
decision forests and predict the belongings of each object
which is waiting for testing. Random Forest have been
introduced by Leo Breiman and Adele Cutler, in [14] [15]. Figure 5. Objects after segmentation. Left: positive
candidates. Right: negative candidates.
The proportion of actual positive objects which are
correctly identified, specificity measures the proportion of
actual negative objects. These measures are given as follows
[16]:

Sensitivity (2)

Specificity (3)

Accuracy (4)

Figure 4. Training process based on Random Forest after C. Object-level evaluation of different classifiers via two
feature extraction. categories of features
The classification of Random Forest works as follows: We also consider three types of discriminative classifiers:
the random trees classifiers is trained by learning the input Random Forest (RF), linear SVM (LinSVM) and cross-
feature vector, which is extracted from the feature matrix, validation SVM (CVSVM). To test the performance of our
representing the feature of each object. By processing the method with different features, we perform systematic ablation
feature vector within every tree of the forest, a score is studies by dividing the features into two subsets: MPG (Hu

555
moments, geometric and binary shape properties) and HoG
features. We also train a RF classifiers using each of feature
subsets as well as on a combined feature set in which we
concatenate the MGB (Hu moments, Geometric and Binary
shape properties) and HoG features to obtain 96-dimensional
feature.

Figure 9. Object-level Average specificity of three


classifiers via different feature combinations
For the RF and CVSVM, the HoG feature subsets are
Figure 6. Performance measurement based on the fraction substantially more discriminative than the MGB feature. With
between actual objects and classifier predicted objects. the linear SVM, we see that the MPG and HoG features
provide complementary information. Combing both MPG and
HoG features enable us to achieve the highest object-level
performance, higher than the MGB only and HoG only
performance. The method based on Random Forest achieves
the highest performance (average accuracy, average sensitivity
and average specificity) than the other methods based on SVM,
D. Object-level evaluation of Random Forest classification
via three training/testing samples numbers
We also consider algorithm performance at different
training/testing samples proportions, in terms of 25%/75%,
50%/50% and 75%/25%, respectively. For each category, we
randomly divide the labeled samples in to this group 5 times
and calculate the performance measures five times. Based on
the performance of average sensitivity, specificity and
accuracy, we can draw a guess that the more training samples
you put into random forest, the more accuracy the result you
Figure 7. Object-level Average accuracy of three
get, as shown in Fig.10
classifiers via different feature combinations

Figure 10. Object-level test set AP across different


Figure 8. Object-level Average sensitivity of three training-testing fraction in Random Forest classification
classifiers via different feature combinations

556
To test our thought, we also extend the experiments into To compare the computational efficiency, we also
LinSVM and CVSVM. The parameters of CVSVM were using measured the training time and testing time of three categories.
10-fold cross-validation. The performance measures were As shown in Table 2, we can see that the more training
calculated as in Table 1, when the estimated parameters were samples, the procedure of training and testing will occupy more
used. time. In addition, The computational complexity of RF is the
second highest one and lower than the CVSVM by two orders
Table 1. Classification parameters of CVSVM of magnitude in training procedure, while whose performance
Optimum is the best one of the three classifiers.
25%/75% 50%/50% 75%/25%
parameters Table 2. Training and testing time across different
Optimum C 312.5 312.5 312.5
numbers of training/testing samples. (unit: ms)
Optimum P 0 0 0
25%/75% 50%/50% 75%/25%
Optimum
0.00225 0.00225 0.03375 Train Test Train Test Train Test
gamma
Sensitivity 0.817 0.879 0.868 RF 508 13.6 1176.2 21 1875.2 24.2
Specificity 0.886 0.928 0.957 LinSVM 234 13.6 530 23 1972.2 27.6
Accuracy 0.852 0.903 0.913
CVSVM 36270 14.2 163373 23.8 360680 27.8
As shown in Fig.11 and Fig..12. in both LinSVM and
CVSVM, it is clearly seen that the sensitivity specificity and An array of test objects sorted by ascending output Random
accuracy measures in the category of 75%/25% are higher than Forest confidence scores is shown in Fig.13., as expected, the
other two categories. objects with the highest RF confidence scores are exhibit the
characteristic rod-like morphology of TB bacilli. Objects
resulting in low confidence scores are peculiar-looking and
irregular.

Figure 11.Object-level test set AP across different training-


testing fraction in Linear SVM classification

Figure 10.Test set objects sorted by their RF output


confidence scores in ascending order (column-wise, from
left to right). Red boxes correspond to objects with higher
confidence scores.
IV. CONCLUSION
This paper proposed a new Random Forest (RF)-based
tuberculosis bacilli detection using fluorescent microscopic
images and feature extraction method combing Hu moments,
geometric and binary shape properties and HoG features. The
performance of the proposed algorithm was analyzed on a
dataset consisting 768 positive objects and 1664 negative
objects obtaining by our fluorescent microscopic image
acquisition systematic. For the performance measurement,
three most common measurement methods, in terms of
Figure 12.Object-level test set AP across different training- sensitivity, specificity and accuracy, are applied. To compare
testing fraction in Cross-Validation SVM classification the results of the proposed RF-BASED approach, two other
very popular machine learning methods were also implemented

557
on the same dataset, which are LinSVM, and CVSVM. This [8] Forero-Vargas M G, Sroubek F, Alvarez-Borrego J, et al. Segmentation,
new TB detection method will be integrated into an automated autofocusing and signagture extraction of tuberculosis sputum
images[C]//International Symposium on Optical Science and
diagnosis system for TB identification and subsequent Technology. International Society for Optics and Photonics, 2002: 171-
automated focus on the layer where the image of TB bacilli is 182.
with the highest sharpness. [9] Chang J, Arbeláez P, Switz N, et al. Automated tuberculosis diagnosis
using fluorescence images from a mobile microscope[M]. Springer
ACKNOWLEDGMENT Berlin Heidelberg, 2012.
The project described here was supported by scientific & [10] Ayas S, Ekinci M. Random forest-based tuberculosis bacteria
classification in images of ZN-stained sputum smear samples[J]. Signal,
technical bureau, Ningbo grant for the project “The research of Image and Video Processing, 2014, 8(1): 49-61.
new 3D HDR multi-focused images for technologies and [11] Hu M K. Visual pattern recognition by moment invariants[J].
application in digital microscopes”. This work is also supported information Theory, IRE Transactions on, 1962, 8(2): 179-187.
by scientific & technical bureau, Ningbo (Project No. [12] Zhang T Y, Suen C Y. A fast parallel algorithm for thinning digital
2012B10055 and 2013D10008) and by the International patterns[J]. Communications of the ACM, 1984, 27(3): 236-239.
Doctoral Innovation Centre (IDIC) at the University of [13] Dalal N, Triggs B. Histograms of oriented gradients for human
Nottingham, Ningbo, China detection[C]//Computer Vision and Pattern Recognition, 2005. CVPR
2005. IEEE Computer Society Conference on. IEEE, 2005, 1: 886-893.
REFERENCES [14] Breiman L. Random forests[J]. Machine learning, 2001, 45(1): 5-32.
[15] Cutler D R, Edwards Jr T C, Beard K H, et al. Random forests for
[1] Santiago-Mozos R, Perez-Cruz F, Madden M, et al. An Automated classification in ecology[J]. Ecology, 2007, 88(11): 2783-2792.
Screening System for Tuberculosis[J]. Biomedical and Health [16] Volpato, Viola, Badr Alshomrani, and Gianluca Pollastri. "Accurate Ab
Informatics, IEEE Journal of, 2014, 18(3): 855-862. Initio and Template-Based Prediction of Short Intrinsically-Disordered
[2] Osman M K, Mashor M Y, Jaafar H. Performance comparison of Regions by Bidirectional Recurrent Neural Networks Trained on Large-
clustering and thresholding algorithms for tuberculosis bacilli Scale Datasets", International Journal of Molecular Sciences, 2015.
segmentation[C]//Computer, Information and Telecommunication
Systems (CITS), 2012 International Conference on. IEEE, 2012: 1-5.
[3] Priya E, Srinivasan S. Automated decision support system for
tuberculosis digital images using evolutionary learning machines[J].
European Journal for Biomedical Informatics, 2013, 9(2): 3-8.
[4] Khutlang R, Krishnan S, Dendere R, et al. Classification of
Mycobacterium tuberculosis in images of ZN-stained sputum smears[J].
Information Technology in Biomedicine, IEEE Transactions on, 2010,
14(4): 949-957.
[5] Panicker R O, Soman B, Saini G, et al. A Review of Automatic Methods
Based on Image Processing Techniques for Tuberculosis Detection from
Microscopic Sputum Smear Images[J]. Journal of medical systems,
2016, 40(1): 1-13.
[6] Veropoulos K, Campbell C, Learmonth G, et al. The automated
identification of tubercle bacilli using image processing and neural
computing techniques[M]//ICANN 98. Springer London, 1998: 797-802.
[7] Forero-Vargas M G, Sierra-Ballen E L, Alvarez-Borrego J, et al.
Automatic sputum color image segmentation for tuberculosis
diagnosis[C]//International Symposium on Optical Science and
Technology. International Society for Optics and Photonics, 2001: 251-
261.

558

Vous aimerez peut-être aussi