Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Medical Image Recognition, Segmentation and Parsing: Machine Learning and Multiple Object Approaches
Medical Image Recognition, Segmentation and Parsing: Machine Learning and Multiple Object Approaches
Medical Image Recognition, Segmentation and Parsing: Machine Learning and Multiple Object Approaches
Ebook1,139 pages11 hours

Medical Image Recognition, Segmentation and Parsing: Machine Learning and Multiple Object Approaches

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book describes the technical problems and solutions for automatically recognizing and parsing a medical image into multiple objects, structures, or anatomies. It gives all the key methods, including state-of- the-art approaches based on machine learning, for recognizing or detecting, parsing or segmenting, a cohort of anatomical structures from a medical image.

Written by top experts in Medical Imaging, this book is ideal for university researchers and industry practitioners in medical imaging who want a complete reference on key methods, algorithms and applications in medical image recognition, segmentation and parsing of multiple objects.

Learn:

  • Research challenges and problems in medical image recognition, segmentation and parsing of multiple objects
  • Methods and theories for medical image recognition, segmentation and parsing of multiple objects
  • Efficient and effective machine learning solutions based on big datasets
  • Selected applications of medical image parsing using proven algorithms
  • Provides a comprehensive overview of state-of-the-art research on medical image recognition, segmentation, and parsing of multiple objects
  • Presents efficient and effective approaches based on machine learning paradigms to leverage the anatomical context in the medical images, best exemplified by large datasets
  • Includes algorithms for recognizing and parsing of known anatomies for practical applications
LanguageEnglish
Release dateDec 11, 2015
ISBN9780128026762
Medical Image Recognition, Segmentation and Parsing: Machine Learning and Multiple Object Approaches
Author

S. Kevin Zhou

S. Kevin Zhou, PhD is dedicated to research on medical image computing, especially analysis and reconstruction, and its applications in real practices. Currently, he is a Distinguished Professor and Founding Executive Dean of School of Biomedical Engineering, University of Science and Technology of China (USTC) and directs the Center for Medical Imaging, Robotics, Analytic Computing and Learning (MIRACLE). Dr. Zhou was a Principal Expert and a Senior R&D Director at Siemens Healthcare Research. He has been elected as a fellow of AIMBE, IAMBE, IEEE, MICCAI and NAI and serves the MICCAI society as a board member and treasurer..

Related authors

Related to Medical Image Recognition, Segmentation and Parsing

Related ebooks

Computers For You

View More

Related articles

Reviews for Medical Image Recognition, Segmentation and Parsing

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Medical Image Recognition, Segmentation and Parsing - S. Kevin Zhou

    USA

    Chapter 1

    Introduction to Medical Image Recognition, Segmentation, and Parsing

    S. Kevin Zhou    Medical Imaging Technologies, Siemens Healthcare Technology Center, Princeton, NJ, USA

    Abstract

    We introduce a probabilistic formulation that unifies medical image recognition, segmentation, and parsing into one modeling framework based on a rough-to-exact shape representation. We then present schemes to decompose a highly complex problem into several simple subproblems, leading to a general-purpose computational pipeline. When it comes to more effective models, we leverage machine learning methods, especially discriminative learning, to depict anatomical context embedded in the medical image. We then provide a short review of commonly used discriminative methods and finally discuss a few classical segmentation algorithms for segmenting a single object.

    Keywords

    Medical image segmentation

    Medical image recognition

    Medical image parsing

    Probabilistic and simple-to-complex modeling

    Shape representation

    Rough-to-exact representation

    Machine learning

    Multiple objects

    Anatomical context

    Chapter Outline

    1.1 Introduction   1

    1.2 Challenges and Opportunities   3

    1.3 Rough-to-Exact Object Representation   4

    1.4 Simple-to-Complex Probabilistic Modeling   7

    1.4.1 Chain Rule   7

    1.4.2 Bayes’ Rule and the Equivalence of Probabilistic Modeling and Energy-Based Method   8

    1.4.3 Practical Medical Image Recognition, Segmentation, and Parsing Algorithms   8

    1.5 Medical Image Recognition Using Machine Learning Methods   10

    1.5.1 Object Detection and Context   10

    1.5.2 Machine Learning Methods   11

    1.5.2.1 Classification   12

    1.5.2.2 Regression   14

    1.6 Medical Image Segmentation Methods   15

    1.6.1 Simple Image Segmentation Methods   16

    1.6.2 Active Contour Method   16

    1.6.3 Variational Methods   16

    1.6.4 Level Set Methods   17

    1.6.5 Active Shape Models and Active Appearance Models   17

    1.6.6 Graph Cut Method   17

    1.7 Conclusions   18

    Recommended Notations   18

    References   20

    1.1 Introduction

    Medical image recognition, segmentation, and parsing are essential topics of medical image analysis. Medical image recognition is about recognizing which objects are inside a medical image. In principle, it is not necessary to detect or localize the objects for object recognition; but in practice, often it is beneficial to associate object recognition with object detection or localization. Once the object is recognized or detected using, say, a bounding box, medical image segmentation further concerns finding the exact boundary of the object in a medical image. When there are multiple objects in the images, segmentation of multiple objects becomes medical image parsing that, in the most general form, assigns semantic labels to pixels in a 2D image or voxels in a 3D volume. By grouping the pixels or voxels with the same label, segmentation is realized.

    Effective and efficient methods for medical image recognition, segmentation, and parsing bring a multitude of important clinical benefits. Below, we highlight the benefits to imaging scanner, image reading, and advanced quantification and modeling.

    • Scanner. Because the computer tomography (CT) or magnetic resonance imaging (MRI) scanner is equipped with many configuration possibilities or imaging protocols, it is challenging to produce consistent and reproducible images of high quality across patients and this is only possible if the scanning is personalized with respect to a patient. High scanning throughput is also of interest for cost saving. Protecting patients from unnecessary radiation from the CT scanner is of major concern. An ideal diagnostic CT scan should be personalized to image only the target region of a given patient, no more (to reduce dose) or no less (to avoid missing information). Therefore, efficient detection of organs from a scout image enables personalized scanning at a reduced dose, saves exam time and cost, and increases consistency and reproducibility of the exam.

    • Image reading for diagnosis, therapy, and surgery planning. During image reading, when searching for disease in a specific organ or body region, a radiologist needs to navigate the volume to the right location. Further, after certain disease is found, he or she needs to report the finding. Medical image parsing enables structured reading and reporting for a streamlined work flow, thereby improving image reading outcome in terms of accuracy, reproducibility, and efficiency. Finally, in radiation therapy, intervention procedures, and orthopedic surgery, medical image parsing is prerequisite in the planning phase.

    • Advanced quantification and modeling. Clinical measurements such as organ volumes are important for quantitative disease diagnosis. But it is time-consuming for a physician to identify the target object especially in 3D and perform quantitative measurements without the aid of an intelligent postprocessing software system. Automatic image parsing also overcomes the difficulty in reproducing the measurement even when reading the same image for the second time. Finally, with 3D objects segmented as boundary conditions, more advanced modeling that simulates biomechanical or hemodynamical processes is feasible.

    The holy grail of a medical image parsing system is that its parsing complexity matches that of Foundational Model of Anatomy (FMA) ontology,a which is concerned with the representation of classes or types and relationships necessary for the symbolic representation of the phenotypic structure of the human body in a form that is understandable to humans and is also navigable, parsable, and interpretable by machine-based systems. As one of the largest computer-based knowledge sources in the biomedical sciences, it contains approximately 75,000 classes and over 120,000 terms, and over 2.1 million relationship instances from over 168 relationship types that link the FMA classes into a coherent symbolic model. A less complex representation is Terminologica Anatomica,b which is the international standard of human anatomic terminology for about 7500 human gross (macroscopic) anatomical structures.

    Current medical image recognition, segmentation, and parsing methods are far behind the holy grail, concerning mostly the following semantic objects:

    • Anatomical landmarks. An anatomical landmark is a distinct point in a body scan that coincides with anatomical structures, such as liver top, aortic arch, pubis symphysis, to name a few.

    • Major organs. Examples of major organs include liver, lungs, kidneys, spleen, prostate, bladder, rectum, etc.

    • Major bones. Examples of major bones include ribs, vertebrae, pelvis, femur, tibia, fibula, skull, mandible, hand and foot bones, etc.

    • Lesions, nodules, and nodes. Examples include liver and kidney lesions, lung nodules, lymph nodes, etc.

    1.2 Challenges and Opportunities

    Medical image recognition, segmentation, and parsing confront a lot of challenges to obtain results that can be used in clinical applications. The main challenge is that anatomical objects exhibit significant shape and appearance variations caused by a multitude of factors:

    • Sensor noise/artifact. As in any sensor, medical equipment generates noise/artifact inherent to its own physical sensor and image formation process. The extent of the artifact depends on image modality and imaging configuration. For example, while high-dose CT produces images with fewer artifacts, low-dose CT is quite noisy. Also, metal objects (such as implants) can generate a lot of artifacts in CT. In MRI scans, artifacts are generated due to inhomogeneous magnetic field, gradient nonlinearity, etc.

    • Patient difference and motion. Different patients exhibit different build forms: fat or slim, tall or short, adult or child, etc. As a result, the anatomical structures also exhibit different shapes. Also, patients undergo motions from respiration, cardiac cycle, blood and cerebrospinal fluid flow, peristalsis and swallowing, and voluntary movement, all contributing to the creation of different images, causing anatomical shape deformation.

    • Pathology, surgery, and contrast agents. Pathology can give rise to highly deformed anatomical structures or even missing ones with varying appearances and shapes. This makes statistical modeling very difficult. To better understand the pathological conditions, contrast agents are utilized to better visualize the anatomical morphology. Image appearances under different contrast phases are different. Finally, a surgical resection completely changes the shape and image appearance of anatomical object(s) in an unexpected manner.

    • Partial scan and field of view. Dose radiation is a major concern in CT. In an effort to minimize the dose radiation, only the necessary part of the human is imaged. This creates partial scans and narrow field of view, in which the anatomical context is highly weakened or totally gone. As a result, the landmarks or organs are missing or partially visible. In MRI, the scan range is often minimized for fast acquisition.

    • Soft tissue. Anatomical structures such as internal organs are soft tissues with similar properties. They (such as liver and kidney) might even touch each other, forming a very weak boundary between them. But, it is a must that the segmented organs be nonoverlapping.

    Figure 1.1 (a) shows 3D CT scans with different sources of appearance variation and Figure 1.1 (b) displays CT examples of various pathologies and conditions associated with a knee joint.

    Figure 1.1 (a) Example of CT images with different body regions, severe pathologies, contrast agents, weak contrast, etc. (b) Example of CT images with various knee pathologies and conditions. From left to right, top to bottom: Touch between femur and tibia, metallic implant inside femur, femur with major defects, osteoporosis, osteoporosis with minor femur defects, and touch between femur and patella.

    Another challenge lies in stringent accuracy, robustness, and speed requirements arising from real clinical applications. Image reading and diagnosis allow almost no room for mistakes. Despite the high accuracy and robustness requirements, the demand for speedy processing does not diminish. A speedy work flow is crucial to any radiology lab that strives for high throughput. Few radiologists or physicians can wait for hours or even minutes to obtain the analysis results.

    To build effective and efficient algorithms to tackle these challenges, one has to exploit the opportunities with leverage. There are two main opportunities:

    • Large database. There is a deluge of medical scans. Take CT scans, for example. In 2005, approximately 57 million individuals in the USA received CT exams. By 2012, the number of annual CT exams rose to over 85 million.c The hypothesis that a large database exhibits the appearance variations commonly found in patients is statistically significant.

    • Anatomical context. Unlike natural scene images, medical images manifest strong contextual information, such as a limited number of anatomical objects (say only one left ventricle), constrained and structured background, the relationship between different anatomies, strong prior information about the pose parameter, etc.

    In light of these opportunities, statistical machine learning methods that exploit such contextual information exemplified by a large number of data sets are highly desired. This whole book is dedicated to approaches based on machine learning. It also covers approaches that cope with multiple objects.

    1.3 Rough-to-Exact Object Representation

    Any intelligent system starts from a sensible knowledge representation (KR). The most fundamental role that a KR plays (Davis et al., 1993) is that it is a surrogate, a substitute for the thing itself. This leads to the so-called fidelity question: how close is the surrogate to the real thing? The only completely accurate representation of an object is the object itself. All other representations are inaccurate; they inevitably contain simplifying assumptions and possibly artifacts.

    In the literature, there are many representations that approximate a medical object or anatomical structure using different simplifying assumptions. Figure 1.2 shows a variety of shape representations commonly used in the literature.d

    Figure 1.2 A graphical illustration of different shape representations using 2D shape as an example. (a) Rigid representation: translation only t  = [ t x , t y ]. (b) Rigid representation: θ  = [ t x , t y , r , s x , s y . (d) Free-form representation: a 2D binary mask function ϕ ( x , y ). (e) Free-form representation: a 2D real-valued level set function ϕ ( x , y .

    • Rigid representation. The simplest representation is to translate a template to the object center t = [tx,ty,tz] as shown in Figure 1.2 (a). In other words, only the object center is considered. A complete rigid representation in Figure 1.2 (b) consists of translation, rotation, and scale parameters θ = [t,r,s]. When the scale parameter is isotropic, this reduces to a similarity transformation. An extension of rigid representation is affine representation.

    • Free-form representation. Common free-form representations, shown in ), mask function ϕ(x,y,z), level set function ϕ(x,y,z), etc.

    • Low-dimensional parametric representation. The so-called statistical shape model (SSM) (Heimann and Meinzer, 2009) shown in Figure 1.2 (f) is a common low-dimensional parametric representation based on principal component analysis (PCA) of a point-based free-form shape. Other low-dimensional parametric representations include M-rep (Pizer et al., 2003), spherical harmonics (SPHARM) (Shen et al., 2009), spherical wavelets (Nain et al., 2006), etc.

    A KR also is a medium for pragmatically efficient computation (Davis et al., 1993). Therefore, it is beneficial to adopt a hierarchical, rough-to-exact representation that gradually approximates the object itself with increasing precision, which also makes computational reasoning more amenable and efficient as shown later.

    A common rough-to-exact 3D object representation (Zheng et al., 2008; Zhou, 2010; Kohlberger et al., 2011; Wu et al., 2014) consists of a rigid part fully specified by translation, rotation, and scale parameters θ = [t,r,s], a low-dimensional parametric part such as from the PCA shape space specified by the top PCA coefficients λ = [λ1:m, or a 3D mask or level set function ϕ.

       (1.1)

    The PCA shape space characterizes a shape by a linear projection:

       (1.2)

    is the mth top eigen shape. This PCA shape modeling forms the basis of the famous active shape model (ASM) (Cootes et al., 1995). In this hierarchical representation, the free-form part can be rough-to-exact too. For a 3D mesh, the mesh vertex density can be a control parameter, from sparse to dense. For a level set function, it depends on the image resolution, from coarse to fine.

    1.4 Simple-to-Complex Probabilistic Modeling

    To handle a single object O from a 3D volume V, the posterior distribution P(O|V) offers the complete characterization of the object O given the volume V. Once P(O|V) is known, inferring the object can be done by taking the conditional mean, which is the minimum mean square error estimator, or conditional mode, which is the maximum a posteriori estimator, or a function of the posterior. By the same token, the posterior distribution P(O1:n|V) completely characterizes the multiple objects O1:n in a statistical sense.

    1.4.1 Chain Rule

    When the rough-to-exact representation for a single object O is used, joint modeling of the full object is challenging and often less effective. To tackle this challenge, a common strategy is to perform simple-to-complex modeling by breaking a complex task into a few simple tasks. For each simple task, effective modeling is more feasible.

    One way is to utilize the chain rule that permits the calculation of a joint probability using conditional probabilities.

       (1.3)

    This breaks the overall task into three simpler tasks. The first task is to infer the rigid object, also known as object detection or recognition, using P(θ|V); the second task is to infer both the rigid and low-dimensional shape model parameters using P(λ|V,θ, solving the segmentation problem.

    In fact, for a single object O, effective modeling of its 3D pose part alone θ = [t,r,s] is difficult. The simple-to-complex modeling is applied here too.

       (1.4)

    Marginal space learning (MSL) (Zheng et al., 2008) leverages such a strategy.

    When dealing with multiple objects O1:n, the chain rule also applies.

       (1.5)

    In Eq. (1.5), each conditional probability spells a simpler task, which can be further decomposed using Eqs. (1.3) and (1.4). Integrating Eqs. (1.3)–(1.5) endows a general-purpose computational pipeline as shown in Figure 1.3 (a), in which a series of simple tasks are connected.

    Figure 1.3 (a) A general-purpose computational pipeline for medical image recognition, segmentation, and parsing based on rough-to-exact object representation and simple-to-complex modeling. (b-e) Special realizations of the computational pipeline.

    1.4.2 Bayes’ Rule and the Equivalence of Probabilistic Modeling and Energy-Based Method

    According to the Bayes’ rule, the posterior probability P(O|V) is proportional to the product of the likelihood P(V|O) and the prior P(O),

       (1.6)

    Energy-based methods (relates the image V with the object O represents the prior belief about the object.

       (1.7)

    By letting

    then the probabilistic model is equivalent to the energy-based method. In the previous discussion, we use the whole object O for illustration, but the derivations hold even when a partial object representation is used.

    When this Bayes’ rule is integrated into the chain rule, complete modeling of object appearances and prior beliefs about the object at different representation levels and using different models is provided.

    1.4.3 Practical Medical Image Recognition, Segmentation, and Parsing Algorithms

    In general, practical algorithms for medical image recognition, segmentation, and parsing are special examples of this computational pipeline. They, however, differ depending on their specialization in the following two aspects:

    • The changes to the computational architecture. Depending on independence assumptions they make or the representation they choose, practical algorithms modify or simplify the architecture accordingly. For example, if detecting only one object is concerned, the pipeline reduces to the one shown in Figure 1.3 (b). Figure 1.3 (c) shows the MSL pipeline (Zheng et al., 2008) for 3D rigid object detection. In Figure 1.3 (d), a complete pipeline for segmenting a single object is presented, going from detecting or recognizing the rigid part, to deformable shape segmentation, to the freeform shape segmentation. Figure 1.3 (e) presents an architecture that deals with multiple objects, which is used in Kohlberger et al. (2011), Lu et al. (2012), and Wu et al. (2014). Here, the conditional dependency among different objects is assumed for the rigid and low-dimensional parametric parts; hence each object is processed independently. Finally, the joint freeform segmentation is applied for segmenting multiple shapes together.

    • The modeling choices of the conditional probabilities. Good algorithm performance needs effective modeling of the conditional probabilities. For medical image recognition or detection, machine learning methods are prevalent to leverage anatomic context embedded in the medical images. Section 1.5 defines the concept of anatomic context and briefly reviews several machine learning methods that model the anatomic context. After object detection, object segmentation follows. Section 1.6 lists a few classical image segmentation methods, each having its own modeling choice based on its particular object representation. Throughout the whole book, each book chapter will discuss its own choices of modeling, either from a general theoretic perspective or in a particular application setting.

    1.5 Medical Image Recognition Using Machine Learning Methods

    1.5.1 Object Detection and Context

    Consider the task of detecting human eyes from the three images in Figure 1.4. To detect the human eye(s) in Figure 1.4 (a) in which all different objects are juxtaposed randomly, one is likely to scrutinize the image pixels row by row, column by column till the eye is located. However, to detect the eye(s) in Figure 1.4 (c) in which a perfect human face is presented, it is effortless because the image is so structured or full of context. A medical image is the kind of image with contextual information with respect to anatomies. Such context is referred to as anatomical context. To detect the two eyes in Figure 1.4 (b), the relationship between them can be useful. Once, say, the left eye is detected, the detection of the right eye becomes less complicated.

    Figure 1.4 Three types of context: (a) unitary or local context; (b) pairwise or higher-order context; and (c) holistic or global context.

    As shown in Figure 1.4, the context can be roughly categorized into three types, namely unitary or local, pairwise or higher-order, and holistic or global context.

    • The unitary or local context refers to the local regularity surrounding a single object.

    • The pairwise or higher-order context refers to the joint regularities between two objects or among multiple objects.

    • The holistic or global context goes beyond the relationships among a cohort of objects and refers to the whole relationship between all pixels/voxels and the objects: in other words, regarding the image as a whole.

    Different detection methods basically operate with different trade-offs between offline model learning complexity and online computational complexity, depending on how to leverage which context(s). For example, a binary classifier that separates the object instances from nonobject instances is learned to model the local context. Given a test image like Figure 1.4 (a), exhaustive scanning of the image using the learned classifier is needed to localize the object (eye). To leverage the global context, a regression function can be learned to predict the object location directly from any pixel. Given a test image like Figure 1.4 (c), the regression function is used for a few sparsely sampled pixel locations to reach a consensus prediction decision about the object location. Learning a binary classifier is easier than a regression function, but exhaustive scanning is more computationally intensive than testing a few locations. Below, we review several modern machine learning methods for binary classification, multi-class classification, and regression. The subsequent book chapters present different recognition methods that employ machine learning.

    1.5.2 Machine Learning Methods

    Statistical machine learning models the statistical dependence of an unobserved variable y on an observed variable x via the posterior probability distribution P(y|x). Such a distribution can be used to predict the unobserved variable y. Modeling P(y|x) can be done in two ways, namely discriminative learning and generative learning. While generative learning models P(y|x) indirectly via the joint distribution P(x,y), discriminative learning instead directly models the posterior. Discriminative models are effective for supervised learning tasks such as classification and regression that do not necessarily require the joint distribution.

    1.5.2.1 Classification

    The goal of binary classification is to learn a function F(x) that minimizes the misclassification probability P{yF(x) < 0}, where y is the class label with + 1 for positive and − 1 for negative. There are many influential binary classification methods such as kernel methods (Hofmann et al., 2008), ensemble methods (Polikar, 2006), and deep learning methods (Bengio, 2009). Support vector machine (SVM) (Vapnik, 1999) is a classical kernel method. Ensemble methods include boosting (Freund and Schapire, 1997; Friedman et al., 2000) and random forest (RF) (Breiman, 2001). Deep learning methods are based on artificial neural networks (ANNs) (Bishop and et al., 1995).

    SVM seeks a separating hyperplane with a maximum margin. As shown in Figure 1.5 (a), the hyperplane is defined as w x + b, where x is the input vector, w is the slope vector, means the dot product, and b is the intercept. The max-margin plane is obtained by solving the following task:

       (1.8)

    , where xjs are support vectors. Often the number of support vectors is much smaller than that of input training data. The kernel trick K(xj,x) = ϕ(xi) ⋅ ϕ(x) is widely used to model data nonlinearity, hence the name kernel method.

    Figure 1.5 Binary classification methods: (a) support vector machine, (b) AdaBoosting, (c) random forest, and (d) neural network. Image courtesy of Wiki for (a, d) and of ICCV 2009 tutorial entitled Boosting and random forest for (b, c).

    An ensemble method combines multiple learners into a committee for final decision. In boosting (as

       (1.9)

    The classification function F(x) in boosting takes an additive form as in Figure 1.5 (b):

       (1.10)

    where Fn(x) is a strong learner that is well correlated with the true classification and hm(x) is a weak learner that is only slightly correlated with the true classification (better than random guessing). This minimization is done iteratively. At the nth iteration, it selects the minimizing weak learner hn(x) and then adjusts the weights for training examples, weighing more on misclassified examples. The posterior P(+1|x) is approximated as

       (1.11)

    The RF (Breiman, 2001) classifier consists of a collection of binary classifiers as in Figure 1.5 (c), each being a decision tree casting a unit vote for the most popular class label. To learn a random decision tree, either the training examples for each decision tree are independent, identically distributed (i.i.d.) sampled from the full training set or the features used in the tree nodes are i.i.d. sampled from the full feature set or both. It is shown in Breiman (2001) that the RF accuracy is comparable to boosting with the added benefits of being relatively robust to outliers and noise and amenable to parallel implementation.

    When these ensemble methods are applied to image applications, the weak learners in boosting are associated with image features (Viola and Jones, 2001; Tu, 2005) and the decision tree in RF (Criminisi et al., 2009) uses an image feature in a tree node. Often a highly redundant feature pool is formed to cover large appearance variation in the object. Learning the weak learner or the decision tree hence becomes a feature selection process.

    An ANN consists of an interconnected group of nodes as shown in Figure 1.5 (d), each circular node representing a neuron and an arrow representing a connection from the output of one neuron to the input of another. A deep learning method concerns an ANN with multiple hidden layers. Often a neuron takes the following form σ(w x + b), where x is the input vector to the neuron, y is the output of the neuron, w is the weight vector, b is the bias term, and σ is a nonlinear function such as a sigmoid function. The final output from the ANN (say with one hidden layer and one node in the output layer) is

       (1.12)

    where wh is the weight vector for the input vector to the node h in the hidden layer, αh is the weight coefficient from the hidden node h to the output node. Typically, the weights for all neurons are learned using stochastic gradient descent. Since combining the input using weighted linear coefficients amounts to feature computation, ANN training performs feature learning.

    The goal of multi-class classification is to classify an input x into one of J > 2 class labels. The LogitBoost algorithm (Friedman et al., 2000) fits an additive symmetric logistic model via the maximum-likelihood principle. This fitting proceeds iteratively by selecting weak learners and combining them into a strong classifier. The output of the LogitBoost algorithm is a set of J response functions {Fj(x);j = 1,…,J}, where each Fj(x) is a linear combination of a subset of weak learners:

       (1.13)

    is a weak learner and n is the number of weak learners. LogitBoost provides a natural way to calculate the posterior distribution of class label:

       (1.14)

    To use the LogitBoost for image classification, the weak classifiers are associated with image features. Refer to Zhou et al. (2006) for more details.

    1.5.2.2 Regression

    Regression (Hastie et al., 2001) finds the solution to the following minimizing problem:

       (1.15)

    are training examples, L(°,°) is the loss function that penalizes the deviation of the regressor output g(x) from the true output y, λ > 0 is the regularization coefficient that controls the degree of regularization, and K(g) is the regularization term that combats overfitting. Regularization often imposes a certain smoothness constraint on the output function or reflects some prior belief about the output. There are many regression approaches (Hastie et al., 2001) in the literature; here we briefly review boosting regression and regression forest, which are often used for object detection.

    As in any boosting procedure (Freund and Schapire, 1997; Friedman et al., 2000), boosting regression assumes that the regression output function g(x) takes an additive form: gt(x) = gt−1(x) + ht(x). Boosting is an iterative algorithm that leverages the additive nature of g(x). At the tth iteration, one more weak function ht(x) is added to the target function g(x) to maximally reduce the cost function as follows:

       (1.16)

    where rt(xn) = yn − gt−1(xn) is the residual and the L² loss function is used. To derive Eq. (1.16), the regularization term K(g) is chosen to take an additive form:

    . In Zhou (2010), the ridge regression principle (also known as Tikhonov regularization) is incorporated into a boosting framework to penalize overly complex models and the image features are connected with weak learners. This leads to the image-based boosting ridge regression framework.

    Similar to RF for classification, regression forest (Breiman, 2001; Criminisi et al., 2013) is a collection of regression trees that jointly predict continuous output(s). To learn a random regression tree, either the training examples for each regression tree are i.i.d. sampled from the full training set or the features used in the tree nodes are i.i.d. sampled from the full feature set or both. Training the node of a regression tree is typically done by maximizing an information gain measure, variance reduction, or optimizing other splitting criteria. Unlike the boosting regression that is a black box to predict the output, the regression forest carries a probabilistic nature that provides a confidence measure with the predicted output. Figure 1.6 shows a graphical illustration of regression forest.

    Figure 1.6 Graphical illustrations of regression forest proposed in Criminisi et al. (2013). Reprinted with permission, ©2013 Elsevier.

    1.6 Medical Image Segmentation Methods

    Assuming the object is recognized or localized, the next step is to perform precise image segmentation again using the local context between the shape and appearance. Medical image segmentation is about partitioning a medical image into multiple segments or regions, each segmentation or region composed of a set of pixels or voxels. Often, segments correspond to semantically meaningful anatomical objects. Here we review a few image segmentation methods for segmenting a single object. The remaining book chapters will cover methods that handle multiple objects (aka medical image parsing) and/or utilize machine learning for more effective modeling of appearance and shape.

    1.6.1 Simple Image Segmentation Methods

    Thresholding is the simplest segmentation method by converting a gray-scale image into a binary image based on clip levels (or thresholds). The key is to utilize a proper threshold value.

    Clustering often invokes the K-means algorithm to assign a pixel or voxel into one of the K cluster labels to which its distance is the minimal. The cluster centers are then computed again using all current pixels belonging to the cluster. This iteration goes on until convergence.

    Region growing assumes that the neighboring pixels (voxels) within one region share similar values. Starting from a set of seed pixel (voxel), the regions are iteratively grown, merging new unallocated neighboring pixels (voxels) into the region if the unallocated pixel (voxel) is close enough to those in the region.

    For all the previously mentioned methods, the computation can happen on a feature image rather the original image or the distance or similarity can be computed based on image feature values. With a proper feature choice, segmentation becomes more robust.

    1.6.2 Active Contour Method

    The active contour model or snake (:

       (1.17)

    where μ controls the magnitude of the potential, ∇ is the gradient operator, I is the image, w1(s) controls the tension of the curve, and w2(s) controls the rigidity of the curve. The implicit assumption of the snake model is that edge defines the curve due to the use of the gradient operator. The gradient descent minimization computes the force on the snake, defined as the negative of the gradient of the energy field, which evolves the curve. Important variants of the active contour model include the gradient vector flow snake model (Xu and Prince, 1998), geodesic active contour (Caselles et al., 1997), etc.

    1.6.3 Variational Methods

    In the Mumford-Shah variational method (, together with two constants ui and u:

       (1.18)

    , ui and uis the length of the curve. The region homogeneity is assumed here.

    1.6.4 Level Set Methods

    A level set function ϕ is an implicit shape representation with the boundary of the shape being the zero level set of ϕ. The advantage of the level set model (Osher and Sethian, 1988) is that it allows tractable numerical computations involving curves and surfaces on a fixed Cartesian grid and it is easy to follow shapes with changed topology. In Chan and Vese (2001), the authors propose a model that unifies level set and variational method into one framework. This results in an active contour evolution that does not explicitly depend on image edges.

    1.6.5 Active Shape Models and Active Appearance Models

    ASMs (Cootes et al., 1995) and active appearance models (AAMs) (Cootes et al., 2001) are two of the most popular model-based segmentation methods, in which a model is learned offline and fitted online to an unseen image. In ASM, based on a point-based representation, a shape model is learned via PCA as depicted in Eq. (1.2) in Section 1.3. To fit the model, typically a line search is first performed for every point to deform the shape to best match the image evidence, and then the deformed shape is constrained to conform to the learned statistical shape model.

    An AAM further includes the image appearance, in addition to the shape, into the statistical model. It jointly characterizes the appearance I using a linear generative model:

       (1.19)

    is the mean appearance in a normalized patch, and λ is the blending coefficient vector shared by both the shape and appearance. To fit the AAM parameters, it takes an analysis-by-synthesis approach by minimizing the deviation from the estimated appearance as parameterized by AAM and the target image. This optimization is driven by the difference between the current estimate of appearance and the target image; often it can match to new images efficiently.

    1.6.6 Graph Cut Method

    A graph G = (V,E) comprises a set V of vertices or nodes together with a set E of edges or links, each edge linking two vertices. In graph-based methods, the image grid points are often regarded as nodes in a graph and the neighboring pixels (voxels) are connected with edges. This is equivalent to a Markov random field formulation. With this, image segmentation becomes a graph cut problem that labels the nodes with different labels and hence splits the graph into subgraphs. Define L = {Lp|p I} as the binary labeling function that labels all pixels in the image I as 0 or 1. Mathematically, the graph cut problem (Boykov et al., 2001; Boykov and Funka-Lea, 2006) seeks the optimal binary function that minimizes the following energy function:

       (1.20)

    where Dp(Lp) is the unary data term that determines the cost of assigning the pixel p to a label Lpis the set of all pairs of neighboring pixels, and Vp,q is the pairwise interaction function that encourages the neighboring pixels with similar properties (such as intensity) to be assigned the same label.

    1.7 Conclusions

    In this chapter, we have introduced a probabilistic formulation that unifies medical image recognition, segmentation, and parsing into one framework. This is due to the use of a rough-to-exact representation and simple-to-complex modeling. A general-purpose computational pipeline then results. We have demonstrated that practical algorithms are a special instance of such a computational pipeline with customized architecture and/or modeling choices. Then we have defined the concept of anatomical context and discussed the use of discriminative learning methods for recognition. We have also reviewed modern classification and regression methods. Finally we have offered a brief review of classical image segmentation methods for segmenting a single object.

    The rest of the book provides a comprehensive review of medical image recognition and parsing, assembling a collection of generic theories for recognizing or detecting and parsing or segmenting a cohort of anatomical structures from medical images and a variety of specific solutions for known anatomical structures. The underlying basis of these new approaches is that, unlike conventional algorithms, they exploit the inherent anatomical context embedded in the medical images and best exemplified by annotated datasets and modern machine learning paradigms, thus offering automatic, accurate, and robust algorithms for recognition and parsing of multiple anatomical structures from medical images. The latest theories related to multiple object segmentation and parsing are sufficiently addressed.

    Recommended Notations

    Throughout the whole book, we utilize the following notations as in Table 1.1 unless otherwise specified. If necessary, each chapter might introduce its own set of notations.

    Table 1.1

    A List of Notations

    References

    Bengio Y. Learning deep architectures for AI. Found. Trends Mach. Learn. 2009;2(1):1–127.

    Bishop C.M., et al. Neural Networks for Pattern Recognition. Oxford: Clarendon Press; 1995.

    Boykov Y., Funka-Lea G. Graph cuts and efficient ND image segmentation. Int. J. Comput. Vis. 2006;70(2):109–131.

    Boykov Y., Veksler O., Zabih R. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 2001;23(11):1222–1239.

    Breiman L. Random forests. Mach. Learn. 2001;45(1):5–32.

    Caselles V., Kimmel R., Sapiro G. Geodesic active contours. Int. J. Comput. Vis. 1997;22(1):61–79.

    Chan T.F., Vese L.A. Active contours without edges. IEEE Trans. Image Process. 2001;10(2):266–277.

    Cootes T.F., Taylor C.J., Cooper D.H., Graham J. Active shape models—their training and application. Comput. Vis. Image Underst. 1995;61(1):38–59.

    Cootes T.F., Edwards G.J., Taylor C.J. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 2001;23(6):681–685.

    Criminisi A., Shotton J., Bucciarelli S. Decision forests with long-range spatial context for organ localization in CT volumes. In: MICCAI Workshop on Prob. Models for MIA. 2009.

    Criminisi A., Robertson D., Konukoglu E., Shotton J., Pathak S., White S., Siddiqui K. Regression forests for efficient anatomy detection and localization in computed tomography scans. Med. Image Anal. 2013;17:1293–1303.

    Davis R., Shrobe H., Szolovits P. What is a knowledge representation? AI Magazine. 1993;14(1):17.

    Freund Y., Schapire R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997;55(1):119–139.

    Friedman J., Hastie T., Tibbshirani R. Additive logistic regression: a statistical view of boosting. Ann. Stat. 2000;28(2):337–407.

    Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning. New York: Springer; 2001.

    Heimann T., Meinzer H.P. Statistical shape models for 3D medical image segmentation: a review. Med. Image Anal. 2009;13(4):543–563.

    Hofmann T., Schölkopf B., Smola A.J. Kernel methods in machine learning. Ann. Stat. 2008;36(3):1171–1220.

    Kass M., Witkin A., Terzopoulos D. Snakes: active contour models. Int. J. Comput. Vis. 1988;1(4):321–331.

    Kohlberger T., Sofka M., Zhang J., Birkbeck N., Wetzl J., Kaftan J., Declerck J., Zhou S.K. Automatic multi-organ segmentation using learning-based segmentation and level set optimization. In: Springer, Heidelberg; 2011:338–345. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2011..

    Lu C., Zheng Y., Birkbeck N., Zhang J., Kohlberger T., Tietjen C., Boettger T., Duncan J.S., Zhou S.K. Precise segmentation of multiple organs in CT volumes using learning-based approach and information theory. In: Springer, Heidelberg; 2012:462–469. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2012..

    Mumford D., Shah J. Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 1989;42(5):577–685.

    Nain D., Haker S., Bobick A., Tannenbaum A. Shape-driven 3D segmentation using spherical wavelets. In: Springer, Heidelberg; 2006:66–74. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2006..

    Osher S., Sethian J.A. Fronts propagating with curvature-dependent speed: algorithms based on Hamilton- Jacobi formulations. J. Comput. Phys. 1988;79(1):12–49.

    Pizer S.M., Fletcher P.T., Joshi S., Thall A., Chen J.Z., Fridman Y., Fritsch D.S., Gash A.G., Glotzer J.M., Jiroutek M.R., et al. Deformable M-Reps for 3D medical image segmentation. Int. J. Comput. Vis. 2003;55(2/3):85–106.

    Polikar R. Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 2006;6(3):21–45.

    Shen L., Farid H., McPeek M.A. Modeling three-dimensional morphological structures using spherical harmonics. Evolution. 2009;63(4):1003–1016.

    Tu Z. Probabilistic boosting-tree: learning discriminative methods for classification, recognition, and clustering. In: Proc. Int. Conf. Computer Vision. 1589–1596. 2005;2.

    Vapnik V. The Nature of Statistical Learning Theory. New York: Springer; 1999.

    Viola P., Jones M. Rapid object detection using a boosted cascade of simple features. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition. 2001:511–518.

    Wu D., Sofka M., Birkbeck N., Zhou S.K. Segmentation of multiple knee bones from CT for orthopedic knee surgery planning. In: Springer, Heidelberg; 2014:372–380. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2014..

    Xu C., Prince J.L. Snakes, shapes, and gradient vector flow. IEEE Trans. Image Process. 1998;7(3):359–369.

    Zheng Y., Barbu A., Georgescu B., Scheuering M., Comaniciu D. Four-chamber heart modeling and automatic segmentation for 3D cardiac CT volumes using marginal space learning and steerable features. IEEE Trans. Med. Imaging. 2008;27(11):1668–1681.

    Zhou S.K. Shape regression machine and efficient segmentation of left ventricle endocardium from 2D B-mode echocardiogram. Med. Image Anal. 2010;14:563–581.

    Zhou S.K., Park J.H., Georgescu B., Simopoulos C., Otsuki J., Comaniciu D. Image-based multiclass boosting and echocardiographic view classification. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition. 1559–1565. 2006;2.


    a http://sig.biostr.washington.edu/projects/fm/AboutFM.html.

    b http://en.wikipedia.org/wiki/Terminologia_Anatomica.

    c http://www.oecd-ilibrary.org/social-issues-migration-health/computed-tomography-ct-exams-total_ct-exams-tottable-en.

    d In texts or equations, we always use 3D as an example unless otherwise noted; in principle, applying 3D to 2D is an easy task. However, in Figure 1.2, we use 2D as an example for graphical illustration.

    Part 1

    Automatic Recognition and Detection Algorithms

    Chapter 2

    A Survey of Anatomy Detection

    S. Kevin Zhou    Medical Imaging Technologies, Siemens Healthcare Technology Center, Princeton, NJ, USA

    Abstract

    Detecting a single anatomy or a plurality of anatomical objects, such as landmarks or organs, in a medical image is important yet challenging. An anatomy detection method has to address offline model learning complexity related to modeling the appearance of a single object or a plurality of objects and online computational complexity related to search or inference strategy. In the chapter, we present a survey of discriminative learning methods for appearance modeling as well as their corresponding search strategies and discuss how they leverage the anatomical context embedded in the medical image for more effective and more efficient detection. In particular, we elaborate approaches for detecting a single anatomy and mention several approaches for detecting multiple anatomies, which are covered in detail in subsequent chapters.

    Keywords

    Anatomy detection

    Multiple object detection

    Multiple landmark detection

    Discriminative learning

    Search strategy

    Anatomical context

    Chapter Outline

    2.1 Introduction   26

    2.2 Methods for Detecting an Anatomy   27

    2.2.1 Classification-Based Detection Methods   27

    2.2.1.1 Boosting detection cascade   27

    2.2.1.2 Probabilistic boosting tree   28

    2.2.1.3 Randomized decision forest   28

    2.2.1.4 Exhaustive search to handle pose variation   29

    2.2.1.5 Parallel, pyramid, and tree structures   29

    2.2.1.6 Network structure: Probabilistic boosting network   30

    2.2.1.7 Marginal space learning   31

    2.2.1.8 Probabilistic, hierarchical, and discriminant framework   32

    2.2.1.9 Multiple instance boosting to handle inaccurate annotation   33

    2.2.2 Regression-Based Detection Methods   34

    2.2.2.1 Shape regression machine   34

    2.2.2.2 Hough forest   35

    2.2.3 Classification-Based vs Regression-Based Object Detection   36

    2.3 Methods for Detecting Multiple Anatomies   37

    2.3.1 Classification-Based Methods   38

    2.3.1.1 Discriminative anatomical network   38

    2.3.1.2 Active scheduling   38

    2.3.1.3 Submodular detection   39

    2.3.1.4 Integrated detection network   39

    2.3.2 Regression-Based Method: Regression Forest   40

    2.3.3 Combining Classification and Regression: Context Integration   40

    2.4 Conclusions   41

    References   42

    2.1 Introduction

    Detecting a single anatomy or a plurality of anatomical objects, such as landmarks or organs, in a medical image is an important but challenging task. Here we define an anatomical landmark as a distinct point in a body scan that coincides with anatomical structures, such as liver top, aortic arch, pubis symphysis, to name a few. From anatomical object detection, body regions can be determined (Liu and Zhou, 2012) to trigger subsequent, computationally intensive applications such as computer-assisted diagnosis. Anatomy detection also provides initialization to image segmentation (Rangayyan et al., 2009) and registration (Johnson and Christensen, 2002; Crum et al., 2004) and enables applications such as semantic reporting (Seifert et al., 2010), optimal organ display (Pauly et al., 2011), etc. It is challenging as it has to deal with significant appearance variations due to sensor noise, patient difference and motion, pathology, contrast agent, partial scan and narrow field of view, weak contrast between soft tissues, etc.

    General object detection is well studied in computer vision. Discriminative modeling is prevalent for modeling general object appearance. Unlike natural scene images, medical images manifest additional contextual information, such as a limited number of anatomical objects (e.g. only one left ventricle [LV]), constrained and structured background, strong prior information about the pose parameter, etc.; hence discriminative learning methods should exploit such contextual information to reduce model learning complexity. This survey will focus on such discriminative learning methods for anatomical object detection. In particular, we will discuss two major detection approaches, classification-based and regression-based, and their corresponding learning complexities. Note that this survey is not meant for detecting general objects such as faces, cars, and pedestrians in natural scenes. For surveys of general object detection, refer to Hjelmås and Low (2001), Yang et al. (2002), Enzweiler and Gavrila (2009), and Geronimo et al.

    Enjoying the preview?
    Page 1 of 1