Vous êtes sur la page 1sur 139

3D Shape Analysis for Quantication, Classication, and Retrieval

Indriyati Atmosukarto
A dissertation submitted in partial fulllment of
the requirements for the degree of
Doctor of Philosophy
University of Washington
2010
Program Authorized to Oer Degree: Computer Science and Engineering
University of Washington
Graduate School
This is to certify that I have examined this copy of a doctoral dissertation by
Indriyati Atmosukarto
and have found that it is complete and satisfactory in all respects,
and that any and all revisions required by the nal
examining committee have been made.
Chair of the Supervisory Committee:
Linda G. Shapiro
Reading Committee:
Linda G. Shapiro
James F. Brinkley III
Maya Gupta
Date:
In presenting this dissertation in partial fulllment of the requirements for the doctoral
degree at the University of Washington, I agree that the Library shall make its copies
freely available for inspection. I further agree that extensive copying of this dissertation is
allowable only for scholarly purposes, consistent with fair use as prescribed in the U.S.
Copyright Law. Requests for copying or reproduction of this dissertation may be referred
to Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346,
1-800-521-0600, to whom the author has granted the right to reproduce and sell (a) copies
of the manuscript in microform and/or (b) printed copies of the manuscript made from
microform.
Signature
Date
University of Washington
Abstract
3D Shape Analysis for Quantication, Classication, and Retrieval
Indriyati Atmosukarto
Chair of the Supervisory Committee:
Professor Linda G. Shapiro
Computer Science and Engineering
Three-dimensional objects are now commonly used in a large number of applications includ-
ing games, mechanical engineering, archaeology, culture, and even medicine. As a result,
researchers have started to investigate the use of 3D shape descriptors that aim to encapsu-
late the important shape properties of the 3D objects. This thesis presents new 3D shape
representation methodologies for quantication, classication and retrieval tasks that are
exible enough to be used in general applications, yet detailed enough to be useful in medical
craniofacial dysmorphology studies. The methodologies begin by computing low-level fea-
tures at each point of the 3D mesh and aggregating the features into histograms over mesh
neighborhoods. Two dierent methodologies are dened. The rst methodology begins by
learning the characteristics of salient point histograms for each particular application, and
represents the points in a 2D spatial map based on longitude-latitude transformation. The
second methodology represents the 3D objects by using the global 2D histogram of the
azimuth-elevation angles of the surface normals of the points on the 3D objects.
Four datasets, two craniofacial datasets and two general 3D object datasets, were ob-
tained to develop and test the dierent shape analysis methods developed in this thesis.
Each dataset has dierent shape characteristics that help explore the dierent properties of
the methodologies. Experimental results on classifying the craniofacial datasets show that
our methodologies achieve higher classication accuracy than medical experts and existing
state-of-the-art 3D descriptors. Retrieval and classication results using the general 3D ob-
jects show that our methodologies are comparable to existing view-based and feature-based
descriptors and outperform these descriptors in some cases. Our methodology can also be
used to speed up the most powerful general 3D object descriptor to date.
TABLE OF CONTENTS
Page
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2: Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 3D Descriptors for General Objects . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Medical Craniofacial Assessment . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3: Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 22q11.2 Deletion Syndrome(22q11.2DS) Dataset . . . . . . . . . . . . . . . . 15
3.2 Deformational Plagiocephaly Dataset . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Heads Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 SHREC Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 4: Base Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1 Low-level Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Mid-level Feature Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 5: Learning Salient Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1 Learning Salient Points for 22q11.2 Deletion Syndrome . . . . . . . . . . . . . 26
5.2 Learning Salient Points for Deformational Plagiocephaly . . . . . . . . . . . . 28
5.3 Learning Salient Points for General 3D Objects . . . . . . . . . . . . . . . . . 30
Chapter 6: 2D Longitude-Latitude Salient Map Signature . . . . . . . . . . . . . . 34
6.1 Salient Point Pattern Projection . . . . . . . . . . . . . . . . . . . . . . . . . 34
i
6.2 Classication using 2D Map Signature . . . . . . . . . . . . . . . . . . . . . . 36
6.3 Retrieval using 2D Map Signature . . . . . . . . . . . . . . . . . . . . . . . . 45
6.4 Retrieval using Salient Views . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Chapter 7: Global 2D Azimuth-Elevation Angles Histogram of Surface Normal
Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.1 3D Shape Severity Quantication and Localization for Deformational Plagio-
cephaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2 Classication of 22q11.2 Deletion Syndrome . . . . . . . . . . . . . . . . . . . 78
7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Chapter 8: Learning 3D Shape Quantication for Craniofacial Research . . . . . . 83
8.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.2 Facial Region Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.3 2D Histogram of Azimuth Elevation Angles . . . . . . . . . . . . . . . . . . . 86
8.4 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.5 Feature Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Chapter 9: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
ii
LIST OF FIGURES
Figure Number Page
1.1 Example of applications that use 3D objects . . . . . . . . . . . . . . . . . . . 2
2.1 Anthropometric landmarks on patients head . . . . . . . . . . . . . . . . . . 12
3.1 Example of 3D face mesh data of children with 22q11.2 deletion syndrome. . 16
3.2 Tops of heads of children with deformational plagiocephaly. . . . . . . . . . . 17
3.3 Example of objects in the Heads dataset. . . . . . . . . . . . . . . . . . . . . 19
3.4 Example morphs from the horse class . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Example of objects in the SHREC 2008 Classication dataset . . . . . . . . . 20
4.1 Low-level feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Azimuth and elevation angle of a 3D surface normal vector. . . . . . . . . . . 24
4.3 Mid-level feature aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1 Craniofacial anthropometric landmarks. . . . . . . . . . . . . . . . . . . . . . 27
5.2 Example of training points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3 Example histograms of salient and non-salient points . . . . . . . . . . . . . . 29
5.4 Salient point prediction for two faces in the 22q11.2DS dataset . . . . . . . . 29
5.5 Salient point prediction for training data in Heads dataset . . . . . . . . . . . 31
5.6 Salient point prediction for testing data in Heads dataset . . . . . . . . . . . 31
5.7 Salient point prediction for objects in SHREC 2008 dataset . . . . . . . . . . 32
6.1 Salient point patterns on 3D objects . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 2D longitude-latitude signature maps . . . . . . . . . . . . . . . . . . . . . . . 36
6.3 Classication accuracy vs training rotation angle increment. . . . . . . . . . . 42
6.4 Comparison of retrieval results . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.5 Comparison of retrieval results . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6 Salient points resulting from clustering. . . . . . . . . . . . . . . . . . . . . . 54
6.7 Salient view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.8 Salient views vs Distinct salient views . . . . . . . . . . . . . . . . . . . . . . 56
6.9 Top 5 distinct salient views in SHREC dataset . . . . . . . . . . . . . . . . . 57
6.10 Average retrieval scores using top K salient views . . . . . . . . . . . . . . . . 59
iii
7.1 Surface normal vectors of 3D points . . . . . . . . . . . . . . . . . . . . . . . 66
7.2 Calculation of the Flatness Scores . . . . . . . . . . . . . . . . . . . . . . . . 67
7.3 Severity localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.4 Spectrum of deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.5 Correlation between LPFS and Expert Score . . . . . . . . . . . . . . . . . . 70
7.6 Correlation between RPFS and Expert Score . . . . . . . . . . . . . . . . . . 70
7.7 Correlation between AS and Expert Score . . . . . . . . . . . . . . . . . . . . 72
7.8 Correlation between AAS and Expert Score . . . . . . . . . . . . . . . . . . . 72
7.9 Correlation between AAS and aOCLR . . . . . . . . . . . . . . . . . . . . . . 74
7.10 ROC curve for LPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.11 ROC curve for RPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.12 ROC curve for AS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.13 ROC curve for AAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.14 Correlation between AAS and Brachycephaly score . . . . . . . . . . . . . . . 76
7.15 ROC curve for AAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.16 Projections of 2D azimuth-elevation angles to the face . . . . . . . . . . . . . 81
8.1 Overview of the quantication learning framework. . . . . . . . . . . . . . . . 83
8.2 Facial region selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.3 2D histogram of selected region . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.4 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.5 Positional information about selected region . . . . . . . . . . . . . . . . . . . 89
8.6 Positional information about selected region with normal vector . . . . . . . . 90
8.7 Output of the genetic programming quantication approach . . . . . . . . . . 91
8.8 F-measure for training and testing dataset . . . . . . . . . . . . . . . . . . . . 94
8.9 Projection of selected histogram bins . . . . . . . . . . . . . . . . . . . . . . . 100
8.10 Tree structure for quantifying midface hypoplasia . . . . . . . . . . . . . . . . 103
8.11 Tree structure for quantifying nasal facial abnormalities . . . . . . . . . . . . 105
8.12 Tree structure for quantifying nasal facial abnormalities . . . . . . . . . . . . 106
8.13 Tree structure for quantifying oral facial abnormalities . . . . . . . . . . . . . 107
8.14 Tree structure for quantifying oral facial abnormalities . . . . . . . . . . . . . 108
8.15 Quantication score for midface hypoplasia. . . . . . . . . . . . . . . . . . . . 109
iv
LIST OF TABLES
Table Number Page
4.1 Besl-Jain surface characterization. . . . . . . . . . . . . . . . . . . . . . . . . 23
6.1 Classication performance for 22q11.2DS. . . . . . . . . . . . . . . . . . . . . 37
6.2 Overall comparison of the various shape descriptors. . . . . . . . . . . . . . . 38
6.3 Comparison of classication accuracy for 22q11.2DS. . . . . . . . . . . . . . . 38
6.4 Plagiocephaly classication using 254 individual dataset . . . . . . . . . . . . 39
6.5 Plagiocephaly classication using 140 individuals dataset . . . . . . . . . . . . 39
6.6 Comparison of classication accuracy for plagiocephaly. . . . . . . . . . . . . 40
6.7 Comparison of classication accuracy for SHREC 2008 dataset. . . . . . . . . 43
6.8 Comparison of timing of each phase . . . . . . . . . . . . . . . . . . . . . . . 44
6.9 Pose-normalized retrieval experiment 2 . . . . . . . . . . . . . . . . . . . . . . 46
6.10 Average retrieval score comparing three pose-normalization methods. . . . . . 48
6.11 Average retrieval score using dierent low-level features . . . . . . . . . . . . 48
6.12 Average retrieval score using image wavelet analysis . . . . . . . . . . . . . . 49
6.13 Comparing the salient map signature best results against existing methods. . 49
6.14 Comparing retrieval score for classes in SHREC dataset . . . . . . . . . . . . 50
6.15 Average retrieval score using salient views . . . . . . . . . . . . . . . . . . . . 62
6.16 Retrieval score using maximum number of distinct views . . . . . . . . . . . . 63
6.17 Average feature extraction runtime per object. . . . . . . . . . . . . . . . . . 64
7.1 Descriptive statistics for the Left Posterior Flatness Score (LPFS) . . . . . . 71
7.2 Descriptive statistics for the Right Posterior Flatness Score (RPFS) . . . . . 73
7.3 Descriptive statistics for the Asymmetry Score (AS) . . . . . . . . . . . . . . 78
7.4 Descriptive statistics for AAS and aOCLR . . . . . . . . . . . . . . . . . . . . 79
7.5 AUC for quantifying posterior attening . . . . . . . . . . . . . . . . . . . . . 80
7.6 Classication accuracy for plagiocephaly . . . . . . . . . . . . . . . . . . . . . 80
7.7 Classication of 22q11.2DS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.8 Classication accuracy of 22q11.2DS facial dysmorphologies . . . . . . . . . . 81
8.1 Genetic programming parameters. . . . . . . . . . . . . . . . . . . . . . . . . 92
v
8.2 Classication performance for nine facial anomalies using GP . . . . . . . . . 93
8.3 Classication performance using various shape descriptors . . . . . . . . . . . 95
8.4 Comparing GP to the global approaches . . . . . . . . . . . . . . . . . . . . . 96
8.5 GP mathematical expressions for midface hypoplasia . . . . . . . . . . . . . . 97
8.6 GP mathematical expressions for midface hypoplasia . . . . . . . . . . . . . . 98
8.7 Coecients for midface hypoplasia . . . . . . . . . . . . . . . . . . . . . . . . 99
8.8 Best performing mathematical expression . . . . . . . . . . . . . . . . . . . . 101
8.9 Best performing mathematical expressions . . . . . . . . . . . . . . . . . . . . 102
8.10 Classication performance in predicting 22q11.2 Deletion Syndrome. . . . . . 104
vi
ACKNOWLEDGMENTS
I wish to express a very deep and sincere gratitude to my advisor, Professor Linda
Shapiro, without whose guidance, encouragement and support I would not be able to com-
plete this PhD study. I have learned tremendously from her on how to become an excellent
researcher and writer, especially one in the eld of computer vision.
I am very grateful to all the members of my PhD thesis committee, Dr Maya Gupta,
Dr James Brinkley, Dr Steve Seitz, and Dr Mark Ganther, for their useful feedbacks and
comments.
I would also like to thank my collaborators at Seattle Childrens Hospital Cranifoacial
Centre: Dr Michael Cunningham, Dr Matthew Speltz, Dr Carrie Heike, Dr Brent Collett,
for providing me with the medical 3D mesh data for this dissertation, as well as for their
engaging discussions and suggestions.
I owe an indescribable amount of gratitude to my parents, my sisters, and my niece for
having condence in me, always encouraging me and cheering me up when I am down.
Finally, I reserve special thanks for my husband, David Gomulya, for being my best
friend and a great supporter, and my son, Kiran, for bringing new joy into my life.
This research was supported by the National Science Foundation under grant number
DBI-0543631.
vii
DEDICATION
to my son
Kiran Atmosukarto Gomulya
our Ray of Light
viii
1
Chapter 1
INTRODUCTION
1.1 Motivation
Advancement in technology for digital acquisition of 3D models has led to an increase in
the number of 3D objects available. Three-dimensional objects are now commonly used
in a number of areas such as games, mechanical design for CAD models, archaeology and
cultural heritage, and medical research studies. Figure 1.1 shows some applications that use
3D objects. The widespread integration of 3D models in dierent elds motivates the need
to be able to store, index, classify, and retrieve 3D objects automatically. However, current
classication and retrieval techniques for text, 2D images, and videos cannot be directly
translated and applied to 3D objects, as 3D objects have dierent data characteristics from
other data modalities.
Classication and retrieval of 3D objects requires the 3D objects to be represented in
a way that captures the local and global shape characteristics of the object. This requires
creating a 3D descriptor or signature that summarizes the important shape properties of the
object. Unfortunately, nding a descriptor that is able to describe the important character-
istics of a 3D object is not a trivial task. The descriptor should be able to capture a good
balance between the global and local shape properties of the object, so as to allow exibility
in performing dierent tasks. The global properties of an object capture the overall shape
of an object, while the local properties capture the details of an object.
A specic example of the usage of 3D models in the medical eld is the work that re-
searchers at Seattle Childrens Hospital Craniofacial Center (SCHCC) are pursuing. The
researchers at SCHCC use CT scans and 3D surface meshes of childrens heads to inves-
tigate head shape dysmorphology due to craniofacial disorders such as craniosynostosis,
22q11.2 deletion syndrome, deformational plagiocephaly, or cleft lip and palate. These
researchers aspire to develop new computational techniques that can represent, quantify,
2
(a) (b) (c) (d)
Figure 1.1: Example of applications that use 3D objects: (a) Second Life is a game that
simulates a virtual 3D world, (b) The Digital Michelangelo is a Stanford project that aims
to digitize cultural artifacts for cataloging, conservation, and restoration, (c) FoldIt! is a
computer game that uses 3D protein structures to understand how proteins fold for use in
drug developments, and (d) Plan3D is an interior design application that allows users to
incorporate 3D models in house designs.
and analyze variants of biological morphology from the 3D models acquired from stereo
camera technology. The objective of their research in the long run is to ultimately reveal
genotype-phenotype disease associations.
This thesis investigates new methodologies for representing 3D objects that are useful
in medical applications. Most existing 3D shape descriptors have only been developed and
tested on general 3D object datasets, while those designed for medical purposes must usually
satisfy a specic medical application and dataset. The objective of this work is to develop
3D shape representation methodologies that are exible enough to generalize from specic
medical tasks to general 3D object tasks. This work was motivated by the collaborations in
two research studies at SCHCC for the study of craniofacial anatomy: 1) a study of children
with 22q11.2 deletion syndrome and 2) a study of infants with deformational plagiocephaly.
22q11.2 deletion syndrome (22q11.2DS) is a genetic disease that is one of the most com-
mon multiple anomaly syndromes in humans [41]. This condition is associated with more
than 180 clinical features, including over 25 dysmorphic craniofacial features. Abnormal
clinical features of individuals with 22q11.2DS include asymmetric face shape, hooded eyes,
bulbous nasal tip, and retrusive chin, among others. The range of variation in individual
feature expression is very large. As a result, even experts have diculty in diagnosing
22q11.2DS from frontal facial photographs alone [9]. Early detection of 22q11.2DS is im-
3
portant as many aected individuals are born with conotruncal cardiac anomalies, mild-to-
moderate immune deciencies and learning disabilities, all of which can benet from early
intervention.
Deformational plagiocephaly (also known as positional plagiocephaly, or non-synostotic
plagiocephaly) refers to the deformation of the head, characterized by a persistent attening
on the side resulting in an asymmetric head shape and misalignment of the ears. Deforma-
tional plagiocephaly is caused by persistent pressure on the skull of a baby before or after
birth. Another possible factor that can lead to deformational plagiocephaly is torticollis, a
muscle tightness in the neck resulting in a limited range of motion for the head that causes
infants to look in one direction and to rest on the same spot of the back of the head. If left
untreated, children with these abnormal head shape conditions may experience a number
of medical issues in their lives, ranging from social problems due to abnormal appearance
to delayed neurocognitive development [18, 77].
1.2 Problem Statement
Motivated by collaborations with researchers at SCHCC, this thesis develops 3D shape
representation methodologies that can be used for 3D shape classication, retrieval, and
quantication. The methodologies provide exibility to generalize usage for both specic
medical datasets and general 3D objects. The following three general problems are tackled.
Problem 1: 3D shape quantication
Given surface mesh S
i
, which consists of n points and information regarding the connec-
tivity of the points, the goal is to analyze and describe the shape S
i
by constructing a
numeric representation of mesh S
i
commonly referred to as a signature or descriptor D
i
. A
quantitative score may also be calculated from the obtained signature.
Problem 2: 3D shape classication
Given a database of 3D shapes S = {S
1
, S
2
, ..., S
N
} that have been quantied and described
using their respective numeric signatures D
i
, 1 i N, and are pre-classied into a num-
ber of C classes, the goal is to create an algorithm that can be used to determine to which
4
class a new 3D object Q belongs.
Problem 3: 3D shape retrieval
Given a database of 3D shapes S = {S
1
, S
2
, ..., S
N
} that have been quantied and described
using their respective numeric signature D
i
, 1 i N, the goal is to create an algorithm
that retrieves all objects in S that are similar to a query object Q based on their numeric
signatures.
1.3 Thesis Outline
Chapter 2 discusses the literature related to the two main classes of research in this thesis:
3D object descriptor in the computer vision literature and craniofacial assessment in the
medical literature. The datasets used to develop and test the methodology are described in
Chapter 3. Chapter 4 explains the base framework for feature extraction. The method for
learning the salient points of a 3D object is explained and applied to dierent applications
in Chapter 5. Two dierent types of 3D object descriptors are introduced and analyzed in
Chapters 6 and 7. Chapter 6 describes the 2D longitude-latitude salient map signature and
investigates its application for classication and retrieval of both general 3D objects and
3D medical data. Chapter 7 covers the global 2D azimuth-elevation angles descriptor and
investigates its application for classication of deformational plagiocephaly and 22q11.2DS
datasets. A learning framework for quantication using genetic programming is described
in Chapter 8. Finally, Chapter 9 provides a summary and suggests possible future research
directions.
5
Chapter 2
RELATED LITERATURE
In this chapter, two main classes of research related to the work in this thesis are
described: 3D shape descriptors for general objects from the computer vision literature
and medical studies from the craniofacial literature.
2.1 3D Descriptors for General Objects
Three-dimensional shape analysis and its application in 3D object retrieval and classication
has received increased attention in the past few years. There have been several survey
papers on the topic [81, 82, 26, 95, 20, 36, 13, 14, 12, 52, 69]. Starting in 2006, researchers
in the area have taken the initiative to organize an annual 3D shape retrieval evaluation
contest called SHREC (SHape REtrieval Contest), currently organized by the Network of
Excellence AIM@SHAPE. The contests general objective is to evaluate the eectiveness of
3D shape retrieval algorithms. Participants register for the contest before the test set is
made available. The participants are given 48 hours to apply their 3D retrieval algorithm
to the test set and submit their retrieval results to the organizers. The retrieval results are
evaluated using measurements that relate to precision and recall. The average performance
of each method over a set of queries is calculated to obtain an overall impression of the
algorithms performance. Using a common test set and queries allows a direct comparison
of the dierent algorithms. The contest started with a single-track using only the Princeton
benchmark database as the test set and has evolved into a multi-track contest. The tracks
now include retrieval of watertight models, CAD models, protein models, 3D face models,
and partial matching. Results of the contest show that no one descriptor performs the best
for all kinds of retrieval and classication task. Each descriptor has its own strength and
weakness for the dierent queries and tasks.
There are three broad categories of 3D object representation: feature-based methods,
6
graph-based methods, and view-based methods.
2.1.1 Feature-based methods
Feature-based 3D object descriptors, which are the most popular, can be further catego-
rized into: (1) global features, (2) global feature distributions, (3) spatial maps, and (4)
local features. Early work on 3D object representation and its application to retrieval and
classication focused more on the global features and global feature distribution approaches.
Global features computed to represent 3D objects include area, volume and moments. Elad
et al. [22] computed the moments properties of the object and used the vector value of the
moments as a descriptor for the object. Osada et al. [62] calculated a number of global
shape distributions to represent 3D objects. The shape functions measured included the
angle between three random points (A3), the distance between a point and a random point
(D1), the distance between two random points (D2), the area of the triangle between three
random points (D3), and the volume between four random points on the surface (D4).
Ohbuchi et al. [59] enhanced the D2 shape function by measuring not only the distance, but
also the mutual orientation of the surfaces on which the pair of points is located. Zaharia
et al. [96] introduced a 3D shape spectrum descriptor that computed the distribution of the
shape index of the points over the whole mesh. Similar distributions were also calculated for
other surface properties such as curvature. Some recent works continue to use the feature
distribution approach. Mahmoudi et al. [53] computed the histogram of pairwise diusion
distances between all points, while Ion et al. [35] dened their descriptor as the histogram
of the eccentricity transform. The histogram uses the maximum geodesic distance from a
point to all other points on the surface. The global feature methods are computationally
ecient, as they reduce the computation space of the 3D object by describing the object
with fewer dimension; however these methods are not discriminative enough when the ob-
jects have small dierences as in intra-class retrieval cases or classication of very similar
objects.
Spatial map representations describe the 3D object by capturing and preserving physical
locations on them. Saupe et al. [71] described a spherical extent function by calculating
7
the maximal extent of a shape across all rays from the origin. They compared two dierent
kinds of representation of the function: using spherical harmonics and moments. Their
results showed that using spherical harmonics to represent the function performed better.
The spherical harmonic coecients reconstruct an approximation of the object at dierent
resolutions. Kazhdan et al. [39] used this idea to show that spherical harmonics can be used
to transform rotation-dependent shape descriptors into rotation-independent ones without
the need to pose normalize the objects in advance. Their results showed that the applica-
tion of the spherical harmonic representation improved the performance of most spherical
function descriptors. Laga et al. [44, 43] uniformly sampled points on a unit sphere and
used spherical wavelet transforms to represent 3D objects. Spherical wavelet descriptors are
natural extensions of 3D Zernike moments and spherical harmonics; they oer better feature
localization and rotation invariance since spherical harmonics analysis has singularities at
each pole of the sphere.
Wavelets are basis functions that represent a given signal at multiple resolutions. Laga
investigated both second generation wavelets, including linear and buttery spherical wavelets
with a lifting scheme, and image wavelets with spherical boundary extension rules for con-
structing the shape descriptor [73, 68]. He proposed three descriptors based on the spherical
wavelets: using the coecients as feature vectors, using the L1 energy of the coecients,
and using the L2 energy of the coecients. Zhenbao et al. [51] compared their multireso-
lution wavelet analysis to the spherical wavelet descriptor and showed that their descriptor
performed slightly better. Their method characterized the shape orientation of the object
by setting six view planes and sampled the shape orientation from each of the view planes.
They then performed multiresolution wavelet analysis on each of the view planes and used
the wavelet coecients for each of the view planes as the feature vector. Assfalg et al. [5]
captured the shape of a 3D object using the curvature map of the objects surface. One of
the methods developed in this thesis is quite related to this approach, however it diers in
that it does not use the curvature information directly. Lastly, Tangelder et al. [80] devel-
oped a 3D spatial map by dividing the 3D object into a 3D grid with cells of equal sizes
and measuring the curvature property in each cell.
Recent research is beginning to focus more on the local approach to representing 3D
8
objects, as this approach has a stronger discriminative power when dierentiating objects
that are similar in overall shape [63]. Local features are often points that are considered
to be interesting or salient on the 3D object. These points are computed in various ways.
Some methods randomly select points on the surface of the object. Frome et al. [25] who
developed a 3D shape context and Johnson et al. [37] who designed spin image descriptors,
both randomly selected points as their basis points. Shilane et al. [75, 76] used random
points with harmonic shape descriptors at four dierent scales. Most other methods use
the local geometric properties of the 3D object such as curvature or normals to describe the
points on the surface of the object, and dene the level dierence extrema as the salient
points. Lee et al. [46] used mean curvature properties with the center-surround mechanism
to identify the extrema as nal salient points. A similar method was adopted by Li et
al. [47, 48] who found the reliable salient points by considering a set of extrema for a scale-
space representation of a point-based input surface and used the locations of level dierence
extrema as the salient feature points. Unnikrishnan et al. [83] presented a multi-scale
interest region detector that captures variation in shape at a point relative to the size of its
neighborhood. Their method used the the extrema of the mean curvature to identify the
salient points. Watanabe et al. [90] used salient extrema of the principal curvatures along
the curvature lines on the surface. Castellani et al. [15] proposed a new methodology for
detecting and matching salient points based on measuring how much a vertex is displaced
after ltering. The salient points are described using a local description based on a hidden
Markov model.
Ohbuchi et al. [60] rendered multiple views of a 3D model and extracted local features
from each view using the SIFT algorithm. The local features were then integrated into a
histogram using a bag-of-features approach to retrieval. Novatnack et al. [58] [57] extracted
corners and edges of a 3D model by rst parameterizing the surface of a 3D mesh model on
a 2D map and constructing a dense surface normal map. They then constructed a discrete
scale-space by convolving the normal map with Gaussian kernels of increasing standard
deviation. The corners and edges detected at individual scales were combined into a unied
representation for the 3D object. Akagunduz et al. [2] used a Gaussian pyramid at several
scales to extract the surface extrema and represented the points and their relationships by
9
a graphical model. Taati et al. [79] generated a local shape descriptor based on invariant
properties extracted from the principal component space of the local neighborhood around a
point. The salient points were selected based on ratios of basic dispersion properties. Other
examples of local descriptors include spin images [37, 4], point signature [17], and symbolic
signatures [70]. Some eorts have also been made in combining both the local and global
properties of the object. Alosaimi et al. [3] combined the information in a 2D histogram and
used the PCA coecients of the histograms concatenated to form a single feature vector.
Liu et al. [50, 49] represented a global 3D shape as the spatial conguration of a set of local
features. The spatial conguration was represented by computing the distributions of the
Euclidean distances between pairs of local shape clusters, represented by spin images.
2.1.2 Graph-based methods
While feature-based methods use only the geometric properties of the 3D model to dene
the shape of the object, graph-based methods use the topological information of the 3D
object to describe its shape. The graph that is constructed shows how the dierent shape
components are linked together. The graph representations include model graphs, Reeb
graphs, and skeleton graphs. These methods are known to be computationally expensive
and sensitive to small topological changes. Sundar et al. [78] used the skeletal graph as a
shape descriptor to encode both geometric and topological properties of the 3D object. The
similarity measures between two objects were approximated using a greedy algorithm for
bipartite graph matching. Hilaga et al. [30] introduced the use of Reeb graphs for matching
the shapes of articulated models.
2.1.3 View-based methods
The most eective view-based shape descriptor is the LightField descriptor developed by
Chen et al. [16]. A light eld around a 3D object is a 4D function that represents the
radiance at a given 3D point in a given direction. Each 4D light eld of a 3D object is
represented as a collection of 2D images rendered from a 2D array of cameras distributed
uniformly on a sphere. Their method places the light eld cameras on 20 vertices of a
10
regular dodecahedron and uses orthogonal projection to capture 10 dierent silhouettes of
the 3D model. Ten dierent rotations are performed to capture a set of light eld descriptor
to improve robustness for rotation. The 100 rendered images are then described using
Zernike moments and Fourier descriptors to describe the region shape and contour shape,
respectively of the 3D model. The retrieval of the 3D models is performed in stages where
objects that are greatly dissimilar to the query model are rejected early in the process. This
is done by comparing only a subset of the light eld descriptors of the query and of the
database objects in the rst few stages of the retrieval process. The light eld descriptor was
evaluated to be one of the best performing descriptors in the SHREC competition. Ohbuchi
et al. [60] used a similar view-based approach to the light eld descriptor. However, their
method extracted local features from each rendered image using the SIFT algorithm. Wang
et al. [89] improved the space usage eciency of the LFD descriptor by projecting a number
of uniformly sampled random points along six directions to create six images that are then
described using Zernike moments. They also used a two-stage retrieval method to speed
up the retrieval process. Experimental results on the Princeton shape benchmark database
showed that their methods performance was comparable to the LFD descriptor for some
categories. Vajramushti et al. [84] employed a combination of a view-based depth-buer
technique and a feature-based volume descriptor for 3D matching. Their method used the
voxel volume of the objects to reduce the search space for the depth-buer comparisons.
Vranic [87] evaluated a composite descriptor called DESIRE that was formed using depth-
buer images, silhouettes and ray extents of a 3D object. His results showed that DESIRE
outperformed LFD in retrieving objects of some categories.
It is important to note that most of these existing 3D object descriptors were developed
and tested to describe general 3D objects with high shape variability, and not medical
datasets, which usually have small shape variations. As shown in the analysis section, they
usually do not perform very well in describing medical datasets. This thesis proposes a
feature-based approach that uses a learning methodology to identify the interesting salient
points on the object, discussed in Chapter 5, and creates a global spatial map of the salient
points patterns, described in Chapter 6. The proposed descriptor is tested and shown to
work well for general 3D objects and to outperform other methods on craniofacial medical
11
datasets.
2.2 Medical Craniofacial Assessment
This thesis focuses on two medical research studies done in collaboration with medical re-
searchers at Seattle Childrens Hospital Craniofacial Center: deformational plagiocephaly
and 22q11.2 deletion syndrome. This section describes some of the existing medical cranio-
facial assessment techniques used to describe the two craniofacial conditions.
2.2.1 Deformational Plagiocephaly
There are a number of dierent methods to assess and measure the severity of deformational
plagiocephaly [54]. Early measurement techniques for plagiocephaly began by using anthro-
pometric landmark measurements [23, 42, 40]. These landmarks identify certain points on
the head that are commonly found across all head shapes. The points include inner corner
of the eyes, outer corner of the eyes, and point along the sagittal plane as shown in Fig-
ure 2.1. These techniques involve taking physical manual measurements on the patients
head [28, 55], using calipers to record various measurements. In one approach, the clinician
determines the area with the greatest prominence both on the right and left side of the
head and measures diagonally the distance from these sites to the back of the head. The
smaller length is subtracted from the larger resulting in an asymmetry number called the
Transcranial Diameter Dierence (TDD) [28]. Skulls with TDD values greater than 0.6cm
are considered severe and correlate with skulls that scored 2 or more by an expert. Skulls
with TDD values less than 0.6cm are considered mildly deformed. This technique produces
discrete classications that do not truly reect the continuous trend in severity of the shape
deformation. In addition, taking manual measurements is very time consuming and intru-
sive, especially for young infants, and tends not to produce consistent results. Finding the
area to place the calipers is still subjective and aects the overall score.
Another technique for measuring the severity of shape deformations due to plagiocephaly
and brachycephaly involves having clinical experts qualitatively match the shape of the
patients skull to a set of templates. The templates contain images of skulls with varying
degree of shape deformation severity. The technique uses four templates: 1) normal skull
12
Figure 2.1: Anthropometric landmarks on patients head. These images were published by
Kelly et al. [40]
shape [score 0], 2) mild shape deformation [score 1], 3) moderate shape severity [score
2], and 4) severe shape deformation [score 3]. When assigning the severity score for a
patient, a clinical expert matches the patients skull shape to the most similar template
and assigns the score corresponding to that template. This technique is currently used
by practitioners using the Dynamic Orthotic Cranioplasty Band (DOC Band) helmet as a
treatment method [33, 34].
Instead of taking physical measurements directly on a patients head, some techniques
take the measurements from photographs of the patients head. This approach is less intru-
sive for young patients, but it is still time consuming and can be inconsistent as technicians
must manually place landmarks on the photographs. Hutchison et al. [31, 32] developed
a technique called HeadsUp that involves taking the top view digital photograph of infant
heads tted with an elastic head circumference band. The elastic band is equipped with
adjustable color markers to identify landmarks such as ear and nose positions. The result-
ing photograph is then automatically analyzed to obtain quantitative measurements for the
head shape, including cephalic index, head circumference, distance of ear to center of nose,
oblique length and ratio. Their results showed that the cephalic index (CI) and Oblique
Cranial Length Ratio (OCLR) can be used for quantication measurement of shape sever-
ity, as the numbers dier signicantly between cases and control. Although promising, the
Hutchison method requires subjective decisions regarding the placement of the midline and
ear landmarks and the selection of the posterior point of the OCLR lines. In addition, as
the measurements are done in two dimensions, displacement of head volumes cannot really
be assessed. In addition, the placing of the band on an infant can be quite challenging.
13
Zonenshayn et al. [98] also employed a headband with two adjustable points (nasion and
inion of the head) and used photographs of the headband shape to calculate the Cranial
Index of Symmetry (CIS). These methods require consistency in setting up the band and
placing the markers, which may lead to non-reproducible results. In addition, this is a 2D
technique, but plagiocephaly and brachycephaly are three-dimensional deformations.
Vlimmeren et al. [85] introduced a new method called plagiocephalometry to assess the
asymmetry of the skull. The method uses a thermoplastic material to mold the outline of
a patients skull. The ring is positioned around the head at the widest transverse circum-
ference. Three landmarks for the ears and nose are marked on the ring. The ring is then
copied onto a paper and transparent sheet made to keep track of follow-up progress.
Measurement techniques that use full 3D head shape information can provide more
detailed and accurate shape information. Plank et al.[67] used a noninvasive laser shape
digitizer to obtain the 3D surface of the head. This system provides more accurate shape
information, but still requires the use of markers to dene an anatomical reference plane for
further quadrant placement and volume calculations. Lanche et al. [45, 61] used a stereo-
camera system to obtain 3D model of the head and developed a statistical model of the
asymmetry to quantify and localize the asymmetry at each patients head. The model was
obtained by rst computing the asymmetry of a patients head by deforming a symmetric
ideal head template to the patients head to obtain point correspondence between the left
and right sides of the head. Principal Component Analysis was then performed on the
vector of the asymmetry values of all patients head to obtain a statistical model.
2.2.2 22q11.2 Deletion Syndrome
Similar to the assessment of deformational plagiocephaly, the assessment of 22q11.2 deletion
syndrome has commonly been through physical examination combined with craniofacial
anthropometric measurements. There have been very few automated methods for analyzing
22q11.2DS. Boehringer et al. [11] used Gabor wavelet to transform 2D photographs of
individuals with 10 dierent facial dysmorphic syndromes. Their method then applied
principal component analysis to describe and classify the dataset. Their method required
14
landmark placement on the face.
Hammond et al. [29] used the Dense Surface Model method. Landmarks were manually
placed on each of the 3D surface mesh, and used to align the faces to a mean face. Principal
component analysis was then used to describe the datasets, and the coecients were used
to classify dataset. Neither of these two methods are fully automatic as they require manual
landmark placement.
One of the method proposed in this thesis to represent craniofacial dysmorphologies uses
3D surface mesh models of heads without the need for markers or templates. The method
uses the surface normal vectors of all the 3D points on the head and constructs a global 2D
histogram of the azimuth-elevation angles of the surface normal vectors of the 3D points
on the face. The proposed method is general enough to characterize dierent craniofacial
disorders including deformational plagiocephaly and its variations and 22q11.2DS and its
dierent manifestation.
15
Chapter 3
DATASETS
This chapter will describe the four datasets that were obtained to develop and test
the dierent shape analysis methodologies developed for this thesis. Each dataset has
dierent characteristics that help explore the dierent properties of the methodologies. The
22q11.2DS dataset, introduced in Section 3.1, contains 3D face models of individuals aected
and unaected by 22q11.2 deletion syndrome. The Deformational Plagiocephaly dataset,
discussed in Section 3.2, contains 3D head models of individuals aected and unaected by
deformational plagiocephaly. The Heads dataset, discussed in Section 3.3, contains head
shapes of dierent classes of animals, including humans. These three datasets help explore
the performance of the methodology on data of similar overall shape with subtle distinctions
- the type of data for which the methodology was designed and developed. Section 3.4
introduces the SHREC 2008 classication benchmark dataset, which was obtained to further
test the performance of the methodology on general 3D object classication, where objects
in the dataset are not very similar.
3.1 22q11.2 Deletion Syndrome(22q11.2DS) Dataset
The 3D face models in this dataset were collected at the Craniofacial Center of Seattle
Childrens Hospital using the 3dMD imaging system [1]. The 3dMD imaging system uses
four camera stands, each containing three cameras. Stereo analysis yields twelve range maps
that are combined using 3dMD proprietary software to yield a 3D mesh of an individuals
head and a texture map of the face. The methodologies developed for this thesis use only
the 3D meshes, due to human subject regulations.
An automated system developed by Wilamowska [92, 74] to align the pose of each mesh
was employed. The alignment system uses symmetry to align the yaw and roll angles and
a height dierential to align the pitch angle. Although faces are not truly symmetrical,
16
Figure 3.1: Example of 3D face mesh data of children with 22q11.2 deletion syndrome.
the pose alignment procedure can be cast as nding the angular rotations of yaw and roll
that minimizes the dierence between the left and right sides of the face. The pitch of the
head was aligned by minimizing the dierence between the height of the chin and the height
of the forehead. In some cases, manual adjustments were necessary to pose normalize the
faces. Figure 3.1 shows two examples of aected individuals in the dataset.
The dataset contained 3D meshes for 189 individuals. Metadata for each 3D mesh
consisted of the age, gender, and self-described ethnicity of the individual plus a label of
aected or unaected. The dataset consisted of 53 aected individuals and 136 control indi-
viduals. The groundtruth for the individuals label for 22q11.2DS was determined through
laboratory conrmation.
A balanced dataset was created from the original dataset. The balanced dataset con-
sisted of 86 individuals: 43 aected and 43 unaected with 22q11.2 deletion syndrome. Each
of the 86 individuals were assessed by three craniofacial experts. Frontal and prole images
of the individuals were de-identied and viewed in random order to blind raters. The ex-
perts assigned discrete scores to a total of 18 facial features that are known to characterize
22q11.2DS (score 0 = none, 1 = moderate, 2 = severe). Nine of the facial features (midface
hypoplasia, prominent nasal root, bulbous nasal tip, small nasal alae, tubular nose, small
mouth, open mouth, downturned mouth, and retrusive chin) are further analyzed in Chap-
ter 8. The experts survey showed that all features of the nose were found to have a higher
percentage of moderate and severe expression in 22q11.2DS aected individuals. Midface
hypoplasia was observed to be moderately present in aected individuals [91].
17
Figure 3.2: Tops of heads of children with deformational plagiocephaly.
3.2 Deformational Plagiocephaly Dataset
The dataset for analyzing the shape dysmorphology due to deformational plagiocephaly was
obtained through a similar data acquisition pipeline as the 22q11.2DS dataset. The resulting
3D meshes are also automatically pose-normalized using the same alignment system used
to normalize the 22q11.2DS dataset [92, 74]. Figure 3.2 shows two examples of individuals
diagnosed with deformational plagiocephaly.
The original dataset consisted of 254 3D head meshes consisting of 100 controls and
154 cases. Each mesh in the original dataset was assessed by two craniofacial experts who
assigned discrete severity scores based on the degree of the deformation severity of dierent
head areas including back of the head, forehead asymmetry, ear asymmetry, and whether
the attening at the back of the head was symmetric (case of brachycephaly). In addition,
each expert also noted an overall severity score. The discrete scores were either category 0
for normal, 1 for mild, 2 for moderate and 3 for severe. The laterality of the atness was
indicated using negative scores to represent left sided deformation and positive scores to
represent right sided deformation.
The work in this thesis focuses on the attening at the back of the head noted as
posterior plagiocephaly. Since there does not exist any gold standard for assessing the
severity of posterior plagiocephaly, the experts ratings were considered the gold standard
in evaluating the dierent severity scores developed. The inter-rater agreement between
the two experts was only 65%. As a result, participants were excluded if (1) the two
18
experts assigned discrepant posterior attening scores, or (2) the classication based on
expert ratings diered from the clinical classication (case or control) assigned at the time
of enrollment. The nal dataset used to investigate posterior plagiocephaly consisted of 140
infants including 50 controls (by denition in category 0 by expert rating) and 90 cases: 46
in category 1 or -1, 35 in category 2 or -2, and 9 in category 3 or -3.
3.3 Heads Dataset
For the Heads dataset, the digitized 3D objects were obtained by scanning hand-made clay
toys using a Roldand-LPX250 laser scanner with a maximal scanning resolution of 0.008
inches for plane scanning mode [70]. Raw data from the scanner consisted of 3D point clouds
that were further processed to obtain smooth and uniformly sampled triangular meshes of
0.9-1mm resolution. To increase the number of objects for training and testing, new objects
were created by deforming the original scanned 3D models in a controlled fashion using 3D
Studio Max software [8]. Global deformations of the models were generated using morphing
operators such as tapering, twisting, bending, stretching and squeezing. The parameters for
each of the operators were randomly chosen from ranges that were determined empirically.
Each deformed model was obtained by applying at least ve dierent morphing operators
in a random sequence.
Fifteen objects representing seven dierent classes were scanned. The seven classes are:
cat head, dog head, human head, rabbit head, horse head, tiger head and bear head. Each
of the fteen original objects were randomly morphed to increase the size of the dataset.
A total of 250 morphed models per original object were generated. Points on the morphed
model are in full correspondence with the original models from which they were constructed.
Figure 3.3 shows examples of objects from each of the seven classes, while Figure 3.4 shows
example of morphs from the horse class.
3.4 SHREC Dataset
The SHREC dataset was selected from the SHREC 2008 Competition Classication of
Watertight Models track [27]. The models in the track were chosen by the organizer to
ensure a high level of shape variability to make the track more challenging. The models
19
cat dog human rabbit horse tiger bear
Figure 3.3: Example of objects in the Heads dataset.
Figure 3.4: Example morphs from the horse class. Morphs were generated by stretching,
twisting, or squeezing the original object with dierent parameters.
in the dataset were manually classied using three dierent levels of categorization. At
the coarse level of classication, the objects were classied according to both their shapes
and semantic criteria. At the intermediate level, the classes were subdivided according to
functionality and shape. At the ne level, the classes were further partitioned based on the
object shape. For example, at the coarse level some objects were classied into the furniture
class. At the intermediate level, these same objects were further divided into tables, seats
and beds. At the ne level, the objects were classied into chairs, armchairs, stools, sofa and
benches. The intermediate level of classication was chosen for the experiments as the ne
level had too few objects per class, while the coarse level had too many objects that were
dissimilar in shape grouped into the same class. The dataset consists of 425 pre-classied
objects. Figure 3.5 shows examples of objects in the benchmark dataset.
The four datasets were used to test the classication and retrieval methodologies de-
veloped in this thesis. The domain-independent base framework of the methodologies is
described next in Chapter 4.
20
human animal knots airplane bottle chess teapot
Figure 3.5: Example of objects in the SHREC 2008 Classication dataset. It can be seen
that the intra-class variability in this dataset is quite high as objects in the same class have
quite dierent shapes.
21
Chapter 4
BASE FRAMEWORK
The methodologies developed in this thesis are used for single 3D object classication.
They do not handle objects in cluttered 3D scenes nor occlusion. A surface mesh, which
represents a 3D object, consists of points {p
i
} on the objects surface and information
regarding the connectivity of the points. The base framework of the methodology starts
by rescaling the objects to t in a xed-size bounding box. The framework then executes
two phases: low-level feature extraction (Section 4.1) and mid-level feature aggregation
(Section 4.2). The low-level feature extraction starts by applying a low-level operator to
every point on the surface mesh. After the rst phase, every point p
i
on the surface mesh
will have either a single low-level feature value or a small set of low-level feature values,
depending on the operator used. The second phase performs mid-level feature aggregation
and computes a vector of values for a given neighborhood of every point p
i
on the surface
mesh. The feature aggregation results of the base framework are then used to construct the
dierent 3D object representations [7, 6].
4.1 Low-level Feature Extraction
The low-level operators extract local properties of the surface points by computing a feature
value v
i
for every point p
i
on the mesh surface. All low-level feature values are convolved
with a Gaussian lter to reduce noise eects. Three low-level operators were implemented
to test the methodologys performance: absolute Gaussian curvature, Besl-Jain curvature
categorization, and azimuth-elevation of surface normal vectors. Figure 4.1(a) shows an
example of the absolute Gaussian curvature values of a 3D model. Figure 4.1(b) shows
the results of applying a Gaussian lter over the low-level Gaussian curvature values, while
Figure 4.1(c) shows the results of applying the Gaussian lter over the low-level Besl-Jain
curvature values.
22
(a) (b) (c)
Figure 4.1: (a) Absolute Gaussian curvature low-level feature value, (b) Smoothed Absolute
Gaussian curvature values after convolution with the Gaussian lter, (c) Smoothed Besl-
Jain curvature values after convolution. Higher values are represented by cool (blue) colors,
while lower values are represented by warm (red) colors.
4.1.1 Absolute Gaussian Curvature
The absolute Gaussian curvature low-level operator computes the Gaussian curvature esti-
mation K for every point p on the surface mesh:
K(p) = 2

fF(p)
interior angle
f
where F is the list of all the neighboring facets of point p and the interior angle is the
angle of the facets meeting at point p. This calculation is similar to calculating the angle
deciency at point p. The contribution of each facet is weighted by the area of the facet
divided by the number of points that form the facet. The operator then takes the absolute
value of the Gaussian curvature as the nal low-level feature value for each point.
4.1.2 Besl-Jain Curvature
Besl and Jain [10] suggested surface characterization of a point p using only the sign of the
mean curvature H and Gaussian curvature K. These surface characterizations result in a
scalar surface feature for each point that is invariant to rotation, translation and changes
in parametrization. The eight dierent categories are: (1) peak surface, (2) ridge surface,
(3) saddle ridge surface, (4) plane surface, (5) minimal surface, (6) saddle valley, (7) valley
23
surface, and (8) cupped surface. Table 4.1 lists the dierent surface categories with their
respective curvature signs.
Table 4.1: Besl-Jain surface characterization.
Label Category H K
1 Peak surface H < 0 K > 0
2 Ridge surface H < 0 K = 0
3 Saddle ridge surface H < 0 K < 0
4 Plane surface H = 0 K = 0
5 Minimal surface H = 0 K < 0
6 Saddle valley H > 0 K < 0
7 Valley surface H > 0 K = 0
8 Cupped surface H > 0 K > 0
4.1.3 Azimuth-Elevation Angles of Surface Normal Vectors
Given the surface normal vector n(n
x
, n
y
, n
z
) of a 3D point, the azimuth angle of n is
dened as the angle between the positive xz plane and the projection of n to the x plane. The
elevation angle of n is dened as the angle between the x plane and vector n (Figure 4.2).
= arctan(
n
z
n
x
) = arctan

n
y

(n
2
x
+ n
2
z
)

where = [, ] and = [

2
,

2
]. The azimuth-elevation low-level operator computes the
azimuth and elevation value for each point on the 3D surface.
4.2 Mid-level Feature Aggregation
The second phase of the base framework performs mid-level feature aggregation and com-
putes a number of values for a given neighborhood of each point p
i
on the surface mesh.
In this thesis, local histograms were used to aggregate the low-level feature values of each
point. The histograms are computed by taking a neighborhood around each point and
accumulating the low-level features in that neighborhood. The size of the neighborhood
is determined by multiplying a constant c, 0 < c < 1, with the diagonal of the objects
bounding box. This ensures that the size of the neighborhood is scaled according to the
24
Figure 4.2: Azimuth and elevation angle of a 3D surface normal vector.
Figure 4.3: (a) 1D histogram aggregating the absolute Gaussian curvature values from
points on the nose of a human head, (b) 2D histogram aggregating the azimuth-elevation
vector values at a point on the back of the head.
object size, and that the results are comparable across dierent objects. The value of c was
determined empirically; for most experiments a value of c = 0.05 was used. Aggregating
the single-valued low-level feature values results in a 1D histogram with d histogram bins
for every point on the surface mesh. Aggregating the pair-valued low-level feature values
(such as the azimuth-elevation angle feature values) results in a 2D histogram constructed
of a b bins, where a and b are the two dierent dimension sizes. Figure 4.3(a) shows an
example of a 1D histogram aggregating the absolute Gaussian curvature low-level feature
values from points on the nose of a 3D head object. Figure 4.3(b) shows an example of the
2D histogram aggregating the azimuth-elevation low-level feature values on a head.
25
Once the feature extraction and aggregation are completed, a learning phase is used
to learn the characteristics of salient points for classication and retrieval as described in
Chapter 5.
26
Chapter 5
LEARNING SALIENT POINTS
Given the base frameworks ability to compute low-level feature values at each point of
a 3D mesh and to aggregate these features in neighborhoods about the point, this chap-
ter explores the use of this framework to create a representation for 3D objects. Before
constructing the 3D object signature, salient or interesting points are identied on the 3D
object and the characteristics of these points are used when constructing the signatures.
The identied salient points are application dependent. The framework and methodology
was developed to be specically applicable to classication of craniofacial disorders, such
as 22q11.2 deletion syndrome, discussed in Section 5.1, and deformational plagiocephaly,
described in Section 5.2, but also appropriate for general use in 3D shape classication, as
shown in Section 5.3.
Preliminary saliency detection using existing methods [46, 38] were not satisfactory. In
some cases they were not consistent and repeatable for objects within the same class. As
a result, to nd salient points on a 3D object, a learning approach was selected. A salient
point classier is trained on a set of marked training points on the 3D objects provided by
experts for a particular application. Histograms of low-level features of the training points
obtained using the base framework (Chapter 4) are then used to train the classier. For
a particular application, the classier will learn the characteristics of the salient points on
the surfaces of the 3D objects from that domain. Sets of detected points will lead to salient
regions in the signatures.
5.1 Learning Salient Points for 22q11.2 Deletion Syndrome
Traditionally, studies of individuals with craniofacial disorders such as 22q11.2 deletion syn-
drome have been performed through in-person clinical observation coupled with craniofacial
anthropometric measurements derived from anatomic landmarks [24]. These landmarks are
27
Figure 5.1: Craniofacial anthropometric landmarks.
located either visually by clinicians or through palpation of the skull. Figure 5.1 shows the
landmark points that are commonly used for craniofacial measurements.
The salient point classier was trained on a subset of the craniofacial anthropometric
landmarks marked on 3D head objects. This was done so that these craniofacial landmarks
would be included in the set of interesting or salient points for classication of the cranio-
facial disorders. The particular subset of landmarks was selected to be well-dened points
that both experts and non-experts could easily identify. The training set consisted of human
heads selected from the Heads database. Figure 5.2 shows an example of manually marked
salient points on the training data. Histograms of low-level features obtained using the base
framework were used to train a Support Vector Machine (SVM) [72, 86] classier to learn
the salient points on the 3D surface mesh. WEKAs implementation of SVM was used for
all experiments [93]. A training set, consisting of 75 morphs of 5 human heads was used to
train the classier to learn the characteristics of the salient points for faces in terms of the
histograms of their low-level features.
Although the salient training points were selected only to be commonly used craniofa-
cial landmark points, empirical studies determined that the classier actually nds salient
regions with a combination of high curvature and low entropy values. This result can be
observed in the dierent histograms of salient and non-salient points in Figure 5.3. In the
gure, the salient point histograms have mainly low bin counts in the bins corresponding
to low curvature values and a high bin count in the last (highest) curvature bin. The
non-salient point histograms have mainly medium to high bin counts in the low curvature
28
Figure 5.2: Example of manually marked salient (blue color) and non-salient (red color)
points on a human head model. The salient points include corners of the eyes, tip of the
nose, corners of the nose, corners of the mouth, and chin.
bins and in some cases a high bin count in the last bin. The entropy of the salient point
histograms also tends to be lower than the entropy of the non-salient point histograms. The
classier approach avoided the use of brittle thresholds.
Figure 5.4 shows results of the salient points predicted on two faces in the 22q11.2DS
database, which include not just the manually marked points but other points with the same
characteristics. The salient points are colored according to the assigned classier condence
score. Non-salient points are colored in red, while salient points are colored in dierent
shades of blue with dark blue having the highest prediction score.
5.2 Learning Salient Points for Deformational Plagiocephaly
A similar learning-based approach was used to nd salient points for 3D heads with de-
formational plagiocephaly. The salient point classier for deformational plagiocephaly was
trained on a set of points marked on the at areas at the back of the head of individuals with
deformational plagiocephaly. The training salient points consisted of 10 marked points on
the at areas of 10 heads with deformational plagiocephaly, while the non-salient training
points were selected from 10 heads without deformational plagiocephaly. Histograms of the
azimuth-elevation low-level features obtained using the base framework were used to train a
Support Vector Machine (SVM) classier to learn the salient points on the 3D heads. After
29
E = 0.348 E=2.435 E=2.79
Salient point histograms
E=3.95 E=3.877 E=4.185
Non-salient point histograms
Figure 5.3: Example histograms of salient and non-salient points. The salient point his-
tograms have a high value in the last bin illustrating a high curvature in the region, while
low values in the remaining bins in the histogram. The non-salient point histograms have
more varied values in the curvature histogram. In addition, the entropy E of the salient
point histogram is lower than the non-salient point histogram (listed under each histogram).
Figure 5.4: Salient point prediction for two faces in the 22q11.2DS dataset. Non-salient
points are colored in red, while salient points are colored in dierent shades ranging from
green to blue, depending on the classier condence score assigned to the point. A threshold
(T = 0.95) was applied to include only salient points with high condence scores.
30
training was complete, the classier was able to label each point on a 3D head as either
salient or non-salient and provide a condence score for each decision. The same threshold,
T = 0.95, was applied to the condence scores for the salient points.
5.3 Learning Salient Points for General 3D Objects
The salient point classier for general 3D object classication was trained on selected objects
from the Heads database using the craniofacial landmark points that were used in the
22q11.2DS application. A small training set consisting of 25 morphs of the cat head model,
25 morphs of the dog head model, and 50 morphs of human head models was used to train
the classier to learn the characteristics of salient points for general 3D object classication.
Histograms of low-level features obtained using the base framework were used to train a
Support Vector Machine (SVM) classier to learn the salient points on general 3D objects.
A threshold T = 0.95 was also applied to the condence scores for the classier salient
points. Figure 5.5 shows results of the salient points predicted on instances of the cat,
dog and human head class in the Heads, which include, as previously mentioned, not just
the manually marked points, but other points with the same characteristics. The salient
points are colored according to the assigned classier condence score. Non-salient points
are colored in red, while salient points are colored in dierent shades of blue with dark blue
having the highest prediction score. While the classier was only trained on cat heads, dog
heads, and human heads, it does a good job of nding salient points on the other classes
of heads, and the 3D patterns produced are repeatable across objects of the same class.
Figure 5.6 shows the predicted salient points on new object classes that were not included
in the training phase.
The trained classier was also tested on the SHREC 2008 Classication dataset. Exper-
imental results show the labeled salient points were quite satisfactory. Figure 5.7 shows the
salient points predicted on a number of objects from the SHREC 2008 database. Note that
on this database, which has a lot of intra-class shape variance, the salient point patterns
are not consistent across all members of each class.
After learning and identifying the application-dependent salient points for the 3D ob-
jects in the dataset, the signature for each 3D object is constructed as described next in
31
(a) (b) (c)
Figure 5.5: Salient point prediction for (a) cat head class, (b) dog head class, and (c) human
head class. Non-salient points are colored in red, while salient points are colored in dierent
shades ranging from green to blue, depending on the classier condence score assigned to
the point. A threshold (T = 0.95) was applied to include only salient points with high
condence scores.
(a) (b) (c)
Figure 5.6: Salient point prediction for (a) rabbit head class, (b) horse head class, and (c)
leopard head class from the Heads database. Even though all three classes were not included
in the training, the training model was able to predict salient points across the classes.
32
(a) (b) (c) (d)
Figure 5.7: Salient point prediction for (a) human class, (b) bird class, (c) human hand
class, and (d) bottle class from the SHREC 2008 database. Note that for classes that have
a lot of intra-class shape variance the salient point patterns are not consistent across all
members of those classes as seen in column (a).
33
Chapter 6.
34
Chapter 6
2D LONGITUDE-LATITUDE SALIENT MAP SIGNATURE
Most 3D object analysis methods require the use of a 3D descriptor or signature to
describe the shape and properties of the 3D objects. This chapter describes the construc-
tion of the 3D object signature using the salient point patterns, obtained using the learning
approach described in Chapter 5, mapped onto a 2D plane via a longitude-latitude transfor-
mation, described in Section 6.1. Classication of 3D objects is then performed by training
a classier using the 2D salient maps of the objects. Results of classication using the 2D
salient map signature are given in Section 6.2. Retrieval of 3D objects is performed by
calculating the distances between the salient signature of the query object and the salient
map signatures of all objects in the database. Results of retrieval using the 2D salient map
signature are given in Section 6.3. Section 6.4 investigates how the salient patterns are used
to obtain 2D salient views for 3D object retrieval.
6.1 Salient Point Pattern Projection
Before mapping the salient point patterns obtained in Chapter 5 onto the 2D plane, the
salient points are assigned a label according to the classier condence score assigned to
the point. The classier condence score range is then discretized into a number of bins.
For the experiments, at condence level 0.95 and above, the condence score range was
discretized into 5 bins. Each salient point on the 3D mesh is then assigned a label based on
the bin into which its condence score falls.
To obtain the 2D longitude-latitude map signature for an object, the longitude and
latitude positions of all the 3D points on the objects surface are calculated. Given any
point p
i
(p
ix
, p
iy
, p
iz
), the longitude position
i
and latitude position
i
of point p
i
are
35
calculated as follows:

i
= arctan(
p
iz
p
ix
)
i
= arctan(
p
iy

(p
2
ix
+ p
2
iz
)
)
A 2D map of the longitude and latitude positions of all the points on the objects surface
is created by discretizing the longitude and latitude values of the points into a xed number
of pixels. A pixel is labeled with the salient point label of the points that fall into that
pixel. If more than one label is mapped to a pixel, the label with the highest count is used
to label the pixel. Figure 6.1 shows the salient point patterns for the cat head, dog head,
and human head model in the Heads database and their corresponding 2D map signatures.
Figure 6.2 shows how dierent objects that belong to the same class will have similar 2D
longitude-latitude signature maps.
(a) (b) (c)
Figure 6.1: Salient point patterns on 3D objects of Figure 5.4 and their corresponding 2D
longitude-latitude map signatures.
To reduce noise in the 2D longitude-latitude map signature, a wavelet transformation
was applied to the 2D map signatures. In the experiments, the 2D longitude-latitude map
signatures were treated as 2D images and decomposed using image-based Haar wavelet
36
human head
rabbit head
horse head
wildcat head
Figure 6.2: Objects that are similar and belong to the same class will have similar 2D
longitude-latitude signature maps.
function. The wavelet function decomposes the 2D image into approximation and detail
coecients. The approximation and detail coecients at the second level were collected and
concatenated into a new feature vector with dimension d = 13 13 4. This nal feature
vector became the descriptor for each object in the database and was used for classication
and retrieval. For most experiments, the noise reduction step was not found to improve the
classication and retrieval performances except for the SHREC dataset (Section 6.2.4).
6.2 Classication using 2D Map Signature
By creating a signature for each 3D objects, it is now possible to perform classication of
3D objects in a given database. Several classication experiments were performed on each
of the acquired datasets described in Chapter 3.
37
6.2.1 Classication of 22q11.2DS Dataset
The goal of this experiment was to classify each individual in the dataset as either aected
or unaected by 22q11.2DS and to measure the classication accuracy. The salient points
classier was trained on a subset of the craniofacial anthropometric landmarks marked
on 3D human head models as explained in Chapter 5. Table 6.1 shows the classication
performance with two dierent classiers: Adaboost and SVM. Evaluation was done using
the following measures: classication accuracy, precision and recall rates, F-measure, true
positive, and false positive rates. The classication accuracy for the higher scoring SVM
classier is 86.7%, which is higher than that obtained from a study of three human experts
whose mean accuracy was 72.5% [92].
Table 6.1: Classication performance for 22q11.2DS.
Classier Accuracy Prec Recall F-Measure TP Rate FP Rate
Adaboost 0.804 0.795 0.804 0.791 0.804 0.387
SVM 0.867 0.866 0.868 0.861 0.868 0.27
The classication accuracy of the map signature was compared to some of the state-of-
the-art and best performing 3D object descriptors in the literature. The following existing
descriptors were used for comparison: Light Field Descriptor (LFD) [16], ray-based spherical
harmonics (SPH) [39], shape distribution of distance between random points (D2) [62], and
absolute angle distance histogram (AAD) [59]. The Light Field Descriptor (LFD) is a view-
based descriptor that extracts features from 100 2D silhouette image views and measures
the distance between two 3D objects by nding the best correspondence between the set
of 3D views for the two objects. The Spherical Harmonics method calculates the maximal
extent of a shape across all rays from the origin and uses spherical harmonics to represent
the function. The shape function D2 represents 3D objects by calculating the global shape
distribution of distances between two random points, while the AAD method enhances
the D2 shape function by measuring not only the distance between two random points,
but also the mutual orientation of the surfaces on which the pair of points is located.
38
Table 6.2 provides an overall comparison of the 2D map signature with the four existing
shape descriptors. Results in Table 6.3 show that the 2D salient map signature achieves a
higher classication accuracy of 22q11.2 deletion syndrome than any of these state-of-the-art
methods.
Table 6.2: Overall comparison of the various shape descriptors.
LFD SPH D2 AAD Salient map
Type Global Global Global Global Global
view-based spatial map feat.dist feat. dist. local feat.
Eciency Medium Fast Fast Fast Medium
Pose-normalization No No Yes Yes Yes
Discriminative power
for large shape di. High High Medium Medium Medium
Discriminative power
for subtle shape di Medium Medium Low Low High
Applications General 3D General 3D General 3D General 3D General 3D
Medical 3D
Table 6.3: Comparison of classication accuracy for 22q11.2DS.
Dataset Salient map LFD SPH D2 AAD
F189 0.867 0.741 0.746 0.619 0.73
Classication of 22q11.2DS will lead to better understanding of the connection between
the 22q11.2 deletion syndrome genotype and the phenotype of this syndrome. Being able
to connect facial features to the genetic code will allow for understanding the etiology of
craniofacial malformation and pathogenesis of 22q11.2DS, which, in turn, will be informative
of the genetic control needed for normal craniofacial development.
39
6.2.2 Classication of Deformational Plagiocephaly Dataset
The goal of this experiment was to classify each individual as either control or case aected
by the plagiocephaly condition and to measure the classication accuracy. The salient
points for the map signature were obtained by using the salient at point classier as
explained in Chapter 5. The classication experiments were performed on the Deformational
Plagiocephaly Dataset introduced in Chapter 3.
Table 6.4 shows the classication accuracy of the method on the full 254 individual
dataset. The groundtruth for the classication was the referral doctors originally assigned
patient status: case or control. Table 6.5 shows the classication accuracy of the method
on the trimmed 140-individual dataset in which the experts agreed. The Adaboost classier
obtains a 80.3% classication accuracy on the full dataset and an improved 87.9% accuracy
on the trimmed dataset.
Table 6.4: Classication performance for plagiocephaly using the full 254 individuals
dataset.
Classier Accuracy Prec Recall F-Measure TP Rate FP Rate
Adaboost 0.803 0.805 0.803 0.804 0.803 0.208
SVM 0.787 0.787 0.787 0.787 0.787 0.233
Table 6.5: Classication performance for plagiocephaly using the trimmed 140 individuals
dataset.
Classier Accuracy Prec Recall F-Measure TP Rate FP Rate
Adaboost 0.879 0.878 0.879 0.878 0.879 0.156
SVM 0.85 0.849 0.85 0.849 0.85 0.19
The classication accuracy of the methodology for this application was also compared
to existing state-of-the-art descriptors. Table 6.6 shows that the 2D salient map signature
achieves higher classication accuracy for deformational plagiocephaly than other existing
methods, including the LFD descriptor and others discussed in Chapter 2.
40
Table 6.6: Comparison of classication accuracy for plagiocephaly.
Dataset Salient 2D map LFD SPH D2 AAD
Full 254 dataset 0.803 0.72 0.673 0.650 0.685
Trimmed 140 dataset 0.879 0.714 0.743 0.779 0.721
Classication of this condition can be incorporated into epidemiologic research on the
prevalence and long-term outcome of deformational plagiocephaly, which may eventually
lead to improved clinical care for infants with deformational plagiocephaly.
6.2.3 Classication of Heads Dataset
The Heads database can be thought of as a rst step toward testing the 2D salient map
signature on more general shapes still in the craniofacial category, but for multiple dierent
animals where face shapes can be quite dierent.
In the rst set of experiments, all objects in the Heads database were pose-normalized by
rotating the heads to face the same orientation, as was the case for the medical craniofacial
datasets. Classication of the 3D objects in the database was performed by training a SVM
classier on the salient point patterns of each class using the 2D longitude-latitude map
signature of the objects in the class. The classier was trained using the signatures of 25
objects from each class for all seven classes in the database and tested with a separate test
set consisting of 50 objects per class for each of the seven classes. The classier achieved
100% classication accuracy in classifying all the pose-normalized objects in the database.
Since 3D objects may be encountered in the world at any orientation, rotation-invariant
classication is desirable. The second set of experiments explored rotation invariance. To
achieve rotation invariance for classication, the classier was trained with a number of
rotated versions of the 2D map longitude-latitude signature for each training object. The
rst experiment in this set tested the classication accuracy by training a classier with
rotated versions of the training data signatures in 45 degree increments for all three axes.
This resulted in 8 8 8 rotated signatures for each object in the database. The classier
41
was then tested on new objects in the same classes. Rotated versions of the testing data
signatures were generated using the same rotation degree increments as in the training.
The classier again achieved 100% classication accuracy when classifying objects that
were rotated in this way.
In the second experiment in this set, the classication method was tested using 15 new
testing instances per class that were rotated randomly. For example, a rotation of (250, 4,
187) was one of the random rotations that did not match any of the training rotations. The
classier was still able to achieve 100% classication accuracy.
The third set of experiments was to explore the degradation in the classication accuracy
by varying the training rotation angle increment when generating the signatures for the
training data. Figure 6.3 shows the degradation in the classication accuracy as the training
angle increment increases and the number of rotated training signature instances decreases.
The graph shows that the classication accuracy steadily decreases as the number of rotated
training signatures decreases. In addition, there is a big dip in the classication performance
when the training signatures are generated at 90 degree angle increments. This is because
the signatures produced at 90 degree increments are not representative of angles in between
the multiples of 90 degrees. Note that the classier is still able to achieve 91% classication
accuracy with training signatures generated at 100 degree increments with only 3 3
3 = 27 rotated training signatures per training object, which is much better than the
8 8 8 = 512 signatures that were originally used.
6.2.4 Classication of SHREC Dataset
The SHREC dataset was used to challenge the 2D salient map signature on data unlike
those it was designed for and to compare it to other methodologies that were designed for
more general object models and for many dierent classes. For this dataset, rotational
invariance was a requirement. To achieve this, two dierent pose-normalization methods
were tested. The rst method, 4ContPCA, is an extension to the commonly used Principal
Component Analysis (PCA) method that aligns 3D models to a canonical coordinate system.
4ContPCA extends the PCA method by taking the mesh resolution and sizes of the triangles
42
40
50
60
70
80
90
100
110
40 50 60 70 80 90 100 110
Training Rotation Angle Increment
C
l
a
s
s
i
f
i
c
a
t
i
o
n

A
c
c
u
r
a
c
y
Figure 6.3: Classication accuracy vs training rotation angle increment.
into consideration when aligning the models [88]. The second approach to achieve rotation-
invariant classication, IncRot, was to rotate each 3D object at 100 degree increments for
all three axes and generated the 2D longitude-latitude map signature of the object at each
rotated pose, as was done for the Heads dataset. This resulted in a total of 3 3 3 = 27
map signatures for each object in the database. In both the 4ContPCA approach and
the IncRot approach, the distance between two objects in the database was the minimum
distance between the various rotated map signatures of the two objects. Since the IncRot
method had better classication performance than the 4ContPCA, only classication results
using the IncRot pose-normalization method is reported. The map signatures were further
transformed using the wavelet computation. The wavelet coecient feature vectors were
used for classication.
As with the medical datasets, results are compared against the Light Field Descriptor
(LFD) [16], the ray-based spherical harmonics (SPH) [39], the shape distribution of distance
43
between random points (D2) [62], and the absolute angle distance histogram (AAD) [59].
Since the number of objects in each class in the dataset varied greatly, creating an un-
balanced dataset, machine learning algorithms such as SVM and Adaboost could not be
used to classify the objects in the dataset. As a result, pairwise distance matrices between
every object in the dataset are computed. Classication performance is measured using
four dierent commonly used statistics: (1) nearest-neighbor classication accuracy, (2)
rst-tier classication accuracy, (3) second-tier classication accuracy, and (4) F-measure.
The rst three statistics indicate the percentage of the top K nearest neighbor of a given
object to be classied. The nearest neighbor statistics provide an indication of how well a
nearest neighbor classier performs where K = 1. The rst-tier and second-tier statistics,
indicate the percentage of top K matches that belong to the same class as a given object
where K = C 1 and K = 2(C 1), respectively, and C is the class size of the classi-
ed object. The F-measure is a composite measure of precision (P) and recall (R) where
F = 2 P R/(P +R). Table 6.7 shows the comparison results. For this dataset, the LFD
method, which was developed to dierentiate between very dierent shape classes, rather
than subtle distinctions in the shape of a common object, was the best performer.
Table 6.7: Comparison of classication accuracy for SHREC 2008 dataset.
Method NN 1st tier 2nd tier F-Measure
AbsGaussCurv 0.569 0.285 0.375 0.246
BeslJain 0.516 0.278 0.379 0.244
LFD 0.759 0.437 0.549 0.365
SPH 0.715 0.365 0.483 0.321
D2 0.502 0.278 0.382 0.238
AAD 0.549 0.266 0.388 0.252
44
6.2.5 Classication Timing Studies
Timing experiments were performed to investigate the runtime performance of the 2D salient
map signature. In this experiment, the runtime speed of the 2D salient map signature is
compared to the existing Light Field Descriptor (LFD) method on all four datasets. These
experiments were performed on a PC running Windows XP. The runtime performance of
the Light Field Descriptor can be divided into two main phases: feature extraction, and
feature comparison and classication. The runtime performance of the salient map signature
methodology can be divided into ve main phases: (1) Low-level feature extraction, (2)
Mid-level feature aggregation, (3) Salient point prediction, (4) Signature generation, and
(5) classication. Table 6.8 compares the runtime for each of the phases of the Light Field
Descriptor to the 2D salient map signature. The bottleneck of the 2D salient map signature
is in the salient point prediction phase where the classier labels each point on the mesh as
either salient or non-salient. Depending on the salient point learned model and the number
of points on the objects in the dataset, this phase may take a longer time, however, accuracy
of results, not speed is most important in the medical applications.
Table 6.8: Comparison of the timing of each phase of the Light Field Descriptor and the
2D salient map signature on the four datasets. m refers to minutes. The number of objects
in each dataset is listed in brackets.
Method Phase 22q11.2DS Plagiocephaly Heads SHREC
(189) (254) (105) (425)
L
F
DFeat. extraction 23.6m 47.1m 21.4m 63.3m
Feat. compare & classify 14m 17m 13m 140m
2
D
s
a
l
i
e
n
t
m
a
p
Low-level feat. extraction 1.5m 3m 1m 4m
Mid-level feat. aggregation 7m 7m 3m 32m
Salient point prediction 51m 19m 106m 53m
Signature generation 2m 2m 2m 11m
Signature classication 2m 1m 1m 2m
45
6.3 Retrieval using 2D Map Signature
By creating a signature for each 3D object, it is now possible to perform similarity-based
retrieval using any of the objects in a dataset as a query object. Retrieval of 3D objects
in the dataset is performed by calculating the distance between the 2D longitude-latitude
salient map signature of a query object and the 2D longitude-latitude salient map signa-
tures of all the objects in the dataset. The 2D salient maps are treated as vectors and
Euclidean distance is used as the distance measure, since preliminary experiment results
using histogram intersection and chi-square distance as the distance measure did not show
any improvements in the retrieval performance.
The retrieval performance was measured using the average normalized rank of relevant
images [56]. The evaluation score for a query object q was calculated as follows:
score(q) =
1
N N
rel

N
rel

i=1
R
i

N
rel
(N
rel
+ 1)
2

where N is the number of objects in the database, N


rel
is the number of database objects
that are relevant to the query object q (all objects in the database that have the same
class label as the query object), and R
i
is the rank assigned to the i-th relevant object.
The evaluation score ranges from 0 to 1, where 0 is the best score as it indicates that all
database objects that are relevant are retrieved before all other objects in the database. A
score that is greater than 0 indicates that some non-relevant objects are retrieved before all
relevant objects.
The retrieval performance was measured over all the objects in a given dataset using
each in turn as a query object. The average retrieval score for each class was calculated
by averaging the retrieval score for all objects in the same class. A nal retrieval score
was calculated by averaging the retrieval score across all classes. Experimental tests were
performed on two of the datasets: the Heads dataset and the SHREC dataset.
6.3.1 Retrieval on Heads dataset
For the rst retrieval experiment, when calculating the evaluation score for a given query
object, the relevant objects were dened to be those objects that are morphed versions of
46
the query object. This resulted in 15 relevant labels for the database. The algorithm was
able to obtain a score of 0 (best) for almost all the queries, except for one of the horse head
queries that obtained a score of 0.00136054. For this horse head query, a morphed version of
a dierent horse head was returned at rank 6 before the last morphed version of the query
horse. Since the two horses had dierent labels, the retrieval score was not zero.
The second retrieval experiment had a similar setup to the rst experiment. The dier-
ence was in categorizing which objects were to be considered relevant to the query object.
For this experiment, the relevant objects were all objects in the same general class: human,
cat, dog, horse, rabbit, wildcat, and bear heads. This resulted in a total of 7 relevant labels
instead of 15. All the scores are extremely low with the exception of the wildcat head.
Table 6.9 shows the mean and standard deviation of the evaluation scores for each class,
using each object in the database as a query.
class mean stddev
cat head 0 0
dog head 0 0
human head 0 0
rabbit head 0 0
horse head 0.064 0.069
wildcat head 0.263 0.046
bear head 0 0
Table 6.9: Pose-normalized retrieval experiment 2: the mean and standard deviation of the
evaluation scores for all 7 head classes. The objects in the experiments were pose normalized,
and the relevant objects were all objects that belonged to the same general class.
6.3.2 Retrieval on SHREC dataset
A number of experiments were performed on the SHREC database. The rst experiment
was designed to investigate how pose-normalization of 3D objects aects the construction
47
of the 2D map signature and the retrieval performance of the map signatures. In this exper-
iment, three dierent pose-normalization methods: 4ContPCA, IncRot, and Manual were
compared. The rst method, 4ContPCA, is an extension to the commonly used Principal
Component Analysis (PCA) method that aligns 3D models to a canonical coordinate sys-
tem. 4ContPCA extends the PCA method by taking the mesh resolution and sizes of the
triangles into consideration when aligning the models [88]. Since each of the principal axes
has two possible direction (positive and negative), there are 8 possible orientation congu-
rations, but only 4 are valid based on the denition of a proper coordinate system. The 2D
longitude-latitude salient map signature of an object are generated at all four congurations.
The retrieval score is calculated using each of the 4 map signatures as a query. The nal map
signature of an object is the map signature that obtained the lowest retrieval score. In the
second approach to achieve rotation-invariant retrieval, IncRot, each 3D object is rotated
at 100 degree increments for all three axes and the 2D longitude-latitude map signature of
the object is generated at each rotated pose. This resulted in a total of 3 3 3 = 27 map
signatures for each object in the database. Similar to the 4ContPCA approach, the nal
map signature of an object is the map signature with the lowest retrieval score out of all
the R rotated map signatures. Lastly, as a baseline to compare the retrieval performance
of the two previous pose-normalization methods, for the Manual method, all objects in the
SHREC database were manually pose normalized so that objects that are in the same class
have the same orientation.
The retrieval score of using each object in the database as a query object was calculated
and the average retrieval score of all classes in the dataset were computed. Table 6.10
compares the average retrieval score using the three dierent pose-normalization methods.
For this experiment, absolute Gaussian curvature is used as the low-level feature in the base
framework. Looking at the retrieval performance score, it can be observed that the IncRot
pose-normalization method has better retrieval performance than the 4ContPCA method.
Surprisingly, the IncRot pose-normalization method resulted in better retrieval results than
those of objects that were manually pose-normalized (Manual ). This is because some classes
such as statues and knots contained objects of very dierent shapes or very abstract shapes
that made it hard to determine a canonical pose for the objects in these classes. The
48
incremental rotation method to pose normalize the objects in the dataset is used for all
subsequent experiments.
Table 6.10: Average retrieval score comparing three pose-normalization methods.
4ContPCA IncRot Manual
0.281 0.249 0.271
The second experiment compared the retrieval performance of the 2D longitude-latitude
map signature when constructed using two dierent low-level features: absolute Gaussian
curvature (AbsGaussCurv) and Besl-Jain curvature (BeslJain). The retrieval performances
of AbsGaussCurv and BeslJain were compared to variants of the signature where both
interesting points predicted using the low-level feature and the full projections of all the
points of the 3D objects (AllPts) are used together and concatenated into one feature
vector. The additional projections work like multiple full views of the object and are useful
for this database where the objects can be very dissimilar. Table 6.11 gives the average
retrieval score when using dierent low-level features and their variants to generate a nal
map signature. The results show that concatenating the full projection information to the
original map signature (AllPts) improves the performance of the original map signature, in
both cases, and the Besl-Jain operator outperformed the absolute Gaussian curvature.
Table 6.11: Average retrieval score of map signatures generated with dierent low-level
features and their variants.
AbsGaussCurv BeslJain AbsGaussCurv BeslJain
+AllPts +AllPts
0.249 0.189 0.203 0.183
Comparing Table 6.11 and Table 6.12, it can be observed that transforming the 2D
longitude-latitude map signatures using Haar wavelets improved the average retrieval score
of each of the map signatures generated using dierent low-level feature.
49
Table 6.12: Average retrieval score of map signatures transformed using image wavelet
analysis.
AbsGaussCurv BeslJain AbsGaussCurv+AllPts BeslJain+AllPts
wavelet wavelet wavelet wavelet
0.226 0.162 0.148 0.1438
Table 6.13: Comparing the salient map signature best results against existing methods.
Salient map LFD SPH D2 AAD
0.144 0.097 0.120 0.361 0.349
The best results were compared against some of the state-of-the art and best performing
methods in the SHREC competition: Light Field Descriptor (LFD) [16], ray-based spherical
harmonics (SPH)[39], shape distribution of distance between random points (D2) [62], and
absolute angle distance histogram (AAD) [59]. The results in Table 6.13 show that the
2D salient map signature performs better than the global features methods, D2 and AAD,
and similar to, though slightly lower than, the feature-based method spherical harmonics
descriptor (SPH) and the view-based method Light Field Descriptor (LFD). LFD is the
best performer on the SHREC database; however Table 6.14 shows that it is not the best
performing descriptor for all classes. The 2D salient map signature has comparable retrieval
score to LFD and SPH in many of the classes, and it performs better in classes such as
airplanes, pipes, spiral, articulated scissors, glass, vase, tables, articulated eyeglasses, and
bird. It is noted that the 2D salient map signature is more ecient than LFD, which
renders 100 projections of each 3D object and computes 2D shape descriptors for each
separate rendering.
Figures 6.4 and 6.5 shows examples of retrieval results for two dierent queries from
the SHREC 2008 dataset on which the 2D salient map signature outperforms LFD. The
query is shown as the rst retrieval result in a bold bounding box and the next few retrieval
results are shown in decreasing similarity value. The corresponding 2D salient map of each
50
Table 6.14: Comparing the retrieval score for each class in the SHREC database using
dierent descriptors: salient map signature, LFD, SPH, D2, and AAD. The best (lowest)
score for each class is in boldface.
no class # objects Salient map LFD SPH D2 AAD
1 human-di-pose 15 0.3105 0.0872 0.1095 0.3677 0.4154
2 monster 11 0.1687 0.1693 0.1536 0.4403 0.4344
3 dinosaur 6 0.2585 0.169 0.1494 0.4561 0.259
4 4-legged-animal 25 0.2274 0.1858 0.1782 0.4267 0.4413
5 hourglass 2 0.0029 0.0006 0.0006 0.43 0.2453
6 chess-pieces 7 0.2084 0.0854 0.0934 0.4260 0.3655
7 statues-1 19 0.2865 0.2497 0.2775 0.4694 0.4454
8 statues-2 1 0.0 0.0 0.0 0.1129 0.0141
9 bed-post 2 0.0182 0.0082 0.0435 0.1365 0.4529
10 statues-3 1 0.0 0.0 0.0 0.1388 0.0259
11 knot 13 0.0372 0.0026 0.0348 0.3933 0.4694
12 torus 18 0.4275 0.1607 0.3138 0.469 0.4479
13 airplane 19 0.0072 0.0539 0.0347 0.488 0.4495
14 heli 5 0.2006 0.1576 0.0516 0.4115 0.4114
15 missile 9 0.3137 0.2406 0.1747 0.4946 0.5004
16 spaceship 1 0.0 0.0 0.0 0.3435 0.0
17 square-pipe 12 0.0017 0.0169 0.0348 0.4941 0.3178
18 rounded-pipe 15 0.1959 0.1841 0.3113 0.4972 0.3946
19 spiral 13 0.3339 0.3719 0.442 0.4397 0.4667
20 articulated scissors 16 0.0004 0.0049 0.087 0.4443 0.4245
21 CAD-1 1 0.0 0.0 0.0 0.0565 0.0212
22 CAD-2 1 0.0 0.0 0.0 0.0.212 0.0565
23 CAD-3 1 0.0 0.0 0.0 0.0706 0.0518
24 CAD-4 1 0.0 0.0 0.0 0.1576 0.0353
25 CAD-5 1 0.0 0.0 0.0 0.0471 0.16
26 glass 7 0.1901 0.2452 0.4263 0.4544 0.4609
27 bottle 17 0.2647 0.0811 0.2187 0.4858 0.4003
28 teapot 4 0.2746 0.0154 0.0341 0.3571 0.48
29 mug 17 0.1122 0.0044 0.0163 0.432 0.4405
30 vase 14 0.1214 0.1492 0.2455 0.4269 0.4972
31 table 4 0.349 0.1525 0.1468 0.3478 0.5235
32 chairs 28 0.2124 0.123 0.1295 0.4766 0.4522
33 tables 16 0.1774 0.183 0.3138 0.494 0.4438
34 articulated-hands 18 0.2325 0.1462 0.14 0.4684 0.4926
35 articulated-eyeglasses 13 0.0184 0.1559 0.1431 0.3492 0.4506
36 starsh 19 0.2947 0.1017 0.1105 0.488 0.4727
37 dolphin 23 0.1718 0.0534 0.0703 0.4613 0.3929
38 bird 17 0.1218 0.2106 0.2094 0.4233 0.4564
39 buttery 2 0.0682 0.0094 0.0024 0.1835 0.3782
51
2D salient map retrieval results
0 3440.14 4818.08 5039.07 5112.73 5307.74 5313.80 5520.91
LFD retrieval results
0 6145 8072 8673 8741 8972 9045 9435
Figure 6.4: Comparison of the salient map signature retrieval results and the results re-
trieved by the LFD descriptor. The query object is the rst object (marked by a bounding
box) followed by the retrieval results sorted in increasing order of distance. The 2D salient
map of each retrieved object is shown under the object.
retrieved object is shown under each object. Figure 6.4 shows that given an eyeglasses query
object, the rst 8 returned objects belong to the same class as the query when the retrieval
is performed using the 2D salient map signature. However, the retrieval results using LFD
include some false positives such as sh and a bottle object. This is because the 2D salient
maps of objects in the same class (eyeglasses vs. eyeglasses) are similar to each other,
while the 2D salient maps of dierent objects (eyeglasses vs. airplane, eyeglasses vs. sh,
eyeglasses vs. bottle) are dierent. In addition, the 2D salient map captures more details
on the objects compared to the silhouette images that LFD uses. A similar scenario can be
seen in Figure 6.5 when the query object is an airplane. The rst 15 returned objects using
the 2D salient map signature as a shape descriptor belong to the same class as the query
object, but the retrieval results using the LFD descriptor include some false positives such
as bird objects, which look the same in silhouette but whose interest points are dierent.
One of the strengths of the 2D longitude-latitude salient map methodology and signature
52
2D salient map retrieval results
0 5469.14 5633.13 5873.31 5943.47 6049.2 6120.0 6126.64
6218.81 6242.29 6247.5 6252.7 6283.82 6429.59 6454.81 6529.94
LFD retrieval results
0 6809 7510 8158 8282 8374 8476 8488
8489 8490 8674 8835 8846 8868 8981 9014
Figure 6.5: Comparison of the salient map signature retrieval results and the results re-
trieved by the LFD descriptor. The query object is the rst object (marked by a bounding
box) followed by the retrieval results sorted in increasing order of distance.
53
is the exibility of the base framework that allows the extraction of any low-level feature
from the 3D object. This produces a methodology that is general enough to be applied to
any application. Experimental results show that the same base framework and methodology
can be applied for both classication and retrieval tasks, and has been tested on both general
objects and medical craniofacial data.
6.4 Retrieval using Salient Views
This section presents a method for selecting salient 2D views to describe 3D objects for the
purpose of retrieval. The views are obtained by using the salient point patterns obtained
via the learning approach discussed in Chapter 5. The salient views are selected by choosing
views with multiple salient points on the silhouette of the object. Silhouette-based similarity
measures from [16] are then used to calculate the similarity between two 3D objects.
6.4.1 Clustering Salient Points
The salient points identied by the learning approach (discussed in Chapter 5) are quite
dense and form regions. A clustering algorithm was applied to reduce the number of salient
points and to produce more sparse placement of the salient points. The algorithm selects
high condence salient points that are also suciently distant from each other. The algo-
rithm follows a greedy approach. Salient points are sorted in decreasing order of classier
condence scores. Starting with the salient point with the highest classier condence score,
the clustering algorithm calculates the distance from this salient point to all existing clus-
ters and accepts it if the distance is greater than a neighborhood radius threshold. For the
experiments, the radius threshold was set at 5. Figure 6.6 shows the selected salient points
on the cat, dog, and human head objects from Figure 5.4. It can be seen that objects from
the same class (heads class in the gure) are marked with salient points in similar locations,
thus illustrating the repeatability of the salient point learning and clustering method.
54
Figure 6.6: Salient points resulting from clustering.
6.4.2 Selecting Salient Views
The methodology presented in this section is intended to improve the Light Field Descrip-
tor [16], using their concept of similarity. Chen et al. [16] argue that if two 3D models are
similar, the models will also look similar from most viewing angles. Their method extracts
light elds rendered from cameras on a sphere. A light eld of a 3D model is represented
by a collection of 2D images. The cameras of the light elds are distributed uniformly and
positioned on vertices of a regular dodecahedron. The similarity between two 3D models is
then measured by summing up the similarity from all corresponding images generated from
a set of light elds.
To improve eciency, the light eld cameras are positioned at 20 uniformly distributed
vertices of a regular dodecahedron. Silhouette images at the dierent views are produced
by turning o the lights in the rendered views. Ten dierent light elds are extracted for a
3D model. Since the silhouettes projected from two opposite vertices on the dodecahedron
are identical, each light eld generates ten dierent 2D silhouette images. The similarity
between two 3D models is calculated by summing up the similarity from all corresponding
silhouettes. To nd the best correspondence between two silhouette images, the camera
position is rotated resulting in 60 dierent rotations for each camera system. In total, the
similarity between two 3D models is calculated by comparing 101060 dierent silhouette
image rotations between the two models. Each silhouette image is eciently represented by
extracting the Zernike moment and the Fourier coecients from each image. The Zernike
moments describe the region shape, while the Fourier coecients describe the contour shape
of the object in the image. There are 35 coecients for the Zernike moment descriptor and
55
10 coecients for the Fourier descriptor.
Like the Light Field Descriptor, the proposed method uses rendered silhouette 2D images
as views to build the descriptor to describe the 3D object. However, unlike LFD, which
extracts features from 100 2D views, the method selects only salient views. The hypothesis
is that the salient views are the views that are discernible and most useful in describing the
3D object. Since the 2D views used to describe the 3D objects are silhouette images, some
of the salient points present on the 3D object must appear on the contour of the 3D object
(Figure 6.7).
(a) (b)
Figure 6.7: (a)Salient points must appear on the contour of the 3D objects for a 2D view
be considered a salient view. The contour salient points are colored in green, while the
non-contour salient points are in red. (b) Silhouette image of the salient view in (a).
A salient point p(p
x
, p
y
, p
z
) is dened as a contour salient point if its surface normal
vector v(v
x
, v
y
, v
z
) is perpendicular to the camera view point c(c
x
, c
y
, c
z
). The perpendicu-
larity is determined by calculating the dot product of the surface normal vector v and the
camera view point c. A salient point p is labeled as a contour salient point if |v c| <= T
where T is the perpendicularity threshold. For experiments, value T = 0.10 was used. This
value ensures that the angle between the surface normal vector and the camera view point
is between 84

and 90

.
For each possible camera view point (total 100 view points), the algorithm accumulates
the number of contour salient points that are visible for that view point. The 100 view
points are then sorted based on the number of contour salient points visible in the view.
The algorithm selects the nal top K salient views used to construct the descriptor for a 3D
56
model. Empirical experiments were performed to test the dierent values of K to investigate
the respective retrieval accuracy.
A more restrictive variant of the algorithm selects the top K distinct salient views. In
this variant, after sorting the 100 views based on the number of contour salient points
visible in the view, the algorithm uses a greedy approach to select only the distinct views.
The algorithm starts by selecting the rst salient view, which has the largest number of
visible contour salient points. It then iteratively checks whether the next top salient view
is too similar to the already selected views. The similarity is measured by calculating the
dot product between the two views and discarding views whose dot product to existing
distinct views is greater than a threshold P. In the experiments, value P = 0.98 was used.
Figure 6.8(a) shows the top 5 salient views, while Figure 6.8(b) shows the top 5 distinct
salient views for a human object. It can be seen in the gure that the top 5 distinct salient
views more completely capture the shape characteristics of the object. Figure 6.9 shows the
top 5 distinct salient views for dierent classes in the SHREC database.
Figure 6.8: (a) Top 5 salient views for a human query object. (b) Top 5 distinct salient
views for the same human query object. The distinct salient views capture more shape
information regarding the objects shape.
6.4.3 Experimental Results
Experimental tests were performed on the SHREC dataset. The retrieval performance was
calculated using the average normalized rank of relevant results [56] discussed in Section 6.3.
57
Figure 6.9: Top 5 distinct salient views of animal class (top row), bird class (middle row),
and chair class (bottom row) from the SHREC database.
A number of experiments were performed to evaluate the performance of the proposed
descriptor and its variants. The rst experiment explored the retrieval accuracy of the
proposed descriptor. The experiment showed the eect of varying the number of top salient
views used to construct the descriptors for the 3D objects in the dataset. As shown in
Figure 6.10, the retrieval performance improved as the number of salient views used to
construct the descriptor increased. Using the top 100 salient views is equivalent to the
existing LFD method. For the absolute Gaussian curvature feature (blue line graph), LFD
with 100 views had the best retrieval score at 0.097; however, reducing the number of views
by half to the top 50 salient views only increased the retrieval score to 0.114. For the Besl-
Jain curvature feature (pink line graph), the trend was similar with a smaller decrease in
performance as the number of views was reduced.
In the second experiment, the algorithm selected the top distinct salient views. Ta-
ble 6.15 shows the average retrieval scores across all classes in the dataset as the number
of views and number of distinct views were varied. Comparing the results, it can be seen
that the retrieval scores for the top K distinct views was always lower (better) than that
58
for the top K views. For example, using the top 5 distinct salient views achieved an average
retrieval score of 0.138 compared to using the top 5 salient views with retrieval score of
0.157. In fact, using the top 5 distinct salient views achieved similar retrieval score to using
the top 20 salient views, and using the top 10 distinct salient views produced a similar
retrieval score as to using the top 50 salient views. Each object in the dataset has its own
number of distinct salient views. The average number of distinct salient views for all the
objects in the dataset was 12.38 views. Executing the retrieval with the maximum number
of distinct salient views for each object query achieved a similar average retrieval score to
the retrieval performed using the top 70 salient views.
The third experiment compared the retrieval score when using the maximum number of
distinct salient views to the retrieval score of the existing LFD method. Table 6.16 shows
the average retrieval score for each class using the maximum number of distinct salient
views and the LFD method. Over the entire database, the average retrieval score for the
maximum number of distinct salient views was 0.121 while the average score for LFD was
0.098. To better understand the retrieval scores, a few retrieval scenarios are presented.
Suppose that the number of relevant objects to a given query is N
rel
and that the total
number of objects in the database is N = 30; the retrieval score is dependent on the rank
of the N
rel
relevant objects in the retrieved list. The same retrieval score can be achieved
in two dierent scenarios. When N
rel
= 10 a retrieval score of 0.2 is attained when 3
of the relevant objects are at the end of the retrieved list, while the same score value is
obtained in the case of N
rel
= 5 when only 1 of the relevant objects is at the end of the
list. This shows that the wrong retrieval for classes with small N
rel
value are more heavily
penalized, since there are fewer relevant objects to retrieve. In Table 6.16 it can be seen
that for classes with small N
rel
values (N
rel
< 10), the average class retrieval scores using
the maximum number of distinct views are small and similar to retrieval using LFD (scores
< 0.2), indicating that the relevant objects are retrieved at the beginning of the list. For
classes with bigger N
rel
values, the retrieval scores for most classes are < 0.3 indicating that
in most cases the relevant objects are retrieved before the middle of the list. The worst
performing class for both methods is the spiral class with a score of 0.338 using maximum
distinct salient views and 0.372 using LFD; this most probably is due to the high shape
59
variability in the class. The retrieval score using the salient views method is quite similar
to the retrieval score of LFD with only small dierences in the score values suggesting that
the retrievals slightly dier in the ranks of the retrieved relevant objects, with most relevant
objects retrieved before the middle of the list. However, the salient views method greatly
reduces the computation time for descriptor computation.
Figure 6.10: Average retrieval scores across all classes in the database as the number of top
salient views used to construct the descriptor is varied. Learning of the salient points used
two dierent low-level features: absolute Gaussian Curvature and Besl-Jain curvature.
The last experiment investigated the runtime performance of the salient views method-
ology and compared the runtime speed of the method to the existing LFD method. These
experiments were performed on a PC running Windows Server 2008 with Intel Xeon dual
processor at 2GHz each and 16GB RAM. The runtime performance of the salient views
method can be divided into three parts: (1) salient views selection, (2) feature extraction,
and (3) feature matching. The salient view selection phase selects the views in which contour
salient points are present. This phase on average takes about 0.2s per object. The feature
matching phase compares and calculates the distance between two 3D object. This phase
on average takes about 0.1s per object. The feature extraction phase is the bottleneck of
the complete process. The phase begins with a setup step that reads and normalizes the 3D
objects. Then, the 2D silhouette views are rendered and the descriptor is constructed using
60
the rendered views. Table 6.17 shows the dierence in the feature extraction runtime for
one 3D object between the salient views method and the existing LFD method. The results
show that feature extraction using the selected salient views provides a 15-fold speedup
compared to using all 100 views for the LFD method.
6.5 Summary
This chapter described the construction of the 2D longitude-latitude salient map signature
and discussed experimental results of using the salient map signature for classication and
retrieval of various 3D datasets. The salient map signature is constructed by projecting
the salient point patterns, obtained using the learning approach, described previously in
Chapter 5, onto a 2D plane via a longitude-latitude transformation.
The constructed salient map signature was used to perform classication and retrieval
tasks on four datasets. Classication experiments on the 22q11.2DS dataset showed that the
salient map signature achieved higher classication accuracy than both the experts and four
existing state-of-the-art 3D object descriptors. Results on the Deformational Plagiocephaly
dataset also showed that the salient map signature performed better than the existing
3D object descriptors. Experiments were also performed to explore the degradation in
classifying the Heads dataset when using rotated versions of the salient map signature.
Results showed that the classication accuracy steadily decreased as the number of rotated
training map signatures decreased. Lastly, comparing the classication accuracy of the
signature to four existing 3D object descriptors showed that the Light Field Descriptor
method was the best performer when classifying the SHREC dataset, which contains very
dierent shape classes instead of subtly distinct shapes.
The salient map signature was also used to perform shape similarity-based retrieval ex-
periments. Retrieval experiments were performed by constructing the salient map signature
using two dierent low-level features: absolute Gaussian curvature and Besl-Jain curvature.
The results showed that using the Besl-Jain operator to construct the signature achieved
better retrieval score than using the absolute Gaussian curvature. Comparing the best re-
trieval results of the salient map signature to existing 3D object descriptors showed that
the 2D salient map signature achieved better retrieval results than the global feature meth-
61
ods, and performed similar to the feature-based descriptors such as spherical harmonics and
LFD.
Finally, the salient point patterns obtained using the learning approach were also used
to select salient 2D views to describe 3D objects. Experimental tests were performed to
test the salient 2D views for shape-based retrieval. Using the maximum number of distinct
salient views achieved slightly worse average retrieval score than using the existing LFD
descriptor, however, the salient views method greatly reduces the computation time for
descriptor computation, achieving a 15-fold speedup compared to using the LFD method,
which requires 100 views.
62
Table 6.15: Average retrieval scores across all classes as the number of top salient views and
top distinct salient views are varied. Absolute Gaussian curvature was used as the low-level
feature in the base framework. The average maximum number of distinct salient views is
12.38, hence there is no score available for K > 13 when using the top K distinct views.
K score top K score top K distinct
1 0.207 0.207
2 0.186 0.174
3 0.172 0.163
4 0.162 0.151
5 0.157 0.138
6 0.155 0.134
7 0.152 0.131
8 0.152 0.129
9 0.146 0.127
10 0.143 0.128
11 0.137 0.127
12 0.134 0.121
20 0.126 -
30 0.121 -
40 0.119 -
50 0.114 -
60 0.121 -
70 0.124 -
80 0.110 -
90 0.105 -
100 0.098 -
63
Table 6.16: Retrieval score for each class using the maximum number of distinct views
versus using all 100 views (LFD).
No Class # Objects Avg # distinct Max distinct LFD
salient views salient views score score
1 human-di-pose 15 12.33 0.113 0.087
2 monster 11 12.14 0.196 0.169
3 dinosaur 6 12.33 0.185 0.169
4 4-legged-animal 25 12.24 0.274 0.186
5 hourglass 2 11.5 0.005 0.001
6 chess-pieces 7 12.14 0.085 0.085
7 statues-1 19 12.16 0.267 0.250
8 statues-2 1 13 0.000 0.000
9 bed-post 2 12 0.124 0.008
10 statues-3 1 12 0.000 0.000
11 knot 13 12 0.006 0.003
12 torus 18 11.77 0.194 0.161
13 airplane 19 12.42 0.101 0.054
14 heli 5 11.6 0.204 0.158
15 missile 9 12 0.306 0.241
16 spaceship 1 13 0.000 0.000
17 square-pipe 12 12.31 0.026 0.017
18 rounded-pipe 15 11.8 0.221 0.184
19 spiral 13 12.46 0.338 0.372
20 articulated-scissors 16 12.06 0.027 0.005
21 CAD-1 1 12 0.000 0.000
22 CAD-2 1 12 0.000 0.000
23 CAD-3 1 13 0.000 0.000
24 CAD-4 1 12 0.000 0.000
25 CAD-5 1 11 0.000 0.000
26 glass 7 11.86 0.144 0.245
27 bottle 17 12.12 0.093 0.081
28 teapot 4 11.5 0.075 0.015
29 mug 17 12.06 0.035 0.004
30 vase 14 12.21 0.166 0.149
31 table 4 11.5 0.099 0.153
32 chairs 28 12.04 0.173 0.123
33 tables 16 11.88 0.254 0.183
34 articulated-hands 18 11.94 0.226 0.146
35 articulated-eyeglasses 13 12 0.161 0.156
36 starsh 19 12.26 0.158 0.102
37 dolphin 23 12.35 0.071 0.053
38 bird 17 12.12 0.239 0.211
39 buttery 2 12 0.166 0.009
Mean 12.38 0.121 0.098
64
Table 6.17: Average feature extraction runtime per object.
Method Setup View rendering Descriptor construction Total time
Max distinct views 0.467s 0.05s 0.077s 0.601s
LFD 100 views 0.396s 4.278s 4.567s 9.247s
65
Chapter 7
GLOBAL 2D AZIMUTH-ELEVATION ANGLES HISTOGRAM OF
SURFACE NORMAL VECTORS
This chapter describes the construction of a second type of 3D object signature: a
global 2D histogram of azimuth-elevation angles of surface normal vectors of points on the
3D object. The azimuth-elevation angles of surface normal vectors were introduced as one
of the low-level features in Chapter 4. In this chapter, the global 2D histogram descriptor
is tested on two dierent applications. The rst application, described in Section 7.1, uses
the signature to describe the 3D head shape deformation associated with deformational
plagiocephaly (DP) and to dene severity scores to quantify the head shape deformation.
In the second application, discussed in Section 7.2, the global 2D histogram is used to
describe and classify individuals in the 22q11.2 Deletion Syndrome dataset as either aected
or unaected by the syndrome.
7.1 3D Shape Severity Quantication and Localization for Deformational Pla-
giocephaly
Most existing assessment techniques for deformational plagiocephaly are very subjective
resulting in a lack of standard severity quantication of the condition. The techniques tend
to rely on either landmarks or on clinical expert opinions to classify the degree of severity of
the condition into a small discrete score range. However, there is considerable disagreement
among experts. Using continuous scores that are produced automatically would not only
standardize the quantication process but also allow researchers in the eld to investigate
other issues such as correlation between shape severity and cognitive outcome. The goal of
the work in this section is to construct a shape signature to describe the 3D head shape
deformation associated with deformational plagiocephaly and to dene severity scores to
quantify the shape deformation.
66
(a) (b)
Figure 7.1: (a) Surface normal vectors of points that lie on a at surface tend to have
similar azimuth-elevation angles and will create a peak in the 2D angle histogram. (b)
Surface normal vectors of points that lie on a more rounded surface have more varying
angles and hence will be spread out in the histogram bins.
7.1.1 3D Shape Descriptor
The 3D shape descriptor designed for deformational plagiocephaly uses the azimuth-elevation
angles of surface normal vectors (rst introduced in Chapter 3). The azimuth-elevation an-
gles of the surface normal vectors are computed for every 3D point on the head data. The
elevation angles span 180 degree ranging from -90

to 90

, while the azimuth angles spans


360 degree. The computed angles are then discretized into a small number of bins. In this
work, the number of bins were set to 12. The value in each bin of the histogram is the
count of how many points of the mesh had the particular combination of azimuth and ele-
vation. The assumption behind the construction of the descriptor is that the surface normal
vectors of 3D points lying on at surfaces of the 3D head meshes will have a much more
homogeneous set of angles than the surface normal vectors of 3D points that lie on rounder
surfaces (Figure 7.1). Thus, at parts of the head will tend to have high-valued bins or
peaks in the 2D histogram. In comparison, the surface normal vectors of points that lie
on a rounded surface will have many dierent angles and hence would be distributed over
multiple histogram bins. The hypothesis is that the bin values can be used to dierentiate
plagiocephaly from more typical head shapes.
67
Figure 7.2: The Left Posterior Flatness Score is computed by summing the values of the
bins highlighted in red, while the Right Posterior Flatness Score is computed by summing
the values of the bins highlighted in green.
7.1.2 Shape Severity Scores
The severity scores for deformational plagiocephaly are dened for the left and right sides of
the back of the head (posterior attening) using selected bins of the 2D histogram. The Left
Posterior Flatness Score (LPFS) is the sum of the histogram bins that correspond to the
combination of azimuth angles ranging from -90

to -30

and elevation angles ranging from


-15

to 45

, while the Right Posterior Flatness Score (RPFS) is the sum of the histogram
bins corresponding to the combination of azimuth angles ranging from -150

to -90

and
elevation angles ranging from -15

to 45

(Figure 7.2).
The Asymmetry Score (AS) represents the dierence between the RPFS and the LPFS.
The AS quanties the degree of asymmetry and also indicates which side is atter, with
negative AS values indicating that the left side is atter (LPFS > RPFS). The absolute
value of the asymmetry score, Absolute Asymmetry Score (AAS), allows us to compare
the AAS measurement to the existing OCLR measurement described in Chapter 2. For
the experimental tests, the OCLR measurements were approximated by taking a top view
snapshot of the 3D head mesh data and measuring the cross-diagonal length of the head
contour in the resulting snapshots. These approximation measurements will be referred to
as aOCLR.
68
Figure 7.3: Back view of the head showing the points whose surface normal vector angles
correspond to the selected bins azimuth-elevation angle combination highlighted in the 2D
histogram.
7.1.3 Shape Severity Localization
The 2D histogram can also be used to indicate the specic location of any posterior atten-
ing. This is done by identifying points at which the surface normal vectors azimuth and
elevation angles correspond to the 16 relevant bins used in the severity score computations.
Points at which the azimuth-elevation angle combinations correspond to one of these rel-
evant bins are marked and subsequently displayed on a color map (Figure 7.3). High bin
values are represented by warm colors (red, orange, yellow), while low bin values correspond
to cool colors (blue, cyan, green). A representative non-deformational plagiocephaly control
participant with an expert score of zero has all bins colored in cool colors, i.e. with no angle
combination that is relatively more prevalent than any of the other combinations. In DP
cases with right and left posterior atness, the increasing prevalence of red, orange, and
yellow indicates increasing severity of DP (Figure 7.4).
7.1.4 Experimental Result
Data analyses was performed using a Receiver Operating Characteristic (ROC) curve. For
all possible diagnostic threshold values, the ROC curve plots the sensitivity (percentage of
cases correctly identied) versus one minus the specicity (the percentage of non-DP head
69
Figure 7.4: (a) Mesh surface depictions of seven skulls representative of possible deforma-
tional plagiocephaly severity scores from expert clinician ratings. (b) Relevant bins of 2D
histogram of azimuth-elevation angles of surface normal vectors on 3D head mesh models.
These bins are used to calculate the various deformation severity indices. As the severity of
posterior atness increases on the side of the head, the peak in the 2D histogram becomes
more prominent as shown by the warmer colors (red, yellow, green). (c) The last row shows
the localization of the posterior atness, where the at areas are colored in a similar shade
as their corresponding histogram bins.
shapes correctly identied). To estimate overall accuracy, the area under the ROC curve is
computed. A perfect diagnostic test yields an AUC of 1. A threshold value is automatically
computed for each score, such that the threshold maximizes the combination of sensitivity
and specicity for distinguishing head shape characteristics such as left posterior attening,
right posterior attening, and head asymmetry. The severity scores were developed and
tested on the 140 participants in the Deformational Plagiocephaly dataset discussed in
Chapter 3.2.
DP cases with left posterior attening had a higher mean LPFS ranging from 0.159 -
0.194 (depending on the expert severity rating), while non-DP controls and DP cases with
right posterior attening had a lower mean LPFS ranging from 0.111-0.127 (Table 7.1 and
Figure 7.5). In contrast, DP cases with right posterior attening had mean RPFS ranging
from 0.171 - 0.184, while non-DP controls and DP cases with left posterior attening had
lower mean RPFS ranging from 0.115 - 0.144 as shown in Table 7.2 and Figure 7.6.
DP cases with left posterior attening had mean AS ranging from -0.079 to -0.015, while
DP cases with right posterior attening had mean AS ranging from 0.048 to 0.069 (Table 7.3
70
Figure 7.5: Correlations between the Left Posterior Flatness Score (LPFS) and Expert
Score. The optimal threshold at 0.15 (thick line) distinguishes the cases with left posterior
attening (enclosed in box) from the rest of the individuals in the dataset.
Figure 7.6: Correlations between the Right Posterior Flatness Score (RPFS) and Expert
Score. The optimal threshold at 0.15 (thick line) distinguishes the cases with right posterior
attening (enclosed in box) from the rest of the individuals in the dataset.
71
Table 7.1: Descriptive statistics for the Left Posterior Flatness Score (LPFS)
Patient group Expert score Mean Standard deviation
Non-DP control 0 0.127 0.014
-1 0.159 0.018
DP cases with left posterior attening -2 0.182 0.025
-3 0.194 0.040
DP cases with right posterior attening
1 0.123 0.013
2 0.116 0.014
3 0.111 0.008
and Figure 7.7). The non-DP control group had a slightly positive mean AS of 0.012.
The distribution of the AAS for non-DP controls had a mean of 0.016 and standard
deviation 0.012 (Table 7.4). while DP cases had higher mean AAS ranging from 0.042 -
0.073 (or 260-450% that of the non-DP control group mean). The mean aOCLR score for
non-DP controls was 103.5 and ranges between 105.2 - 114.8 for DP cases (or 102-111% of
the control group mean) depending on the assigned expert scores for these DP cases.
The graphs indicate that there is no single threshold for any of the indices that perfectly
distinguished DP cases and non-DP controls (Figures 7.5- 7.8). Nevertheless, the automat-
ically set threshold of 0.15 for the LPFS and RPFS distinguished most DP cases with left
and right posterior attening respectively (enclosed in a box in Figures 7.5 and 7.6) from
most non-DP controls and participants with attening on the opposite side. Excluding the
non-DP control participants, an AS threshold of zero produced a relatively clear distinc-
tion between DP cases with left and right posterior attening (Figure 7.7). Setting the
AAS threshold to 0.0352 provided a reasonable, though imperfect, classication of non-DP
control participants (expert score = 0) versus DP cases participants (expert score > 0)
(Figure 7.8). The AAS correlated with the aOCLR; for both measures, there was overlap in
the range of scores between non-DP controls and DP cases who were given expert ratings
of mild or moderate DP (Figure 7.9). All DP cases with expert ratings of severe DP were
72
Figure 7.7: Correlations between the Asymmetry Score (AS) and Expert Score. A thresh-
old at value = 0 produces a clear distinction between cases with left posterior attening
(enclosed in the box in the lower left quadrant) and cases with right posterior attening
(enclosed in the box in the upper quadrant.)
Figure 7.8: Correlations between the Absolute Asymmetry Score (AAS) and Expert Score.
Setting threshold at value 0.0352 (thick line) provides a reasonable classication of non-DP
control participants versus DP cases participants.
73
Table 7.2: Descriptive statistics for the Right Posterior Flatness Score (RPFS)
Patient group Expert score Mean Standard deviation
Non-DP control 0 0.139 0.018
-1 0.144 0.017
DP cases with left posterior attening -2 0.127 0.015
-3 0.015 0.016
1 0.171 0.023
DP cases with right posterior attening 2 0.184 0.020
3 0.181 0.020
above the diagnostic threshold for both measures (0.035 for AAS and 106 for aOCLR).
The LPFS and RPFS had relatively high accuracy in distinguishing DP cases from
non-DP controls (Figures 7.10 and 7.11), but they were not directly comparable to the
aOCLR. The AAS produced more accurate classication (AUC = 90.9%) than the aOCLR
(AUC = 78.6%) as shown in Figure 7.13. The AS demonstrated very high accuracy in
the classication of DP cases with left posterior attening from those with right posterior
attening (Figure 7.12).
Preliminary experiments were performed using the newly developed severity scores (LPFS,
RPFS, AAS, and AS) to classify brachycephaly. A consistent brachycephaly dataset was
created similar to the plagiocephaly dataset (Section 3.2) by excluding participants with
discrepant brachycephaly expert scores. The nal brachycephaly dataset consisted of 129
individuals; 79 individuals in the dataset were labeled 0 by the experts, while 50 individuals
had labels > 1, indicating the presence of brachycephaly to some degree. The Area Under
the Curve for using the Absolute Asymmetry Score was 0.882. The graph in Figure 7.14
shows the correlation between the Absolute Asymmetry Score and the experts brachy-
cephaly scores, while Figure 7.15 illustrates the ROC curve of using Absolute Asymmetry
Score for classication of cases with brachycephaly.
Additional classication experiments were also performed by treating the global 12
74
Figure 7.9: Correlation between the Absolute Asymmetry Score and the approximate
Oblique Cranial Ratio Length for quantifying posterior attening.
Figure 7.10: Receiver Operating Characteristics (ROC) curve using Left Posterior Flatness
Score (LPFS) for classication of cases with left posterior attening versus other individuals
in the dataset. The sensitivity and specicity at which the AUC is maximized (marked point
on the graph) are 96.6% and 95.8%, respectively.
75
Figure 7.11: Receiver Operating Characteristics (ROC) curve using Right Posterior Flat-
ness Score (RPFS) for classication of cases with right posterior attening versus other
individuals in the dataset. The sensitivity and specicity at which the AUC is maximized
(marked point on the graph) are 91.9% and 86.4%, respectively.
Figure 7.12: Receiver Operating Characteristics (ROC) curve using Asymmetry Score (AS)
for classication of patients with left posterior attening versus patients with right posterior
attening. The sensitivity and specicity at which the AUC is maximized (marked point
on the graph) are 100% and 98.5%, respectively.
76
Figure 7.13: Receiver Operating Characteristics (ROC) curve for classication of patients
with posterior attening versus non-DP controls using Absolute Asymmetry Score (AAS)
and approximate of the Oblique Cranial Length Ratio (aOCLR). The performance of AAS
is better than that of aOCLR. The sensitivity and specicity at which the AAS AUC is
maximized (marked point on the graph) are 96% and 80%, respectively.
Figure 7.14: Correlation between the Absolute Asymmetry Score and the experts Brachy-
cephaly score.
77
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positive rate (1specificity)
T
r
u
e

p
o
s
i
t
i
v
e

r
a
t
e

(
s
e
n
s
i
b
i
l
i
t
y
)
ROC curve


Absolute Asymmetry Score
Figure 7.15: Receiver Operating Characteristic (ROC) curve using Absolute Asymmetry
Score for classication of cases with brachycephaly versus unaected individuals.
78
Table 7.3: Descriptive statistics for the Asymmetry Score (AS)
Patient group Expert score Mean Standard deviation
Non-DP control 0 0.012 0.016
-1 -0.015 0.018
DP cases with left posterior attening -2 -0.055 0.020
-3 -0.079 0.026
1 0.048 0.024
DP cases with right posterior attening 2 0.068 0.022
3 0.069 0.015
12 2D azimuth-elevation angles histogram as a feature vector and using an SVM classier
to perform classication of the dierent deformational plagiocephaly conditions. Consistent
datasets were created for each of the conditions by excluding individuals with discrepant
experts score for each particular condition. The 5 consistent datasets are: (1) Posterior
plagiocephaly consisting of 140 individuals (90 cases and 50 unaected), (2) Brachycephaly
consisting of 129 individuals (50 cases and 79 unaected), (3) Forehead asymmetry (129
individuals), (4) Ear asymmetry (116 individuals), and (5) Overall severity (188 individuals).
Table 7.6 shows the classication accuracy for each of the deformational plagiocephaly
conditions using the global 2D histogram of azimuth-elevation angles and SVM classier.
7.2 Classication of 22q11.2 Deletion Syndrome
The global 2D azimuth-elevation angles histogram of surface normal vectors was also tested
on the 22q11.2 Deletion Syndrome dataset, described in Chapter 3.1. In this section, a num-
ber of global 2D histograms were constructed using dierent number of bins to investigate
the eect of the number of bins on the classication performance.
Figure 7.16 shows the dierence between individuals with and without midface hypopla-
sia, which is one of the facial abnormalities associated with 22q11.2 Deletion Syndrome.
The gure shows that the projection of the 2D azimuth-elevation angle histograms to the
79
Table 7.4: Descriptive statistics for the Absolute Asymmetry Score (AAS) and approximate
Oblique Cranial Length Ratio (aOCLR) measurements.
Patient group Absolute Expert Score Method Mean Standard deviation
Non-DP control 0 AAS 0.016 0.012
aOCLR 103.566 2.474
1 AAS 0.042 0.024
aOCLR 105.218 3.259
DP cases 2 AAS 0.064 0.022
aOCLR 109.135 3.298
3 AAS 0.073 0.020
aOCLR 114.809 3.247
face shows some discriminating patterns between individuals with and without midface hy-
poplasia, noticeably on the cheek area which is the facial region that is most aected by
midface hypoplasia.
Table 7.7 shows the classication accuracy of 22q11.2DS using the global 2D histograms
that are constructed with various numbers of bins. Note that the classication accuracy
with a 2424 2D global histogram of azimuth-elevation angles is higher than the experts
median rate [94]. Table 7.8 shows the classication accuracy of the nine dierent facial dys-
morphologies associated with 22q11.2DS using the global 2D histogram method constructed
with dierent number of bins. The oral features have slightly higher classication accuracy,
but this could also be due to the fact that the dataset for the oral features are slightly
unbalanced. In the case of the open mouth feature, for example, 19 individuals are labeled
with the feature and 67 as unaected. The experts survey results showed that all features
of the nose were found to have a higher percentage of moderate and severe expression in
22q11.2DS aected individuals [91].
80
Table 7.5: Area under the curve and corresponding 95% condence intervals computed from
Receiver Operating Characteristics curves for quantifying posterior attening.
Score AUC 95% CI
LPFS 0.9745 0.93-1.02
RPFS 0.9185 0.87 - 0.97
AAS 0.9091 0.86 - 0.96
aOCLR 0.7861 0.71 - 0.86
AS 0.9956 0.98 - 1.01
Table 7.6: Classication accuracy for the various plagiocephaly conditions using an SVM
classier.
Posterior Brachycephaly Forehead Ear Overall
plagiocephaly asymmetry asymmetry severity
Using 2D histogram 0.793 0.868 0.674 0.603 0.766
Table 7.7: Classication of 22q11.2DS using the global 2D histogram method constructed
using dierent numbers of bins.
88 1616 2424 32 32 Experts median
Whole 2D hist 0.651 0.569 0.79 0.684 0.68
7.3 Summary
This chapter described the construction of a second type of 3D object signature in the form
of a global 2D histogram of azimuth-elevation angles of surface normal vectors of points on
the 3D object. The constructed global 2D histogram signature was used to describe the 3D
head shape deformation associated with deformational plagiocephaly. Four severity scores
were dened to quantify the head shape deformation: Left Posterior Flatness Score (LPFS),
Right Posterior Flatness Score (RPFS), Asymmetry Score (AS), and Absolute Asymmetry
Score (AAS). Classication experiments showed that LPFS and RPFS had relatively high
81
Figure 7.16: Projections of 2D azimuth-elevation angles to the face. The projections shows
some discriminating patterns between individuals with and without midface hypoplasia.
Table 7.8: Classication accuracy of the various facial dysmorphologies associated with
22q11.2DS using the global 2D histogram of azimuth-elevation angles constructed with
dierent numbers of bins.
88 1616 2424 32 32
Midface Hypoplasia 0.639 0.744 0.697 0.651
Tubular Nose 0.709 0.593 0.581 0.663
Bulbous Nasal Tip 0.593 0.581 0.581 0.639
Prominent Nasal Root 0.547 0.639 0.616 0.658
Small Nasal Alae 0.561 0.675 0.571 0.560
Retrusive Chin 0.526 0.674 0.560 0.546
Open Mouth 0.875 0.799 0.844 0.683
Small Mouth 0.671 0.526 0.752 0.585
Downturned Mouth 0.613 0.539 0.553 0.630
82
accuracy in distinguishing DP cases from non-DP controls . The results also showed that
AAS provided better sensitivity and specicity in the discrimination of plagiocephalic and
typical head shapes than the 2D measurements provided by a close approximation of the
existing OCLR measurement. The AS showed high classication accuracy in distinguishing
DP cases with left posterior attening from those with right posterior attening. These new
scores provide continuous measurement of head shapes as opposed to the current discrete
approach, which could be used to examine associations between deformational plagiocephaly
and neurodevelopment outcomes or to assess the changes in head shapes over time with and
without treatments such as helmet therapy.
Similar classication experiments were also performed to test the performance of the
global 2D histogram signature in classifying brachycephaly. Results showed that the severity
scores were also able to achieve good accuracy rates in classifying brachycephaly. In addition,
the global 2D histogram signature was used as a feature vector to perform classication of
the dierent deformational plagiocephaly conditions, such as ear and forehead asymmetry,
and the 22q11.2 deletion syndrome and its multiple dierent manifestations. Experimental
results showed that classication using the global 2D histogram feature vector achieves
higher accuracy in classifying 22q11.2DS than experts median rate.
In this chapter, all bins of the histogram are weighted equally, and the histogram is
used as a feature vector for classication. This is not necessarily the best way to proceed.
In Chapter 8, the bins form the leaves of a mathematical expression tree whose evalua-
tion leads to a single numeric feature for classication. Genetic programming is used for
learning the best expression trees for multiple dierent 3D shape quantication applications
corresponding to the dierent dysmorphologies associated with 22q11.2 deletion syndrome.
83
Chapter 8
LEARNING 3D SHAPE QUANTIFICATION FOR CRANIOFACIAL
RESEARCH
This chapter describes the use of genetic programming for learning 3D shape quantica-
tions and investigates its application in analyzing craniofacial disorders. For this work, the
analysis focuses on 22q11.2 deletion syndrome (22q11.2DS). The goal is to quantify the dif-
ferent shape variation that manifests in the dierent facial abnormalities in individuals with
22q11.2DS. Figure 8.1 shows the overview of the learning quantication framework. The
framework begins with the selection of the facial region that is most pertinent to a given fa-
cial abnormality (Section 8.2). Features in the form of a 2D histogram of azimuth elevation
angles of surface normals are extracted from the selected facial region (Section 8.3). The
framework continues by selecting features from the selected facial region using Adaboost
(Section 8.4). Genetic programming is then used to combine the selected features and pro-
duce the quantication of a given facial abnormality (Section 8.5). Experimental results of
the dierent quantication schemes will be discussed in Section 8.6.
Figure 8.1: Overview of the quantication learning framework.
84
8.1 Related Work
Genetic programming is a method that follows the theory of evolution by evolving individ-
ual computer programs following the survival-of-the-ttest approach. The approach uses
genetic operators, such as mutation and crossover, to generate new individuals to form a
population. Starting with a random population, every individual in the population is eval-
uated. Individuals that perform best according to a given tness measure reproduce with
one another to produce a new population. After a number of iterations, the best individual
from all the populations is selected as the nal solution.
There has been some work on using genetic programming to improve image descriptors
for 2D object recognition tasks. Perez et al. [64, 65, 66] used genetic programming to
evolve a 2D SIFT image descriptor and construct a new operator for object recognition.
Their results show that the newly constructed operators improved the overall performance
of SIFT.
Torres et al. [19] presented a Genetic Programming (GP) framework for content-based
image retrieval (CBIR). To improve the eectiveness of the CBIR system, their approach
used genetic programming to combine similarity values obtained from a number of image de-
scriptor to create a more eective fused-similarity function. Their results show that the GP
framework is able to nd better similarity functions than the ones obtained from individual
descriptors. Dos Santos [21] presented a new relevance feedback method for content-based
image retrieval by using a genetic programming approach to learn user preferences and
combine region similarity values in a query session.
Zhang et al. [97] improved the eciency and eectiveness of the genetic programming
approach for object detection tasks by breaking the GP search into two phases. The rst
phase applies GP to a selected subset of the training data and a simplied tness function.
The second phase is then initialized with the resulting individual from the rst phase and
uses the full set of training data with a complete tness function to construct the nal
detection programs. They compared their two phase GP approach to the basic GP approach
and the neural network approach using the same set of features, and their results show that
their two phase GP approach achieved the best detection performance.
85
The approach described in this chapter uses genetic programming to learn the best
quantication for various 3D facial abnormalities observed in craniofacial disorders.
8.2 Facial Region Selection
Abnormal clinical features in individuals with 22q11.2DS include asymmetric face shape,
hooded eyes, bulbous nasal tip, and retrusive chin, among others. The range of variation in
individual feature expression is very large. To study the dierent facial abnormalities, focus
is placed on the facial region that is most pertinent to a given facial abnormality. Nine dier-
ent facial abnormalities are analyzed in this chapter. These facial abnormalities cover three
dierent areas on the face: midface, nose, and mouth. The nose region is extracted using
a trapezium bounding box that covers the nose area of the face (Figure 8.2(a)) and is used
to investigate four nasal facial abnormalities: bulbous nasal tip, tubular nose, prominent
nasal root, and small nasal alae. The mouth region is extracted using a rectangular bound-
ing box that covers the mouth area of the face (Figure 8.2(b)) and is used to investigate
four oral facial abnormalities: retrusive chin, open mouth, small mouth, and downturned
mouth. The midface region is extracted using a rectangular bounding box that covers the
middle portion of the face (Figure 8.2(c)). This region is used to study midface hypoplasia
for individuals in the 22q11.2DS dataset. The facial regions are selected manually as the
general area of each abnormality is known.
(a) (b) (c)
Figure 8.2: Dierent facial regions are selected (highlighted in pink) depending on the facial
abnormality being studied: (a) nose, (b) mouth, and (c) midface.
86
8.3 2D Histogram of Azimuth Elevation Angles
Once the facial region pertinent to a given facial abnormality has been selected, features
are extracted from the selected region. The methodology for representing the various 3D
facial shape variations uses the 2D histogram of the azimuth and elevation angles of surface
normal vectors of the 3D points (introduced in Chapter 3) in the selected region. The
surface normals at each point in the selected region are calculated. Each surface normal
vector is represented by its azimuth angle and elevation angle. The computed angles are
then clustered into bins to produce a 2D histogram. For the experimental results in this
chapter, an 8 8 2D histogram was used.
Figure 8.3(a) shows the selected midface region of an individual in the dataset, while
gure 8.3(b) shows the constructed 2D histogram of the selected midface region displayed
on a color map. High histogram bin values are represented by warm colors (red, orange,
yellow), while low bin values correspond to cool colors (blue, cyan, green).
(a) (b)
Figure 8.3: (a) Selected midface facial region of an individual in the dataset. (b) The
constructed 2D histogram of the azimuth and elevation angles of the surface normals of the
points in the selected facial region. High histogram bin values are represented by warm
colors (red, orange, yellow), while low bin values correspond to cool colors (blue, cyan,
green).
The vector of histogram bin values is treated as a feature vector and used for classi-
cation. Results of using all the bin values of the 2D histogram of the selected facial region
for the various facial abnormalities are given in Section 8.6. Rather than using a linear
combination of all the histogram bins, the approach presented in this chapter determines
87
the bins that are most important and most discriminative in classifying the dierent fa-
cial abnormalities. It then uses genetic programming to nd the best way to combine the
discriminative histogram bin values in order to generate a quantication for each of the
dierent facial abnormalities.
8.4 Feature Selection
To determine the histogram bins that are most discriminative in classifying and quantifying
given facial abnormalities, Adaboost learning was used to select the histogram bins that
optimized the classication performance. The Adaboost algorithm obtains a strong classier
by combining a set of weak classiers with dierent weights to minimize the classication
error. For the experiments, the WEKA [93] implementation of Adaboost learning was used,
and the decision stump was selected as the weak classier as it produced high classication
performance. A maximum of ten most discriminative histogram bins were selected. Dierent
bins are selected for each of the dierent facial abnormalities. The values of the selected bins
are concatenated into a feature vector that can be used to classify the dataset according to
the dierent facial abnormalities. Results of using the selected bin values as a feature vector
for classifying the dierent facial abnormalities are given in Section 8.6. Figure 8.4(a) shows
the selected bin values of the 2D histogram in Figure 8.3(b) highlighted in red. Note that
both low and high-valued bins were selected.
Positional information about which points in the selected region contribute to the se-
lected bins can be projected back onto the face. Figure 8.4(b) shows the projection of
all of the selected bins in Figure 8.4(a) back onto the face. Interestingly, though midface
hypoplasia occurs on both sides of the face, the algorithm selected bins mostly from the
right side of the face. A possible reason for this is due to the asymmetry in the face where
the right side of the face of the individuals in the dataset is more dominant than the left
side. Figure 8.5 shows the projection of the selected bins back onto the face for each of the
various facial abnormalities. The projection shows that areas that are specic to the facial
abnormalities are selected as the important and discriminative features, for example: in the
case of prominent nasal root, the root base of the nose is selected as an important feature.
This result also show that there might exist some correlation between the features, for ex-
88
ample in the case of open mouth, the selected features include part of the chin and the top
of the mouth; this could be because when the mouth is open the shape of the chin and top
of the mouth change as well. In Figure 8.6, the selected bins are colored in dierent random
colors and the surface normal vectors are displayed using arrows. Notice that individuals
that are labeled with midface hypoplasia (top row) have a big patch of the same color on the
cheek indicating that most points on the cheek surface have similar surface normal angles
due to the lack of curvature in the area, which follows the denition of midface hypoplasia.
The cheeks of individuals that are labeled as not having midface hypoplasia (bottom row)
have dierent colors, indicating that there is a change in the surface normal angles due to
curvature on the cheek area.
(a) (b)
Figure 8.4: (a) Adaboost learning selects the most discriminative histogram bins for clas-
sifying midface hypoplasia (highlighted in red). (b) Positional information about which
points in the selected region contribute to the selected bins for classifying midface hypopla-
sia (highlighted in pink).
8.5 Feature Combination
The goal of the work in this section is to quantify the dierent shape variation that manifests
in the dierent facial abnormalities. Preliminary experiments in combining the selected bin
values using weighted linear methods such as linear regression and neural networks obtained
less than satisfactory classication performance. The F-measures for classifying midface
hypoplasia using linear regression and neural networks were less than 0.67. These results
highlighted the fact that the best combination of the selected feature values is possibly
89
Bulbous nasal tip Tubular nose Prominent Nasal Root Small Nasal Alae
Downturned mouth Small mouth Open Mouth Retrusive chin
Figure 8.5: Positional information about which points in the selected region contribute to
the selected bins for classifying the dierent facial abnormalities.
non-linear. In this work, a genetic programming methodology was used to combine the
values of the selected discriminative histogram bins to produce mathematical expressions
that quantify the shape variation of the dierent facial abnormalities.
As introduced in Section 8.1, genetic programming follows the theory of evolution to
evolve individual computer programs. The approach adopted in this chapter uses genetic
programming to evolve a mathematical expression over the selected histogram bins to quan-
tify the shape variation of the dierent facial abnormalities. The genetic programming ap-
proach starts with assigning the measure of performance of an individual commonly known
as the tness test. In this work, F-measure was used as the tness function to measure
an individuals performance for a given facial abnormality. F-measure is commonly used in
information retrieval as it gives a good balance between precision and recall. F-measure is
dened as
F(prec, rec) =
2 (prec rec)
prec + rec
(8.1)
where prec and rec are the precision and recall metrics at a given threshold. The nal
precision and recall metrics used to obtain the F-measure are calculated at the threshold
90
Two individuals who have midface hypoplasia.
Two individuals who do not have midface hypoplasia.
Figure 8.6: Positional information about which points in the selected region contribute to
the selected bins. Points belonging to the same bin are highlighted using the same random
color. The surface normal vectors are displayed using arrows.
that maximizes the area under the Receiver Operating Characteristics (ROC) curve. The
best individual in the population is the individual with the maximum F-measure value.
The genes of an individual program in genetic programming form a tree-like structure
with two dierent types of genes: functions and terminals. The terminal sets are the leaves
of the tree, while the functions are the tree nodes that have children. The genes and function
set selected will aect the nal outcome of the solution. In this approach, the terminal sets
are the selected histogram bin values, X
i
. A number of dierent function sets were analyzed
in the experimental tests. The genetic programming evolves the individuals through a
91
number of set iterations and selects the individuals with the maximum F-measure. The
approach then produces the tree structure of the best individual, which can be translated
into a mathematical expression. Figure 8.7 shows an example of the nal tree structure of
the mathematical expression to quantify the midface hypoplasia facial abnormality shown
in Figure 8.3.
Figure 8.7: The output of the genetic programming quantication approach produces a tree
structure that denes the mathematical equation to quantify a given facial abnormality. In
this example, the tree structure is an example of the quantication of midface hypoplasia.
The equivalent mathematical expression can be written as X6 + (X7 + ((max(X7, X6)
sin(X8)) + (X6 + X6)) where X
i
is the feature value of bin i.
8.6 Experimental Results
A number of experiments were performed to measure the performance of the dierent facial
region descriptors in classifying individuals with respect to a given facial abnormality. The
92
experiments were performed on the balanced 22q11.2 Deletion Syndrome dataset (discussed
in Section 3.1). The dataset consists of 86 individuals, each of which were assessed by
three craniofacial experts who assigned scores to their facial features. The groundtruth for
each individual was set to be a binary label (1 for aected and 0 for unaected) and was
computed to be the the union of the experts scores. The groundtruth label is 1 if any
one of the experts marked the individual as being aected by a given facial abnormality
and 0 otherwise. The goal of each experiment is to classify each individual in the dataset
as either aected or unaected by the given facial abnormality. Two dierent classiers
were tested: Adaboost and SVM. Results on the best performing classier for each facial
abnormality are reported. Table 8.1 species the genetic programming parameters used
during the experimental tests. Preliminary experiments showed that increasing the number
of iterations to evolve the tree did not improve the results; hence the number of iterations
was set to 50.
Table 8.1: Genetic programming parameters.
Parameters Value
# iterations 50
# generations 50
Population size (# individuals) 50
Genetic operators cross-over, mutation
Population initialization Ramped half-half
Selection for reproduction roulette
The objective of the rst experimental test was to analyze the classication performance
of the Genetic Programming quantication method using dierent function sets. In this
experiment, four dierent combinations of functions are compared. The four function sets
are: (1) Combo1 = {+, , , min, max}, (2) Combo2 ={+, , , min, max, sqrt, log
2
, log
10
},
(3) Combo3= {+, , , min, max, 2x, 5x, 10x, 20x, 50x, 100x}, and
(4) Combo4 = {+, , , min, max, sqrt, log
2
, log
10
, 2x, 5x, 10x, 20x, 50x, 100x}. The last two
93
function sets were chosen to introduce weighting to the mathematical expressions. Each GP
function set combination was run a total of 10 times. Table 8.2 shows the best F-measure
out of the ten iterations for each of the dierent GP function set combinations. The best
performing GP function set for each of the facial abnormalities is highlighted in bold. The
results show that for most of the facial abnormality the simpler function sets, Combo1 and
Combo2, that do not include the weighted operations perform better than the weighted GP
function set.
Table 8.2: Classication performance in terms of F-Measure (best F-Measure
of 10 iterations) for the nine facial anomalies using dierent combination of
Genetic Programming (GP) functions. GP functions for the various combi-
nations are listed in the table legend.
Facial anomaly Combo1
a
Combo2
b
Combo3
c
Combo4
d
Midface Hypoplasia 0.8393 0.8364 0.8527 0.80
Tubular Nose 0.8571 0.875 0.8667 0.8813
Bulbous Nasal Tip 0.8545 0.8099 0.8103 0.7544
Prominent Nasal Root 0.8667 0.8430 0.8571 0.8335
Small Nasal Alae 0.8846 0.8454 0.8454 0.8571
Retrusive Chin 0.7952 0.8000 0.7342 0.7586
Open Mouth 0.9444 0.9714 0.9189 0.9189
Small Mouth 0.6849 0.7568 0.6829 0.7750
Downturned mouth 0.8000 0.7797 0.8000 0.8000
a
Combo1 GP functions {+, , , min, max}
b
Combo2 GP functions {+, , , min, max, sqrt, log2, log10}
c
Combo3 GP functions {+, , , min, max, 2x, 5x, 10x, 20x, 50x, 100x}
d
Combo4 GP functions {+, , , min, max, sqrt, log2, log10, 2x, 5x, 10x, 20x, 50x, 100x}
The F-measure in Table 8.2 were obtained using the whole dataset for both training
and testing the Genetic Programming. A set of experiments were performed to ensure
that the best tree constructed for each of the facial abnormality was not overtting the
94
0.75
0.8
0.85
0.9
0.95
1
1.05
5 10 20 30 40 50
% dataset used for testing
F
-
M
e
a
s
u
r
e
Training set Testing set
Figure 8.8: The best F-measure for classifying midface hypoplasia when using dierent
percentage of the dataset for testing.
data. The experiments were conducted by taking out a certain percentage of the dataset
for cross-validation testing. The experiments were conducted using the Combo1 function
set for classifying midface hypoplasia. The graph in Figure 8.8 shows the best F-measures
out of 10 runs for both the training and testing data when using dierent percentages of
the dataset for testing. The graph shows that the F-measure of the testing set degrades
slowly as the size of the testing dataset increases. This shows that the Genetic Programming
approach is general enough and does not overt the training set.
The second experiment examined the classication performance using three dierent
facial shape descriptors: (1) 2D histogram of the azimuth elevation angles of points in
the selected region (discussed in Section 8.3), (2) selected 2D histogram bin values as a
feature vector (discussed in Section 8.4), and (3) genetic programming quantication results
(discussed in Section 8.5). Table 8.3 shows the performance using the various facial shape
95
descriptors to classify the nine dierent facial abnormalities. The results show that for all
facial abnormalities, using the genetic programming approach to quantify and then classify
the facial abnormalities performs the best. Using the selected 2D histogram bin values
performs worse than genetic programming but better than using the full 2D histogram of
all the points in the selected facial region.
Table 8.3: Classication performance, in terms of F-measure, using the various facial shape
descriptors to classify the nine dierent facial abnormalities.
Facial abnormality Region Histogram Selected Bins GP
Midface hypoplasia 0.697 0.721 0.853
Tubular nose 0.701 0.776 0.881
Bulbous nasal tip 0.617 0.641 0.855
Prominent nasal root 0.704 0.748 0.867
Small nasal alae 0.733 0.801 0.8846
Retrusive chin 0.658 0.713 0.8000
Open mouth 0.875 0.889 0.9714
Small mouth 0.694 0.725 0.7750
Downturned mouth 0.506 0.613 0.8000
In the third experiment, the region-based genetic programming approach to quantify
and classify the dierent facial abnormalities was compared to two global approaches in
representing the face and the various facial features, which were discussed in Chapter 6 and
7. The rst global approach represents the whole face using a global saliency map. The
second global approach represents the face using the 2D histogram of azimuth elevation
angles of surface normals of points on the whole face. Table 8.4 shows the results. It can be
seen that using genetic programming quantication to represent the facial regions achieves
a much higher classication performance than the global approaches.
In an eort to reveal which of the selected histogram bins are more important in de-
scribing a given facial abnormality, a simple GP function set consisting of only {+, } was
used to generate the GP mathematical expression for quantifying the facial abnormalities.
96
Table 8.4: Comparing the Genetic Programming quantication approach to the global
approaches: 2D longitude-latitude salient map signature and 2D histogram of azimuth-
elevation angles of surface normal vectors on the whole face (instead of on only a selected
region).
Facial abnormality GP Saliency Map Global 2D Hist
Midface hypoplasia 0.853 0.674 0.744
Tubular nose 0.881 0.628 0.709
Bulbous nasal tip 0.855 0.616 0.639
Prominent nasal root 0.867 0.663 0.658
Small nasal alae 0.8846 0.779 0.675
Retrusive chin 0.8000 0.628 0.674
Open mouth 0.9714 0.707 0.875
Small mouth 0.7750 0.581 0.752
Downturned mouth 0.8000 0.566 0.630
When possible, the mathematical expressions were simplied. Table 8.5 and 8.6 lists the
mathematical expressions and the simplied mathematical expressions for the ten iterations
in quantifying midface hypoplasia, while Table 8.7 contains the coecient of each of the
selected histogram bins from the simplied mathematical expressions in Table 8.5 and 8.6.
Looking at the coecients across the ten iterations it can be seen that for all iterations
the 6th and 7th selected histogram bin, X
6
and X
7
, have higher positive weights in the
mathematical expressions indicating that they are possibly more important in quantifying
midface hypoplasia. Figure 8.9 shows the projection of selected histogram bins X
6
and X
7
back onto the face. Comparing Figures 8.9 and 8.4, it can be seen that the quantication
favors the cheek and part of the nose area more than the area under the mouth and top of
the nose to describe midface hypoplasia.
9
7
Table 8.5: GP mathematical expressions and simplied mathematical expression using function set {+, } for ten iterations in
quantifying midface hypoplasia (Iterations 1 to 5).
Iteration Mathematical expression Simplied mathematical expression
1 ((((X9)+(X4))-(X2))+((X7)+((X6)+((X4)-(X3))))) -X2-X3+2X4+X6+X7+X9
2 ((((X7)-(X3))-((((X1)-(X2))+((X1)-(X8)))-
(((X7)-(X3))-((X2)+(X5)))))-((((X7)+((X5)+(X2)))
+(X7))+((((X2)+(X5))+((X8)-(X3)))-(((X7)+(X4))+((X1)+(X7)))))) -X1-2X2-X3+X4-3X5+2X7
3 (((((((X6)-((X2)+(X1)))-((X7)-(X7)))-(X5))-
((X2)+(X5)))-(X4))-((((X1)+(((X5)+(X9))-(X4)))-(X7))-(X7))) -2X1-2X2-3X5+X6+2X7-X9
4 ((X7)+((((X7)+((X7)+((X7)-((((X4)+(X7))-
((X7)+(X6)))+(X2)))))+(X7))-(X8))) -X2-X4+X6+5X7-X8
5 (((((X6)-(X2))+(X7))+((((X7)-((X9)+(X9)))+
(((X9)+(X6))-(X2)))+(X9)))+(((X7)-((X1)+(X7)))-(X9))) -X2-2X2+2X6+2X7-X9
9
8
Table 8.6: GP mathematical expressions and simplied mathematical expression using function set {+, } for ten iterations in
quantifying midface hypoplasia (Iteration 6 to 10).
Iteration Mathematical expression Simplied mathematical expression
6 ((((X4)+(X3))+((((X2)+(X7))-((X2)-(X8)))-
((((X2)+(X3))-((X9)-(X5)))-(X6))))-(((X8)+((X2)+(X3)))+
(((X5)+(X5))-(X8)))) -2X2 -X3+X4-3X5+X6+X7+X8+X9
7 (((((X4)+(X1))+(((X4)-(X2))-((X1)-(X1))))+
((((X7)+(X3))-(X4))-(X1)))-((((X4)+((X1)+(X5)))
-((X9)+((X6)+(X6))))+(((X6)+((X1)-(X6)))-(((X8)-(X5))-(X2))))) -2X1-2X2+X3-2X5+2X6+X7+X8+X9
8 (((((X6)-((X9)+(X7)))+(((X6)-(X9))+((X6)-
((X1)-(X7)))))+ ((((((X7)-(X9))-(X7))+(X7))+(X4))+
(((X6)-((X2)-(X7)))-(X7))))+(X6)) -X1-X2+X4+5X6+X7-3X9
9 ((((((X7)+(X4))+((((X6)-(X8))-(X5))-((X5)+((X9)+(X9)))))-(((X4)-(X3))
+((X1)-(X6))))+((X8)-(X5)))+(((((X7)-(X8))+((X3)-(X2)))+(((((X7)-((X7)
+(X1)))+((X3)-(X2)))+(((X7)-(((X6)+(X7))-((X3)+(X7))))- ((X5)-(X8))))-
(((((X7)+(X4))+((X9)-(X6)))-((X2)-(X3)))-(X8))))-
((X7)-(((X6)+(X7))-((X3)+(X8)))))) -2X1 -X2 +2X3 -X4 -4X5 +3X6 +2X7 -3X9
10 ((X7)-(X8)) X7 - X8
9
9
Table 8.7: Coecients of the selected histogram bins for quantifying midface hypoplasia using function set {+, }.
Expression X1 X2 X3 X4 X5 X6 X7 X8 X9 F-measure
1 -X2-X3+2X4+X6+X7+X9 0 -1 -1 2 0 1 1 0 1 0.8288
2 -X1-2X2-X3+X4-3X5+2X7 -1 -2 -1 1 -3 0 2 0 0 0.8364
3 -2X1-2X2-3X5+X6+2X7-X9 -2 -2 0 0 -3 1 2 0 -1 0.8571
4 -X2-X4+X6+5X7-X8 0 -1 0 -1 0 1 5 -1 0 0.8214
5 -X1-2X2+2X6+2X7-X9 -1 -2 0 0 0 2 2 0 -1 0.8364
6 -2X2 -X3+X4-3X5+X6+X7+X8+X9 0 -2 -1 1 -3 1 1 1 1 0.8571
7 -2X1-2X2+X3-2X5+2X6+X7+X8+X9 -2 -2 1 0 -2 2 1 1 1 0.8571
8 -X1-X2+X4+5X6+X7-3X9 -1 -1 0 1 0 5 1 0 -3 0.8462
9 -2X1 -X2 +2X3 -X4 -4X5 +3X6 +2X7 -3X9 -2 -1 2 -1 -4 3 2 0 -3 0.8624
10 X7 - X8 0 0 0 0 0 0 1 -1 0 0.7928
100
Figure 8.9: Projection of selected histogram bins X6 and X7 back onto the face. The sim-
plied quantication mathematical expression shows that the two bins are more important
in quantifying midface hypoplasia.
Tables 8.8 and 8.9 list the best performing mathematical expressions produced by the
genetic programming approach for each facial abnormality, while Figures 8.10, 8.11, 8.11,
8.13, and 8.14 show the corresponding tree structure for the best mathematical expression
in quantifying the facial abnormalities. The mathematical expressions are evaluated to
produce a score that quanties the facial abnormality of an individual. Figure 8.15 attempts
to highlight the GP quantication score spectrum for midface hypoplasia. The gure shows
examples of the GP quantication scores for midface hypoplasia and the experts scores for
these individuals. Notice the progression of the dierent colors on the cheeks of individuals
with increasing midface hypoplasia score. The results also show the correlation between
the produced GP quantication score and the experts scores. It can be observed that as
the sum of the expert scores increase, the GP quantication scores increases as well in the
example.
1
0
1
Table 8.8: Best performing mathematical expressions produced by the genetic programming approach for each facial abnor-
mality.
Facial abnormality Mathematical expression
Midface hypoplasia plus(plus(minus(X7,X7),plus(X6,plus(plus(minus(plus(X6,X6),X7),
minus(X7,X2)),X7))),plus(plus(minus(X9,mytimes 5(X9)),X7),X7))
Tubular nose minus(cos(X8),minus(minus(plus(min(min(minus(minus(plus
(min(min(mysqrt(X7),plus(cos(X5),X3)),X5),X6),times(mysqrt
(cos(min(X5,X5))),minus(cos(X6),minus(cos(X8),cos(X8))))),cos(cos(X5)))
,plus(cos(X5),X3)),X5),X6),times(mysqrt(minus(X5,cos(sin(X5))))
,minus(cos(X8),minus(cos(X8),cos(X8))))),cos(X7)))
Bulbous nasal tip minus(min(times(minus(min(X5,X2),minus(X3,max(X3,X3)))
,max(plus(max(X3,X7),max(X3,X3)),minus(X3,min(X3,X8))))
,plus(max(times(X2,X5),min(X2,minus(X2,X3))),minus(max(X3,X7),X5)))
,min(min(minus(max(X2,X8),times(X2,X4)),minus(max(X3,X7),X5)),X3))
Prominent nasal root minus(max(minus(min(X7,X1),times(X3,X5)),
min(min(max(X8,X8),X4),max(X4,X3))),times(times(X3,X5),X5))
Small nasal alae max(max(minus(times(times(min(min(plus(X6,min(X7,X7))
,times(X6,X3)),X7),times(min(X1,times(min(X3,X2),X6)),X2)),times(X7,X5))
,minus(X5,X3)),X2),plus(plus(minus(max(times(max(X5,X5),X1)
,times(minus(minus(times(X2,minus(X4,plus(max(X1,X2),X5))),X2),X2)
,plus(X2,X1))),max(X7,X4)),times(X7,times(X7,X6))),max(times(max(X1,X7),X6)
,times(minus(X4,X1),max(min(X7,X2),X1)))))
1
0
2
Table 8.9: Best performing mathematical expressions produced by the genetic programming approach for each facial abnor-
mality.
Facial abnormality Mathematical expression
Retrusive chin times(plus(minus(max(mylog(mylog(X3)),X7),mylog10(X2))
,mysqrt(plus(mylog10(min(cos(X4),sin(X5))),mylog10(X6)))),mysqrt(X1))
Open mouth plus(plus(cos(X6),plus(cos(X6),plus(X1,X3))),plus(X3,X3))
Small mouth mytimes 2(mylog10(mytimes 2(plus(mytimes 100(minus(X3,
mytimes 5(plus(mylog2(plus(X3,X5)),plus(max(mytimes 100(mytimes 2(X5)),X4)
,X4))))),plus(X3,plus(mytimes 100(minus(X4,mytimes 5(plus(X1,mylog10
(mysqrt(mytimes 2(plus(mytimes 20(X1),X2)))))))),plus(mytimes 100
(minus(X4,mytimes 5(plus(mylog2(plus(X3,mytimes 20(X4))),plus(max
(mytimes 100(mytimes 2(X1)),times(X3,mylog2(X3))),X3))))),mylog10
(mysqrt(mytimes 50(X4))))))))))
Downturned mouth times(X5,plus(X4,X1))
103
Figure 8.10: Tree structure of the best performing GP mathematical expression for quanti-
fying midface hypoplasia.
The purpose of the last experiments was to analyze how the learned genetic programming
quantication performed in predicting 22q11.2 deletion syndrome for individuals in the
dataset. In this experiment, the best performing mathematical expression obtained by the
genetic programming quantication for each of the nine facial abnormalities was evaluated,
and the resulting values concatenated into a feature vector of dimension nine. The resulting
feature vector was then used to classify the individuals as either aected or unaected
by 22q11.2DS. The F-measure using this quantication feature vector was 0.709 with SVM
and 0.721 with Adaboost respectively. However, evolving the resulting concatenated feature
vector with dimension nine using the genetic programming approach obtained a much higher
classication performance of 0.821.
Table 8.10 compares the F-measures for predicting 22q11.2 deletion syndrome for the
methods discussed in this thesis. The top three rows use the 9-dimensional vector con-
taining the quantications of the nine separate abnormalities, as described above. Results
for the global saliency map (discussed in Chapter 6), which uses curvature as its low-level
feature, as a whole and with Adaboost-learning selected bins, are shown next. Results for
104
the global 2D histogram of azimuth and elevation angles (with dimensions 2424, discussed
in Chapter 7), as a whole and with Adaboost-learning-selected bins, are given next. Us-
ing genetic programming to evolve the Adaboost-learning-selected bins of both the global
saliency map and the global 2D histogram of azimuth and elevation angles further improved
the F-measures. Finally, the median score obtained by our three human experts is given
for comparison. All of the automatic results are improvements over the median of human
experts. Of the measures tested in this thesis, evolving the Adaboost-learning-selected bins
of the global saliency map performed the best with F-measure of 0.96.
Table 8.10: Classication performance in predicting 22q11.2 Deletion Syndrome.
Method F-measure
Quantication vector with SVM 0.709
Quantication vector with Adaboost 0.721
Quantication vector with GP 0.821
Global saliency map 0.764
Selected bins of global saliency map 0.9
Global 2D histogram 0.79
Selected bins of global 2D histogram 0.9
Selected bins of global saliency map with GP 0.96
Selected bins of global 2D histogram with GP 0.92
Experts median 0.68
105
Bulbous nasal tip
Tubular nose
Figure 8.11: Tree structures of the best performing GP mathematical expression for quan-
tifying the nasal facial abnormalities.
106
Prominent nasal root
Small nasal alae
Figure 8.12: Tree structures of the best performing GP mathematical expression for quan-
tifying the nasal facial abnormalities.
107
Downturn mouth
Small mouth
Figure 8.13: Tree structures of the best performing GP mathematical expression for quan-
tifying the oral facial abnormalities.
108
Open mouth
Retrusive chin
Figure 8.14: Tree structures of the best performing GP mathematical expression for quan-
tifying the oral facial abnormalities.
109
Expert 1 Score Expert 2 Score Expert 3 Score Expert sum GP Score
0 0 0 0 0.121641
0 0 0 0 0.112806
1 0 0 1 0.355436
1 1 0 2 0.395905
1 1 1 3 0.458822
1 2 1 5 0.744366
Figure 8.15: Quantication score for midface hypoplasia.
110
Chapter 9
CONCLUSIONS
This thesis presented new 3D shape representation methodologies for quantication,
classication, and retrieval of 3D objects. The methodologies start by extracting and aggre-
gating low-level features in the base framework. Three dierent types of low-level features
were investigated: Gaussian curvature, Besl-Jain curvature, and the azimuth-elevation an-
gles of surface normal vectors of points on the 3D meshes. Motivated by existing collabora-
tions on craniofacial disorder studies, a learning approach to identify interesting or salient
points on 3D objects was developed. The classier learns the characteristics of interesting
points based on the extracted feature values in a neighborhood of each point. The learning
methodology was tested on both general 3D objects and medical craniofacial datasets. The
salient point patterns of the 3D objects are then mapped onto a 2D plane via a longitude-
latitude transformation to produce the 2D longitude-latitude map signature. The 2D map
signature was tested for both classication and retrieval tasks. In addition, the salient point
patterns were also used to select salient 2D views to describe the 3D objects. These views
were then used to improve the computation time for constructing the descriptor compared
to the existing LFD descriptor, thus improving retrieval tasks. A second type of 3D object
signature was presented in the form of the global 2D histogram of azimuth-elevation angles
of surface normal vectors of points on the 3D object. The global 2D histogram was tested
on the two craniofacial datasets for classication tasks. Genetic programming was used to
learn 3D shape quantication and tested for analyzing craniofacial disorders that manifests
in 22q11.2DS.
9.1 Contributions
The main contributions of this work are:
1. A general methodology for analysis of 3D shapes that can be used in multiple dierent
111
applications and is extremely well-suited for craniofacial shape classication.
2. A learning approach to detection of interesting points on a 3D object
3. Two dierent two-dimensional signatures for 3D objects: the longitude-latitude sig-
nature that can use any low-level feature as its basis and the azimuth-elevation angle
histogram that relies only on surface normals.
4. A new methodology for quantication of craniofacial disorders that employs the azimuth-
elevation histogram, boosting, and genetic programming to obtain a single number
that quanties the degree of dysmorphology for any craniofacial abnormality.
9.2 Future Work
Results of the two craniofacial studies analyzed in this thesis open up several possible
research avenues. Researchers at SHCC working on the deformational plagiocephaly study
have began to acquire additional 3D head mesh data of the same participants from the
original dataset. These head mesh data are obtained when participants are at an older
age and will be used to investigate how their head shapes change over time and whether
the head shape changes when the participants obtain additional treatment, such as helmet
therapy to correct the deformation. Using the new 3D shape representations developed in
this thesis would help analyze these changes. The 3D head shape descriptors could also be
used to to investigate whether there exists a correlation between the change in head shape
and the change in brain shape and how those changes aect the cognitive development of
a participant. An overall severity measurement could also be developed that encapsulates
the dierent head shape deformations that manifest in deformational plagiocephaly, such
as the ear asymmetry, forehead asymmetry, brachycephaly and plagiocephaly. This overall
severity measurement could then be compared to the experts overall severity scores and
to existing two-dimensional head shape severity measurements such as the Oblique Cranial
Length Ratio (OCLR) and Cephalic Index (CI) [32].
Possible future work for the 22q11.2 deletion syndrome study include using CT scans
for groundtruth instead of the experts survey. This could possibly lead to more accurate
112
labeling of the facial shape features. Researchers at SHCC are planning to investigate
whether these same 3D shape representations could be used to analyze other craniofacial
disorders in addition to deformational plagiocephaly and 22q11.2DS, such as cleft lip and
palate. It would also be interesting to see whether the facial shape deformation measure-
ments could possibly be used to predict whether a child will have midface hypoplasia or
any other craniofacial dysmorphology. Many of the craniofacial researchers have suggested
to further investigate the 3D head shape descriptors using simulated datasets in order to
better understand the values of the obtained quantication scores. Using simulated datasets
will allow the testing to be more controlled at each facial part, thus enabling a better un-
derstanding of how each facial part aects the quantication scores. Lastly, a long term
future goal is be to be able to translate the obtained 3D head shape quantication into
plain English language that could then be integrated into clinical practices for an aid in
diagnosis.
113
BIBLIOGRAPHY
[1] 3dMD. http://www.3dmd.com.
[2] Erdem Akagunduz and Ilkay Ulusoy. 3D object representation using transform and
scale invariant 3D features. In ICCV 3dRR Workshop, pages 18, 2007.
[3] F.R. Al-Osaimi, M. Bennamoun, and A. Mian. Integration of local and global geomet-
rical cues for 3D face recognition. Pattern Recognition, 41:10301040, 2008.
[4] J. Assfalg, G. DAmico, A. Del Bimbo, and P. Pala. 3D content-based retrieval with
spin images. In ICME, 2004.
[5] J urgen Assfalg, A. Del Bimbo, and P. Pala. Retrieval of 3D objects using curvature
maps and weighted walktroughs. In ICIAP, 2003.
[6] I. Atmosukarto and L. G. Shapiro. A learning approach to 3D object representation
for classication. In International Workshop on Statistical + Structural and Syntactic
Pattern Recognition, 2008.
[7] I. Atmosukarto and L. G. Shapiro. A salient-point signature for 3D object retrieval. In
ACM Multimedia Information Retrieval, 2008.
[8] Autodesk. 3dsmax http://www.autodesk.com, 2009.
[9] DB. Becker, T. Pilgram, L. Marty-Grames, DP. Govier, JL. Marsh, and AA. Kane.
Accuracy in identication of patients with 22q11.2 deletion by likely care providers
using facial photographs. Plastic Reconstructive Surgery, 114(6):13671372, 2004.
[10] P. J. Besl and R. C. Jain. Three-dimensional object recognition. Computing Surveys,
17(1):75145, 1985.
[11] S. Boehringer, T. Vollmar, C. Tasse, R. P. Wurtz, G. Gillessen-Kaesbach, B. Hors-
themke, and D. Wieczorek. Syndrome identication based on 2d analysis software.
Eur J Hum Gen, 14:10821089, 2006.
[12] B. Bustos, D. Keim, D. Saupe, and T. Schreck. Content-based 3D object retrieval.
IEEE Computer Graphics and Applications, Special Issue on 3D Documents, 27(4):22
27, 2007.
114
[13] B. Bustos, D. Keim, D. Saupe, T. Schreck, and D. Vranic. Feature-based similarity
search in 3D object databases. ACM Computing Surveys, 37(4):345387, 2005.
[14] B. Bustos, D. Keim, D. Saupe, T. Schreck, and D. Vranic. An experimental eective-
ness comparison of methods for 3D similarity search. International Journal on Digital
Libraries, 6(1):3954, 2006.
[15] U. Castellani, M. Cristani, S. Fantoni, and V. Murino. Sparse point matching by
combining 3D mesh saliency with statistical descriptors. Computer Graphics Forum,
27(2):643652, 2008.
[16] D. Chen, X. Tian, Y. Shen, and M. Ouhyoung. On visual similarity based 3D model
retrieval. Computer Graphics Forum, 22(3):223232, 2003.
[17] Chin Seng Chua and Ray Jarvis. Point signatures: A new representation for 3D object
recognition. Int. J. Comput. Vision, 25(1):6385, 1997.
[18] B. Collett, D. Breiger, D. King, M. Cunningham, and M. Speltz. Neurodevelopmental
implications of deformational plagiocephaly. Journal of Developmental and Behavioral
Pediatrics, 26(5):379389, 2005.
[19] R. da S. Torres, A. X. Falcao, M. A. Goncalves, J. P. Papa, B. Zhang, W. Fan, and
E. A. Fox. A genetic programming framework for content-based image retrieval. Pattern
Recognition, 42:283292, 2009.
[20] A. Del Bimbo and P. Pala. Content-based retrieval of 3D models. ACM Transactions
on Multimedia Computing, Communications, and Applications, 2(1):2043, 2006.
[21] J. A dos Santos, C. D. Ferreira, and R. da Silva Torres. A genetic programming
approach for relevance feedback in region-based image retrieval systems. In SIBGRAPI
08: Proceedings of the 2008 XXI Brazilian Symposium on Computer Graphics and
Image Processing, pages 155162, 2008.
[22] M. Elad, A. Tal, and S. Ar. Content based retrieval of vrml objects: an iterative and
interactive approach. In Eurographics workshop on Multimedia, pages 107118, 2001.
[23] LG Farkas. Anthropometric Facial Proportions in Medicine. Charles C Thomas, 1987.
[24] LG. Farkas, MJ. Katic, and CR. Forrest. International anthropometric study of fa-
cial morphology in various ethnic groups/races. The Journal of Craniofacial Surgery,
16(4):615646, 2005.
[25] Andrea Frome, Daniel Huber, Ravi Kolluri, Thomas Bulow, and Jitendra Malik. Rec-
ognizing objects in range data using regional point descriptors. In Proceedings of the
European Conference on Computer Vision (ECCV), pages 224237, 2004.
115
[26] T. Funkhouser, M. Kazhdan, P. Min, and P. Shilane. Shape-based retrieval and analysis
of 3D models. Communications of the ACM, 48(6):5864, 2005.
[27] Daniela Giorgi and Simone Martini. Shape retrieval contest 2008: Classication of
watertight models. In IEEE Shape Modeling and Applications, pages 219220, 2008.
[28] TS Glasgow, F Siddiqi, C. Ho, and PC Young. Deformational plagiocephaly: Devel-
opment of an objective measure and determination of its prevalence in primary care.
Journal of Craniofacial Surgery, 18(1), 2007.
[29] P. Hammond. The use of 3D face shape modeling in dysmorphology. Arch Dis Child,
92:11201126, 2007.
[30] M. Hilaga, Y. Shinagawa, and T. Kohmura. Topology matching for fully automatic
similarity estimation of 3D shapes. In SIGGRAPH, pages 203212, 2001.
[31] BL Hutchison, LAD Hutchison, JMD Thompson, and Ed A Mitchell. Plagiocephaly
and brachycephaly in the rst two years of life: A prospective cohort study. Pediatrics,
114:970980, 2004.
[32] L. Hutchison, L. Hutchison, J. Thompson, and Ed A. Mitchell. Quantication of
plagiocephaly and brachycephaly in infants using a digital photographic technique.
The Cleft Palate-Craniofacial Journal, 42(5):539 547, 2005.
[33] Cranial Technologies Inc. http://www.cranialtech.com.
[34] Plagiocephaly Info. http://www.plagiocephaly.info.
[35] A. Ion, N. Artner, G. Peyre, S. Marmol, W. Kropatsch, and L. Cohen. 3D shape
matching by geodesic eccentricity. In CVPR, 2008.
[36] N. Iyer, S. Jayanti, K. Lou, Y. Kalyanaraman, and K. Ramani. Three-dimensional
shape searching: state-of-the-art review and future trends. Computer Aided Design,
37:509530, 2005.
[37] Andrew Johnson and Martial Hebert. Using spin images for ecient object recog-
nition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 21(5):433 449, May 1999.
[38] Timor Kadir and Michael Brady. Saliency, scale, and image description. International
Journal of Computer Vision, 45(2):83105, 2001.
[39] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz. Rotation invariant spherical har-
monic representation of 3D shape descriptors. In Eurographics, pages 156164, 2003.
116
[40] KM Kelly, TR Littleeld, JK Pomatto, CE Ripley, SP Beals, and EF Joganic. Impor-
tance of early recognition and treatment of deformational plagiocephaly with orthotic
cranioplasty. American Cleft Palate-Craniofacial Journal, 36:127130, 1999.
[41] L. Kobrynski and K. Sullivan. Velocardiofacial syndrome, digeorge syndrome: the
chomorosme 22q11.2 deletion syndromes. The Lancet, 370(9596):14431452, 2007.
[42] JC Kolar and EM Salter. Craniofacial Anthropometry: A Practical Measurement of the
head and face for clinical, surgical and research use. Charles C. Thomas: Sprineld,
1997.
[43] H. Laga and M. Nakajima. A boosting approach to content-based 3D model retrieval.
In GRAPHITE, pages 227234, 2007.
[44] Hamid Laga, Hirko Takahashi, and Masayuki Nakajima. Spherical wavelet descriptors
for content-based 3D model retrieval. In Shape Modeling and Applications, pages 1523,
2006.
[45] S. Lanche, T. A. Darvann, H. Olafsdottir, N. V. Hermann, A. E. Van Pelt, D. Govier,
M. J. Tennenbaum, S. Naidoo, P. Larsen, S. Kreiborg, R. Karsen, and A. A. Kane. A
statistical model of head asymmetry in infants with deformational plagiocephaly. In
Scandanivian Conference on Image Analysis, 2007.
[46] Chang Ha Lee, Amitabh Varshney, and David W. Jacobs. Mesh saliency. ACM Trans.
Graph., 24(3):659666, 2005.
[47] Xinju Li and Igor Guskov. Multi-scale features for approximate alignment of point-
based surfaces. In Eurographics Symposium on Geometry Processing, 2005.
[48] Xinju Li and Igor Guskov. 3D object recognition from range images using pyramid
matching. In ICCV Workshop on 3dRR, 2007.
[49] Y. Liu, H. Zha, and H. Qin. Shape topics: A compact representation and new algo-
rithms for 3D partial shape retrieval. In CVPR, 2006.
[50] Yi Liu, Hongbin Zha, and Hong Qin. The generalized shape distributions for shape
matching and analysis. In Shape Modeling and Applications, 2006.
[51] Zhenbao Liu, Jun Mitani, Yukio Fukui, and Seiichi Nishihara. Multiresolution wavelet
analysis of shape orientation for 3D shape retrieval. In ACM Multimedia Information
Retrieval, pages 403410, 2008.
[52] K. Lou, N. Iyer, S. Jayanti, Y. Kalyanaraman, S. Prabhakar, and K. Ramani. Ef-
fectiveness and eciency of three-dimensional shape retrieval. Journal of Engineering
Design, 16(2):175194, 2005.
117
[53] M. Mahmoudi and G. Sapiro. Three-dimensional point cloud recognition via distribu-
tions of geometric distances. In CVPR, 2008.
[54] A. McGarry, M. T. Dixon, R. J. Greig, D. Hamilton, S. Sexton, and H. Smart. Head
shape measurement standards and cranial orthoses in the treatment of infants with
deformational plagiocephaly: a systematic review. Dev Med Child Neuro, 2008.
[55] PA Mortenson and P Steinbok. Quantifying positional plagiocephaly: Reliability and
validity of anthropometric measurements. Journal of Craniofacial Surgery, 17(3):413
419, 2006.
[56] H. M uller, S. Marchand-Maillet, and T. Pun. The truth about corel - evaluation in
image retrieval. In CIVR, 2002.
[57] John Novatnack and Ko Nishino. Scale-dependent 3D geometric features. In Interna-
tional Conference on Computer Vision, pages 18, 2007.
[58] John Novatnack, Ko Nishino, and Ali Shokoufandeh. Extracting 3D shape features in
discrete scale space. In 3D Data Processing, Visualization, and Transmission, pages
946953, 2006.
[59] R. Ohbuchi, T. Minamitani, and T. Takei. Shape-similarity search of 3D models by
using enhanced shape functions. International Journal of Computer Applications in
Technology, 23(2):7085, 2005.
[60] Ryutarou Ohbuchi, Kunio Osada, Takahiko Furuya, and Tomohisa Banno. Salient local
visual features for shape-based 3D model retrieval. In Shape Modeling International,
pages 93102, 2008.
[61] H. Olafsdottir, S. Lanche, T. A. Darvann, N. V. Hermann, R. Larsen, B. J. Ersboll,
E. Oubel, A. F. Frangi, P. Larsen, C. A. Perlyn, G. M. Morris-Kay, and S. Kreiborg.
A point-wise quantication of asymmetry using deformation elds: Application to the
study of the crouzon mouse model. In MICCAI, 2007.
[62] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin. Shape distributions. ACM
Transactions on Graphics, 21:807832, 2002.
[63] Georgios Passalis and Theoharis Theoharis. Intraclass retrieval of nonrigid 3D objects:
Application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 29(2):218
229, 2007. Member-Ioannis A. Kakadiaris.
[64] C. B. Perez and G. Olague. Learning invariant region descriptor operators with genetic
programming and the f-measure. In International Conference on Pattern Recognition,
2008.
118
[65] C. B. Perez and G. Olague. Evolutionary learning of local descriptor operators for
object recognition. In GECCO, 2009.
[66] C. B. Perez and G. Olague. Evolving local descriptor operators through genetic pro-
gramming. In EvoWorkshops LNCS 5484, pages 414419, 2009.
[67] LH Plank, B Giavedoni, JR Lombardo, MD Geil, and A. Reisner. Comparison of infant
head shape changes in deformational plagiocephaly following treatment with a cranial
remoding orthosis using a noninvasive laser shape digitizer. Journal of Craniofacial
surgery, 17(6):10841091, 2006.
[68] E. Praun and H. Hoppe. Spherical parametrization and remeshing. In SIGGRAPH,
pages 340349, 2003.
[69] Zheng Qin, Ji Jia, and Jun Qin. Content based 3D model retrieval: A survey. In
International Workshop CBMI, pages 249256, 2008.
[70] S. Ruiz-Correa, L. Shapiro, M. Meila, G. Berson, M. Cunnnigham, and R. Sze. Symbolic
signatures for deformable shapes. IEEE Trans. PAMI, 28(1):7590, 2004.
[71] D. Saupe and D. Vranic. 3D model retrieval with spherical harmonics and moments.
In DAGM Symposium on Pattern Recognition, pages 392397, 2001.
[72] B. Scholkopf and A. J. Smola. Learning with kernels. Cambridge University Press,
2002.
[73] Peter Schroder and Win Sweldens. Spherical wavelets: Eciently representing func-
tions on the sphere. In SIGGRAPH, pages 161172, 1995.
[74] L. G. Shapiro, K. Wilamowska, I. Atmosukarto, J. Wu, C. L. Heike, M. Speltz, and
M. Cunningham. Shape-based classication of 3D head data. In International Confer-
ence on Image Analysis and Processing, 2009.
[75] P. Shilane and T. Funkhouser. Selecting distinctive 3D shape descriptors for similarity
retrieval. In Shape Modeling International, pages 1825, 2006.
[76] P. Shilane and T. Funkhouser. Distinctive regions of 3D surface. ACM Transactions
on Graphics, 26(2), 2007.
[77] M. Speltz, B. Collett, M. Stott-Miller, J. Starr, C. Heike, A. Wolfram-Aduan, D. King,
and M. Cunningham. Case-control study of neurodevelopment in deformational pla-
giocephaly. Unpublished manuscript, under editorial review, 2009.
119
[78] H. Sundar, D. Silver, N. Gagvani, and S. Dickenson. Skeleton-based shape matching
and retrieval. In Shape Modeling International, pages 130138, 2004.
[79] B. Taati, M. Bondy, P. Jasiobedzki, and M. Greenspan. Variable dimensional local
shape descriptors for object recognition in range data. In ICCV, 2007.
[80] J. Tangelder and R. Veltkamp. Polyhedral model retrieval using weighted point sets.
In SMI, 2003.
[81] J. Tangelder and R. Veltkamp. A survey of content based 3D shape retrieval methods.
In Shape Modeling International, pages 145156, 2004.
[82] J. Tangelder and R. Veltkamp. A survey of content based 3D shape retrieval methods.
Multimedia Tools Application, 39(3):441471, 2008.
[83] Ranjith Unnikrishnan and Martial Hebert. Multi-scale interest regions from unorga-
nized point clouds. In CVPR, 2008.
[84] N. Vajramushti, I. A. Kakadiaris, T. Theoharis, and G. Papaioannaou. Ecient 3D
object retrieval using depth images. In MIR, 2004.
[85] L. A. van Vlimmeren, T. Takken, L. van Adrichem, Y. van der Graaf, P. Helders, and
R. Engelbert. Plagiocephalometry: a non-invasive method to quantify asymmetry of
the skull; a reliability study. Eur J Pediatr, 165:149157, 2006.
[86] Vladimir V. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998.
[87] D. Vranic. Desire= a composite 3D shape descriptor. In ICME, 2005.
[88] D. Vranic and D. Saupe. Tools for 3D object retrieval: Karhunen-loeve transform
and spherical harmonics. In IEEE Workshop on Multimedia Signal Processing, pages
293298, 2001.
[89] Yuehong Wang, Rujie Liu, Takayuki Baba, Yusuke Uehara, Daiki Masumoto, and
Shigemi Nagata. An images-based 3D model retrieval approach. In Advances in Mul-
timedia Modeling, pages 90100, 2008.
[90] Kouki Watanabe and Alexander G. Belyaev. Detection of salient curvature features on
polygonal surfaces. Computer Graphics Forum, 20(3), 2001.
[91] K. Wilamowska. Shape-based Quantication and Classication of 3D Face Data for
Craniofacial Research. PhD thesis, University of Washington, 2009.
120
[92] K. Wilamowska, L. G. Shapiro, and C. L. Heike. Classication of 3D face shape in
22q11.2 deletion syndrome. In IEEE International Symposium on Biomedical Imaging,
2009.
[93] Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and
techniques. Morgan Kaufmann San Fransisco, 2nd edition, 2005.
[94] J. Wu, K. Wilamowska, L. Shapiro, and C. Heike. Automatic analysis of local nasal fea-
tures in 22q11.2ds aected individuals. In IEEE Engineering in Medicine and Biology,
2009.
[95] Yubin Yang, Hui Lin, and Yao Zhang. Content-based 3D model retrieval: A survey.
IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and
Reviews, 37(6), 2007.
[96] T. Zaharia and F. Preteux. 3D shape based retrieval within the mpeg-7 framework.
SPIE Applications, 4304, 2001.
[97] M. Zhang, U. Bhowan, and B. Ny. Genetic programming for object detection: A two-
phase approach with an improved tness function. Electronic Letters on Computer
Vision and Image Analysis, 6(1):2743, 2007.
[98] M. Zonenshayn, E. Kronberg, and M. Souweidane. Cranial index of symmetry: an
objective semiautomated measure of plagiocephaly. J Neurosurgery (Pediatrics 5),
100:537540, 2004.
121
VITA
Indriyati Atmosukarto graduated with a Doctor of Philosophy in Computer Science from
the University of Washington in 2010, advised by Prof Linda Shapiro. She has obtained
a Master of Science from the University of Washington in 2006 and a Master of Science
from the National University of Singapore in 2002. Before that, she obtained her Bachelor
in Computer Science from the National University of Singapore in 2000. She was also a
research engineer at the National University of Singapore.
Her research interests are in computer vision, computer graphics, and machine learning.
Her research work has been in applying computer vision and machine learning techniques to
analyze and quantify 3D shapes for similarity-based retrieval and classication for general
computer vision applications. She has also collaborated with medical doctors at the Seat-
tle Childrens Hospital Craniofacial Center to quantify the 3D shape variations of various
craniofacial disorders.
She welcomes your comments to indria@u.washington.edu.

Vous aimerez peut-être aussi