Vous êtes sur la page 1sur 220

Continuous-state Graphical Models for Object Localization, Pose Estimation and Tracking

by Leonid Sigal

B.

A., Boston University, 1999

M.

A., Boston University, 1999

Sc.

M., Brown University, 2003

Submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in the Department of Computer Science at Brown University

Providence, Rhode Island May 2008

c Copyright 2003, 2004, 2006, 2008 by Leonid Sigal

This dissertation by Leonid Sigal is accepted in its present form by

the Department of Computer Science as satisfying the dissertation requirement

for the degree of Doctor of Philosophy.

Date

Date

Date

Date

Date

Michael J. Black, Director

Recommended to the Graduate Council

William T. Freeman, Reader (Electrical Engineering and Computer Science) Massachusetts Institute of Technology

John F. Hughes, Reader (Department of Computer Science)

David Mumford, Reader (Department of Applied Mathematics)

Approved by the Graduate Council

Sheila Bonde Dean of the Graduate School

iii

VITA

Leonid Sigal was born on May 23, 1977 in Kiev, Ukraine. He is a Ph.D. candidate under the supervision of Michael J. Black at Brown University; he received his B.Sc. degrees in Computer Science and Mathe- matics from Boston University (1999), his M.A. from Boston University (1999), and his M.S. from Brown University (2003). From 1999 to 2001, he worked as a senior vision engineer at Cognex Corporation, where he developed industrial vision applications for pattern analysis and verification. In 2002, he spent a semester as a research intern at Siemens Corporate Research (SCR) working on autonomous obstacle detection and avoidance for vehicle navigation. During the summers of 2005 and 2006, he worked as a research intern at Intel Applications Research Lab (ARL) on human pose estimation and tracking. His work received the Best Paper Award at the Articulate Motion and Deformable Objects Conference in 2006 (with Michael J. Black). Leonid’s research interests are primarily in computer vision and machine learning, including human motion analysis, graphical models, probabilistic and hierarchical inference.

iv

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my advisor and mentor Michael J. Black for his attentive supervision during my six years at Brown University. Michael provided me with a wealth of expertise and knowledge. He taught me how to choose and approach challenging problems and how to formulate a clear scientific argument. His enthusiasm and ability to engage students in interesting problems has served as an inspiration to me. As an advisor, Michael struck a perfect balance of providing help to me where and when it was needed, yet allowing me the freedom to grow as an independent researcher. I am also grateful to all members of Browns’ computer vision group and graduate community for helpful discussions, collaborations and a friendly environment that allowed me to conduct this research. I would specifically like to thank Stefan Roth for being a congenial colleague and for the advice that he has given me over the years; Alexandru Balan for a variety of collaborations and for introducing me to the SCAPE model.

I would also like to thank Alexundru for his entertaining personal stories and the companionship during

numerous conference travels and internships. I would also like to mention other group members who have indirectly affected this thesis: Gregory Shakhnarovich, Payman Yadollahpur and Frank Wood. In addition,

I want to thank Chad Jenkins and his students for introducing me to robotics and physics-based simulation,

while this has no direct relation to this thesis, it has affected my overall research agenda. The ideas developed in this thesis have also benefited from numerous external collaborations. I would like to thank Michael Isard for his early input and discussions on Particle Massage Passing (PAMPAS), which

is at the core of this thesis. I would also like to thank Alex Ihler and Eric Sudderth for interesting discussions on the relations between Non-parametric Belief Propagation (NBP), developed by them, and PAMPAS. In addition, I would like to thank Dorin Comaniciu and Ying Zhu from Siemens Corporate Research (SCR) for the mentorship they provided me during my five month internship in Princeton, NJ. As part of my internship, I was able to extend my prior work on articulated pose estimation and tracking to the generic obstacle detection domain (for autonomous vehicle navigation); this gave rise to the results presented in Chapter 4. I would also like to thank, Horst Haussecker, who has hosted me as a Research Intern at Intels Applications Research Laboratory (ARL) in Santa Clara, CA for two summers in 2006 and 2007 . Horst provided me with an invaluable environment and freedom to pursue research of my own personal interest. The collaborations with various people in the ARL group, in particular Nizhny Novgorod’s team has benefited this thesis in many practical aspects. Much of the experiments presented in Chapter 5 would not be possible (or as easily attained) without their help. Hence, I would like to thank all members of the Intel Research group with whom

I had a chance to interact and collaborate with during my visits: Adam Seeger, Oscar Nestares, Jean-Yves Bouguet, Konstantin Rodyushkin (Nizhny Novgorod, Russia) and Alexander Kuranov (Nizhny Novgorod, Russia). Moreover, I am grateful to members of my thesis committee, William Freeman, John Hughes (Spike)

v

and David Mumford, for taking the time from their busy schedules to read and comment on my work. In addition, I would like to thank David Mumford whose class on statistical modeling of shape has introduced me to Belief Propagation and much of the basic statistical and mathematical formalism that I use throughout this thesis. Both his class and our outside discussions were always insightful. I would also like to thank Spike for his valuable input on modeling statistical distributions over angles. Before coming to Brown University, my interest in the pursuit of an academic career and computer vision as a field was shaped by a number of key people. Most notably, Stan Sclaroff, my former advisor at Boston University. I had a chance to work with Stan both as an undergraduate and a master student. I am thankful to Stan, for sparking my interest in computer vision and for his continuing support and career advice over the years. I also would like to acknowledge people in IVC group with whom I had a chance to interact and collaborate at various points in my career: Vasillis Athitsos, Matheen Siddiqqui, John Isodoro and Romer Rosales. On a more personal level, I would like to thank my closest friends without whom this endeavor would not be half as much fun: Stan Rost (a.k.a. Progressor) for always making sure that I do the right thing, Max Frenkel and Arthur Furman for making sure that I get out of the house for beer once in a while; Natalya Ganchina, Filipp Rakevich, Irina Asipenko and Pasha Volkov for making their homes and refrigerators always open to me. Lastly to a large extent for the success of this thesis I must thank my family. My parents, Yelena and Alexander Sigal, from an early age have taught me to value education and to pursue my dreams. While they may not have always agreed with my decisions or choices, they always stood by me and supported me along the way. My sister, Marina Sigal, is always someone I can talk to about my struggles and achievements. However, more importantly, my days would not be complete without her funny stories. Finally, this thesis would not be possible without the moral support and patience of my soon to be wife, Sofya Bubentsova; I want to thank her deeply for sharing with me all the successes and hardships along this lengthy but rewarding journey.

vi

TABLE OF CONTENTS

List of Tables

xi

List of Illustrations

 

xiv

List of Algorithms

xv

1 Introduction

2

1.1 Object Localization and Tracking .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3

1.2 Articulated Pose Estimation and Tracking

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5

1.3 Challenges

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

6

1.4 Thesis Outline .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9

1.5 List of Related Papers .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

11

2 State of the Art

 

12

2.1 Common Assumptions

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

12

2.2 Humans at Different Scales

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

15

2.3 Categorization of Approaches

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

15

2.4 Representing the Body

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

18

2.4.1

Kinematic Tree

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

19

2.4.2 Scale Prismatic Model

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

20

2.4.3 Part-based Representation

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

21

2.5 Image Features

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

22

2.5.1

Silhouettes

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

22

2.5.2 Color

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

23

2.5.3 Edges

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

23

2.5.4 Contours

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

24

2.5.5 Ridges

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

24

2.5.6

Image Flow .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

24

2.5.7

Voxels .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

25

2.5.8

Image Descriptors

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

26

2.6 Pose Estimation and Tracking

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

27

2.7 Discriminative and Generative Methods

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

27

2.8 Optimization Methods

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

28

2.9 Number of Views

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

32

vii

 

2.9.1 Multiocular 3D Inference .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

33

2.9.2 Monocular 3D Inference

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

33

2.9.3 Sub-space Methods

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

35

2.10 Quantitative Evaluation .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

35

2.11 Generic Object Detection, Localization and Categorization

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

36

2.11.1 Sliding Window Classifiers .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

36

2.11.2 Part-based Models

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

37

2.11.3 Hierarchical Composition Models

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

38

3

Graphical Models and Inference

 

39

3.1 Graphical Model Building Blocks

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

40

3.1.1 Exponential Family

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

40

3.1.2 Gaussian Distribution and Properties

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

41

3.2 Bayesian Networks

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

42

3.2.1 Markov Chains

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

43

3.2.2 Hidden Markov Models

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

44

3.2.3 Generative and Discriminative Graphical Models

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

45

3.3 Undirected Graphical Models

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

46

3.3.1 Markov Random Fields

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

46

3.3.2 Pair-wise Markov Random Fields

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

48

3.3.3 Factor Graphs .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

49

3.4 Parameter Estimation

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

50

3.4.1 Maximum Likelihood

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

50

3.4.2 Expectation-Maximization .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

52

3.4.3 Parameter Estimation with Hyperpriors

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

54

3.5 Inference

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

55

3.5.1 Variable Elimination

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

56

3.5.2 Belief Propagation

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

58

3.6 Monte Carlo Methods .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

62

3.6.1 Importance Sampling

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

62

3.6.2 Kernel Density Estimation

.

.

.

.

.

.

.

.

.

.

.

.

.

.