Vous êtes sur la page 1sur 110

Pose estimation from one conic

correspondence
by
Snehal I. Bhayani
201211008

A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of

MASTER OF TECHNOLOGY
in
INFORMATION AND COMMUNICATION TECHNOLOGY
to

D HIRUBHAI A MBANI I NSTITUTE OF I NFORMATION AND C OMMUNICATION T ECHNOLOGY

June, 2014

Declaration
I hereby declare that
i) the thesis comprises of my original work towards the degree of Master of
Technology in Information and Communication Technology at Dhirubhai
Ambani Institute of Information and Communication Technology and has
not been submitted elsewhere for a degree,
ii) due acknowledgment has been made in the text to all the reference material
used.

Snehal Bhayani

Certificate
This is to certify that the thesis work entitled INSERT YOUR THESIS TITLE HERE
has been carried out by INSERT YOUR NAME HERE for the degree of Master of
Technology in Information and Communication Technology at Dhirubhai Ambani
Institute of Information and Communication Technology under my/our supervision.

Prof. Aditya Tatu


Thesis Supervisor

Acknowledgments
Is this what you ask,
or is this your answer?
Pray illuminate, for I have questions more and more
after you answer each one of these,
and more... ;
there is no learning save for when I stumble,
pray illuminate, for I have questions more.
Of the swathe of acknowledgments, the first one goes out to my supervisor
Prof. Aditya Tatu. His insights helped me not only achieve crucial progress, but
have different and interesting perspectives of problems at various times along
the duration of my thesis work. His guidance, starting from what, when, where
to keep notes(in Latex), to what shall we infer from which experiment, has been
paramount in molding my work in a comprehensive manner. With his knowledge
of mathematics and his sheer interest in the same, he has helped me get over my
points of confusion, again, a numerous times.
I would like to acknowledge Prof. Manjunath Joshi for his guidance on camera
calibration tools and approaches. I would also like to acknowledge Prabhunath
sir, for his instantaneous help in creating a virtual setup for camera calibration. A
special thanks to my friend, Haritha for her swanky DSLR camera helped me have
the best possible images of calibration patterns for days at end. I would like to
thank all of my friends, colleagues and classmates, who put up with my changed
self while I worked and put up with my other self while I was not working and
they were. And last but not the least, a special thanks to my parents, and my sister
for their constant support and care all along.

ii

Contents

Abstract

vi

List of Principal Symbols and Acronyms

viii

List of Tables

ix

List of Figures

Introduction

1.1

Two camera setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1

General assumptions . . . . . . . . . . . . . . . . . . . . . . .

1.2

Introduction to problem of pose estimation . . . . . . . . . . . . . .

1.3

Background work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4

Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

1.5

Layout of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

Epipolar geometry

13

2.1

Introduction to epipolar geometry . . . . . . . . . . . . . . . . . . .

13

2.1.1

Geometric definition of homography between two images .

14

2.1.2

Algebraic definition of epipolar mapping . . . . . . . . . . .

15

2.1.3

Some properties of the fundamental matrix, F . . . . . . . .

16

2.1.4

Question on homography generated in a one camera setup .

17

2.1.5

Question on homography generated in a two camera setup

20

2.2

Conics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.3

Quadrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

2.4

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

Geometric approach to pose estimation from one conic correspondence

28

3.1
3.2

Dependence of pose on conic correspondence and vector defining


the scene plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

Conic correspondence . . . . . . . . . . . . . . . . . . . . . . . . . .

30

iii

3.3

Mathematical implication of the first assumption on scene plane

32

3.4

Estimating R and t through geometric construction . . . . . . . . .

37

3.5

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

3.5.1

Experiments for geometric approach on synthetic data . . .

47

3.5.2

Experiments for geometric approach on real data . . . . . .

50

3.5.3

Experiment of geometric approach on part real and part syn-

3.6
4

51

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Alternate approaches to pose estimation


4.1
4.2

4.3
5

thetic dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

Estimating R and t through optimization . . . . . . . . . . . . . . .

54

4.1.1

Results and discussion . . . . . . . . . . . . . . . . . . . . . .

56

Multi-stage approach to pose estimation: a comparison . . . . . . .

57

4.2.1

Optimizing the cost function . . . . . . . . . . . . . . . . . .

60

4.2.2

Results and discussion . . . . . . . . . . . . . . . . . . . . . .

61

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

Conclusion and future work

63

References

66

Appendix A Basics of projective geometry

70

A.1 Affine Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

A.1.1 Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

A.1.2 Basis of affine spaces . . . . . . . . . . . . . . . . . . . . . . .

72

A.1.3 Affine morphism . . . . . . . . . . . . . . . . . . . . . . . . .

73

A.1.4 Affine subspaces . . . . . . . . . . . . . . . . . . . . . . . . .

74

A.1.5 Invariants of Affine morphism . . . . . . . . . . . . . . . . .

75

A.2 Projective Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

A.2.1 Definition of a projective space . . . . . . . . . . . . . . . . .

77

A.2.2 Basis of a projective space . . . . . . . . . . . . . . . . . . . .

78

A.2.3 Projective transformation . . . . . . . . . . . . . . . . . . . .

79

A.2.4 Projective subspaces . . . . . . . . . . . . . . . . . . . . . . .

81

A.2.5 Affine completion . . . . . . . . . . . . . . . . . . . . . . . . .

82

A.2.6 Action of Homographies on subspaces and study of invariants 85


A.2.7 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

A.2.8 Homography as a perspective projection between two projective lines . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

A.2.9 Homography between two planes . . . . . . . . . . . . . . .

89

iv

Appendix B Camera models and camera calibration


B.1 Finite Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.1.1

92
92

Elements of a finite projective camera . . . . . . . . . . . . .

93

B.2 Infinite Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

B.3 Camera calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

Appendix C Some miscelleneous proofs

96

Abstract
In this thesis we attempt to solve the problem of camera pose estimation from one
conic correspondence by exploiting the epipolar geometry. For this we make two
important assumptions which simplify the geometry further. The assumptions
are, ( a) The scene conic is a circle and (b) The translation vector is contained in a
known plane. These two assumptions are justified by noting that many artifacts
in scenes(especially indoor scenes), contain circles, which are wholly in front of
the camera. Additionally, there is a good possibility that the plane which contains
the translation vector would be known. Through the epipolar geometry framework, a matrix equation is defined which relates the camera pose to one conic
correspondence and the normal vector defining the scene plane. Through the assumptions, we simplify the system of polynomials in such a way that the task
involving solution to a set of seven simultaneous polynomials in seven equations,
is transformed into a task of solving only two polynomials in two variables, at
the same time. For this we design a geometric construction. This method gives
a set of finitely many camera pose solutions. We test our propositions through
synthetic datasets and suggest an observation which helps in selecting a unique
solution from the finite set of pose solutions. For synthetic dataset, the solution
so obtained is quite accurate with an error of 104 , and for real datasets, the solution is erroneous due to errors in camera calibration data we have. We justify this
fact through an experiment. Additionally, the formulation of above mentioned
seven equations relating the pose to conic correspondence and scene plane position, helps to understand that, how does the relative pose establish point and
conic correspondences between the two images. We then compare the performance of our geometric approach with the conventional way of optimizing a cost
function and show that the geometric approach gives us more accurate pose solutions.

vi

List of Principal Symbols and Acronyms


, , : Real valued scalars.
En , Rn : n dimensional vector spaces over R. The basics of projective geometry
which we shall refer to, are mostly written in a language that uses the notation En to denote a real vector space. The rest of our work shall use the
usual Rn notation.
P( En+1 ) : n dimensional projective space over R. As mentioned in appendix
(A.2), the underlying vector space is En+1 .
p : En+1 \ 0n+1 P( En+1 ) : Canonical projection from vector space, En to P( En )
as explained in section (A.2.2).
f :P( En+1 )P( En+1 ): A bijective mapping of points of P( En+1 ) onto itself such

that f : En+1 En+1 is an isomorphism. In literature this is termed as a


projective morphism or a projective transformation.
GL(n) : Set of n n real invertible matrices.
E:

The calibrated counterpart of F that relates the two images in the same way,
except for the fact that the calibration matrices are known and fixed.

K:

A 3 3 camera calibration matrix that has the intrinsic parameters of the


camera, K GL(3).

H:

A homography or a projective morphism over a projective plane, f :P( En+1 )P( En+1 )
where n = 2. This mapping is denoted by a matrix H GL(3).

cam(O, , K ) : A camera unit with centre O, image plane and calibration matrix
K.
F:

A fundamental matrix defined in a framework of epipolar geometry, [1].

vii

[ x ] : The skew-symmetric matrix defining the vector cross-product of x. If x =

0
x3 x2
h
iT

0
x1 .
x1 x2 x3 , then [ x] = x3
x2

x1

Inn or In : An n n dimensional identity matrix.


xn1 or xn : An n dimensional column vector, x.

k A k F : The frobenius norm of matrix A.

viii

List of Tables

3.1

Results of single stage geometric approach on synthetic dataset.


Here Rtrue and ttrue denote true values and, R and t denote the pose
solution obtained through convergence for gradient descent scheme. 49

3.2

Results of single stage geometric approach on real dataset . . . . .

3.3

Result of part real data for investigating the error due to erroneous
calibration matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.1

51
52

Result of gradient descent approach on synthetic data. Here Rinit


and tinit denote starting points, Rtrue and ttrue denote true values
and, R and t denote the pose solution obtained through convergence for gradient descent scheme. . . . . . . . . . . . . . . . . . . .

ix

56

List of Figures

1.1

A setup describing epipolar geometry . . . . . . . . . . . . . . . . .

2.1

Epipolar geometry or the geometry of two views . . . . . . . . . . .

13

2.2

Epipolar plane drawn for epipolar geometry . . . . . . . . . . . . .

15

2.3

A one camera setup and its question . . . . . . . . . . . . . . . . . .

18

2.4

Geometric description of poncelets theorem, figure from [2]. . . . .

20

2.5

Conic correspondence through a projective transformation. . . . . .

24

2.6

A cone with its apex at origin, image from [1]. . . . . . . . . . . . .

25

3.1

Two cones,Q1 and Q20 describing a conic correspondence. . . . . . .

39

3.2

Rigid body motion of the cone Q20 onto Q2 . . . . . . . . . . . . . . .

40

3.3

A diagram describing the geometric construction. . . . . . . . . . .

43

3.4

Pose solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

3.5

First test image containing conic C1 . . . . . . . . . . . . . . . . . . .

51

3.6

Second test image containing conic C2 . . . . . . . . . . . . . . . . .

51

3.7

Difference between the two conics of real and sythetic datasets

. .

52

A.1 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

A.2 Associativitiy of perspective projections . . . . . . . . . . . . . . . .

90

A.3 An example of a homography between two projective planes l and


m due to perspectivity . . . . . . . . . . . . . . . . . . . . . . . . . .

90

C.1 Two series of circular cross-sections in circular cone, figure from [3].

98

C HAPTER 1

Introduction
This thesis deals with the problem of one form of pose estimation as defined for
computer vision community. In rudimentary terminology this form of pose estimation can be stated as an estimation of relative orientation between two camera
positions in euclidean coordinate system from where the given scene has been imaged.
Haralick in [4] introduces four classes of pose estimation problems as given next:
1. 2D-2D pose estimation problem: We are given two-dimensional coordinate
observations from N observed images: x1 , ..., x N . These could correspond,
for example, to the observed center position of all observed objects. We are
also given the corresponding (or matching) N two-dimensional coordinate
vectors from the model:y1 , ..., y N . The rotation and translation in 2D plane
are to be estimated that relate these two sets of observations. In other words,
we have to determine the rotation matrix R and the translation vector t such
that the least squares error,
e2 = nN=1 wn k yn ( Rxn + t) k2 ,

(1.1)

is minimized. wn represents the weight for the contribution to the error by


the nth point correspondence.
2. 3D-3D pose estimation problem: Let N 3D-coordinate observations be given
as y1 , ..., y N and that they match the corresponding 3D coordinates x1 , ..., x N .
Each observation yn is said to be the rigid body motion of the corresponding
observations xn in R3 space. They are related as
yn = Rxn + t + n , n, 1 j N.

The pose estimation in this case is defined to be the estimation of R and t


that minimizes the error
e2 = nN=1 wn k yn ( Rxn + t) k2 .
3. 2D perspective-3D pose estimation problem: Let N 3D coordinate observations be given as y1 , ..., y N and they match the corresponding 2D coordinates
h
iT
represented as x1 , ..., x N , x j = u j1 u j2 . The exact relationship is given as
u j1 = f

r1 y n + t1
,
r3 y n + t3

u j2 = f

r2 y n + t2
,
r3 y n + t3
0

t = ( t1 , t2 , t3 ) ,

r1

R = r2 ,
r3

(1.2)

where f is the focal length or the distance of the image plane in front of
the origin that is the center of perspectivity and r1 , r2 and r3 are the rows of
the rotation matrix R. Then the problem of pose estimation of this kind is
to estimate R and t when a set of correspondences between the 3D points
and the perspective 2D points are given. This problem is termed as exterior
orientation problem in photogrammetry literature.
4. 2D perspective-2D perspective pose estimation problem: This is perhaps the
most difficult of pose estimation problems. Here we do not have the 3D
world coordinates. Instead we have two images or perspective projections
of the same object. Or one can assume the object to be moving and the
perspective projection device to be fixed. A pin-hole camera model is one
such theoretical device and is of interest to us. As a setup, we have a scene
and its image from two distinct positions of the camera(or as stated, we
have one camera and we are taking images of a scene which has undergone
a rigid body motion.). Then with point correspondences between these two
perspective projections one has to estimate the rigid body motion that the
scene has undergone. This statement forms the statement of 2D perspective2D perspective pose estimation problem.
2

Of the four classes above, our work is about the fourth type of pose estimation
problem, 2D perspective- 2D perspective pose estimation problem. This approach
requires an overview of a two camera setup. Hence before we go further, a general
arrangement of the two camera setup introduced in next section (1.1). The mathematical spaces considered throughout this report would be euclidean spaces1 ,
unless specified otherwise.

1.1

Two camera setup

The purpose of introducing such an arrangement is two-fold. Firstly, it introduces the various feature artifacts which would be used for establishing correspondences between two images(like points, lines, conics etc.) and the varying
mathematical relationships amongst them, and secondly the same framework sets
up the idea of multiple-view geometry(or termed as epipolar geometry by Hartley
and Zisserman in [1]).

Figure 1.1: A setup describing epipolar geometry

1 One

can wonder, we are dealing with projective spaces and still the ones considered are not
projective spaces. The reason is, as shown in section (A.2.5), the projective space is obtained by
"adding" points at infinity to an affine space(here we can consider an euclidean space as an affine
space with origin as the point [0, 0, 0] T ). For practical purposes we assume the points we deal with
are "not at infinity". Hence the projective space is reduced to an affine(or an euclidean) space.

1.1.1

General assumptions

A two camera setup is depicted in figure (1.1). Here a pin-hole camera is decomposed into a projection center O(a point in R3 ), an image plane and its calibration matrix K. This model is mainly of theoretical interest but for our application
we see that this highly simplified model works well enough so as to be able to
ignore various practical issues in a camera model. Such a camera model shall be
denoted as cam(O, , K ). The calibration matrix houses quantities that determine
the relation between the position of a point x in 2D image coordinate system
with respect to its position in the 3D global coordinate system of the camera. Let
O be the intersection of the line from O perpendicular to with . Then the
matrix K gives us the distance of the plane from center O and the position of
the point O in terms of the local coordinate system of . More on the structure
of K can be read from appendix (B.3).
As shown in the figure we have a pair of cameras cam(O1 , 1 , K ) and cam(O2 , 2 , K )
with their centers at points O1 and O2 in R3 . The calibration matrices are same for
both of the cameras. The image planes associated with cameras O1 and O2 are 1
and 2 respectively. Now a quadratic curve is defined as the zero set of a second
order polynomial
Ax2 + By2 + Cxy + Dx + Ey + F = 0.
This polynomial can be written in matrix form as


x y 1 C/2
B
E/2 y = 0.
D/2 E/2
F
1
A

C/2 D/2

(1.3)

Using dual notation, henceforth we shall have the same notation for a quadratic
curve and for a matrix representation of its defining polynomial, C. For the above

A C/2 D/2

defined curve, C means the matrix C/2


B
E/2 and also the set of points
D/2 E/2
F
defined by the solution to the equation (1.3). In computer vision community such
a quadratic curve is termed as a conic.
The conics in the two image planes are assumed to be C1 in 1 and C2 in 2 .
Further, let us have a third plane, is oriented in R3 space, containing the scene
4

conic C such that its images are C1 and C2 upon imaging by the two cameras. For
a general orientation we assume that none of O1 or O2 lie on the plane . This arrangement of the three planes , 1 and 2 constructs a special bijective mapping
between each pair of these projective planes, known as homographies in computer
vision terminology. Without getting into details, we mention that a homography
is a bijective mapping between two projective planes such that projective lines
are mapped to projective lines. Precise definition and properties of homography
between two planes is described in sections (A.2.3) and (A.2.9) of appendix (A).
From these definitions we note that such a mapping can be represented by a real
invertible matrix, H, unique upto a non-zero scalar multiple. Then the point mapping between two projective planes 1 and 2 is defined as
Hx = y, x 1 , y 2 ,
where x and y are homogeneous representations for points of projective planes.
This matrix, H, shall henceforth represent such a homography between two projective planes. Then as mentioned before, that the arrangement of the three planes
, 1 and 2 construct homographies between planes and 1 , between and
2 , and between 1 and 2 as shown below:
H1 : 1 , H2 : 2 and H : 1 2
where
H = H21 H1 .
Contrary to what we assume for defining homography above, we assume that
the three planes , 1 and 2 are represented as euclidean planes rather than projective planes. A practical application would mostly have the cameras at finite
location and the projective point representing the finite camera center would be
uniquely identified by its corresponding euclidean (or affine) counterpart. Even
if there is point at infinity in P( E4 ) which is imaged to obtain a finite point in the
image plane, we treat those points as special cases of parallelism. Further, the
points in P( E4 ) which are imaged to points at infinity on the image planes are the
points lying on the principal plane2 , [1]. But for practical situations we dont consider those parts of scene that lie on the same side of the image plane on which
the camera center lies. This means that points on the principal plane would not be
imaged, which implies that the points in the scene in front will never be mapped
2A

principal plane in a camera is the plane parallel to the image plane and passing through the
camera center.

to points at infinity on the image plane through imaging process by the camera.
To summarize, we dont need to include points at infinity in P( E4 ) or the image
plane P( E3 ) for our purpose.
h
iT
E.g. if the point is x y z w , we can safely assume w 6= 0 and hence
h
iT
h
iT
x y z w corresponds to x/w y/w z/w in R3 . In short we can safely
assume that the projective spaces used in a camera setup are reduced to euclidean
spaces.
The points in the planes are measured relative to their local coordinate systems
which again, are assumed to be euclidean. If still one needs to have a projective
representation of the same point, the euclidean coordinate system can be extended
h
iT
to a corresponding projective system, and a point x y in R2 can represent the
h
iT
point x y 1 in P( E3 ). This extension is basically the process of adding an
homogeneous coordinate to a point measured in euclidean(or affine) coordinate
system3 .
With the notation and representation of points in order we can define the way
the conics are related through the homographies as
C = H1T C1 H11 , C = H2T C2 H21 and C2 = H T C1 H 1 .

(1.4)

Thus a homography maps the points and conics in one image with those in
the other image. This mapping is governed by the related orientation between
the two image planes and also that between the two camera centers. There is a
basic relationship that relates this homography with the relative orientation. But
before introducing them let us introduce two matrices which are important for
computer vision problems.
Now the homographic mapping is between points in both of the images that
are projections of those points lying in same plane . A general relation exists
between all of the image points x in 1 and y in 2 irrespective of whether their
corresponding image points lie in the same plane or not. This relation has been
studied, introduced and worked over a many times in literature. Starting from
the introduction of the essential matrix by Higgins [5]. The matrix defines the con3 This

extension is along the lines of discussion in appendix (A.2.5).

ditions for point correspondence x y:


y T Ex = 0

(1.5)

if and only if x and y are the images of the same scene point4 . Another matrix
discussed and taken up subsequently by noted researchers termed as fundamental
matrix. This matrix is the un-calibrated counterpart of the essential matrix.
E = K T FK.
This means the same point correspondence is defined but the point measurements
dont need the calibration matrix to be known. A detailed explanation and treatment of both these matrices can be found in the textbook, [1] by Hartley and Zisserman. These equations form the backbone of our thesis.
Relative orientation of plane 1 with respect to plane 2 in E3 is assumed to
be rotation, R and translation, t. These quantities are such that a point y 2
when rotated and translated through R and t, we would get a corresponding
point x 1 as x = Ry + t. Thus in the figure given above, if O1 is at origin
h
iT

O1 = 0
0
0 , then O2 = R T t. The points of intersection of line O1O2
with planes 2 and 1 are known as the epipoles e1 and e2 of cameras 2 and 1 respectively. The essential matrix, as introduced above can be decomposed in terms
of R and t, [5]:
E = [t] R.
The fundamental matrix in terms of R and t is decomposed as [1]:
F = [e2 ] H.
In lemma (1) in chapter (3), we prove the following relationship between pose
parameters R, t and variables of epipolar geometry, H, e
t = K 1 e,
R = 1 (K 1 HK + K 1 ev T K ),

(1.6)

where R, t, e and K have their usual meanings and is a real scaling factor. v
represents the position of the scene plane.
4x

and y are measured in image planes, assuming that the cameras are calibrated.

1.2

Introduction to problem of pose estimation

With the above setup in mind, one can now define the pose estimation through
mathematical quantities. Considering the camera centers, O1 and O2 , translation
vector t is defined as:

t = O1 RO2 ,

(1.7)

where O1 and O2 are the vector representations of points O1 and O2 in R3 .


R is the rotation matrix which maps image plane 2 to a parallel position to that
of 1 upon rotation. In other words if 1 is defined as u1T x + 1 = 0, x R3 and
2 as u2T x + 1 = 0, x R3 , then R can be estimated as the rotation of the unit
vector u2 /ku2 k to the unit vector u1 /ku1 k.
Thus pose estimation in this thesis, is defined to be an estimation of R and t, given
the two images or the two image planes, 1 and 2 . In this thesis, the assumption is
that the cameras are calibrated and the camera calibration matrix K is same for
both cameras. More assumptions would follow in later chapters, some for more
accuracy and some for simplicity.

1.3

Background work

The history of computer vision is rich and full of brilliant insights. Its association
with projective geometry is even richer. We have listed out the four main classes
of pose estimation problem along the lines of a paper by Haralick, [4]. In a later
paper, Haralick et al. in [6] work over a similar kind of problem but exclusively
in an euclidean space, where they look at a closed form solution to pose estimation from a set of three point perspective projections. This problem would be of
the type defined as 2D perspective-3D pose estimation problem in point (3). Photogrammetry deals with these problems in detail and as well has its application
to computer vision. Higgins, in [5], introduced the essential matrix of equation
(1.5) primarily to tackle the problem of relative orientation or in other words the
pose estimation problem. Authors in the same paper give us an algorithm to estimate R and t from E. This fact can be understood from the algorithm proposed
for estimation of R and t in [5]. A comprehensive study on fundamental matrix
and its related treatment has been carried out by other researchers, Zhang in [7]
and, by Luong and Faugeras in [8]. For the second and more crucial part of the
problem [5, 8, 7, 4, 9, 10] use point correspondences to estimate either the fun8

damental matrix or essential matrix. As against this, Heyden and Kahl in [11]
have used conic correspondences to estimate the fundamental matrix. The authors give a brief survey of various features(like points, lines, curves and many
more) used in the past to estimate the fundamental matrix. They also state the
reasons for why conic correspondences are preferred by certain researchers over
the conventional point and line correspondences. The primary motivation is the
fact that many man-made objects contain a curve which is either a conic or can
be approximated to be formed of conics. Another reason is a property of projective transformation that any projective transformation maps a conic into another
conic(also termed as the projective invariance property). A projective transformation is a pointwise mapping between two projective spaces.5 Ji et al. in [12] have
used a mix of various geometric features like points, lines and conics, to estimate
the pose of a camera with respect to the object coordinate frame. Towards the
same objective, they have have considered a linear approach that combines geometric features at different levels of complexity, thus improving the stability and
accuracy of the solution. The approach estimates the pose parameters from point
correspondences, line correspondences and 2D ellipse-3D circle correspondences.
For circle-ellipse correspondence, they have obtained two polynomials which define two constraints on the relative pose. But the authors have assumed that the
radius of the circle is known and the property of the circle as a conic section is
not used completely as the focus is more on using as many feature correspondences as available. The same problem of 2D perspective-3D pose estimation is
worked over by Wang in [13]. The approach as proposed in the paper amounts
to estimating the pose of the camera from single view and under the assumption
that the intrinsic camera parameters are known. The approach uses the image
of an absolute conic6 to estimate the pose of the camera. An added assumption
is needed that the image of center of the 3D circle is known is employed for the
minimal case where image of only one circle is known. But it hasnt been explicitly justified when would such an assumption hold true though some methods of
estimating the same have been suggested.
5 Definition

of projective transformation in strict mathematical terms is given in appendix


(A.2.3).
6 The absolute conic is an imaginary conic at infinity that consists of purely imaginary points.
The image of the conic is shown to depend only on the calibration matrix.

1.4

Our contribution

Our contribution primarily lies in an attempt to solve the problem from a slightly
different perspective. It has motivated two different approaches for pose estimation. The first approach is based on the equation,

( R tu T )T K T C2 K ( R tu T ) K T C1 K = 033 ,
where R, t, C1 , C2 have their usual meanings, u R3 is the vector7 defining
the scene plane that contains the conic C and is a scaling factor introduced to
account for the homogeneous quantities, C1 and C2 in the equation. The above
equation is derived by combining epipolar geometry with one conic correspondence. Intuitively this equation describes the relationship between the pose R, t
and the pair of conics in correspondence through the normal vector of scene plane,
u. This constraint can be further simplified if we assume that the conic C in scene
whose images C1 and C2 are known is a circle8 and that the translation vector lies
in a specified plane(defined by a normal vector w). These assumptions reduce the
number of unknown variables in previous equation to get

( R tu T )T K T C2 K ( R tu T ) K T C1 K = 033 ,
w T t = 0,

(1.8)

where R, t and are to estimated and C1 , C2 , u, K and w are known. A straightforward way to solve the above equations is to write a gradient descent algorithm
via explicit calculation of gradient vectors or use MATLABs inbuilt functions for
optimization on a cost function modeled from equation (1.8). Unfortunately any
optimization method, in general, can get stuck in a local minima, and through experiments on synthetic datasets, we have found that the algorithm does get stuck
at a point which is nowhere close to the true value. Such an experiment and its
result is given in section (4.1) of chapter (4). A second problem is that there is
no sure-shot way of figuring out how many global minimas does our system of
polynomials have. These facts make the starting points of the parameters more accountable to how does the algorithm behave. An estimate closer to the true value
helps the algorithm behave nicely and converge accurately to the true solution.
But with a starting point quite far off, the solution achieved upon convergence is
not at all close enough to the true value. To get round to this problem, we design
vector u defines the plane through the plane equation x T u + 1 = 0, x R3 .
8 By its requirement to being a circle we mean, a circle in the global coordinate system in R3 .
7 The

10

a geometric construction9 such that one can estimate all possible pose solutions to
a given problem. For this we transform the problem of estimating pose solutions
through optimization of cost function of equation (1.8) to a problem that involves
finding solutions to two pairs of polynomials, with each pair depending only on
two variables. The first pair of polynomials has three and four degree polynomials, whereas the second pair has quadratic polynomials. These polynomials
can be accurately solved using the symbolic computation toolbox available with
MATLAB. The advantage here is that at a time we have only two polynomials in
two variables to solve which is a considerable improvement over the conventional
optimization task which includes solving seven polynomials in seven variables at
the same time. This is the reason for the high accuracy our approach achieves.
Further, solving these polynomials we get the pose as a finite set of all possible
solutions in form of R and t. The process follows a geometric construction and
does not need optimization which in turn helps improve the accuracy of the results. The construction further improves our understanding of the above equation. The equation (1.8) relates the image and camera coordinate systems through
a conic correspondence. As a set of observations we propose some points on how
to pick one solution out of the finite set of all possible solutions as obtained from
this approach. We perform experiments on both real and synthetic data for this
geometric approach to pose estimation. For synthetic datasets, we find that the
pose solutions thus estimated, are accurate to an error of the order of 104 . Especially for datasets with rotation matrix close to identity matrix, the observations
help us select a solution which is closest to the true values. But the observations
dont hold true for datasets with rotation matrices considerably far from identity
matrix. For such cases, we propose using one additionally point correspondence
which is beyond the scope of this thesis. For real datasets the estimated pose solution is not accurate enough. But through a related experiment, we demostrate
that the error in pose solution is primarily due to the error in camera calibration
process.

1.5

Layout of thesis

In chapter (2) we introduce the basics of epipolar geometry. It deals with the setup
of a two camera system but from a projective geometry point of view. The prerequisites of epipolar geometry are projective, affine and euclidean spaces whose
9 For

the time being, we consider only the euclidean coordinate system.

11

properties and definitions are well covered in appendix (A). The camera models
are covered in appendix (B) and camera calibration in appendix (B.3). The discussion in these two appendices follows the textbooks by Hartley and Zisserman in
[1] and by Trucco, Emanuele and Verri in [14]. In chapter (3) we introduce and
describe in detail the geometric approach to pose estimation from one conic correspondence with the two assumptions. Alongwith the discussion of the algebra
and geometry behind the approach we list the experiments performed on synthetic and real data, we infer certain points of merit and demerit for the proposed
approach. In chapter (4) we take up two alternate methods of pose estimation,
which are solved through optimization algorithms. Their shortcomings and sample results for one method are provided alongwith an interpretation for the other
method. In chapter (5) we conclude the thesis where we discuss practical and
theoretical difficulties encountered, and a possible future line of work.

12

C HAPTER 2

Epipolar geometry

2.1

Introduction to epipolar geometry

Epipolar geometry is a geometry of two views and the underlying framework on


which this thesis is built upon. Before one looks into the details, it is worthwhile
to see why should one study the same. We start with a purely euclidean setup
already introduced in section (1.1) on a two camera system. The same setup is
redrawn here but with the necessary details kept and the rest removed.

Figure 2.1: Epipolar geometry or the geometry of two views


The two cameras are cam(o, 2 , K ) and cam(q, 1 , K ) and the scene plane is .
Through this scene plane we have the point mappings a a00 ,b b00 , c c00
and d d00 . The points e2 and e1 are known as epipoles for images 1 and 2
13

respectively. These two points define correspondence a geometric setup. This


gives us the name epipolar geometry meaning, the geometry of two(epi) poles.
In general this point correspondence between points x1 in 1 and x2 in 2 can be
defined as

6= &p , p

.
x1 x2
qx
ox
qx
ox
2
2
1
1
int
int

(2.1)

This is the geometric way of defining a point correspondence. One point worth
and
are
noting is that the camera setup of the figure (2.1) is in R3 . If lines
qx
ox
1

parallel they dont intersect in a point in E3 , but in a point x , well defined in the
projective space P( E4 ) which by equation (A.13) is decomposed as
P( E4 ) = E3 t P( E3 ),
where t denotes the union of two disjoint sets. Thus point x lies in P( E3 ).
With this decomposition in mind, we can ensure that the point correspondence
between two images is well defined. This way of defining a point correspondence
motivates a special homography between two images. We call it special because,
such a homography would be constructed through the scene plane. As shown
later, this mapping is a part of a more general mapping between these two images
through the scene. In next section we intuitively describe this homography mapping through a scene plane and after that algebraically define the more general
mapping through scene points.

2.1.1

Geometric definition of homography between two images

Based on the way a point correspondence between two images through a scene
plane, , is described, one can infer that such a mapping would be bijective. Distinct positions for would give different mappings unless the planes are parallel
to each other. One point to note is that given a pair of images and a scene, not
every point in first image forms a correspondence pair with a point in second
image through a homography realized through a scene plane. Only the points
which are projection of points on scene plane, in both of the image planes are the
only ones forming correspondence pairs through homgraphy mapping generated
through . This is termed as point transfer through scene plane by Hartley and
Zisserman in chapter (9), [1]. But the scene points(irrespective of whether they lie
on scene plane or not) in general also setup point correspondences between the
two images. We look at this mapping in an algebraic formulation next.

14

2.1.2

Algebraic definition of epipolar mapping

We can model the correspondence of equation (2.1) using a fundamental matrix


as well:
x1 x2 x2T Fx1 = 0 where x1 1 and x2 2 .

(2.2)

Hartley and Zisserman in [1] term this representation to be the algebraic expression of the epipolar geometry. Given a pair of cameras, their image planes have
point correspondences related through this algebraic equation. But the point mapping is not unique which is evident from the two figures (2.1) and (2.2).

Figure 2.2: Epipolar plane drawn for epipolar geometry


From figure (2.2), we see that points c00 and c000 in plane 1 map to the same

point c in plane 2 . In short all points that lie on line c00 e1 are mapped to the same

point in plane 2 . Thus we say that the line c00 e1 corresponds to the point c. For
geometric intuition one has the following definitions from [1]:

1. Epipolar plane of a point c 2 : A plane containing the line


qo and the point
c is known as the epipolar plane of c.
15

2. Epipolar line of a point c 2 : The line l in 1 obtained by the intersection of


the epipolar plane of c as defined above, with the image plane 1 is known
as the epipolar line of c. This line is the set of all points of 1 which can be
mapped to c through the two-camera setup described above.
To conclude, each point x 2 has a unique line associated with it, l 1 . The
same epipolar plane is also seen to be the epipolar plane of all points x 00 1
such that x 00 l. With simple geometry, one can say that, to every point x 2
there is a unique line associated, l in 1 . The fundamental matrix F encodes this
correspondence:
l = Fx,

(2.3)

where l is a vector representation of line l in P( E3 ). Referring to section (A.2) of


appendix (A) we say that every line l in P( E3 ) corresponds to a plane through
origin in E3 and the normal vector of this plane is denoted by l here. Hence this
representation is unique upto a non-zero scalar multiple, which conforms well
with the relationship given above. This is a point-line correspondence between
the two images that solely depends on the relative orientation of the two cameras. It is just another perspective of the point-point correspondence of equation
(2.2). The geometric description of homography we saw in previous section is
constrained mapping of current mapping, as is evident from figure (2.2). In other
words, the point correspondence pairs through geometric description are a subset of the correspondence pairs through the algebraic definition we discussed in
present section. In summary, this section builds the framework of epipolar geometry through which two images have point mappings realized through scene
points.

2.1.3

Some properties of the fundamental matrix, F

The fundamental matrix is of rank 2, unique upto a non-zero real scalar. Certain
decompositions and properties of this fundamental matrix are enlisted below for
a quick reference. Detailed discussions on properties and different interpretations
can be obtained from [1, 8, 7]:
1. If P1 and P2 are the projection matrices1 , of two cameras then F = [e2 ] P2 P1 .
2. If the relative orientation and position between the two cameras are defined
1

A projection matrix of a camera is discussed in appendix (B.1).

16

by rotation R and translation t,


F = K T [t] RK 1 .

(2.4)

3. If the scene contains a plane and the point mapping through the plane is
defined by the homography H,
F = [e] H,
were e is the epipole of the image plane 2 of the second camera and H is
defined such that
x = Hx 00 , x 2 , x 00 1 .

(2.5)

The second property is helpful for an intuitive grasp of the setup. The fundamental matrix maps points from one image to the other albeit upto a certain
ambiguity. The points are specified in local coordinate systems2 . The decomposition is though specified in terms of R and t which can be seen as being external
or specified in absolute coordinate system as compared to the image and scene
planes involved. This enables us to infer from an algebraic point of view how
does the change in R and/or t affect the change in point mapping. For more clarification, we can put equations (2.2) and (2.4) together:
x T K T [t] RK 1 x 00 = 0.

(2.6)

Intuitively we see that this equation describes a relationship between point


mappings and relative orientation between the two cameras. Such an interpretation will be useful for the approach we have devised for pose estimation, as
the aim is to estimate R and t from various feature correspondences. For want
of deeper insight, there are two questions which need to be answered related to
equations (2.5) and (2.6) in chapter (3). These answers help is a better understanding of the single stage geometric approach for pose estimation, taken up in chapter
(3). Next we take up both the questions one by one.

2.1.4

Question on homography generated in a one camera setup

Before taking up the problem with two cameras, we consider a situation with just
one camera and the scene plane 1 . For a given relative orientation of the camera
2 To every plane(image or object) we fix an internal cartesian coordinate system. When we talk
of calibration matrix being fixed, we mean the coordinate system as well.

17

Figure 2.3: A one camera setup and its question


3

with respect to the scene plane 1 , we can have a homography H representing

the mapping 1 4 . Thus given a relative orientation of the camera and the
planes, we can construct a unique homography. This statement is well proved and
discussed in depth in the textbook, [1] by Hartley and Zisserman, and which we
accept here without proof. The actual question is inverse of the above statement:
For a given homography can we orient the camera and scene plane in order to induce the
given matrix?". If we have fixed coordinate systems in both the planes, the given
homography actually translates to a euclidean problem. The homography, thus
gives us four point correspondences5 between two planes 1 and :
ai bi , ai , bi 1 , 1 i 4.

(2.7)

Thus the problem is about finding an orientation between the camera and scene
plane such that the point correspondences as mentioned above are obtained. One
can show that not any given homography (or a set of four point correspondences) can be
represented by an arrangement of the camera and the scene plane. It amounts to getting
the right representation and at the same time reducing the number of unknowns
and the number of equations in play. Once the basic arrangement is laid out, the
3 Following

discussion on cameras in section (1.1), by camera, we mean a model comprising of


centre O1 , its image plane and the calibration matrix, K fixed as well as known.
4 One more point to note is, we can fix any coordinate system in and planes. Thereafter a
1
change of coordinate system in any of the planes amounts to multiplying the obtained homography matrix by an invertible matrix of coordinate transformation. In fact calibration matrix is for
the same reason, to transform the coordinates from one coordinate system to another.
5 The point correspondences are also assumed to have been measured in the pre-decided euclidean coordinate system.

18

reason for such a constraint is explained.


Such an euclidean arrangement is illustrated in figure (2.3). Here we have the
camera cam(O, , K ). A calibrated camera means that the relation between the
local coordinate system of and the global coordinate system is fixed. In figure (2.3) , the origin of coordinate system in is O plane and the origin of global

coordinate system is O, the line OO plane is perpendicular to . As well as the


axes x y of global coordinate system being parallel to the x plane y plane axes of
the plane . This information fixes the orientation of the plane with respect
h
iT
to the origin O and also the relationship of point P = u plane v plane with the
global coordinate system. P as defined in the global coordinate system will be
h
iT
P u plane v plane f , where f is the distance of O from the plane . In terms
of polynomials, we can specify the same setup as a fixture of three quantities viz:
f and the distances of two arbitrary points6 P1 , P2 from O. These constraints
fix the orientation and the position of plane with respect to the origin O. The
calibration matrix encodes this information in the form of upper triangular matrix K, but the equations help us understand the conditions that control the image
formation in a simple pin-hole camera.
With the basic setup with us, the point correspondences can now be defined as
mentioned before. Given four such point correspondences as labelled in equation
(2.7), we have to orient the plane 1 relative to camera cam(O, , K ).
Orienting 1 in R3 to construct the desired homography
The way point correspondence between and 1 is defined, points ai , i = 1, ..., 4
in plane are mapped to bi = i ai in 1 , where i is a scaling factor for point
ai . Then points i ai have to lie in the same plane, . Further, the points bi are
measured in a local coordinate system and hence their positions are represented
by five distance constraints. In other words, five inter-point distances,
dist(b1 , b2 ), dist(b1 , b3 ), dist(b2 , b3 ), dist(b3 , b4 ) and dist(b2 , b4 )
are known, where dist( x, y) represents euclidean distance between two points x
and y in R2 . Hence we have six polynomial constraints in four variables, i , i =
1, ..., 4. This proves the fact we stated before that not all homography mappings
can be realized by a relative orientation of the scene plane with respect to the given
camera. We have an interesting result to further reinforce this fact, by Poncelet,
6 The

two arbitrary points ought be specified in the local coordinate system. So we can select
T

T
P1 = 1 0 and P2 = 0 1 .


19

[2] which is stated next.

Figure 2.4: Geometric description of poncelets theorem, figure from [2].


Poncelets theorem: A version of the famous Poncelets theorem mentions that
When a planar scene is the central projection of another plane(image plane), the
planar scene and the image plane stay in perspective correspondence even if the
scene plane is rotated about the line of intersection of the image and the scene
planes. The center of perspectivity moves in a circle in the plane perpendicular to
this line of intersection".
For our requirements we can translate the same theorem as Given an orientation
of the scene plane and the camera(consisting of the center, image plane and the calibration matrix, with a fixed coordinate system) inducing the given homography, any further
change in the relative orientation of the scene plane with respect to the camera will change
the homography."
This fact is an important point towards building up the original problem. For it
shows that in order to maintain the same homography in spite of change in orientation of scene plane with respect to the image plane, the camera centre also needs
to move with respect to image plane(specifically in a circle). This means if we attempt to keep the distance of the camera centre from image plane fixed, no two different
orientations of scene plane can give the same homography.

2.1.5

Question on homography generated in a two camera setup

Adding one more camera to the above arrangement, we have two cameras and
the scene plane, . We assume 1 and 2 are image planes of two cameras. Any
20

orientation so formed would give us a homography H between the two images


1 and 2 . The question we are asking is the inverse (as we did for Question 1
above):
Given a homography, H, can we orient the two cameras and the scene plane such that the
H is induced between the two image planes, by point transfer through the scene plane?.
This means that given a homography mapping H : 1 2 , we can arrange the
three entities so as to obtain H1 : 1 , H2 : 2 and
H = H2 H11 .

(2.8)

The orientation so obtained is the pose between the two cameras consisting of
rotation R and translation t. An exact dependence of R and t on H as well as on
epipole e(of image plane 2 if H is defined as in equation (2.5)).
Algebraically the relation is specified as :
R = 1 (K 1 HK + K 1 ev T K )

(2.9)

t = K 1 e,

(2.10)

where is a non-zero scalar, and v is a parameter vector uniquely specifying the


orientation of the scene plane in space. Next we derive these two equations.
The arguments follow a lemma stated in [1] and which we state here,
Lemma 1. We know that a fundamental matrix is of rank two. It can be decomposed as
F = [e] H as we have seen earlier. Such a decomposition, given F, is not unique. Hence
this lemma says that if F has two decompositions,
F = [e] H = [e1 ] H1 ,
then e1 = e and H1 = 1 ( H + ev T ) for a non-zero scalar and a vector v in R3 .
Now, if we assume that the relative orientation between the two cameras with
same calibration matrix is represented by R and t, the projection matrices of the
two cameras are P1 = K [ I3 |0] and P2 = K [ R|t]. A property stated in [10] says
that with projection matrices given in this form, fundamental matrix would be
decomposed as F = [Kt] KRK 1 . Let us assume that for the same camera setup,
point e is the epipole of second image and homography between image planes
of the two cameras through a scene plane is H. Hence fundamental matrix can
be alternately decomposed as F = [e] H. Thus we can apply lemma (1) with F

21

having two decompositions, F = [Kt] KRK 1 = [e] H to get:


Kt = e,
KRK 1 = ( H + ev T )/.

(2.11)

Rearranging the terms, we get equations (2.9 ) and (2.10).


Using these equations as a base, we hypothesize that a given H can help us to
identify the specific R and t in form of some conclusions:
1. If we keep the relative orientation of the two cameras the same and change
the plane position, the homography, H changes alongwith upto non-zero
scalar multiple.
2. If we keep the plane position fixed with respect to the coordinate system of
the first camera, it is not possible to obtain a relative orientation of the two
cameras so that a given homography is formed. This is obvious from the
above two equations in which R, t, e and are unknowns, nine in all. But
we have twelve algebraically independent equations. Hence not for every
H is a solution R and t, guaranteed.
The same inference can be seen through the breakup of homography between the two planes 1 and 2 in form of homographies H1 and H2 . From
the discussion in question-1, we see that fixing a relative orientation between
a scene plane and the image plane, corresponding homography gets fixed.
Hence here as scene plane is fixed with respect to first camera, H1 is fixed.
Assuming that H is given to us as well, from equation (2.8) we have
H2 = HH1
and hence H2 is also fixed. Applying the discussion in question-1 again, we
see that not every homography H can be obtained through relative orientation between a scene plane and a camera. Thus for some values of H2 , we
have no possible relative orientation between two cameras, R and t possible.
3. If we have a solution R and t for a given homography, H and a given plane
position, then changing R and t would invariably change H. Two distinct
relative orientations would generate different homography through the same
scene plane.
These two equations motivate a method of pose estimation, R and t, from conic
correspondence which would put constraints on H, e and v and forms the basis
22

for one approach to pose estimation, taken up at the end in chapter (3). But it is
purely an optimization task though there is some possible of future work on it.
This thesis focuses on a different approach that involves one defining equation
instead of two here. We can combine this two equations by eliminating e. The
equation so formed forms the basis of our geometric approach. This equation has
been solved through optimization tool as well, but with results not good enough,
we create a geometric design and estimate R and t. Discussion on this design is
given in section (3.4) of chapter (3).

2.2

Conics

The epipolar geometry is laid out in previous section. It defines the point correspondences between the two images. Such point correspondences lead to correspondences between more complex features of images. The main focus of this
thesis being the use of conic correspondences for pose estimation, it is worthwhile
investigating the formulation of conics, its basic properties and mathematical definitions for a conic correspondence. A conic is a second degree curve in a plane
described by a quadratic equation as its solution set:
ax2 + bxy + cy2 + dx + ey + f = 0.

(2.12)

This is the euclidean plane equation. Its corresponding representation in P( E3 ) is


obtained by homogenizing equation (2.12) using a third variable as:
ax2 + bxy + cy2 + dxz + eyz + f z2 = 0.

(2.13)

The same equation can be encoded using a symmetric matrix:


x
h
i


x y z b/2 c e/2 y = 0.
d/2 e/2
f
z

b/2 d/2

b/2 c/2

(2.14)

The matrix C = b/2 c


e/2 would define the conic upto a non-zero scalar
d/2 e/2
f
multiple. Using dual notation, we would use C to mean both, the set of points of
the conic and its defining equation. We use this notation to classify conics by inspecting the matrix C. Ideally in a euclidean plane, we can have either degenerate
23

or non-degenerate conics:
1. A degenerate conic is the one in which rank(C ) is less than three. In this case
h
iT
the conic reduces to either a lone real point ( 0 0 0 ), a set of two lines

( x = y) or a single line counted twice x = 0 in P( E3 ).


2. A non-degenerate conic is the one with a full rank matrix. We find all versions of a non-degenerate conic, viz parabola, hyperbola, ellipse and circle to
be projectively equivalent. This means that one non-degenerate conic can be
transformed into another through a projective morphism. Each of the three
forms parabola, ellipse and hyperbola are characterized by where does the
line at infinity meet them. A hyperbola meets the line in two distinct points
at infinity, a parabola is a tangent to the line at infinity and the ellipse doesnt
intersect the line at infinity at all. So every projective morphism changing
one non-degenerate form to another is equivalent to moving the line at infinity. But this is not the case in affine or euclidean classification.
In this thesis we focus on non-degenerate conics. As is evident from equation
(2.13) there are six coefficients to estimate which are unique upto a non-zero scalar
multiple. Hence five distinct points are sufficient to determine a unique conic.

Figure 2.5: Conic correspondence through a projective transformation.


Projective morphism of conics:
A projective morsphism would transform one conic into another. Referring to the

24

figure (2.5), we encode this relation as


H T C2 H = C1 ,

(2.15)

where H = A2 A11 , A1 and A2 are the homographies of planes 1 and 2 with


respect to the scene plane and is a real scaling factor. Matrix H denotes the
homography between 1 and 2 .

Figure 2.6: A cone with its apex at origin, image from [1].

2.3

Quadrics

A Quadric is a surface defined by a quadratic polynomial in four homogeneous


variables in P( E4 ). It is the set of points which satisfy the relation
h

i h
iT
x y z w S x y z w =0

(2.16)

where S is a real symmetric 4 4 matrix defining the solution set. The above
definition tells that the surface is a quadratic surface. Just like conics, various
classes of quadrics can be studied through its defining matrix S. Henceforth we
shall denote a quadric as a set of points by its defining matrix, S.
A quadric has certain fundamental properties which can be read from chapter-3
in the book [1]. For the sake of reference below are certain properties which we
would need in coming chapters:
25

1. Every real symmetric matrix, S uniquely corresponds to a quadric upto a


non-zero scalar multiple. Hence nine points in general uniquely define a
quadric.
2. If S is singular, the quadric is said to be degenerate just like we mentioned
for conics in previous section. A cone at the origin defined as the set of
h
iT
points x y z w such that
x 2 + y2 = z2
is an example of a degenerate quadric which we use in this thesis. Such a
cone is shown in figure (2.6).
3. The intersection of a plane with a quadric Q is a conic C. Computing the
conic can be tricky because it requires a coordinate system for the plane. In
lemma (2) we select one such coordinate system to estimate the conics C1 ,
C2 and C, as plane sections of cones.
4. As is for conics, a projective morphism transforms a quadric into another
quadric. Let us consider a projective morphism f : P( E4 ) P( E4 ) as defined in appendix (A.2.3) represented by a 4 4 real invertible matrix H.
Then the transformed set of points represented by Q0 = H T QH 1 is also a
quadric.

2.4

Summary

In this chapter we introduce the epipolar geometry framework. We start with a


two camera system, and derive two important equations that encode the relationship of relative orientation of one camera with respect to the other camera, R and
t on a conic correspondence. These two equations lead to three inferences about
how does a homography mapping change with change in R and t between the two
cameras. We present arguments in two different ways to arrive at the same conclusion regarding this dependence, one using the geometry of Poncelets theorem,
(2.1.4) and the other through the equations (2.9) and (2.10). With the focus of this
thesis primarily being on conics, next we introduce conics, their representations
as symmetric matrices and how conics would be transformed through projective
morphisms of P( E2 ) spaces. Discussion on cross-ratios of points on a conic is included for a different perspective on the group of homographies that preserve a
conic. Similar argument holds for the subgroup of homographies that transform
26

a given conic C1 to the second conic C2 . Lastly we give a brief introduction to


quadrics as homogeneous surfaces in R4 or as projective surfaces in P( E4 ), defining along the process, a cone.
Next chapter, (3) introduces our proposed method for pose estimation. The
approach suggested is an estimation of R and t, through a geometric construction
of a two camera setup. Hence we build the setup step by step and propose an
algorithm for the same. With results of experiments and analysis, on both real
and synthetic data, we compare the performance with conventional optimization
approach of solving pose for such a setup.

27

C HAPTER 3

Geometric approach to pose estimation from


one conic correspondence
In this chapter we start with two equations that relate pose parameters, R and
t to homography(H), epipole(e) and a vector(let us denote it by u) defining the
scene plane, which are important constructs of the epipolar geometry. Using a
constraint developed from one conic correspondence(as discussed in section (2.2)
), alongwith two assumptions which we introduce and justify in section (3.4), we
obtain a matrix equation that directly encodes the dependence of R and t on the
two image conics and u. The simplified equation is indirectly solved through a
geometric construction, developed in section (3.4). Sample experiments are performed for this proposed geometric approach on both synthetic and real data,
with the results listed in sections (3.5.1) and (3.5.2). Next two sections simplify
the above discussed set of equations to give us one defining equation, we mentioned above.

3.1

Dependence of pose on conic correspondence and


vector defining the scene plane

Let us restate the two equations that define R and t, (2.9) and (2.10) in terms of
epipolar geometry,
R = 1 (K 1 HK + K 1 ev T K ),
t = K 1 e,

(3.1)

where
1. R and t are the rotation matrix and the translation vector respectively describing the orientation of camera cam(O1 , 1 , K ) with respect to cam(O2 , 2 , K ).

28

Precise definition of pose as it depends on R and t and how it maps camera


1 to camera 2 is given in section (1.2).
2. H is the homography matrix that represents the mapping of points from
image plane 1 to the image plane 2 . This mapping is through the third
scene plane, via point transfer, described in section (2.1.1).
3. Point e is the epipole in the second image(or image plane 2 ). e being measured in local coordinate system, is unique upto a non-zero scalar multiple.
4. K is the calibration matrix of the two cameras. It is in fact unique upto a
non-zero scalar multiple. One can see from the above equations that the
non-unique representation of e and H is compensated by the scalar and
the non-unique representation of K matrix.
5. Vector v defines the position of scene plane , uniquely upto a non-zero
scalar multiple.
6. Scaling v up(or down) can be compensated for, by scaling t down(or up)
respectively keeping R and H the same.
Eliminating e from the two equations (2.9) and (2.10) we have

Rt

vT K
2

K 1 HK
.

(3.2)

Let us denote the quantity by v T K/(2 ) by u T . Further, K 1 HK would be another


invertible matrix unique upto scalar multiple and hence is in one-to-one correspondence with the homography matrix, H. One important point to note is that
H describes point mapping between the two images with the points measured
in local image coordinate system. Hence K 1 HK represents the same point mapping between the two images, but in another local coordinate system that has its
x y axes parallel to the cameras global coordinate system. This implies that the
matrix K 1 HK/ represents the same homography matrix1 . We rewrite the above
equation to get,
K 1 ( R tu T )K = H/ or R tu T = H 0 ,

(3.3)

where we denote K 1 HK/ by H 0 and To describe this equation in one line is


to say, given the relative orientation of camera cam(O1 , 1 , K ) with respect to camera
cam(O2 , 2 , K ) and the position of the scene plane, in a global coordinate system, the
1 This

is one of the many places in this thesis where the fact that the cameras are calibrated
would be used to simplify the situation.

29

point correspondences between points in 1 and 2 through the points of the scene plane is
mapped through the homography H. The above equation is proved in a different way
by Hartley, [1]. It can be shown with some algebra that u represents the position
of the scene plane . If is represented by the solution set of the equation

h

x y z

iT

R |m1 x + m2 y + m3 z + 1 = 0, m1 , m2 , m3 R .
3

(3.4)

h
iT
Then u is the vector m1 m2 m3 uniquely defining the position of . Henceforth we shall alternately denote the plane defined by a vector u as above, by
the notation u .

3.2

Conic correspondence

The scene plane contains conic, C whose images are C1 and C2 in planes 1 and
2 respectively. These two conics are measured in the local coordinate system.
Then we have the transformed conics as C10 = K T C1 K and C20 = K T C2 K as representations of the two conics in a transformed local coordinate system in which the
x plane y plane axes are aligned to the x y axes of the cameras coordinate system
and the origin O plane point of intersection of normal vector to . We can use the
equation of conic correspondence stated in equation (2.15),
H T C2 H = C1 ,
to form the new constraint

( R tu T )T K T C2 K ( R tu T ) K T C1 K = 033 .

(3.5)

This equation transforms the problem of pose estimation from conic correspondence into a problem of estimating R, t and u from a set of five equations.
Though the matrix equation has six polynomial equations in all, its elements being unique upto non-zero scalar multiple, we have five equations, or by introducing one more variable , we have six equations but the additional variable, . As
evident from the equation (3.5), t and u appear as a scalar product form. We need
to estimate u upto a scalar multiple. This argument reduces the variable set to R,
h
iT
t, u = 1 n2 n3 and : nine parameters in all from six equations. In order to
reduce the number of unknowns further, we introduce two assumptions,
1. Scene conic C is a circle in the global coordinate system.
30

2. The translation vector lies in a particular plane. Let us denote it by w and


its defining normal vector by w.
The first assumption is easily realized by indoor scenes and to an extent by outdoor scenes. For example, a scene comprising of household artifacts is quite likely
to contain circular cross-sections in form of bottle-mouths, cups, glasses, door
knobs, objects of art and craft that contain circular arcs and curves, holes in walls
etc. And the complete circular curves dont need to be visible, partially visible
curves can be fit to curves with considerable accuracy. In most of the cases, the circular objects in scenes would be solids and more like circular discs which implies
that they would be wholly in front of the camera while imaging. This means that
these circles would be always projected as ellipses. We have many tools available
for detecting an ellipse from an image and then fit a polynomial to this ellipse.
One such tool which we use for our experiments is developed by Prasad, [15].
The second assumption is not so commonly fulfilled as is the first one. But many
times it so happens that the camera is leveled and held on a tripod stand, even
as it moves. This fact can be used to estimate the plane that contains translation
vector. Hence in such cases, the plane containing the translation vector is already
known.
These two assumptions further reduce the number of variables in equation
(3.5). In next section (3.3) we prove the lemma (2), which says that u can be estimated as a set of finite solutions, each unique upto a non-zero multiple. For
this we derive two equations (3.16) and (3.17). This means that out of the three
parameters of vector u, two are estimated through the lemma. Thus we are left
with seven parameters and six equations. Next, the second assumption reduces
the parameter set by one more variable. In summary, by employing two assumptions, we have a fully determined set of seven polynomials in seven variables.
Rewriting the constraint equations we get,

( R tu T )T K T C2 K ( R tu T ) K T C1 K = 033 ,
w T t = 0,

(3.6)

where u, C1 , C2 and w are known and R, t and are to be estimated. If we consider the geometry described by the above equations, we can intuitively note that
all of the seven polynomials are algebraically independent. This means that for
non-trivial cases, unique solution(s) exist. These equations form the backbone of
the approach for pose estimation we next propose.
31

The conventional approach to pose estimation is a two stage task of estimating F


from feature2 correspondences and, then R and t from F. As mentioned earlier in
section on background work in chapter (1), there is a lot of literature and study on
methods for estimating F from point or conic correspondences. But most of these
methods treat F as a single mathematical entity to be estimated at once. For the
second stage of estimating R and t, we have an algorithm proposed by Hartley
in [10], based on SVD decomposition of the fundamental matrix. A point worth
noting here is the fact that the estimation of R and t from F gives little insight
in a two camera setup in euclidean space. Additionally, the first assumption is
not encoded directly in the fundamental matrix formulation and in the way it is
estimated from point correspondences. These are the reasons why we look for
a different approach to pose estimation from conic correspondence. The reason
why assumptions affect the methods we adopt, is due to the fact that for the first
assumption of the scene conic being a circle, there are direct constraints on the
position of the scene plane. These constraints on plane position arent evident
directly from treatment of F as one quantity or even when we estimate it directly
from point or conic correspondences. The idea here is to breakup the fundamental
matrix in such a way that we have a direct relationship among the quantities describing pose, R, t, the conic correspondence and scene plane position. Equation
(3.6) encodes such a relationship. This equation can be solved through a geometric construction, with which we can estimate all possible pose solutions with
substantial accuracy. This relationship in mathematical terms is given by equation (3.6). Next we give a derivation of two equations that put two polynomial
constraints on the plane position by employing the first assumption. And though
we are looking for a geometric construction in order to list out all possible pose
solutions, in section (4.1),we give an account of a way to optimize a cost function
that encodes the equation (3.6) so that one can register the follies with such an
approach and we justify our motivation for such an approach.

3.3

Mathematical implication of the first assumption


on scene plane

Let us consider the arrangement as given in section (1.1), where only one camera
cam(O1 , 1 , K ) and the scene plane are considered. Let the conic C1 1 be
known. Given this setup we claim that there are finitely many positions of the
2 Traditionally,

features have included points but lines, conics and curves have been subsequently used for estimating fundamental matrix.

32

plane , unique upto non-zero scalar multiple, such that the scene conic, C
is a circle in the global coordinate system. This coordinate system is assumed
to have its origin coincide with O1 and the x y axes parallel to the x y axes
of the local coordinate system in plane 1 . In other words the orientation of the
camera cam(O1 , 1 , K ) with respect to the global coordinate system is represented
by R = I3 and t = [0 0 0] T .
h

Lemma 2. Let us assume be a scene plane defined by the normal vector, u = m1 m2 m3


Without loss of generality, we can assume m1 6= 0. Then we have a finite set of solutions
for variables m2 and m3 such that conic C is a circle.
Proof. Based on our assumptions, the projection matrix of camera cam(O1 , 1 , K )
is P1 = K [ I3 03 ]. From a result by Quan in [16], we know that given a projecton
matrix P1 of a camera, and a conic C1 in image plane 1 , the cone containing C1 is
given by P1T C1 P1 . Let us denote it by Q.

Q=

" #
I3
03

K T C1 K [ I3 03 ] .

Let K T C1 K be denoted as Ccal .


"

Q = P T C1 P1 =

Ccal 03
03T

#
.

Alternately Q can also denote a set of points defined as


(

xyz

iT

R3 | x y z 1

"
#
i C
h
0
3
cal
03T

xyz1

iT

=0 .

Then the conic C in , ( is the scene plane) is the intersection of with Q. Then
the intersection of with Q is given as the solution set of the equation,

1 m2 y m3 z

#
m1

03

=0

0
z

1 m2 y m3 z
y z 1
m1

"

Ccal
03T

33

(3.7)

iT


1 m2 y m3 z



m1
1 m2 y m3 z

=
y z Ccal
= 0.
y
m1

(3.8)

Thus the conic, C, in global coordinate system is obtained as the set of points

1 m2 y m3 z




iT
m
1
(
1

m
y

m
z
)

m
y

m
z

2
3
2
3
3
&
=
0
C=
.
y z Ccal

x y z R |x =
y

m1

m1

Let us have the following notations,


n1 =

m2
m3
1
, n2 =
, n3 =
,
m1
m1
m1

(3.9)

h
iT
h
iT
h
iT
o3 = n1 0 0 , o1 = n2 1 0 , o2 = n3 0 1 .

(3.10)

Then we have

1 m2 y m3 z



m1
1 m2 y m3 z

y z Ccal
= 0.
y
m1

(o3 + y(o1 )) + z(o2 ))T Ccal (o3 + y(o1 )) + z(o2 )) = 0.

(3.11)

This can be rewritten as


h

T

T
T
y
i o1 Ccal o1 o1 Ccal o2 o1 Ccal o3
T

y z 1 o2 Ccal o1 o2T Ccal o2 o2T Ccal o3 z = 0.


o3T Ccal o1 o3T Ccal o2 o3T Ccal o3
h

iT

Now we need to represent the points x y z


in a local coordinate system of . Following the definition of plane by a zero set of equation (3.4), we
select an orthonormal coordinate system that depends only on the parameters,
ni , i = 1, 2, 3. We further simplify the notations,

34

h
iT
1 + n2 n3 + n23
,
,
a
=

n
0
0
1
1 + n2 n3 + n22
q
q
k1 = (n2 + n3 )2 + 2, k2 = ( Mn2 + n3 )2 + M2 + 1,


1 1 T
n2 + n3
,
b=
n1
k1
k1 k1

T
Mn2 + n3
M 1
c=
.
n1
k2
k2 k2
M=

(3.12)

The points a, b, c so parameterized lie on plane . And the orthogonal axes are
h

n2 + n3 1 1

ab =
k1

iT

,
ac =

Mn2 + n3 M 1
k2

iT
.

One can easily verify the following ,

ab.
ac = 0, k ab k= 1, k
ac k= 1 .
From this parametrization, we have the following relationship between local coordinate vector representation [u v] T and global coordinate vector representation

[ x y z]T of the same point:


y=

u
yz
Mv
v
u
Mz y
,z =
u = k1
, v = k2
.
+
+
k1
k2
k1 k2
M1
M1

(3.13)

Substituting the values of y, z in equation (3.11) we get




u
Mv
u
v
o3 + ( +
)(o1 ) + ( + )(o2 )
k1
k2
k1 k2

T


Ccal

u
Mv
u
v
o3 + ( +
)(o1 ) + ( + )(o2 )
k1
k2
k1 k2

Rearranging the terms, we obtain a polynomial in variables u and v,




u(o2 + o1 ) v( Mo1 + o2 )
o3 +
+
k1
k2

T


Ccal

u(o2 + o1 ) v( Mo1 + o2 )
o3 +
+
k1
k2

Rewriting we get,

T

T
T
u
i l1 Ccal l1 l1 Ccal l2 l1 Ccal l3
T

u v 1 l2 Ccal l1 l2T Ccal l2 l2T Ccal l3 v = 0,


l3T Ccal l1 l3T Ccal l2 l3T Ccal l3

35

= 0.

= 0.

Mo1 + o2
o1 + o2
, l2 =
and l3 = o3 .
k1
k2
Thus the conic C in parameteric form in local coordinate system is defined as
where l1 =


TC l
TC l
TC l
u
l
l
l

2
3
1
cal
cal
cal

1
1
iT
h
i 1

T
T
T
C=
u v | u v 1 l2 Ccal l1 l2 Ccal l2 l2 Ccal l3 v = 0 , (3.14)

1
l3T Ccal l1 l3T Ccal l2 l3T Ccal l3

and hence represented by the matrix,

l1T Ccal l1 l1T Ccal l2 l1T Ccal l3

T
l2 Ccal l1 l2T Ccal l2 l2T Ccal l3 .
l3T Ccal l1 l3T Ccal l2 l3T Ccal l3

(3.15)

If we have a circle represented by the solution set of the equation,

( u a )2 + ( v b )2 = r 2 ,
its matrix representation would be obtained by rewriting the equation as

a
u


1
b
u v 1 0
v = 0.
2
2
2
a b a + b r
1

For the conic C to be a circle, its matrix, as defined in equation (3.15), has to be
of the same form as given above. Hence we have two conditions as
l1T Ccal l1 l2T Ccal l2 = 0

(3.16)

and
l1T Ccal l2 = 0,
(3.17)
h
iT
h
iT
n2 + n3 1 1
Mn2 + n3 M 1
, l2 = p
, Ccal = K 1 C1 K and
where l1 = p
( n2 + n3 )2 + 2
( Mn2 + n3 )2 + M2 + 1
1 + n2 n3 + n23
M=
, with n1 , n2 and n3 as defined in equation (3.9).
1 + n2 n3 + n22
We can solve the two equations, (3.16) and (3.17) for two variables n2 and n3 . Then
the plane is defined by the vector,
h
iT
m1 1 n2 n3 ,

36

where m1 can take any real value. Thus this solution gives us a series of parallel
planes, each of which, upon its intersection with cone Q1 gives us a circle.
In summary, this lemma defines the relationship between the normal vector u
that defines the scene plane and the conic C1 in image plane. For this we have
assumed that the conic in scene plane which has been projected onto C1 in 1 is a
circle.

3.4

Estimating R and t through geometric construction

One straightforward way to estimate R and t from equation (3.6) is through optimization of a certain cost function. Such a method is described in section (4.1)
of chapter (4) along with certain discussion on results for sample experiments we
have performed. But we consider here its shortcomings which can be observed
from results of these experiments in section (4.1.1). Optimization approaches in
general have a tendency to get stuck in local minima. Additionally it is not feasible
to estimate all possible global minimas for such a system of multiple polynomials.
Hence we look for a solution which involves fewer number of variables and equations, and takes the problem of optimization almost out of question. By almost we
mean that the search space should contain fewer global minimas and we should
be able to analytically estimate all possible pose solutions. The result is a geometric approach where we can estimate R and t with more accuracy and reliability.
For this approach, we state the assumptions used, and then step-by-step construct
the setup in such a way that the problem of estimation of R and t is transformed
into the problem of estimation of the relative orientation between two circles of the same
radius. We get multiple solutions but finite in number, of which all but one can be
eliminated by certain observations laid out in section (3.4).
Clarification on notations:
We shall henceforth use the term "camera-1" to mean the camera cam(O1 , 1 , K )
and "camera-2" to mean cam(O2 , 2 , K ). If a vector u determines the linear equation defining the set of points lying on a plane as observed from equation (3.4),
then we shall alternately denote the plane and hence the set of points as u . Transformed versions of the same geometric quantity or quantities of the same type
shall be suffixed by numerals, or special characters which are defined as and when
needed, e.g. two planes defined by distinct vectors u1 and u2 shall be denoted by
u1 and u2 respectively. Additionally, if we have a scaled vector ui defining
a plane ( being a real scalar), we denote the same plane, alternately, as ui ,
37

where i can be any numeral or a sub-scripted string, e.g. ua23 denotes the plane
defined the vector u a23 , being a real scalar.

Geometric construction
The two assumptions which were stated at the start of this chapter help in geometric construction forming the bulk of this approach:
1. The scene conic C is a circle in the global coordinate system.
2. The translation vector t lies in a given plane(lets denote the plane as w )
specified in the global coordinate system.
The geometry is augmented by the availability of conic correspondences C1 C2 .
The calibrated counterparts of C1 and C2 are C10 and C20 respectively. Henceforth
in this section for geometric construction we shall use C10 to mean the calibrated
counterpart of the conic C1 defined as C10 = K T C1 K. Revisiting the arrangement of
section (1.1), let us form cones Q1 and Q20 through conics C10 and C20 respectively as
shown in the figure (3.1). This diagram is basically an extension of the two camera
setup of section (1.1). The rigid body motion that defines the relative pose of cone
Q1 with respect to cone Q2 is considered to be R and t. C10 is the intersection of 1
with Q1 and C20 is the intersection of 2 with Q20 . C is the intersection of with
Q1 or with Q20 . In fact the cones Q1 and Q20 have to intersect in a planar conic. A
result is stated by Quan in [16] that the two cones must intersect in a quartic curve
which disintegrates into two second order planar curves of which one is the scene
conic C.
Step-1
Let us apply rigid body motion3 on the structure formed from camera-2(shown
in yellow in the above figure) and cone Q20 . Camera-2 has its coordinate system
as origin O2 and triplet of axes { Xc , Yc , Zc }. In other words, we first rotate Q20
through rotation matrix R and then translate it through the translation matrix
t. This motion results in cone Q2 and circle C is transformed into circle C 0 . As
a result, the two cameras in this case would coincide and so would the image
planes 2 and 1 . We see further that the circle C and the circle C 0 have the same
radii. This situation is shown in the next figure (3.2). This rigid body motion is
precisely the relative pose we have to estimate. The idea lies in estimating the
3 The

rigid body motion is with reference


to the ocoordinate system of camera-1, which consists
n
of origin O1 , and the triplet of axes Xw f , Yw f , Zw f .

38

Figure 3.1: Two cones,Q1 and Q20 describing a conic correspondence.


relative orientation between the circles C and C 0 . The series of steps to follow
demonstrate a geometric construction of estimating the pose once the two circles
are known in R3 . The two conics C10 and C20 are known which give us two cones
Q1 and Q2 respectively. We apply lemma (2) to these two cones to get two sets of
h
iT
plane positions of the form u = m1 m2 m3 denoted by U1 and U2 . The two
sets are defined as:
U1 is the set of planes u1 such that the intersection of plane u1 with cone Q1
is a circle, and U2 is the set of planes u2 such that the intersection of plane u2
with cone Q2 is a circle. The following property can be inferred from the proof of
lemma (2):
Lemma 3. If u1 U1 , then u1 U1 , R {0}. Similarly if u2 U2 , then
u2 U2 , R {0}.
Proof. Let us apply lemma (2) to C1 and its cone Q1 . Inspecting the form of equations (3.16) and (3.17) so obtained, we see that they are homogeneous polynomials
39

Figure 3.2: Rigid body motion of the cone Q20 onto Q2 .


in three variables, m1 , m2 and m3 . By change of variables, we transform them into
polynomials of two variables n2 = m2 /m1 and n3 = m3 /m1 . Hence scaling u1
by does not have any effect on n2 and n3 . Similarly we can argue for conic C2
and its cone Q2 . Thus we have the result that if u1 U1 , then u1 U1 , R.
Similarly if u2 U2 , then u2 U2 , R.
For every plane u1 U1 , we can always find a plane u2 U2 such that the
radius of the circle of intersection of u1 with Q1 is the same as the radius of the
circle of intersection of u2 with cone Q2 4 . Let us define the radius of intersection
of u1 plane with Q1 as ru1 and the radius of the intersection of u2 with Q2 as ru2 .
This means for every u1 in U1 we have u2 in U2 such that ru1 = ru2 .
This relationship defines a pair of planes. This pair is important as every such pair
can give a possible pose estimation and for every plane u1 we have two planes
in U2 which form such a pair viz u2i and u2i for a scalar i N. Thus the set of
all possible pairs of planes which can give us a solution can be defined as
Usol = {(u1 , u2 ) U1 U2 |ru1 = ru2 } .
4 This

(3.18)

can be seen as every cone extends to infinity and the radius can take any positive real
value by appropriately positioning the plane.

40

This brings to us another interesting property which we have proved as a lemma


(18) in appendix (C).
Step-2
Once we have the two sets U1 and U2 in order, the set of all possible solutions to
the pose estimation problem forms a subset of U1 U2 which has been defined in
equation (3.18). Let (u11 , u21 ) Usol be one such solution pair. In other words
ru11 = ru21 . This gives us the pair of planes as defined before. Let the circles be
defined by the matrices as

C= 0
a1
and

C0 = 0
a2

0
1

b1
0
1

b2

a1

b1
,
2
2
2
a1 + b1 r1

(3.19)

a2

b2
.
2
2
2
a2 + b2 r1

(3.20)

The circles are in a specific local orthonormal coordinate system which solely depends on the plane position in R3 . Matrix C represents the circle of intersection
of cone Q1 with u11 and C 0 represents the circle of intersection of cone Q2 with
u21 . Their radii being the same we denote them by r1 . The two circles can be seen
as one being a rigid body motion of the other. Let us have the relative orientation
defined as
Rx + t = y, x C, y C 0 .
Further we know that cones Q1 and Q20 intersect in C. Hence, applying the same
rigid body motion, R and t on cone Q20 , we should get the cone Q2 , as shown in
figure (3.2).
Step-3
The next step is to map circle C to C 0 through a rigid body motion comprising
of rotation R and translation t, on C. From the two circles representations (3.19)
h
iT
and (3.20) we have the centers of two circles represented as a1 b1 for C and
h
iT
center a2 b2 for C 0 . But these center representations are in a local coordinate
system, unique for each plane. Their representations in global coordinate system
are obtained through equation (3.13) as shown in next equation. We shall denote

41

the global coordinate representation of the two center points as xc1 and xc2 . Then
equations (3.12), (3.9) and (3.13) give us,
M b
a1
+ 1 1
k11
k12
a
b
zc1 = 1 + 1
k11 k12
1 m12 yc1 m13 zc1
,
xc1 =
m11

yc1 =

(3.21)

for plane u11 and


a2
M b
+ 2 2
k21
k22
a2
b2
zc2 =
+
k21 k22
1 m22 yc2 m23 zc2
xc2 =
,
m21
yc2 =

(3.22)
h

for plane u21 . The plane u11 is assumed to be defined by the vector m11 m12 m13
h
iT
and u21 is assumed to be defined by the vector m21 m22 m23 . From equations (3.21) and (3.22), centers of the two circles are represented in global coordinate system as
h
iT
h
iT
xc1 = xc1 yc1 zc1 , xc2 = xc2 yc2 zc2 .
Primary condition in mapping circle C to C 0 is that the center xc1 should be mapped
to xc2 . Second condition is that the translation vector t should satisfy the assumption w T t = 0 where w is pre-specified. These two conditions lead to our next step
of geometric construction. figure (3.3) depicts the geometric construction for estimating R and t by mapping one circle C to C 0 . Steps four and five next describe
and solve this construction.
Step-4
We have a plane w1 through point xc2 such that w1 k w . Then the point
xc1rot = Rxc1 (lets assume) ,
should lie on w1 and
t = xc2 xc1rot ,
42

iT

Figure 3.3: A diagram describing the geometric construction.


which by construction would be on w1 . The distance of point xc1rot from origin
is the same as that of xc1 from origin. Hence we have the first two equations
constraining xc1rot as
w1T xc1rot + 1 = 0,

(3.23)

k xc1rot k=k xc1 k .

(3.24)

Let us denote the point on perpendicular line from origin to plane u11 and also
lying on u11 as p1 , whose coordinates depend on u11 as
p1 =

u11
.
k u11 k2

(3.25)

The plane through xc1rot and parallel to u21 is denoted as uc1rot and defined by
the vector, uc1rot as
uc1rot =

u21
.
T
( xc1rot
u21 )

The point on perpendicular line from origin to plane uc1rot and also lying on
uc1rot is denoted by p2 . Its coordinates depend on uc1rot as
p2 =

uc1rot
.
k uc1rot k2

43

(3.26)

Then the distance of p2 from xc1rot should be the same as that of p1 from xc1 , giving
us the following polynomial equation:

k p2 xc1rot k=k p1 xc1 k .

(3.27)

These equations (3.23), (3.24) and (3.27) encode the solution to the parameters R
and t. Point xc1rot obtained as a solution to the above three equations help us in
determining R with the following constraints:
Rxc1 = xc1rot ,
Rp1 = p2 .

(3.28)

Let us assume, A = xc1 p1 ( xc1 p1 ) and B = xc1rot p2 ( xc1rot p2 ) , (3.29)


where xc1 p1 represents the vector cross-product of xc1 and p1 , and similarly for
xc1rot and p2 . Then, we estimate such a matrix R as

R = BA1 ,

(3.30)

with A and B both being invertible matrices, justifying an existence of R as obtained above. Now, from the way the solution to R is designed, we can ascertain
the following from equations (3.24), (3.27) and (3.28),

k xc1 k=k xc1rot k,


k p1 k=k p2 k,

(3.31)

and the angle between the vectors xc1 and p1 is the same as the angle between
vectors xc1rot and p2 . With these facts in mind, one can easily prove the following
with matrices A and B as defined in equation (3.29),
A T A = B T B.
From this it is straightforward to note that the matrix R = BA1 obtained as in
equation (3.30) is a rotation matrix. Once R is known, t is estimated as
t = xc2 Rxc1 ,

44

(3.32)

with xc1rot estimated from equation (3.28). Thus we have estimated R and t for
one solution point xc1rot to the three equations (3.23), (3.24) and (3.27) designed
for one pair of planes (u11 , u21 ).
Step-5
The three polynomial equations have at most four real solutions, and each point
gives one pose solution R and t, leading to a maximum of four pose solutions
for each plane pair. Then above steps are repeated for all possible plane pairs

{u11 , u21 } Usol . Thus we get a set of solutions R and t obtained for all such
plane pairs. For a general case there would be more than one in such a set, of
which one would be the true solution we desire and the rest are to be eliminated. Next section describes the non-uniqueness of solutions in this set and some
thoughts on how to pick the particular solution which actually realized camera
setup.
Non-uniqueness of solution
Case-1: The three equations (3.23), (3.24) and (3.27) are polynomials of degree one,
two and two respectively. One can eliminate a variable and reduce the three equations to two. Hence the total number of possible solutions are four for each pair
of planes as an application of bezouts theorem on counting intersection points of
two curves. Hence for every plane pair, (u11 , u21 ), we have at most four pose
solutions possible.
Case-2: The second case arises due to the fact that for every plane u1 in U1 , we
have two planes possible in U2 , u2 and u2 , such that ru1 = ru2 = ru2 . These
planes have their normal vectors in opposite directions. The discussion following
lemma (3) states the same fact which in precise terms can be rewritten as:
If (u11 , u21 ) Usol has translation t as part of its solution then (u11 , u21 )
has translation t as a part of one of its solutions. So the translation vectors for
all solutions to (u11 , u21 ) are negative counterparts of translation obtained as
a solution to (u11 , u21 ). The complete relationship between pose solutions for
plane pairs (u11 , u21 ) and (u11 , u21 ) can be derived based on equations (3.29)
and (3.30) as:
t1 = t,
R1 = B1 A1

45

(3.33)

h
i
where B1 = B + 2 0 0 xc1rot p2 .
h
i
R1 = R + 2 0 0 xc1rot p2 A1 .

(3.34)

This relationship gives us a way to estimate R and t for one pair of planes (u1 , u2 )
if R and t for the pair (u1 , u2 ) are known.
Case-3: The third case arises due to the fact that if (u11 , u21 ) Rsol gives us a
solution R and t then (u11 , u21 ) gives us a solution R and t/. We prove this
fact in lemma (19) of appendix (C).
Because of the first two cases of non-uniqueness of a pose solution we have thirty
two pose solutions in all. Accounting for the third case as well, we cant estimate
the translation vector beyond non-zero scaling. Case-1 & 2 can be solved through
some point correspondences or as we show next through some observations. We
show next a breakup of how one can eliminate all but one solution.
Solution to case-1 and case-2: These two problems can be worked out by using some point correspondences. Ideally a single point correspondence should
be enough to select a true solution. But the one point correspondence we have
might be realized by more than one solution of R and t. Unfortunately, to the best
of our knowledge, there is no one-shot way of selecting the right discriminating
point correspondence. Additionally the main focus of this thesis being on minimal correspondences, we look for other ways for fixing one solution of the set
of solutions. We have tested our approach on synthetic data. The data has been
designed to model a real world scenario as closely as possible. For this we have
used the epipolar geometry toolbox, [17] in MATLAB. The point which is taken
care for is that the circle which is imagined by both of the cameras is wholly in
front of the cameras. In other words if c is the camera center, is the image plane
and x is a point on circle, then c and x are points on different sides of the plane
. This would eliminate sixteen of the thirty two solutions. The procedure is outlined next.
Condition for the scene conic to lie in front of the camera:
Consider a plane pair (u11 , u21 ) and circles, C and C 0 in these two planes. Let
centers of the two circles be xc1 and xc2 as mentioned in step-3 above. Writh
iT
ing the defining vectors for the two planes, u11 as m1 1 n2 n3 and u21
h
iT
as m10 1 n20 n30 , lemma (2) fixes n2 , n3 for u11 and n20 , n30 for u21 , which

means that the factors, m1 and m10 scale centers xc1 and xc2 respectively. Hence we
need the scales to be such that m1 xc1 and m10 xc2 lie in the front of the first camera.

This condition gives a possible range for the scaling factors. Either the range con46

sists of positive real values or negative values, based on which, we eliminate one
half of pose solutions in Usol .
Second observation is that for scenarios with small rotation angles the geodesic
distance of rotation matrix from identity matrix is the least for the specific pose
solution which is the best approximation to the true solution. This hypothesis
has been tested extensively on synthetic datasets. The distance metric used is the
geodesic metric on unit sphere, [18]:
d( R, I3 ) =

trace( L T L)

where L = (( acos((trace( R) 1)/2))( R R T )/(2sin( acos((trace( R) 1)/2)))),


(3.35)
where R is the rotation matrix whose geodesic distance from identity matrix, denoted by d( R, I3 ) is to be estimated.
Case-3 solution: Once R is known, all that is left is to find the correct value of t.
Hartley in [1] talks about the case of epipolar geometry under the effect of pure
translation. From section (2.4) we can see that scaling t would scale F correspondingly and thus the point mapping between the two images would still stay the
same. Hence, we can not estimate the scale of t simply by using the point correspondences in the two images. Translation can be determined upto non-zero
scalar multiple only.

3.5

Experiments

This section contains results of some experiments we have performed on synthetic


as well as real data for the geometric approach proposed for pose estimation in
this section.

3.5.1

Experiments for geometric approach on synthetic data

Synthetic dataset has been designed using the epipolar geometry toolbox, [17].
Not going in details of the process followed, it should be noted that a scene circle
is first chosen in R3 . Calibration matrix K is assumed to be an identity matrix.
Projection matrix P1 is the same for all examples, with the first camera assumed
to be at the origin of the world coordinate system. The projection matrix of the
second camera, P2 is chosen randomly, starting from ones with smaller rotation
angles and progressively with larger angles. One such dataset and its solution, so
obtained through our algorithm is described next.
47

Discussion on an experiment on synthetic dataset for geometric


approach
For this dataset, we estimate all possible thirty two distinct pose solutions, R and
t. From points on non-uniqueness of solutions discussed previously, we select two
pose solutions, which are shown in figure (3.4), where the true camera positions

Figure 3.4: Pose solution


are shown in green and yellow colors. First camera is shown in green. For second camera(shown in yellow), the rotation matrix, Rtrue is defined through euler
angles about the three coordinate axes as 8 about z axis, 10 about y axis and
h
iT
0 about z axis. The translation vector is set to be ttrue = 1 11 1 . Let R1 , t1
48

and R2 , t2 be the two best pose solutions selected through our algorithm. Then
the camera for pose solution R1 , t1 is shown in blue which almost coincides with
the true pose for second camera and the camera for pose solution R2 , t2 is shown
in black color.
Departure of rotation matrices for these two solutions and the true solution from
identity matrix is
d( Rtrue , I3 ) = 0.3515, d( R1 , I3 ) = 0.3516 & d( R2 , I3 ) = 1.8472.
The distances are based on geodesic distance on unit sphere between two points
R1 and R2 in SO(3) group, [18]:
d( R1 , R2 ) =k log( R1T R2 ) k F ,
where R1 and R2 are two rotation matrices.
Table 3.1: Results of single stage geometric approach on synthetic dataset. Here
Rtrue and ttrue denote true values and, R and t denote the pose solution obtained
through convergence for gradient descent scheme.
Angles with respect to x, y and
z axes

Translation vector, ttrue

10 , 0 , 0
10 , 20 , 0
0 , 10 , 5
1 , 10 , 8
30 , 0 , 0
1 , 30 , 80

0.5
0.1
0.1
0.7
0.1
1
1
3
0.1
1
11
1
0.1
1
3

0.0891
0.0980
0.0178

Geodesic distance of R from


Rtrue

Recovered
Translation
vector, t

Geodesic
distance of
R from
I3

Is
the
selected solution
with
smallest
geodesic
length?

0.2428

0.2469

yes

0.0028

0.5513

yes

0.0080

0.2760

yes

0.0321

0.3154

yes

0.8615

0.7239

yes

0.0264

2.093

no

2.1 104
7.9 106
2.3 104
1.3 104
0.1666 104
3.9 104

0.50097
0.1020
0.0989
0.6993
0.0999
1.0000
0.9993
2.9993
0.1000
0.9960
11.0096
1.0049
0.0914
1.0642
3.0429
0.0900
0.9853
0.1790

Angle
between
t
and
ttrue

One more point to note is that we estimate the translation vector only upto a
non-zero scalar multiple. Hence we scale it up with the same scalar which has
scaled the true translation of second camera for visualization purposes in figure
(3.4). Hence for this case we select R1 , t1 as the best possible pose solution taking
into consideration the observation that this solution has its rotation matrix closest
to identity matrix in geodesic sense.
This was for one experiment on synthetic dataset for our proposed approach. We
49

perform similar experiments on more synthetic datasets and tabulate the results
in table (3.1). The results so obtained justify the observations stated in solution
to cases two and three. For rotation angles which are small enough, the pose
solution chosen with the smallest geodesic distance from identity matrix is the
best approximation to the true value. But for substantially large angles as seen in
the results given in last row of table (3.1), the geodesic distance is quite large and
definitely not the smallest among all solutions in Usol . This poses a challenge on
finding such a threshold and fixing it so that for all datasets where the geodesic
distances of true rotation matrices from identity matrix are within this threshold,
of all of the estimated rotation matrices, the particular rotation matrix with the
smallest geodesic distance from identity matrix has the smallest geodesic distance
from the true value as well.

3.5.2

Experiments for geometric approach on real data

We have used a Cannon DSLR camera for real data experimentation. We calibrate
the camera using the MATLAB toolbox by Bouguet, [19]. Calibration matrix then
retrieved is

K=

3565.00387

3559.46384

968.51381

636.14655 .
1

The images captured have their dimensions as 1920 1280 pixels. Skew is zero,
the pixels being completely square. Errors in focal lengths are [15.04591 14.79244]T
and that in principal points are [12.16201 15.07881]T . Distortion coefficients estimated are
kc = [0.11590 1.97105 0.00693 0.00054 0.00000] T [0.01742 0.25227 0.00145 0.00138 0.00000] T ,

where kc(1), kc(2) and kc(5) are the radial distortion coefficients and kc(3) and
kc(4) are tangential distortion coefficients, [19] and [20]. Pixel errors in image
points through reprojection are: [0.52954 0.61128]T .
In figures (3.5) and (3.6) we have two images containing conics as projections
of a scene circle. The two conics, C1 and C2 in images are detected upto a root
mean square error of 0.0086. Table (3.2) lists out the result of pose solution obtained for the images in figures (3.5) and (3.6). The ground truth values, Rtrue
and ttrue are estimated through the calibration tool, [19] in form of extrinsic parameters5 . Errors being there in estimated calibration matrix and distortion in
5 In fact one can see that we have a calibration pattern in the two images alongwith the conics.
The set of images used for camera calibration contains these two images as well.

50

Figure 3.5: First test image


containing conic C1

Figure 3.6: Second test image


containing conic C2

the camera, we expect the ground truth values thus estimated to be inaccurate.
Hence the plane w that contains the translation vector assumed to be defined by
the vector w = [1 5 2.74233822] T is not very accurate which leads to errors
in estimated pose with respect to the ground truth values.
Table 3.2: Results of single stage geometric approach on real dataset
Geodesic distance of Rtrue
from I3

Translation vector, ttrue

0.4810

743.7650
130.3833
508.9385

Geodesic distance of R from


Rtrue

Recovered
Translation
vector, t

Geodesic
distance of
R from
I3

Is
the
selected solution
with
smallest
geodesic
distance from
I3 ?

36.2930

0.8480

yes

1.0580

0.50097
0.1020
0.0989

Angle
between
t
and
ttrue

At this point it seems quite plausible that the errors in calibration solely contribute to errors in estimated pose. In fact we would like to point out that the best
pose solution for the above real data is still quite far from the true values. To further reinforce this fact, we carry out a related experiment with part real and part
synthetic dataset.

3.5.3

Experiment of geometric approach on part real and part synthetic dataset.

We assume that the first image with conic C1 has been obtained through a camera
with calibration matrix K by imaging the scene circle obtained as one of the two
plane solutions of lemma (2) for K T C1 K. Here C1 is the conic in local image coordinate system. Hence K T C1 K is its calibrated counterpart. Using epipolar geometry
toolbox, [17] we project the scene circle back on the second cameras image plane,
with K as its calibration matrix. Conic C2syn thus obtained doesnt coincide with
the conic, C2 detected in previous experiment with real dataset. Figure (3.7) depicts the two conics, C2 (shown in red) and C2syn shown in green. The points are
51

Table 3.3: Result of part real data for investigating the error due to erroneous
calibration matrix
Geodesic distance of Rtrue
from I3

Translation vector, ttrue

Geodesic distance of R from


Rtrue

0.4810

743.7650
130.3833
508.9385

Recovered
Translation
vector, t

Angle
between
t
and
ttrue

Geodesic
distance of
R from
I3

Is
the
selected solution
with
smallest
geodesic
distance from
I3 ?

0.0168

0.4813

yes

0.0015

7.4437
1.3033
5.0907

plotted in local image coordinate system.

Figure 3.7: Difference between the two conics of real and sythetic datasets
With this new dataset, we run our algorithm and select the best solution, tabulated in table (3.3). If we continue the assumption of previous section that C2 with
other parameters kept the same, gives us the same pose solution, we have
K T C2 K/syn = K T C2syn K/, , syn 6= 0
from equation (3.5). Hence C2 and C2syn represent the same conic which has been
found to not be true(as evident from figure (3.7)). Either the calibration matrix
or the ground true values for pose, arent accurate or the conics C1 and C2 have
erroneous representation matrices. But the conic detection algorithm has errors
of the order of 103 which can be considered to be sufficiently negligible. And
R and t as estimated through the toolbox, [19], give us pixel errors of the order
of 0.1. This leaves us the calibration matrix which has substantial errors, of the
order of 10 in each of its elements. Added to these errors is the distortion which
is not included in the calibration matrix, giving us incomplete rectification while
52

constructing conic representations in camera coordinate system. One can deduce


from this discussion that the primary reason for substantial error in pose solution
for a real dataset is errors in the calibration process we have employed through the
calibration toolbox. But it is worth noting that our algorithm gives better accurate
results than conventional optimization process for synthetic datasets.

3.6

Summary

This chapter forms the core of the thesis. We start with an introduction to two
equations derived in section (1) which relate the relative pose to a conic correspondence. Based on these two equations, we devise a geometric construction
in an epipolar geometry framework simplified by two important assumptions
regarding scene conic and plane containing the true translation vector. The geometric approach thus proposed is tested upon both synthetic and real dataset.
The results so obtained are compared, analyzed and discussed in order to explain
the performance of our proposed method. In next chapter, (4) we consider two
alternate approaches to pose estimation from one conic correspondence. These
two approaches differ from the geometric method we have taken up in this chapter in the way in which we estimate the pose solution. As against this geometric
approach, these alternate approaches are based on optimization of cost functions
appropriately modeled on equations that relate pose, R, t to elements of epipolar
geometry, H, e, C1 , C2 .

53

C HAPTER 4

Alternate approaches to pose estimation


This chapter describes two techniques for pose estimation which we have considered at certain points but the results havent been as good as the ones we obtained
with geometric approach, discussed and reported in chapter (3). The first technique is based on the same set of equations, on which the geometric approach is
based, which means the two assumptions defined in section (3.2) of chapter (3)
also hold true here. But we estimate pose through a conventional optimization
scheme instead of the geometric construction. This is described in next section,
(4.1). The second approach we report here is based on a different idea, which can
be seen to be loosely based on the work of Higgins, [5], Zhang, [7] and Luong,
[8]. We employ optimization for a cost function modeled for one conic correspondence and one point correspondence. Optimization schemes have been either
gradient descent method implemented through calculation of gradient vectors or
through MATLABs inbuilt methods like lsqonlin(.). The results for both of these
implementations are comparable. Hence in section (4.1.1) we report results for
experiments of first approach through gradient descent.

4.1

Estimating R and t through optimization

The equations which define the dependence of R and t on conics C1 , C2 and the
scene plane are,

( R tu T )T K T C2 K ( R tu T ) K T C1 K = 033 ,
w T t = 0,

(4.1)

where u, C1 , C2 , w and K are known and R, t and are to be estimated. For sake
of brevity we assume C10 = K T C1 K and C20 = K T C2 K. From this we define the

54

following cost function,


E(Y, t, ) =k (Y tu T )T C20 (Y tu T ) C10 k2F +(w T t)2 + k Y T Y I3 k2F +(det(Y ) 1)2 .
(4.2)

This allows us to use the above lemma (2) for

C10 ,

giving us the vector u upto a

scalar multiple. Hence from equation (3.6), we can consider u as a known constant
and hence have to estimate all elements of t. Vector w being constant, we have
unknown variables as Y, t and . The norm for matrices considered here is the
frobenius norm:

k A kF =

trace( A T A).

We have replaced the rotation matrix, R, with a real matrix Y and additional constraints Y T Y = I3 and det(Y ) = 1. The cost function has been optimized through a
command lsqnonlin(.) in MATLAB, [21]. Results of sample experiments with this
approach are listed in section (4.1.1). With a random starting point, the behavior
of the algorithm is as expected for a conventional optimization technique. After
a certain value of cost function is achieved, the algorithm tends to get stuck in a
local minima. Additionally the final value achieved upon convergence depends
on the starting point. Due to these reasons, it is practically unfeasible to estimate
a unique solution in form of a global minima to the cost function. This is evident
from results listed in table (4.1) of section (4.1.1). With a starting point closer to
the true value, the algorithm converges to a solution which is considerably close
to the true value. But with a starting point which is considerably far from the true
values, the point reached upon convergence is far from the true solution.
One can perform optimization by explicit computation of gradient descent as
E(Y, t, )
E(Y, t, ) E(Y, t)
,
and
are:
well. These vectors,
Y
t

E(Y, t, )
= 4C20 YY T C20 + 2(tT C20 Yu)C20 tuT + 2 k u k2 C20 ttT C20 Y + 2C20 YL
Y
T
+ C20 tuT LT + L0 + 4Y T Y 4Y + 2det(Y ) R
n ( det (Y ) 1),
E(Y, t, )
= 2(tT C20 Yu)C20 Yu + 2 k u k2 C22 t + 4 k u k4 C20 t 4uT C10 uC20 t + C2 Yu
t
+ 2C20 tuT Y T C20 Yu+ k u k2 (tT C20 tC20 Yu + 2C20 ttT C20 Yu) C20 YC10 u + 2(wT t)w,
E(Y, t, )
= 2tT C20 tuT C10 u + 2(trace(C12 )) trace(Y T C20 YC10 ) tT C20 YC10 u,

(4.3)

(t T C20 YY T C20 Yu) 0


. L hasnt been further
Y
simplified here for it doesnt have a concise representation in matrix form. But it
where L = (t T C20 t)uu T C10 and L0 =

can be simplified using symbolic computation toolboxes like MATLAB or Maple.


Or its analytic expression can be derived through some tedious matrix algebra.
55

The derivations of the above partial differentials are omitted for they are quite
straightforward and elementary. Properties of matrix trace are used to simplify
the expressions resulting in the above equations. Now the parameters R, t and
can be iteratively updated through gradients given in equation (4.3). But we dont
give the exact algorithm here as such an algorithm would be quite straightforward
and not of importance here. As a matter of fact the performance is almost the same
as those obtained through optimization by lsqonlin(.) method.

4.1.1

Results and discussion

For testing purposes we run optimization on the cost function of equation (4.2)
using optimization toolbox of MATLAB. Table (4.1) lists results of sample experiTable 4.1: Result of gradient descent approach on synthetic data. Here Rinit and
tinit denote starting points, Rtrue and ttrue denote true values and, R and t denote
the pose solution obtained through convergence for gradient descent scheme.
Angle
between
tinit and
ttrue

k Rtrue Rinit k F

Translation
vector, ttrue

11.6901

2.5935

12.5894

2.1696

60.9918

1.1672

3.2769

1.5701

4.4206

1.7508

0.5
0.1
0.1
0.5
0.1
0.1
0.5
0.1
0.1
0.5
0.1
0.1
0.5
0.1
0.1

k R Rtrue k F

Recovered
Translation
vector, t

15.6250
0.7307
4.7318
15.8957
0.4495
5.0122
1.2964
1.6602
1.5375
4.4460
0.9152
0.9110
15.8957
0.4495
5.0122

Angle between t and


ttrue

cost function upon


convergence

10.1486

0.7641

11.3576

0.7361

62.0119

0.0016

0.4041

0.1697

5.3466

0.1555

2.9376

2.7815

0.3382

0.5440

0.5583

ments. The specific function in MATLAB we used for optimization is lsqnonlin(.),


[21]. We test with different starting points to find that these values are in meaning,
the same for varying starting points. We select certain starting points which are
far from the ground truth values(as can be seen for first three rows in table (4.1))
and certain points which are close enough to the ground truth values, as done for
experiments whose results are in row four and five. For starting points quite far
away from true values, the algorithm converges to a nearby minimum which is as
much far from true value. E.g. when the starting point for t is 60 apart from true
value, the translation vector so obtained upon convergence is 62 apart from true
value. The cost function achieved, (0.0016) upon convergence is small enough to
be considered a global minima. From this observation we infer that through a
56

purely optimization scheme, it is not feasible to estimate all possible pose solutions.
The function lsqnonlin(.) of optimization toolbox in MATLAB has the option of
two types of optimization algorithms inherently. One is the Levenberg Marquardt
algorithm, [22] and the other is the trust region method. These two vary in a manner which is not quite important to our problem at hand. But what is crucial is
the fact that the these algorithms dont always converge to the global minima or
even if they do, one can never ascertain fully how many distinct points of global
minima our cost function can attain. A second problem here is that the cost function which we are attempting to solve is a set of thirteen polynomials in thirteen
variables. Theoretically such a solution set has multiple solutions and through
such a pure optimization approach, it is not feasible to estimate all possible pose
solutions.

4.2

Multi-stage approach to pose estimation: a comparison

Another approach which we have given some thought to is a based on two stage
dependence of R and t on point and conic correspondences. This relationship
depends on a property of fundamental matrix that defines point correspondence
between two image planes 1 and 2 as,
a b b T Fa = 0, a 1 & b 2 .

(4.4)

A fundamental matrix can be decomposed as F = [e] H, as introduced in


section (2.4). Thus, given n point correspondences, { ai bi } as defined above,
one can think of minimizing the error
f ( F ) = in=1 (biT Fa2i ).
This gives us the fundamental matrix F from which we have the essential matrix E = K T FK 1 with the assumption that calibration matrix K, known. Once
E is known, R and t can be estimated through a relative orientation algorithm
suggested by Hartley in [10]. This has led to a method of pose estimation from
point correspondences which has been quite studied and researched in the past
and successfully implemented. Theoretically seven point correspondences of this
form are sufficient to estimate F. The points involved in this correspondence need

57

to be in general position. By a general position, we mean that no three points


should lie on the same line in any of the two planes1 . Similar to these approaches
we suggest a method for estimating F from point and conic correspondences. To
begin with let us consider one point correspondence,
a b, a 1 & b 2 ,

(4.5)

and one conic correspondence,


C1 C2 , C1 1 , C2 2 .
Let us have a scene conic C in plane , being a scene plane. Then C1 and
C2 are images of C by the two cameras on image planes 1 and 2 respectively,
thus defining the above mentioned conic correspondence. The two cameras image the same scene plane and hence there exists a homography between the two
image planes, constructed by point transfer between two image planes through
. We have defined this point transfer in section (2.1.1). Projective invariance of
conics implies that the same homography ought to transform C1 into C2 . If this
homography mapping is denoted by H, we have
H T C2 H = C1 .

(4.6)

This equation introduces a constraint on H in the form of a zero set of six homogeneous polynomials in nine homogeneous variables2 , which are the elements
of vector h. Let us assume h = vec( H ), where vec(.) is the usual vector operation
in linear algebra that transforms an n n dimensional matrix to n2 dimensional
vector formed by stacking up the columns of the matrix. The equation (4.6) is thus
transformed into a set of five polynomials given next:

h T S1 h

h S2 h

TS h = 0
f : R9 R5 : f (h) =
h
51 ,
3

h S4 h
h T S5 h
1 In

(4.7)

fact a set of three collinear points in one plane would invariably be mapped to three
collinear points in the other plane.
2 The conic and homography representations are in homogeneous coordinates, due to which
we estimate H uotp non-zero scalar multiple.

58

where Si are nine dimensional quadrics or real symmetric matrices of the form

C2

03 3

03 3

033

S1 = 033
033

033

S4 = 033

03 3

033

033 , S2 = 033
p1 C2
033

c2 /2
033

033 , S5 = 033

c2 /2

033

p4 C2

03 3
033

c2 /2
033

033

033 , S3 = c2 /2
p2 C2
033

033

c2 /2 .

c2 /2

p5 C2

C2
03 3
033

033

033

03 3

03 3

03 3

03 3

03 3 ,
p3 C2
(4.8)

These matrices define five polynomials of equation (4.7) which put five independent constraints on H. The condition that the vector h is unique upto non-zero
scalar multiplication gives us a polynomial constraint on h defined as:
f 6 (h) = (h T h 1)2 = 0.

(4.9)

We saw earlier in this section that a point correspondence is defined through a


fundamental matrix, F as in equation (4.4), and F can be decomposed as [e] H, H
being the homography between 1 and 2 via scene plane . We can modify the
the point correspondence, a b defined in equation (4.4) as
b a b T [e] Ha = 0.
This gives us a constraint termed by Hartley as epipolar constraint in [1] which we
write as our seventh polynomial in e and H:
f 7 (e, H ) = (b T [e] Ha)2 = 0,

(4.10)

where e is the epipole in image plane 2 . The epipole is a point in a projective


plane but not at infinity. The representation of e as used in equation (4.10) is
unique upto a non-zero scalar multiple. So we have one more constraint on e
to ensure the same, which is defined as the zero set of the following quadratic
polynomial,
f 8 (e) = (e T e 1)2 = 0.

(4.11)

The eight polynomials defined in equations (4.7), (4.9), (4.10) and (4.11) put eight
constraints on twelve parameters, H and e. With this accounting, we would need
four more constraints and if not, we it is not possible to get a unique pose solution. A way out of this is to add more point correspondences for each point correspondence has one polynomial constraint and correspondences between points
in general position would give algebraically independent constraint polynomials.
With exactly five point correspondences, we have a fully determined polynomial
59

system. Kahl and Heyden estimate the fundamental matrix through five point
correspondences and one conic correspondence in [11]. Now even if the system
of polynomials we have described is fully determined, each of the polynomials
so obtained would be quadratic in twelve variables. Hence we wouldnt get a
unique solution and hence the correct solution for an arbitrary starting point. Secondly, we have certain assumptions to simplify the epipolar geometry and our
primary focus is on minimal feature correspondences. Due to these two reasons
we dont go beyond one point correspondence here, but we do note that more
point correspondences can be integrated in this approach with minimal effort.

4.2.1

Optimizing the cost function

Similar to the first approach as discussed in section (4), we can implement this
approach in two ways. One is to use the MATLAB function lsqnonlin(.) to optimize
the cost function obtained as a sum of squares of polynomials in equations (4.7),
(4.9), (4.10) and (4.11):
E(e, H ) = (b T [e] Ha)2 + (h T h 1)2 + (e T e 1)2 + k H T C2 H C1 k2F . (4.12)
A second way is an algorithm keeping in mind the specific nature of function f
of equation (4.7). Here the cost function would be a sum of squares of polynomials
in equations, (4.9), (4.10) and (4.11),
E(e, H ) = (b T [e] Ha)2 + (h T h 1)2 + (e T e 1)2 .

(4.13)

This algorithm primarily aims to minimize the above cost function under a condition that h satisfies the condition f (h) = 0. As proved in lemma (17) of appendix
(C) the zero set of polynomial function f defines a fourth order manifold in R9 . Let
this manifold be denoted as MX . Our gradient descent algorithm then confines
the vector movement to MX . In summary we choose an initial point on MX (say
hinitial ) and estimate the gradient vector of the cost function with respect to h at
hinitial . Then we project the estimated gradient vector on the tangent space to MX
at the same point at hinitial . Next we update h along this vector. Such an approach
would minimize the cost function E(e, H ) and simultaneously impose upon h the
constraint that f (h) = 0 or h lies in MX . The gradients of E(e, H ) with respect to e

60

and H(or h = vec( H )) are obtained as:


E(e, H )
= 2[ Ha] bb T [ Ha] e + 2(e T e 1)e,
e

a1 [ e ] b
E(e, H )

= 2b T [e] Ha a2 [e] b + 2(h T h 1)h,


h
a3 [ e ] b
h

(4.14)

iT

where a = a1 a2 a3 . The proposed algorithm is listed below.


Initialization: Let n = 0. We set timesteps te for update in e and th for update in h.
It is possible to keep the timesteps dynamically changing with the magnitude of
gradient vectors. But for the time being we take them to be constants. The threshold for the cost function value, tolcost is preset. Starting point for e is a random
vector, but the starting point for H is chosen to lie on MX , h0 MX .
Algorithm:

1. Update e as en+1 = en te

E(e, H )
|e=en ,H = Hn .
e

E(e, H )
|e=en+1 ,H = Hn on the tangent space of MX at point hn =
h
vec( Hn ). Let the projected vector so obtained be h.

2. Project

3. Compute geodesic with starting point as hn and the starting vector as h.


The updated value of h is the endpoint of the geodesic, say hn+1 and Hn+1
is obtained as Hn+1 = vec1 (hn+1 ). The geodesic computation has been
implemented along the lines of [23] by Nowicki and Dedieu.
4. If E(en+1 , Hn+1 ) < tolcost stop and the solution is e = en+1 and H = Hn+1
else increment n by one and repeat steps one through four.

4.2.2

Results and discussion

Both of these methods estimate e and H and hence F through one point and one
conic correspondence. Unlike the approach of section (4.1), we do away with the
two assumptions regarding scene conic being a circle and the translation plane
being known. As stated previously, R and t then can be found from via an SVD
decomposition of E = K T FK 1 . But the problem here is that being an under determined polynomial system, the solution obtained through optimization would
vary with starting points and may not the be true one. We have implemented

61

these two optimization tasks on synthetic datasets, but the results arent promising enough to be listed here. The reason why we have listed this approach is that
there is some intuition in this idea. Previously we saw that MX is a polynomial
manifold and can be seen as an intersection of five quadrics in R9 . Their defining matrices,S1 , ..., S5 are 9 9 symmetric matrices with a special structure. This
fact opens up a new way of looking at this optimization task. If the geometric
structure of this quadrics intersection is studied in detail, it may be possible to
have an improved optimization algorithm which gives us more accurate pose solutions. Further, we can see that the intersection is a non-linear set of points in R9 .
Hence a point of importance, we assume, is that by identifying the subsets of this
intersection set which are linear sets, we can simplify the process of optimization.

4.3

Summary

This chapter introduces two alternate ways of estimating pose from one conic
correspondence. The first approach, section (4.1) considers two assumptions and
hence one conic correspondence is enough for pose estimation. Whereas the second approach of section (4.2) doesnt consider the two assumptions and hence
needs five more point correspondences. For both of these approaches we design
cost functions which are optimized through either MATALBs optimization toolbox or gradient descent by explicit computation of the gradient vectors. We state
a common point for these two approaches, that the optimization tasks fail to converge to the true solution point. For the first approach we carry out experiments
on synthetic data and justify this point as well.

62

C HAPTER 5

Conclusion and future work


In conclusion, we note that the geometric approach for pose estimation from one
conic correspondence gives us accurate pose solutions with the error being of the
order of 104 . The idea for this approach rests on two important assumptions.
One assumption that the scene conic is a circle and the other assumption is that
the translation vector lies in a known plane. With these two assumptions the
geometry is highly simplified due to which we are able to employ computation
toolbox in MATLAB to solve the simplified set of polynomials and obtain all possible pose solutions. This further helps is estimating the finite set of all possible
pose solutions. Next, an observation is made that the pose solution with the rotation matrix closest to identity matrix is the best approximation to the true value.
This observation helps in selecting one particular pose solution as final solution to
pose estimation problem. With experiments on synthetic data, we show that this
observation holds true and is valid for rotation matrices close enough to identity
matrix. For larger distances, we find the observation failing. This point props up
an important question which can form a part of future work. The question is that,
is a threshold possible to be computed analytically, such that for all cases with
distance rotation matrices from identity matrix less than this threshold, the observation holds true? If not then we need to find another way to select one solution
out of the finite set of pose solutions estimated through our geometric approach.
Another sure shot way is to use one point correspondence. But as mentioned in
chapter (3), we need to select one point correspondence which can be realized by
only one pose out of all the solutions. And such a selection in all general cases,
doesnt seem possible, atleast to our knowledge. So the search for a universal
method to pick one pose solution is still an open problem.
Secondly the results for real dataset have been marred by inaccuracies in camera calibration. We havent figured out the source of error, but have shown that the
error in pose solution so estimated through geometric approach is solely due to

63

the error in camera calibration. This forms a possibility that if we have highly accurate camera calibration data, the pose so obtained for such a real dataset would
be highly accurate.
Additionally, as is with problems in computer vision, there are many assumptions at work, some important and some trivial. The important ones here have
been the two assumptions we stated in opening paragraph here. Though they
are practically feasible for many scenes, a future line of work is possible where
one can generalize the assumptions and reconstruct a similar geometry, for pose
estimation. The circle assumption helps one fix two of the three elements of the
normal vector defining the scene plane. Similarly we can consider a general conic
and for that attempt to simplify the epipolar geometry.
Lastly we note that the geometric approach has been an alternative to conventional optimization approach. The problem of local minima, in optimization
approach is justified by sample experiments on synthetic datasets which has been
the primary motivation for considering geometric construction for pose estimation. Still we take-up these optimization based approaches in previous chapter
(4). The main reason is two fold. Firstly we demonstrate the above mentioned
shortcoming of optimizing a cost function modeled on the two equations that directly relate the pose R and t to conic correspondence and position of scene plane.
Secondly, we formulate a different cost function based on a combination of a point
and a conic correspondence. This doesnt need two assumptions we employed for
simplifying the geometry in geometric approach and the first alternate approach
for optimization. Additionally, the solution set is non-linear and to the best of our
knowledge, is not convex. But the solution set contains an intersection set of five
quadrics in a large, R9 space. These five quadrics have special structures, which
we might be able to investigate further for simplifying the optimization task. A
detained study of quadric intersection is a good line of future work to improve
upon the task of optimization approach for such a cost function.
To surmise, we note that all of these approaches(the geometric approach and
the two alternate approaches) are loosely based on a set of equations defining
the pose R, t in terms of various elements of epipolar geometry including point
and/or conic correspondences. Then we estimate the pose as a solution to these
equations. There have been attempts and studies of method of estimating pose
from a different perspective altogether. To cite a few, Kaminski and Shashua in

64

[24] form a grassmannian representation of a conic(in general a curve) in P( E4 )


for epipolar geometry. Similarly, Burdis et al. in [25] consider the problem of
establishing correspondence between two curves which are images of the same
curve in 3D by considering the groups of projective transformations that leave a
curve invariant in a specific sense. These two approaches lend a new meaning
to epipolar geometry of curves and might be extended to the specific problem of
estimating pose from a conic(or generally a curve) correspondence.

65

References
[1] R. Hartley and A. Zisserman, Multiple view geometry in Computer vision. Cambridge, 2003.
[2] P. Gurdjos, A. Crouzil, and R. Payrissat, Another way of looking at
plane-based calibration: The centre circle constraint, in Computer Vision,A
ECCV 2002, A. Heyden, G. Sparr, M. Nielsen, and P. Johansen, Eds.
Springer Berlin Heidelberg, 2002, vol. 2353, pp. 252266. [Online]. Available:
http://dx.doi.org/10.1007/3-540-47979-1_17
[3] Apollonius, Treatise on conic sections, T.L.Heath, Ed. New York: Dover, 1961.
[4] R. Haralick, H. Joo, D. Lee, S. Zhuang, V. Vaidya, and M. Kim, Pose estimation from corresponding point data, Systems, Man and Cybernetics, IEEE
Transactions on, vol. 19, no. 6, pp. 14261446, 1989.
[5] Longuet, A computer algorithm for reconstructing a scene from two projections, Nature, vol. 293, pp. 133135, Sep. 1981.
[6] R. M. Haralick, C.-N. Lee, K. Ottenberg, and M. Nlle, Review and analysis
of solutions of the three point perspective pose estimation problem, Int. J.
Comput. Vision, vol. 13, no. 3, pp. 331356, Dec. 1994. [Online]. Available:
http://dx.doi.org/10.1007/BF02028352
[7] Z. Zhang and T. Kanade, Determining the epipolar geometry and its uncertainty: A review, International Journal of Computer Vision, vol. 27, pp. 161195,
1998.
[8] Q.-T. Luong and O. D. Faugeras, The fundamental matrix:

Theory,

algorithms, and stability analysis, International Journal of Computer


Vision, vol. 17, no. 1, pp. 4375, Jan. 1996. [Online]. Available: http:
//dx.doi.org/10.1007/bf00127818

66

[9] R. I. Hartley, In defense of the eight-point algorithm, IEEE Trans. Pattern


Anal. Mach. Intell., vol. 19, no. 6, pp. 580593, Jun. 1997. [Online]. Available:
http://dx.doi.org/10.1109/34.601246
[10] , Estimation of relative camera positions for uncalibrated cameras,
in Proceedings of the Second European Conference on Computer Vision, ser.
ECCV 92.

London, UK, UK: Springer-Verlag, 1992, pp. 579587. [Online].

Available: http://dl.acm.org/citation.cfm?id=645305.648678
[11] F. Kahl and A. Heyden, Using conic correspondences in two images to estimate the epipolar geometry, in Computer Vision, 1998. Sixth International
Conference on, 1998, pp. 761766.
[12] Q. Ji, M. S. Costa, R. M. Haralick, and L. G. Shapiro, An integrated linear
technique for pose estimation from different geometric features, 1999.
[13] G. Wang, Q. Wu, and Z. Ji, Pose estimation from circle or parallel lines
ACCV 2007, ser. Lecture Notes
in a single image, in Computer Vision U
in Computer Science, Y. Yagi, S. Kang, I. Kweon, and H. Zha, Eds.
Springer Berlin Heidelberg, 2007, vol. 4844, pp. 363372. [Online]. Available:
http://dx.doi.org/10.1007/978-3-540-76390-1_36
[14] E. Trucco and A. Verri, Introductory Techniques for 3-D Computer Vision. Upper
Saddle River, NJ, USA: Prentice Hall PTR, 1998.
[15] D. K. Prasad, C. Quek, and M. K. H. Leung, A precise ellipse fitting
method for noisy data, in Proceedings of the 9th International Conference
on Image Analysis and Recognition - Volume Part I, ser. ICIAR12.
Heidelberg:

Berlin,

Springer-Verlag, 2012, pp. 253260. [Online]. Available:

http://dx.doi.org/10.1007/978-3-642-31295-3_30
[16] L. Quan, Conic correspondence and reconstruction from two views, Pattern recognition and machine intelligence, vol. 18, 1996.
[17] G. Mariottini and D. Prattichizzo, Egt: a toolbox for multiple view geometry
and visual servoing, IEEE Robotics and Automation Magazine, vol. 3, no. 12,
December 2005.
[18] D. Huynh, Metrics for 3d rotations: Comparison and analysis, Journal of
Mathematical Imaging and Vision, vol. 35, no. 2, pp. 155164, 2009. [Online].
Available: http://dx.doi.org/10.1007/s10851-009-0161-2

67

[19] J. Y. Bouguet, Camera calibration toolbox for Matlab, 2008. [Online].


Available: http://www.vision.caltech.edu/bouguetj/calib_doc/.
[20] D. C. Brown, Decentering Distortion of Lenses, vol. 32, no. 3, pp. 444462,
1966.
[21] [Online]. Available: http://www.mathworks.in/help/optim/ug/lsqnonlin.
html
[22] J. Mor, The levenberg-marquardt algorithm: Implementation and theory,
in Numerical Analysis, ser. Lecture Notes in Mathematics, G. Watson, Ed.
Springer Berlin Heidelberg, 1978, vol. 630, pp. 105116. [Online]. Available:
http://dx.doi.org/10.1007/BFb0067700
[23] J.-P. Dedieu and D. Nowicki, Symplectic methods for the approximation of
the exponential map and the newton iteration on riemannian submanifolds,
J. Complexity, vol. 21, no. 4, pp. 487501, 2005.
[24] J. Y. Kaminski and A. Shashua, Multiple view geometry of general algebraic
curves.
[25] J. M. Burdis, I. A. Kogan, and H. Hong, Object-image correspondence
for algebraic curves under projections, CoRR, vol. abs/1303.3358, 2013.
[Online]. Available:

http://dblp.uni-trier.de/db/journals/corr/corr1303.

html#abs-1303-3358
[26] J. Gallier, Geometric Methods and Applications: For Computer Science and Engineering, 2nd ed.

Springer Publishing Company, Incorporated, 2011.

[27] O. Faugeras, Q.-T. Luong, and T. Papadopoulou, The Geometry of Multiple


Images: The Laws That Govern The Formation of Images of A Scene and Some of
Their Applications.

Cambridge, MA, USA: MIT Press, 2001.

[28] A. Beutelspacher and U. Rosenbaum, Projective geometry - from foundations to


applications. Cambridge University Press, 1998.
[29] R. Casse, Projective Geometry: an Introduction. Oxford University Press, 2006.
[30] J. Richter-Gebert, Perspectives on Projective Geometry: A Guided Tour Through
Real and Complex Geometry, 1st ed.

Springer Publishing Company, Incorpo-

rated, 2011.

68

[31] Z. Zhang, A flexible new technique for camera calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22(11), pp. 13301334,
2000.
[32] [Online]. Available: http://en.wikipedia.org/wiki/Bezouts_theorem

69

C HAPTER A

Basics of projective geometry

A.1

Affine Geometry

In this section we introduce the geometry of affine spaces. These discussions will
lay the foundation for projective geometry.

A.1.1

Affine spaces

An affine space is a set A together with a vector space V and a faithful and transitive group action of V 1 (with addition of vectors as group action) on A. Explicitly,
an affine space is a set of points, A together with a map:
l : V A A, (, a) 7 + a with the following properties:
1. Left identity: a A, 0 + a = a (0 is a vector).
2. Associativity: , w V, a A, + (w + a) = ( + w) + a.
3. Uniqueness: a A, a : V A, 7 ( + a) is a bijection. (This justifies transitivity of the map l and faithfulness is seen in the fact that if two
elements f , g of V are such that
f + a = a, g + a = a, a A,
then f = g.
The vector space V is said to underly the affine space A and is also called a difference space. Thus we have the operator + ( defined as the map l) between a point
and a vector. Equivalently we can define an affine space in another way. We can
see it as some results that come from the above definition, considering an affine
space A and the underlying vector space is V:
1A

group action of a vector space V on a set X is a map V X X, (v, x ) :7 v.x with associativity and existence of an identity element in V.

70

Lemma 4. We can have a subtraction map defined as :


: A A V, ( a, b) 7 , a, b A, v V
where

(, a) 7 + a = b

as per the definition of + operator above. Thus we can define ( a, b) 7 b a ab = .


Then we prove here that this map is onto V and many-one.
Proof. If for two distinct vectors, v, u in V, we have ( a, b) = v and ( a, b) = u
then
( a, b) = v = u = v + a = b, u + a = b

= v + a = u + a
= (v u) + a = a.

(A.1)

Further, the uniqueness property says that 0 is the only vector such that
0 + a = a, a A.
Hence we have
v u = 0 = v = u.
This proves that the map : A A V is well defined. Also for every vector v
in V and every point a in A, we can find a point b in A such that b = v + a. Hence
( a, b) = v, b A, v V.
This proves that the map is onto. And we can find distinct points a1 , a2 , b1 , b2 in
A such that v = ( a1 , b1 ) = ( a2 , b2 ) for atleast one v in V. This proves that is
many-one.
Lemma 5. For three points a, b, c in A, ( a, b) + (b, c) = ( a, c) where is what we
have defined in lemma (4) above.
We shall accept this lemma without proof here.
Thus with a definition of an affine space in place, we can note one point that an
affine space is actually a set of points with such a vector space and represent it as

( A, V, ) where V is the underlying vector space or also represented as ( X, X , )

where X is the underlying vector space. Henceforth in this literature we will use

71

either of these notations for affine spaces. Also, if for two points a and b in A
( a, b) = , V,

(A.2)

then (b, a) = v.

A.1.2

Basis of affine spaces

Given an n dimensional vector space V, a linearly independent set of vectors

{mi }1<i<n in V form a basis of V iff for every vector w in V, w can be written
as a linear combination of {mi }1<i<n , as w = in=1 (i mi ). A similar parallelism
can be obtained in the case of affine spaces as is described next.
Let us have an affine space ( A, V, ) and consider a linear combination of the set
of points { ai }1<i<n+1 in A, as
w = in=1 (i ai )
where {i }1in R. Ideally any combination like this is not defined. But
under a special case where in=1 (i ) = 1 we can define this notation to be w =
in=1 (i ( ai o )) + o for a point o in A. This is well defined since

V
( ai o ) =
oa
i
and it is equivalent to an addition of a point in A to a vector in V.
Thus for every pair of points o1 and o2 in A,
w = in=1 (i ( ai o1 )) + o1 = in=1 (i ( ai o2 )) + o2 .
Hence w is the same whatever point of A we select as our center. Such a combination of points of an affine space, with the scalars selected in such a way that
their sum is one, is termed as an affine combination. So we can think of in=1 (i ai )as
a valid point and establish the fact that affine combinations are preserved in an
affine space an affine space is closed with respect to affine combinations.
This discussion leads to the definition of a basis for an affine space. Given
an affine space ( A, V, ) , we define a set of points {mi }1in+1 to be a basis for

n dimensional affine space iff for a point mi in A, {


m
i m j }1 jn+1,i 6= j forms a basis for V. If a set of n + 1 affine points have this property, we say that they are
affinely independent. An affine basis, thus defined for an affine space (A,V,),

72

has a property stated as the following lemma.


Lemma 6. Every point a in A has a unique representation in terms of basis points as
in=+11 (i mi ), in=+11 i = 1.
Proof. We are given a set of points {mi }1in+1 which are mentioned to be affinely
independent. Hence fixing a point mi out of these n + 1 points we get a basis

{
m
i m j }1 jn+1,i 6= j for its difference vector space V. By definition (A.1.1), the mapping

m
mi : mi (mi , a) 7
i a = , V

is a bijection between A and V. Thus we have


m
i a V for every a in A. Now mi a
has a unique representation as a linear combination of basis vectors,

n +1
m
i a = j=1,j6=i ( j mi m j )
where { j } are real non-zero scalars. Hence using definition of of lemma (4),

n +1
1
a = nj=+1,j
6=i ( j mi m j ) + mi = j=1 ( j om j ) + o,
for a point o A. The second equality holds true for any o by lemma (5)


m
i m j = om j + mi o
and hence

n+1

n +1
1
nj=+1,j
6=i ( j mi m j )+ mi = j=1,j6=i ( j om j ) omi ( j=1,j6=i ( j )) + mi

n +1
1
= nj=+1,j
6=i ( j om j ) + omi (1 j=1,j6=i ( j )) + o
) + o where = 1 n+1
= nj=+11 ( j
om
j
i
k=1,k6=i k
By our definition of a barycenter

+1
a = nk=
1 ( j mk )
where n+1 = 1 nk=i k . Thus a is uniquely represented as a barycenter of the
family of points ((mk , k ))1kn+1 A.

A.1.3

Affine morphism

0 0

0
0
Given two affine spaces ( X, X , ) and ( X , X , ), a map f : X X is said to be
0



an affine morphism if and only if we can find a linear application f : X X and
73

have

f X

60

f 0
X -X

Figure A.1: Commutativity

such that f (b) f (c) = f ( bc ), b,cX . As we know that is many-one, we can fix
0

one of b and c. Thus given a X and hence f ( a) X , f is defined as an affine

morphism if and only if we can find a linear f such that

bX , f (b) = f ( a) + f ( ab )

or v V, f ( a + v) = f ( a) + f (v).

(A.3)

This definition brings us to a result stated by Gallier in [26] which says that a
0

map f :XX is defined to be an affine morphism if and only if for every family of weighted points (( ai , i ))i I in X, the affine combination(or barycenter) is
preserved,
f (i I i ai ) = i I (i f ( ai )).
(A.4)

0
0
0
Here : XX X and : X X X are as defined in lemma (4). With some
elementary algebra we can show that equations (A.3) and (A.4) are equivalent
conditions.

A.1.4

Affine subspaces

Gallier in [26] defines an affine subspace as : Given an affine space ( X, X , ), U is


an affine subspace of X if and only if for every family of points

(( ai , i ))i I , { ai }1in U, n is the size of I,


the corresponding affine combination or barycenter, i I (i ai ) also lies in U. Faugeras
in [27] defines an affine subspace in terms of its corresponding vector subspace

as for an affine space, ( X, X , ), U is its subspace if and only if we can find a unique

vector subpsace U of X , such that (U, U , )2 is also an affine space as defined in section
(A.1.1).
This definition leads to many results :

is the same function as defined in the definition of the parent affine space ( X, X , ) in

lemma (4) as : X X X , a, b X, ( a, b)7, for a unique . For the subspace we replace X

with U and X with U , : U U U


2 Here

74

Alternate definition of affine subspace

Given an affine space ( X, X , ), a non empty set U of X is its affine subspace if

and only if for every point a in U, Ua = {


ax = ( a, x ) | x U } is a subspace of

X and additionally Ua = Ub for any b in U . And Ua = U for all a in U. Seen in

another way we can see that U can be generated from Ua as u = + a for a unique

vector in U for every u in U. Gallier in [26], names U as the direction of U.

Lemma 7. Given an affine space ( X, X , ) for any family of points, ( ai )i I in X, the


set V of all possible barycenters i I (i ai ), for a family of scalars (i )i I is the smallest
affine subspace containing ( ai )i I .
Proof. We can see how this is possible using definition of affine subspaces. As seen
previously, if ( ai )i I belong to an affine subspace V then all barycenters i I (i ai )
also have to belong to V where i I (i ) = 1. Now we only have to show that
V is the smallest subspace containning ( ai )i I . This is evident as any barycenter
of set of points which themselves are barycenters of ( ai )i I , are also barycenters
of ( ai )i I . Hence V has to contain all of these barycenters and thus is the smallest
affine subspace containing the points ( ai )i I of X.
Thus given any set of points S we can denote the smallest affine subspace
containing S as < S >. We can say that S spans < S >.

A.1.5

Invariants of Affine morphism

As mentioned earlier one of the motives of defining geometries as groups of transformations that leave certain properties invariant is that our primary interest lies
in capturing the invariant properties and thus define the transformations based
on point correspondences. In affine geometry the three properties that remain invariant across affine morphism are parallelism, incidence and cross-ratios as seen in
[27]. Below we consider only one invariant parallelism. For cross-ratios are defined
for projective spaces in next section and affine spaces being constricted versions
also keep them invariant. Incidence is a very simple extension of parallelism only.
Parallelism invariance

We can consider various figures in an affine space ( X, X , ) as affine subspaces


of A denoted by U. Hence given two affine subspaces U and V, they are defined

to be parallel if and only if U V or V U . Here (V, V , ) and (U, U , )

are the affine subspaces of ( X, X , ). With some simpler algebra we can see that
this condition is equivalent to saying that U is parallel to V (denoted as U//V) iff
75

U V or V U or U V = .
Hence the condition of parallelism invariance can be stated as:

0
Lemma 8. Given an affine morphism f : X X between two affine spaces ( X, X , )

0
0
and ( X , X , ) if U, V be the affine subspaces of X such that U//V then the correspond0

ing affine subspaces in image of f , U = f (U ) and V = f (V ) follow the property


0

U //V .

Proof. We have seen in the section on affine morphism, that f defines the corre
0

sponding mapping between the underlying vectors spaces X and X . Hence by

definition of parallel subspaces, U V or V U .

Let U V .

0
0
a f ( U ) = a = f ( a), a U

0
= a = f ( a), a V ( U V )

0
= a f ( V ).

(A.5)

0
0
Thus a f ( U ) = a f ( V ). Hence f ( U ) f ( V ). Similarly we

can show the other way round if we assume V U . Hence now all that we
need to show is that f (U ) and f (V ) are affine subspaces of X. For that we need to

show that f ( U )and f ( V ) are corresponding vector spaces of f (U ) and f (V ).


This is pretty obvious from the alternate definition of affine morphism and that of

affine subspaces. Hence once we prove ( f (U ), f ( U ), ) and ( f (U ), f ( V ), )

0
0
are affine subspaces of ( X , X , ), we can say that parallel affine subspaces are
transformed into parallel subspaces in image affine space.

A.2

Projective Geometry

Now we move on to definitions of a projective geometry. The projective geometry


is the geometry of a most general form and hence has fewer invariants but they
are neverthless extremely crucial.

76

A.2.1

Definition of a projective space

The projective space of dimension n, denoted P( En+1 ) is obtained by taking the


quotient of an (n + l )-dimensional vector space En+1 \ {0} with respect to the
equivalence relationship
0

x ' x x = x , x, x En+1 , R \ {0}.

(A.6)

Here we assume En+1 is the vector space over R. In some cases we might generalize to the complex field C and mentioned as required. Also do we see that '
is an equivalence relation here.
Many other equivalent definitions are also found in literature for P( En+1 ). One
might be interested in looking at an equivalence relation as a 1-dimensional subspace of En+1 , thus P( En+1 ) can be seen as the set of all 1-dimensional subspaces
of En+1 , or also the set of all lines passing through origin in En+1 . Different ways
of looking at the definition, but essentially the same structure is obtained. Alternate ways of describing a projective geometry are interesting enough to not miss.
Hence just for the sake of lateral view:
A projective space is a triplet (P, L, I ) such that
1. Any pair of distinct points are joined by a unique line.
2. Given any four points A, B, C, and D with no three collinear, if AB intersects
CD, then AC intersects BD.3
3. Every line is incident with at least three distinct points.
4. There exist three non-collinear points.
P is a set of points, L is a set of lines and I is an incidence structure which gives
us the information as to which line is incident on which point and which point is
incident on which line. We can derive as results from these axioms many other
properties of a projective space including its invariants. But they being out of
the scope of this text we skip it. Beutelspacher in [28] and Casse in [29] give an
extensive treatment of this topic.
3 This

axiom leads to the much talked about property of a projective plane that any two lines
must intersect at a point.

77

A.2.2

Basis of a projective space

With the definition of a projective space in place, we can talk of its basis. And
thus, a basis of an n dimensional projective space P( En+1 ) is defined as a set of
points { ai }1in+2 iff we can find vectors {vi }1in+1 such that
p(vi ) = ai , i,1in+1 and p(in=+11 (vi )) = an+2 ,

(A.7)

it follows with some elementary algebra, that to specify a point in P( En+1 ) we


need n + 2 points. Due to the constraint for an+2 in condition (A.7), if we select two
0

00

sets of vectors {vi }1in+1 and {vi }1in+1 satisfying the conditions in condition
(A.7) for the same set of projective points { ai }1in+2 in P( En+1 ) then
0

00

vi = vi i,1in+2 .
Hence every point a in P( En+1 ) represented by one set of co-ordinates in one
vector-basis would also represent the same projective point in the another vectorbasis. Both of these vector bases ought conform to the condition for the same set
of projective points. This statement needs a clarification. We normally identify a
basis of a vector space by the unique set of scalars needed to identify the point. In
this case we have no notions of linear combinations of points in projective space,
atleast until now. Hence we consider a set of points which correspond to a basis
in the corresponding vector space as a basis of our projective space. But again the
problem arises because each point in a projective space can correspond to multiple vectors. So to ensure that whichever vectors for the given points we choose,
coordinates representation of any point in projective space remains the same, we
specify conditions as given in equation (A.7). Again we can choose any vector for
specifying its coordinate representation but the projective point would remain
the same.
Now from the definition we see that every projective point maps to not one
but a set of vectors positioned at origin. Hence we can define a mapping known
as canonical projection
p : En+1 \ 0n+1 P( En+1 ),

(A.8)

such that every non-zero vector of En+1 is mapped to a unique point in P( En+1 ).
This mapping can be obtained in a simple manner as for some vector in E3 , ( a, b, c)
it is mapped to a projective point ( a/c, b/c) in P( E3 ) if c 6= 0. And if c = 0
78

the point is mapped to a point at infinity on the line passing through (0, 0) and

( a, b). Thus we can see the vectors with c 6= 0 map to points in R2 and those with
c = 0 map to points at infinity. This plain reasoning tells us that augmenting an
euclidean or an affine space with points at infinity leads us to a projective space.
More on formal treatment of this augmenting in later sections.

A.2.3

Projective transformation

With parallels of the vector spaces we can talk of transformations of projective


spaces. Lets assume P( En+1 ) is an n dimensional projective space on R, with the
associated vector space being En+1 and n0 dimensional projective space P( En0 0 +1 )

over En0 0 +1 .

A map f :P( En+1 ) P( En0 0 +1 ) is defined to be a projective transformation iff we can

find a linear map, f :En+1En0 0 +1 such that,

0
v E , f p ( v ) = p f ( v ),
0

P( En+1 ) = p( En+1 ) and P( En0 0 +1 ) = p ( En0 0 +1 ),

(A.9)

where p and p are canonical projections of the vector spaces onto the set of respective projective spaces as defined in equation (A.8).

When f :En+1 En0 0 +1 is an isomorphism(n = n0 ), the corresponding projective

mapping f : P( En+1 ) P( En0 0 +1 ) is known as a projective morphism. Certain literature in computer vision community terms such a map as a homography, though
we would use these two terms interchangeably, in this thesis, we shall reserve the
term homography to mean a projective morphism between two projective planes.
The set of all morphisms is denoted by C ( E, E). Some results follow which we
accept without proof for now.

Lemma 9. Given a mapping f :P( En+1 )P( En0 0 +1 ), there is a unique f :En+1En0 0 +1
upto a non-zero scalar multiplication.

Lemma 10. Given a mapping f :En+1En0 0 +1 , there is a unique f :P( En+1 )P( En0 0 +1 ).
Next we consider only those cases where n = n0 . This thesis focuses on problems that would need transformations between projective spaces that are of equal
dimensions.

79

Groups of projective transformations

0
0
If the mapping f :P( En+1 ) P( En+1 ) is bijective , the mapping f :En+1 En+1 is

0
one-one in a sense that u, v En+1 , u 6= v, for any R, then f (u) 6= f (v),

0
0
for any R. Similarly we can see that f is onto as every vector in En+1 would
0

project on a point in P( En+1 ) which would have a unique corresponding point in


P( En+1 ) which in turn would be a projection of some vector in En+1 .
These three results tell us that we can uniquely identify a projective mapping between two projective spaces with a unique linear mapping between their
corresponding vector spaces. Further, we know well that any linear mapping,

0
f :En+1En+1 can be alternately represented as multiplication by a matrix:
0

v En+1 , f ( p(v)) = p ( A v).

(A.10)

Hence for an homography, we need matrix A to be an invertible matrix. In other

words an homography is defined as the projective transformation where f is an


isomorphism. The set of all homographies, PLG ( En+1 ), that can be represented by
the set of all such invertible matrices, A, form a group, with the group operation
being the composition of homographies:
f , g C ( P( En+1 ), P( En+1 )) f g C ( P( En+1 ), P( En+1 )).
The identity element is seen as an identity homography defined as

v En+1 , f (v) = Av = v,

implying a P( En+1 ), f ( a) = a. Further, the mapping f :En+1En+1 and hence


f :EE are bijective, the inverse homography exists such that f f 1 is an identity homography. Thus PLG ( En+1 ) is a group.

Lemma 11 (First fundamental theorem of projective geometry). Let P( En+1 ) and


0

P( En+1 ) be two projective spaces of n dimensions and let their associated vector spaces
0

be En+1 and En+1 . Lets assume {mi }1in+2 and {mi }1in+2 be the basis of P( En+1 )
0

and P( En+1 ) respectively. Then the theorem says that there is a unique homography
0

g : P( En+1 ) P( En+1 ), such that g(mi ) = mi , i,1in+2 .

Proof. Given the basis of a projective space, {mi }1in+2 lets assume {
m i } 1 i n +1

forms the basis of the vector space associated. Thus {


m }
and { m }
i 1 i n +1

80

i 1 i n +1

are the bases of vector spaces of P( E) and P( E ) as defined in subsection (A.2.2)


on bases of projective spaces.

) = m
We shall use canonical projection functions, p :En+1\{0}P( En+1 ), p(
m
i
i

0
0
0
0
0
0
and p : En+1\{0} P( En+1 ), p ( mi ) = mi . From the given condition g(mi ) =
0

mi , i,1in+2 , we have

)) = g(m ) = m0 = p0 ( m0 ) = p0 (

)),
g( p(
m
g (
m
i
i
i
i,1i n+2 .
i
i

), R,
mi = i
g (
m
i
i
i,1i n+1 .
Let,

(A.11)

m n +2 =
g (
m n +2 ).

But

) and m0
n +1
mn+2 = in=+11 (
m
i
n +2 = i =1 ( m i ).

From equation (A.11) we get,

)) =

)).
g (
m
g (in=+11 (
m
in=+11 (i
i
i

Using the fact that


g is a linear function and as {
m i }1in+1 forms a basis of
0

En+1 we see that { g ( m i )}1in+1 forms a basis of En+1 . Thus we get


i = , i,1in+1 .

g ( mi ) = mi , i,1in+2 , R \ {0}.

(A.12)

) =
Let us consider two homographies, g1 and g2 such that
g1 (
m
mi , i,1in+2
i
0

) =
and
g (
m
m ,
, , R \ {0}. Then from lemma (10), we deduce
2

i,1i n+2

that
g1 =
g2 (here = ) implies that there is a unique homography associated

with them. Hence g1 = g2 .

A.2.4

Projective subspaces

Let V be a subset of a given projective space P( En+1 ) P( E) and its associated vector space En+1 , then it is a projective subspace of P( En+1 ) iff we can find

0
a vector subspace V of En+1 such that P(V ) = V 4 . Thus if V is an m dimen

sional subspace(m n + 1) of E , then V is known as m 1 dimensional projective


subspace of P( E).
81

Transformations of projective subspaces


0

A projective transformation f : En Em transforms an k dimensional projective


subspace(k < n) into l dimensional projective subspace where l k. In other
words, a plane in En would be transformed into either a plane, a line or a point
0

in Em whereas a homography would preserve the dimensions. Hence a line is


transformed into a line, and a plane transformed into a plane. Here a projective
line is defined as a 1 dimensional subspace of the projective space and a plane
as 2 dimensional subspace, a point as a 0 dimensional subspace of a projective
space, and so on so forth. The reason for homography preserving the dimensions
is a point to note. A homography or collineation(a term used for homography
in certain literature) is a projective mapping associated with an isomorphism of

0
f : En+1 En+1 . Hence each of the vectors of a basis of any subspace of En+1
0

would be uniquely mapped to a unique vector in En+1 , and a set of of linearly


independent vectors {mi }1ik would be mapped to a set of linearly independent
0

vectors {mi }1ik . Hence a subspace of k dimensions spanned by {mi }1ik is


0

uniquely mapped to a subspace of k dimensions spanned by {mi }1ik .


More on subspaces
Given two subspaces U and V of P( En+1 ), we have a span of U and V, < U V >
as the smallest projective subspace containing U V (or seen as the intersection
of all subspaces of P( En+1 ) containing U V ). Then we can easily show that
U V has the vector subspace F + G of En+1 , when F, G are the subspaces of En+1
associated with U, V.

A.2.5

Affine completion

Generalizing the affine geometry we obtain the projective geometry. Specifically


we show here the extension of the affine space to obtain a projective space. Con

sider an n dimensional affine space ( X, X , ). Now assuming {mi }1in+1 to be


a basis of affine space X, we can denote every point m X by taking the vector

m
m =
m = ( x , ..., x ) and representing it with its co-ordinates in the given
1

basis. Thus extending this co-ordinate representation by appending 1 we have

= p(( x , ..., x , 1)) = p([

m
m , 1]) 5 . Hence as there is a one-one correspondence
p
n
1

we can represent every point a in P( E ) not at infinity by a unique


m
m
p

n +1

point m in X. Also for every point m in X we have a unique point a in P( En+1 ).


5p

is the same canonical projection defined in equation (A.8).

82

Hence we can validate X P( En+1 ).


Further do we see that P( En+1 ) has those points a defined as a = p(( x1 , ..., xn , 0)) =

p([ m , 0]). Hence these points lying at infinity can be seen to correspond uniquely

to projections of vectors
m X . Thus these points project onto the set
X = P( En+1 ) \ X.
Thus we have

P( En+1 ) = X t X = X t P( X ).

Recursively we can write the same as

P( En+1 ) = X t P( X ) = En + P( En ) = En + En1 + P( En1 )... .

(A.13)

In summary, an affine space can be completed by adding points (and lines,


planes etc.) at infinity to form a projective space. One more point to be clarified is
the difference between projective points and vectors. Any point in X would need

the same representation as that of a vector in X . But they are to be treated differently. The reason is that projective points are not vectors and hence we cannot

state that projective points are equivalent to vectors of X . Thus the points not at
infinity correspond to affine points and those at infinity correspond to projection

of X .
Affine and projective basis
In the previous section we have affine completion to obtain a projective basis.

Let ( X, X , ) be a given n dimensional affine space and its projective comple

0
tion, X = P( En+1 ) = X t P( X ). Further, let {mi }1in+1 be a basis for X

and then correspondingly assume {


m1
mi }1in+1,i6=1 to be the basis of X . Thus
 

we see that the extension of the basis of X being p1 = p


m1 m2 , 0 , p2 =
 
 
p m1 m3 , 0 , ..., pn = p m1 mn+1 , 0 . These points lie on X and they rep0

resent
a part of the basis of X = P( En+1 ). We have one morehpoint, pn+1 = i
h
i
m1 m i ), 1 .
p 0n , 1 . Adding these n + 1 points together, we get pn+2 = p (in=+21
0

Thus the set of points { pi }1in+2 forms a projective bas is of X . This is not an
unique basis but one we can always relate the affine space and its corresponding
projective space with.
Thus we see that given a point, y in X with its co-ordinate vector (Y1 , ..., Yn ) in
83

the affine basis {mi }1in+1 , the corresponding point in the corresponding projective basis as extended above would be represented by the vector (Y1 , ..., Yn , 1)
in the basis { pi }1in+2 of En+1 . Thus this relation of affine and projective bases
gives us the same relation between points of the two spaces as given in the previous section.
Relation between PLG ( En+1 ) and AG ( X )

Given an affine space ( X, X , ) and a projective space which is the completion of

X, P( En+1 ) = X = X t P( X ), we can see that PLG ( En+1 ) is the group of all homographies or collineations onto P( En+1 ) and AG ( X ) is the group of all invertible
affine morphisms f : X X. Then we can easily prove the next lemma:
Lemma 12. A homography can be an invertible affine morphism if and only if it leaves
the points at infinity, invariant.
We would accept this result without proof.
Results on Hyperplanes
A projective hyperplane is defined as a subspace of a projective space of dimensions
n 1 where n is the dimension of the specified projective space. Also we can talk
of vector hyperplanes in the sense that they are the vector subspaces of dimension
n 1 where the vector space is of dimension n. Thus a projective line is a hyperplane in 2 dimensional space and a plane is a hyperplane in 3 dimensional space.
It follows that a projective hyperplane is obtained by projection of a vector space
hyperplane.
Consider a set H ( En+1 ) of hyperplanes of a vector space En+1 , and a set of
projective hyperplanes, H ( P( En+1 )); then there is a one-one correspondence between H ( En+1 ) and H ( P( En+1 )). We see how it is so from the definition of a projective subspace of dimension n 1, that every proojective subspace is obtained
by the application of the canonical projection p : En+1 \{0} P( En+1 ) on a vector
subspace of En+1 , thus giving us a projective subspace of P( En+1 ). Also every
projective subspace can be obtained from a unique vector subspace. Thus we see a
one-one correspondence between H ( En+1 ) and H ( P( En+1 )),

H ( En+1 ) H ( P( En+1 )).

84

(A.14)

And any hyperplane in En+1 can be represented as


in=+11 (hi xi ) = 0,

x En+1 , x ( x1 , ..., xn+1 ),


h En+1 , h (h1 , ..., hn+1 ).
Vector of coefficients, h as described above uniquely defines the hyperplane,
upto multiplication by a non-zero scalar. Thus a hyperplane can be uniquely
represented by a projective point a = p(h), a P( En+1 ). Vice versa, every point
h P( En+1 ) uniquely gives us a hyperplane, as one from H ( En+1 ). Hence from
equation (A.14)
H ( En+1 ) H ( P( En+1 )) P( En+1 ),

(A.15)

e.g. the hyperplane defined by equation, xn+1 = 0 corresponds to multiples of


a vector h (0, ..., 1)n+1 and hence to the projective point (0, ..., 0)n . This section indirectly hands over us the concept of duality in projective space. Given
a n dimensional projective space P( En+1 ), we see that any point uniquely corresponds to a hyperplane in corresponding En+1 and hence to a unique hyperplane
in P( En+1 ). Thus points and lines (line is an hyperplane of P( E2 )) are duals of
each other in p( E2 ).

A.2.6

Action of Homographies on subspaces and study of invariants

We know that parallelism and incidence are invariant in any affine transformation. And this can be seen from the previous sections on affine invariants. Similarly here in projective geometry we can talk about invariants, which are cross-ratio
and incidence. Its elementary to prove that incidence is preserved in a projective
transformation. For all we have to show is that the underlying vector spaces for
two projective subspaces in which one is a subset of another, are transformed into
vector subspaces in which again the corresponding vector subspace is a subset of
the other image. This can be proved using the linear property of the linear vector
space mapping. Hence we just define a cross-ratio here:

Lemma 13. Given four points a, b, c, d on P( E2 ) such that the first three points are
distinct, and denoting h a,b,c as the homography on P( E2 ) such that h a,b,c ( a) = ,
85

h a,b,c (b) = 0 and h a,b,c (c) = 1. Then the cross-ratio, denoted as { a, b; c, d} is the element h a,b,c (d). This value is invariant to homography or collineations.
Proof. Assume that we have a homography at hand f :XX. Here we can prove
for the case X is one dimensional projective space or a line and then using the
result that incidence is preserved, we can show that the same result holds for any
dimensional projective space. Hence the cross-ratio of points a, b, c, d in X is given
by { a, b; c, d} = h a,b,c (d) where h a,b,c is a homography defined as
h a,b,c : X X.
Thus through a mapping by f , the corresponding points on X are f ( a) =
0

a , f (b) = b , f (c) = c , f (d) = d . Hence cross-ratio of points in h a,b,c ( X ),corresponding


the cross-ratio of points a, b, c and d, is
0

{ a , b ; c , d } = h a0 ,b0 ,c0 (d ).
From the result of section (A.2.3) the composition of homographies is also a homography. Hence h

0
0

a ,b ,c

f = h a,b,c 6 .

h a0 ,b0 ,c0 (d ) = h a0 ,b0 ,c0 ( f (d)) = h a,b,c (d),


0

{ a , b ; c , d } = { a, b; c, d}.

A.2.7

Duality

A pencil of hyperplanes7 is an important concept that leads to many results useful


for epipolar geometry. Here we investigate some results and accept the first two
without proof, while attempt the third.
Pencil of hyperplanes
Consider a projective space P( En+1 ) associated with En+1 and its dual P( En+1 )
associated with a dual En+1 . Also let be a line in P( En+1 ), P( En+1 ). We state
6 This can be seen as for a 1 dimensional projective space, only one homography is possible onto

itself upto multiplication by a non-zero scalar. Also can we get only one homography between two
1 dimensional projective spaces again upto multiplication by a non-zero scalar. And we can find
a one-one correspondence between two 1-dimensional spaces.
7 A pencil of hyperplanes is a set of hyperplanes where each hyperplane contains a specific
subspace.

86

two lemmas next, which we accept without proof. We still give short explanation
for each lemma along with.
Lemma 14. A point in P( En+1 ) uniquely represents a hyperplane, H in P( En+1 ) and
0

hence a unique hyperplane H in En+1 (Refer to equation (A.15)). In fact if the point is
f P( En+1 ), represented by the coordinate vector (1 , ..., n ) of En+1 upto a non-zero
scalar multiplication, then the corresponding hyperplane is defined by the equation h T x =
0 where h is a vector uniquely defined as (1 , ..., n ) upto a non-zero scalar multiplication.
This hyperplane is normally considered to lie in En+1 . Thus a set of points in P( En+1 ) is a
set of hyperplanes in En+1 and also by equation (A.15) is a set of hyperplanes in P( En+1 ).
Lemma 15. There is a unique n 2 dimensional subspace of P( En+1 ), V, such that
= { H H ( P( En+1 )) : V H }. And for every x not in V, there is a unique H
containning V. as either a line in P( En+1 ) or in P( En+1 ) (we can note that there is a
one-one correspondence between P( En+1 ) and P( En+1 )).
An elaborate proof can be read from [27], given by Faugeras et al. . This gives
us a kind of understanding of a line in a projective dual space P( En+1 ) and corresponding pencil of hyperplanes in the corresponding projective space P( En+1 ).
Ideally we have a vector in dual space uniquely corresponding to a vector in
the vector space and vice versa. And by result (A.15) each of the points on the line
in projective dual space P( En+1 ) corresponds to a unique hyperplane in the vector space En+1 and hence to a unique hyperplane in the projective space P( En+1 ).
The above result adds to it that corresponding to all these points lying on a line,
the hyperplanes in P( En+1 ), contain a common n 2 dimensional projective subspace, V.
Lemma 16. If we consider another line D in P( En+1 ) such that it doesnt intersect V,
then we have a homography,
D : H 7 H D.
Proof. We are given the application D : H 7 H D. Hence to show that
it is a homography, we consider the corresponding map between the respective
vector spaces:
= P( F ), D = P( D ), F En+1 , D En+1 .

f : F D
is the corresponding map which we show is an isomorphism. We proceed in two
parts.
87

1. Firstly we show that the given application is a bijection. Considering the map


0
f : F D for every vector g in F , we have a unique hyperplane H such that
0

H defined by the equation in=+11 (hi xi ) = 0 where {hi } are the coordinates of g in
some basis { f i } in En+1 .
From the previous result, we see that for every x in P( En+1 ), x
/ V, there is a
unique H such that H =< x, V >. If l is a 1 dimensional subspace of En+1 ,
0

x = P(l ) then H H = l + F.
0

Now d D, d
/ V there is a unique H and hence a unique H . Also for every
0

d in D , we have a unique hyperplane H = d + F which is a unique point in


0

F . Again for every hyperplane H in F can be written as F + l where l is a onedimensional subspace of En+1 . With some elementary algebra we can show that l
must be a subspace of 2-dimensional D . This gives us a chance to relate each one
of the hyperplanes of F to a unique vector in D . This tells us that the application

f : F D ,
is a bijection.
2.

g F , g = in=1 (hi f i )
where { f i } forms the basis of En+1 . Thus {hi } are the co-ordinate vectors of g. We
0

see from the sections on hyperplanes, any hyperplane H in En+1 is n dimensional


subspace. Hence it is denoted by the condition of the matrix
0

(m1 , ..., mn , x )n+1n+1 = 0, x H ,


0

where {mi } is a basis in hyperplane H . Hence writing in the matrix form, we see
that it is equivalent to in=+11 (hi xi ) = 0, where x1 , ..., xn+1 are coordinates of vector
x, and {hi } are linear functions of the co-ordinates of basis vectors {m j }, j,1 jn+1 .
0

Let F have basis as {mi }1in1 . Thus considering H = l + F, we can select


any vector v in l D as a basis of l(l is a 1-dim subspace of D ) and hence
0

(m1 , ..., mn1 , v) form the basis of H in En+1 . Hence we can show that {hi } are
functions of m1 , ..., mn1 , v. where all vectors except v being constant, g (h1 , ..., hn+1 )
becomes a linear function of v whose coordinates are v1 , ..., vn+1 . As we have n + 1
variables and n + 1 linear equations, and (m1 , ..., mn1 , v) are linearly independent
vectors, we can also write v as a linear function of h. Thus every g in F linearly

88

maps to v in D . Which can be expressed as,


v j = in=+11 (hi ij ), j,1 jn+1 ,

(A.16)

where ij are constant scalars dependent only on m1 , ..., mn1 .


Thus this transformation is a linear one which can be easily verified using the def

inition of a linear map and the transformation f : F D : g 7 v.


Now this transformation defines the corresponding projective mapping, f :
D : H H D, and hence the projective mapping is a homography., or a projective morphism.

A.2.8

Homography as a perspective projection between two projective lines

Consider a setting of a 2D projective space(plane). Let us have a point o and two


lines l and m not passing through o. Further let us have four lines n1 , n2 , n3 and
n4 , passing through o. Then we can assume a, b, c, d to be the points of intersection
0

of line l with lines n1 , n2 , n3 , n4 and a , b , c , d are points of intersection of line m


with lines n1 , n2 , n3 , n4 respectively. Then this kind of projective transformation is
a homography from P( E1 ) onto P( E1 ). The proof which is quite elementary is left
here. Though this is not the only kind of homography possible between two projective lines. We can show that given 3 point correspondences between two P1 lines,
we can obtain a unique homography8 between them. And not all of them would

0
0

0
be such that the lines defined by vectors aa , bb and cc are concurrent. Thus this
kind of projective transform is a special case, known as perspective projection or
perspectivity. In fact in section (A.2.3) we saw that all homographies f : P1 P1
form a group PLG (R2 ). Thus these perspective projections also form a group
which is also a subgroup of PLG (R2 ). How such a projection forms a group can
be summed up from the below figure (A.2) where m1 m2 and m2 m3 are
homographies due to perspective projection and also is m1 m3 a perspective
projection.

A.2.9

Homography between two planes

Figure (A.3) shows a correspondence between two planes m and l due to perspective projection. In this case we can extend the result for lines to see that such a
8 Of

course as defined in section (A.2.3.) a homography is unique upto non-zero scalar multiplication

89

Figure A.2: Associativitiy of perspective projections

Figure A.3: An example of a homography between two projective planes l and m


due to perspectivity
perspective projection between two planes centered at point o is also a homography f : P( E2 ) P( E2 ). Then f PLG (R3 ). Thus this is a special case of a
homography between two projective planes which otherwise would have needed
4 point correspondences. In this scenario as we know central point o, knowing
just three non-collinear point correspondences, we have planes well defined and
correspondences between all the other points obtained by collineating through
point o. The three points in each plane constructing the correspondences are measured in a local coordinate systems in respective planes.
A good example of why we need four points for specifying a homography
between two projective planes in general can be seen if we consider only three
point corrrespondences, the position of the center point o is uncertain. With ref90

erence to [6], we can see that given a point o, we can fix three directions and then
find three points, one in each direction, this would determine the position of two
planes and a homography between them. With some extension we can show that
specifying a fourth point correspondence, the freedom is restricted and not every
point o can act as a center of the perspective projection. A point to note is that the
relative positions are to be calculated. We have proved a similar result in chapter
(2), where we show that not all four point mappings can be realized by a perspective projection. Thus the set of homographies of projective planes obtained
through perspective projection forms a subgroup of the set of all homographies
of projective planes. The group structure of this subgroup has been studied and
discussed in literature on projective geometry like [27, 30]. This kind of homography is extremely useful for camera calibration and pose estimation. It defines
many properties governing image formation in a pin-hole camera model.
One point to note is that, in this appendix we have looked at homographies
as invertible bijective projective transformation, as in section (A.2.3) between two
projective spaces of equal dimensions(also known as projective morphisms). But
henceforth from here we will use the term homography only for projective morphisms between projective planes as considered in this section.

91

C HAPTER B

Camera models and camera calibration


A basic camera is a projective model that maps points in P( E4 ) to points in P( E3 ).
Skipping elementary constructions we give a general formula that maps point
X P( E4 ) to a point x P( E3 ) 1 . Assuming the camera coordinate system to be
centered at the euclidean coordinate system,


x = KR I | C 34 X,

(B.1)

where K is a 3 3 camera calibration matrix that relates points in 3D camera coordinate system to 2D image coordinate system and houses intrinsic parametes.
Further, R and t = RC are the extrinsic parameters of camera. R is the rotation matrix and t is the translation vector relating 3D world coordinate system to
3D camera coordinate system. Point C denotes the camera center in the world coordinate system and hence C = [C 1] T is one of the vectors in R4 representing
C.


P = KR I | C is the 3 4 projection matrix of the camera.
This raises an important question, can all 3 4 real matrices represent a camera
projection matrix? The answer is yes. This question leads us to two kinds of
cameras, classified based on the position of the camera center:

B.1

Finite Camera

If the left-most 3 3 submatrix of projection matrix P(let us denote it as M), is a


non-singular matrix, we have
P = M [ I | M 1 p 4 ] ,
1 Owing

to space restrictions we denote a point X ( a, b, c) in P( E4 ) by one of its corresponding vectors ( a, b, c, 1) in R4 . Henceforth we would use this notation for a projective point unless
specified otherwise.

92

where p4 is the rightmost column of P. A camera center in world coordinate frame


is defined as a vector C, such that PC "= 0. For the
# finite camera with non-singular

1
M p4
In short, a finite camera is the one
M, C is the point represented as C =
1
whose center C is a finite point in 3D world coordinate system. 2

B.1.1

Elements of a finite projective camera

h
i
Assuming that we have a finite camera at hand, a camera projection matrix P = M | p4
is dissected into following elements:
1. Column points: The leftmost 3 columns of P, which are p1 , p2 , p3 represent
the images of 3 principal directions X, Y, Z of world coordinate system. And
p4 represents the image of the origin of world coordinate system. This is so
in P3 a direction is represented by a point at infinity in that direction. Hence
X direction is represented by a point represented by the vector (1, 0, 0, 0)
2. Row vectors: Denoting the rows of P as r1 , r2 , r3 , the principal plane is the
plane parallel to image plane and passing through the camera center. Hence
all points that project to points represented by ( x, y, 0) lie on this plane.


x
r1


Thus PX = r2 X = y , and hence r3 X = 0. Thus r3 is the correspond0
r3
ing row representing the principal plane. Similarly we can see that the other
two rows are the planes which project to X and Y axes of the image plane.
They are known as axis planes.

B.2

Infinite Camera

An infinite camera is the one whose center is at infinity. Using the notion of the
previous section, we say that M is a singular matrix. And hence applying the
condition PC = 0, we get the camera center as
"
C=

d 31
0

2 Can

#
.

we say that a point at infinity in 3D world coordinate system is also a point at infinity in
3D camera coordinate system?

93

B.3

Camera calibration

From section (B), a projection matrix of a camera model is given by:


P = KR [ I | C ] ,

u0

K = 0 v0 ,
0 0 1

(B.2)

where K is the camera calibration matrix. In fact, and represent the focal length
of the camera in terms of pixel dimensions in x and y directions respectively,
represents the skew due to distorted sensors in practical cameras and, u0 and v0
are the x and y coordinates of the principal point3 in image coordinate system.
Further, R is the rotation matrix and t = RC is the translation vector.
The process of camera calibration is defined as estimating these quantities.
Further, P is a 3 4 matrix with 11 degrees of freedom4 and rank 3. Thus a knowledge of 6 point correspondences in needed to uniquely estimate P upto non-zero
scalar multiplication. In fact only 5 and a half correspondences are needed. Representing our image plane as P( E3 ) and world coordinate space as P( E4 ), we can
show that the process of imaging points in scene,P( E4 ), onto image plane, P( E3 ),
is a form of a projective transform. Hartley and Zisserman in [1] and Trucco et al.
in [14] have a detailed treatment for various ways of estimating P. Also does a
paper by Zhang, [31], outline two main kinds of methods for camera calibration:
1. Photogrammetric calibration: Here the calibration is done by specifying a
set of 3D-2D point correspondences between the world coordinate system
and the image plane. For this an elaborate setup and knowledge of the 3D
coordinates of the model object are required.
2. Self-calibration: Here more than one images are obtained using the same
camera for the scene. Different images are created by a rigid motion of the
camera in 3D space5 . These views impose certain constraints on the internal parameters of the camera and hence can help us estimate the projection
matrix without the need for an explicit calibration model.
3A

principal point is the point of intersection of the perpendicular line to image plane from the
camera center, with the image plane itself.
4 K has 5 degrees of freedom, R has 2 and C has 3, thus a total of 11.
5 It can be an euclidean or a projective space.

94

To this end, the algorithm discussed by Zhang in the same paper is a mixture of
these two methods. This method is briefly outlined below:
A regular and simple pattern containing a grid of squares is printed on paper attached to a planar surface. Then a set of images of this pattern from varying angles
is obtained. Using a corner detection algorithm, point correspondences between
the pattern and the each of the images are obtained. These correspondences are
enough to estimate the homography between the image and the calibration pattern. This homography puts some constraints on the values of the parameters of
projection matrix. Assuming that the patterns paper lies on plane defined by the
equation, z = 0, the correspondence between a point X on the object to the point
x on the image plane, is seen as

x = v = PX = A [r1 r2 r3 t]
0 = A [r1 r2 t ] Y ,

1
1
1

where R = [r1 , r2 , r3 ]. Hence the task of camera calibration is transformed into a


problem of estimating the homography, H, between the planar object and image
plane. The homography is obtained as,
H = [ h1 h2 h3 ] = A [r1 r2 t ] ,
where the homography defines point correspondences between calibration pattern and an image, as
x = HX.
This homography can be calculated by the DLT algorithm, as discussed by Hartley
and Zisserman in [1]. Once H is known, the intrinsic parameters of A which
are r1 , r2 and t are estimated. Then certain constraints are derived based on the
properties of rotation matrix:
h1 AT A1 h2 = 0,
h 1 A T A 1 h 1 = h 2 A T A 1 h 2 ,
where H = [h1 h2 h3 ]. For a detailed explanation to these constraints one can refer
to [31] by Zhang. A closed form solution is obtained which is refined further
through non-linear least squares estimation.

95

C HAPTER C

Some miscelleneous proofs


This section contains some mathematical proofs to certain statements claimed in
various sections of the thesis. While some proofs might look trivial and some
not so trivial, the premise of a rigorous mathematical backbone is upon airtight
arguments and reasoning. And hence we aspire to lay down whatever proofs felt
relevant with utmost rigor.
Lemma 17. The zero set of function f in equation (4.7) defines the set of valid values of
h. We hypothesize that this set of points
X = { h R9 | f ( h ) = 0 5 }
defines an implicit manifold of fourth dimension. Or in other words, the jacobian of f ,
JX (h) defined as

h T S1

T
h S2

TS
JX ( h ) = 2
h
3

T
h S4
h T S5 5 9

(C.1)

is a matrix of rank five for all nonzero values of vector h R9 . Where Si are nine
dimensional quadrics or real symmetric matrices defined in section (4.2).
Proof. Let us in general assume that the five vector rows are linearly dependent.
Hence we have some scalars i , i = 1, 2, ..., 5 such that 5i=1 i h T Si = 0 and not all
of i are zero. Using the definition of Si , i = 1, 2, ..., 5 we can write
h
i
h
h T S1 = h1T C2
013
p1 h3T C2 , h T S2 = 013
h2T C2
h
i
h
T
T
T
T
T
h S3 = h2 C2 /2
h1 C2 /2
p3 h3 C2 , h S4 = h3T C2 /2
h
i
h T S 5 = 01 3
h3T C2 /2
h2T C2 /2 p5 h3T C2 .

96

i
p2 h3T C2 ,
01 3

h1T C2 /2

p4 h3T C2

Thus we have
1 h1T C2 + 3 h2T C2 /2 + 4 h3T C2 /2 = 0

(C.2)

2 h2T C2 + 3 h2T C2 /2 + 5 h3T C2 /2 = 0

(C.3)

and
h

But H = h1 h2 h3 being a homography matrix is invertible. This means h1 , h2


and h3 are linearly independent vectors. And we have conics C1 and C2 assumed
to be non-degenerate forms and so their matrix representations are invertible matrices. We have from equations (C.2) and (C.3),
1 h1T + 3 h2T /2 + 4 h3T /2 = 0 & 2 h2T + 3 h2T /2 + 5 h3T /2 = 0
H being invertible, h1 , h2 and h3 being linearly independent. Hence we have
i = 0, i=1,2,3,4,5 .
This proves the fact that all five rows of the matrix JX (h) are linearly independent
and hence it has a full row rank.
Lemma 18. Given a conic C10 and its cone Q1 , let U1 be the set of all plane positions as
obtained through solutions of lemma (2). If we define an equivalence relationship between
planes u1 and u2 as:
u1 u2 ru1 = ru1 , R {0} ,
then U1 has exactly two equivalence classes. Here ru1 is the radius of the circle obtained
through intersection of u1 with cone Q1 .
Proof. In other words we have to prove that we can orient the scene plane in
only two nonparallel orientations such that the intersection of the planes with Q1
is a circle. Lemma (2) estimates plane position through two polynomials in two
variables, equations (3.16) and (3.17). These polynomials are of three and four
degrees. Applying Bezouts theorem, [32], to curves defined by these two polynomials in R, we have at-most twelve distinct intersection points. Hence there
would be at-most twelve distinct plane positions upto non-zero scalar multiple.
For most of the cases, we have seen that in practice, the scene lies in front of the
camera. This ensures that the circle in the scene would get projected to an ellipse.
For a cone with a circular or an elliptic base, there is a classical property, stated
by Apollonius of Perga in his manuscript, Treatise on conic sections and translated
by Heath, [3]. This property states that an oblique cone formed with a circular
97

Figure C.1: Two series of circular cross-sections in circular cone, figure from [3].
base has two distinct series of circular sections of which one series is the one with
planes parallel to the circular base. By a series of sections, we mean a sequence
of circles of progressively changing radii.1 For a cone with an ellipse as its base,
we would have two distinct series of circular sections, none of which would be
parallel to the base. In figure (3.3) we have two circles, one through the points
D, P, E and the other through points N, P, K which are members of the two series
of circular sections we just discussed. The parallel members of each series are
omitted. This proves the fact that U1 has exactly two solutions of plane positions
for the sections to be circles, upto non-zero scalar multiple.
Lemma 19. Let us have R and t as one of the pose solutions obtained for a plane pair

(u11 , u21 ) in Rsol . Then we hypothesize that (u11 , u21 ) would give us R and t/
as one of its solutions. We assume here that is a positive non-zero scalar.
Proof. Let the respective circles of intersection of planes u11 and u21 with cones
Q1 and Q2 be C and C 0 . Following the discussion in step-3 of geometric construction, described in section (3.4), the centers in global coordinate system denoted as
xc1 and xc2 . For this setup, we have points p1 , p2 and xc1rot (= Rxc1 ) as defined
from equations (3.25) and (3.26). Then from equations (3.29), (3.30) and (3.32) we
have a pose solution R and t.
Effect of vector scaling on circle center and radius: Let us consider the matrix
representation of a circle for the conic Ccal as given in equation (3.15) with center,
h
iT
l1T Ccal l3 l2T Ccal l3 in the local coordinate system. Scaling the vector representing the plane by a factor results in scaling of l3 by 1/ and a subsequent
scaling of the center coordinates and the radius by 1/. Similarly we can argue
1 By

progressive change we mean either increasing or decreasing in magnitude based on


whether we are moving towards or away from the apex of the cone.

98

for two set of conics and their plane solutions, as given next.
Scaling the two vectors defining u11 and u21 , by , we get planes, u11 and
h
iT
u21 and the vector defining the plane u11 is m11 m12 m13 and that for
h
iT
u21 the vector is m21 m22 m23 . From the definition of quantities k11 , k12
and M1 for vector u11 and k21 , k22 and M2 for vector u21 , given by equations (3.9)
and (3.12), we can notice that they are not affected by scaling. Hence from equations (3.21) and (3.22), one can infer that scaling u11 and u21 by results in scaling
of the centers xc1 and xc2 by 1/. The local coordinate system is chosen to be an
orthonormal set of axes. Hence the radius of the circle represented in both of the
local and global coordinate systems is the same which means that the radius is
also scaled by 1/.
Then by application of lemma (2) to each of the two planes scaled versions, u11
and u21 , with conics C10 and C20 being the same, we have
xc1 = xc1 / and xc2 = xc2 /,
and the radius(which stays the same as it was for C and C 0 ) of these two scaled
circles is
r1 = r1 /.
Further do we see that xc1rot = xc1rot /. From equations (3.25) and (3.26) we have
p1 = p/ and p2 = p2 /. Applying equation (3.29),


x p ( xc1 p1 )
A = c1 1

2


xc1rot p2 ( xc1rot p2 )
and B =
.

Then the pose solution to this pair of scaled planes, is obtained through application of equations (3.30) and (3.32) as
R = R,
t = t/.

99

(C.4)

Vous aimerez peut-être aussi