Académique Documents
Professionnel Documents
Culture Documents
correspondence
by
Snehal I. Bhayani
201211008
MASTER OF TECHNOLOGY
in
INFORMATION AND COMMUNICATION TECHNOLOGY
to
June, 2014
Declaration
I hereby declare that
i) the thesis comprises of my original work towards the degree of Master of
Technology in Information and Communication Technology at Dhirubhai
Ambani Institute of Information and Communication Technology and has
not been submitted elsewhere for a degree,
ii) due acknowledgment has been made in the text to all the reference material
used.
Snehal Bhayani
Certificate
This is to certify that the thesis work entitled INSERT YOUR THESIS TITLE HERE
has been carried out by INSERT YOUR NAME HERE for the degree of Master of
Technology in Information and Communication Technology at Dhirubhai Ambani
Institute of Information and Communication Technology under my/our supervision.
Acknowledgments
Is this what you ask,
or is this your answer?
Pray illuminate, for I have questions more and more
after you answer each one of these,
and more... ;
there is no learning save for when I stumble,
pray illuminate, for I have questions more.
Of the swathe of acknowledgments, the first one goes out to my supervisor
Prof. Aditya Tatu. His insights helped me not only achieve crucial progress, but
have different and interesting perspectives of problems at various times along
the duration of my thesis work. His guidance, starting from what, when, where
to keep notes(in Latex), to what shall we infer from which experiment, has been
paramount in molding my work in a comprehensive manner. With his knowledge
of mathematics and his sheer interest in the same, he has helped me get over my
points of confusion, again, a numerous times.
I would like to acknowledge Prof. Manjunath Joshi for his guidance on camera
calibration tools and approaches. I would also like to acknowledge Prabhunath
sir, for his instantaneous help in creating a virtual setup for camera calibration. A
special thanks to my friend, Haritha for her swanky DSLR camera helped me have
the best possible images of calibration patterns for days at end. I would like to
thank all of my friends, colleagues and classmates, who put up with my changed
self while I worked and put up with my other self while I was not working and
they were. And last but not the least, a special thanks to my parents, and my sister
for their constant support and care all along.
ii
Contents
Abstract
vi
viii
List of Tables
ix
List of Figures
Introduction
1.1
1.1.1
General assumptions . . . . . . . . . . . . . . . . . . . . . . .
1.2
1.3
Background work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4
Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.5
Layout of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Epipolar geometry
13
2.1
13
2.1.1
14
2.1.2
15
2.1.3
16
2.1.4
17
2.1.5
20
2.2
Conics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.3
Quadrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.4
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
28
3.1
3.2
28
Conic correspondence . . . . . . . . . . . . . . . . . . . . . . . . . .
30
iii
3.3
32
3.4
37
3.5
Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.5.1
47
3.5.2
50
3.5.3
3.6
4
51
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.3
5
thetic dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
54
4.1.1
56
57
4.2.1
60
4.2.2
61
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
63
References
66
70
70
70
72
73
74
75
76
77
78
79
81
82
86
89
89
iv
92
92
93
93
94
96
Abstract
In this thesis we attempt to solve the problem of camera pose estimation from one
conic correspondence by exploiting the epipolar geometry. For this we make two
important assumptions which simplify the geometry further. The assumptions
are, ( a) The scene conic is a circle and (b) The translation vector is contained in a
known plane. These two assumptions are justified by noting that many artifacts
in scenes(especially indoor scenes), contain circles, which are wholly in front of
the camera. Additionally, there is a good possibility that the plane which contains
the translation vector would be known. Through the epipolar geometry framework, a matrix equation is defined which relates the camera pose to one conic
correspondence and the normal vector defining the scene plane. Through the assumptions, we simplify the system of polynomials in such a way that the task
involving solution to a set of seven simultaneous polynomials in seven equations,
is transformed into a task of solving only two polynomials in two variables, at
the same time. For this we design a geometric construction. This method gives
a set of finitely many camera pose solutions. We test our propositions through
synthetic datasets and suggest an observation which helps in selecting a unique
solution from the finite set of pose solutions. For synthetic dataset, the solution
so obtained is quite accurate with an error of 104 , and for real datasets, the solution is erroneous due to errors in camera calibration data we have. We justify this
fact through an experiment. Additionally, the formulation of above mentioned
seven equations relating the pose to conic correspondence and scene plane position, helps to understand that, how does the relative pose establish point and
conic correspondences between the two images. We then compare the performance of our geometric approach with the conventional way of optimizing a cost
function and show that the geometric approach gives us more accurate pose solutions.
vi
The calibrated counterpart of F that relates the two images in the same way,
except for the fact that the calibration matrices are known and fixed.
K:
H:
A homography or a projective morphism over a projective plane, f :P( En+1 )P( En+1 )
where n = 2. This mapping is denoted by a matrix H GL(3).
cam(O, , K ) : A camera unit with centre O, image plane and calibration matrix
K.
F:
vii
0
x3 x2
h
iT
0
x1 .
x1 x2 x3 , then [ x] = x3
x2
x1
viii
List of Tables
3.1
3.2
3.3
Result of part real data for investigating the error due to erroneous
calibration matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
51
52
ix
56
List of Figures
1.1
2.1
13
2.2
15
2.3
18
2.4
20
2.5
24
2.6
25
3.1
39
3.2
40
3.3
43
3.4
Pose solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.5
51
3.6
51
3.7
. .
52
A.1 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
90
90
C.1 Two series of circular cross-sections in circular cone, figure from [3].
98
C HAPTER 1
Introduction
This thesis deals with the problem of one form of pose estimation as defined for
computer vision community. In rudimentary terminology this form of pose estimation can be stated as an estimation of relative orientation between two camera
positions in euclidean coordinate system from where the given scene has been imaged.
Haralick in [4] introduces four classes of pose estimation problems as given next:
1. 2D-2D pose estimation problem: We are given two-dimensional coordinate
observations from N observed images: x1 , ..., x N . These could correspond,
for example, to the observed center position of all observed objects. We are
also given the corresponding (or matching) N two-dimensional coordinate
vectors from the model:y1 , ..., y N . The rotation and translation in 2D plane
are to be estimated that relate these two sets of observations. In other words,
we have to determine the rotation matrix R and the translation vector t such
that the least squares error,
e2 = nN=1 wn k yn ( Rxn + t) k2 ,
(1.1)
r1 y n + t1
,
r3 y n + t3
u j2 = f
r2 y n + t2
,
r3 y n + t3
0
t = ( t1 , t2 , t3 ) ,
r1
R = r2 ,
r3
(1.2)
where f is the focal length or the distance of the image plane in front of
the origin that is the center of perspectivity and r1 , r2 and r3 are the rows of
the rotation matrix R. Then the problem of pose estimation of this kind is
to estimate R and t when a set of correspondences between the 3D points
and the perspective 2D points are given. This problem is termed as exterior
orientation problem in photogrammetry literature.
4. 2D perspective-2D perspective pose estimation problem: This is perhaps the
most difficult of pose estimation problems. Here we do not have the 3D
world coordinates. Instead we have two images or perspective projections
of the same object. Or one can assume the object to be moving and the
perspective projection device to be fixed. A pin-hole camera model is one
such theoretical device and is of interest to us. As a setup, we have a scene
and its image from two distinct positions of the camera(or as stated, we
have one camera and we are taking images of a scene which has undergone
a rigid body motion.). Then with point correspondences between these two
perspective projections one has to estimate the rigid body motion that the
scene has undergone. This statement forms the statement of 2D perspective2D perspective pose estimation problem.
2
Of the four classes above, our work is about the fourth type of pose estimation
problem, 2D perspective- 2D perspective pose estimation problem. This approach
requires an overview of a two camera setup. Hence before we go further, a general
arrangement of the two camera setup introduced in next section (1.1). The mathematical spaces considered throughout this report would be euclidean spaces1 ,
unless specified otherwise.
1.1
The purpose of introducing such an arrangement is two-fold. Firstly, it introduces the various feature artifacts which would be used for establishing correspondences between two images(like points, lines, conics etc.) and the varying
mathematical relationships amongst them, and secondly the same framework sets
up the idea of multiple-view geometry(or termed as epipolar geometry by Hartley
and Zisserman in [1]).
1 One
can wonder, we are dealing with projective spaces and still the ones considered are not
projective spaces. The reason is, as shown in section (A.2.5), the projective space is obtained by
"adding" points at infinity to an affine space(here we can consider an euclidean space as an affine
space with origin as the point [0, 0, 0] T ). For practical purposes we assume the points we deal with
are "not at infinity". Hence the projective space is reduced to an affine(or an euclidean) space.
1.1.1
General assumptions
A two camera setup is depicted in figure (1.1). Here a pin-hole camera is decomposed into a projection center O(a point in R3 ), an image plane and its calibration matrix K. This model is mainly of theoretical interest but for our application
we see that this highly simplified model works well enough so as to be able to
ignore various practical issues in a camera model. Such a camera model shall be
denoted as cam(O, , K ). The calibration matrix houses quantities that determine
the relation between the position of a point x in 2D image coordinate system
with respect to its position in the 3D global coordinate system of the camera. Let
O be the intersection of the line from O perpendicular to with . Then the
matrix K gives us the distance of the plane from center O and the position of
the point O in terms of the local coordinate system of . More on the structure
of K can be read from appendix (B.3).
As shown in the figure we have a pair of cameras cam(O1 , 1 , K ) and cam(O2 , 2 , K )
with their centers at points O1 and O2 in R3 . The calibration matrices are same for
both of the cameras. The image planes associated with cameras O1 and O2 are 1
and 2 respectively. Now a quadratic curve is defined as the zero set of a second
order polynomial
Ax2 + By2 + Cxy + Dx + Ey + F = 0.
This polynomial can be written in matrix form as
x y 1 C/2
B
E/2 y = 0.
D/2 E/2
F
1
A
C/2 D/2
(1.3)
Using dual notation, henceforth we shall have the same notation for a quadratic
curve and for a matrix representation of its defining polynomial, C. For the above
A C/2 D/2
conic C such that its images are C1 and C2 upon imaging by the two cameras. For
a general orientation we assume that none of O1 or O2 lie on the plane . This arrangement of the three planes , 1 and 2 constructs a special bijective mapping
between each pair of these projective planes, known as homographies in computer
vision terminology. Without getting into details, we mention that a homography
is a bijective mapping between two projective planes such that projective lines
are mapped to projective lines. Precise definition and properties of homography
between two planes is described in sections (A.2.3) and (A.2.9) of appendix (A).
From these definitions we note that such a mapping can be represented by a real
invertible matrix, H, unique upto a non-zero scalar multiple. Then the point mapping between two projective planes 1 and 2 is defined as
Hx = y, x 1 , y 2 ,
where x and y are homogeneous representations for points of projective planes.
This matrix, H, shall henceforth represent such a homography between two projective planes. Then as mentioned before, that the arrangement of the three planes
, 1 and 2 construct homographies between planes and 1 , between and
2 , and between 1 and 2 as shown below:
H1 : 1 , H2 : 2 and H : 1 2
where
H = H21 H1 .
Contrary to what we assume for defining homography above, we assume that
the three planes , 1 and 2 are represented as euclidean planes rather than projective planes. A practical application would mostly have the cameras at finite
location and the projective point representing the finite camera center would be
uniquely identified by its corresponding euclidean (or affine) counterpart. Even
if there is point at infinity in P( E4 ) which is imaged to obtain a finite point in the
image plane, we treat those points as special cases of parallelism. Further, the
points in P( E4 ) which are imaged to points at infinity on the image planes are the
points lying on the principal plane2 , [1]. But for practical situations we dont consider those parts of scene that lie on the same side of the image plane on which
the camera center lies. This means that points on the principal plane would not be
imaged, which implies that the points in the scene in front will never be mapped
2A
principal plane in a camera is the plane parallel to the image plane and passing through the
camera center.
to points at infinity on the image plane through imaging process by the camera.
To summarize, we dont need to include points at infinity in P( E4 ) or the image
plane P( E3 ) for our purpose.
h
iT
E.g. if the point is x y z w , we can safely assume w 6= 0 and hence
h
iT
h
iT
x y z w corresponds to x/w y/w z/w in R3 . In short we can safely
assume that the projective spaces used in a camera setup are reduced to euclidean
spaces.
The points in the planes are measured relative to their local coordinate systems
which again, are assumed to be euclidean. If still one needs to have a projective
representation of the same point, the euclidean coordinate system can be extended
h
iT
to a corresponding projective system, and a point x y in R2 can represent the
h
iT
point x y 1 in P( E3 ). This extension is basically the process of adding an
homogeneous coordinate to a point measured in euclidean(or affine) coordinate
system3 .
With the notation and representation of points in order we can define the way
the conics are related through the homographies as
C = H1T C1 H11 , C = H2T C2 H21 and C2 = H T C1 H 1 .
(1.4)
Thus a homography maps the points and conics in one image with those in
the other image. This mapping is governed by the related orientation between
the two image planes and also that between the two camera centers. There is a
basic relationship that relates this homography with the relative orientation. But
before introducing them let us introduce two matrices which are important for
computer vision problems.
Now the homographic mapping is between points in both of the images that
are projections of those points lying in same plane . A general relation exists
between all of the image points x in 1 and y in 2 irrespective of whether their
corresponding image points lie in the same plane or not. This relation has been
studied, introduced and worked over a many times in literature. Starting from
the introduction of the essential matrix by Higgins [5]. The matrix defines the con3 This
(1.5)
if and only if x and y are the images of the same scene point4 . Another matrix
discussed and taken up subsequently by noted researchers termed as fundamental
matrix. This matrix is the un-calibrated counterpart of the essential matrix.
E = K T FK.
This means the same point correspondence is defined but the point measurements
dont need the calibration matrix to be known. A detailed explanation and treatment of both these matrices can be found in the textbook, [1] by Hartley and Zisserman. These equations form the backbone of our thesis.
Relative orientation of plane 1 with respect to plane 2 in E3 is assumed to
be rotation, R and translation, t. These quantities are such that a point y 2
when rotated and translated through R and t, we would get a corresponding
point x 1 as x = Ry + t. Thus in the figure given above, if O1 is at origin
h
iT
O1 = 0
0
0 , then O2 = R T t. The points of intersection of line O1O2
with planes 2 and 1 are known as the epipoles e1 and e2 of cameras 2 and 1 respectively. The essential matrix, as introduced above can be decomposed in terms
of R and t, [5]:
E = [t] R.
The fundamental matrix in terms of R and t is decomposed as [1]:
F = [e2 ] H.
In lemma (1) in chapter (3), we prove the following relationship between pose
parameters R, t and variables of epipolar geometry, H, e
t = K 1 e,
R = 1 (K 1 HK + K 1 ev T K ),
(1.6)
where R, t, e and K have their usual meanings and is a real scaling factor. v
represents the position of the scene plane.
4x
and y are measured in image planes, assuming that the cameras are calibrated.
1.2
With the above setup in mind, one can now define the pose estimation through
mathematical quantities. Considering the camera centers, O1 and O2 , translation
vector t is defined as:
t = O1 RO2 ,
(1.7)
1.3
Background work
The history of computer vision is rich and full of brilliant insights. Its association
with projective geometry is even richer. We have listed out the four main classes
of pose estimation problem along the lines of a paper by Haralick, [4]. In a later
paper, Haralick et al. in [6] work over a similar kind of problem but exclusively
in an euclidean space, where they look at a closed form solution to pose estimation from a set of three point perspective projections. This problem would be of
the type defined as 2D perspective-3D pose estimation problem in point (3). Photogrammetry deals with these problems in detail and as well has its application
to computer vision. Higgins, in [5], introduced the essential matrix of equation
(1.5) primarily to tackle the problem of relative orientation or in other words the
pose estimation problem. Authors in the same paper give us an algorithm to estimate R and t from E. This fact can be understood from the algorithm proposed
for estimation of R and t in [5]. A comprehensive study on fundamental matrix
and its related treatment has been carried out by other researchers, Zhang in [7]
and, by Luong and Faugeras in [8]. For the second and more crucial part of the
problem [5, 8, 7, 4, 9, 10] use point correspondences to estimate either the fun8
damental matrix or essential matrix. As against this, Heyden and Kahl in [11]
have used conic correspondences to estimate the fundamental matrix. The authors give a brief survey of various features(like points, lines, curves and many
more) used in the past to estimate the fundamental matrix. They also state the
reasons for why conic correspondences are preferred by certain researchers over
the conventional point and line correspondences. The primary motivation is the
fact that many man-made objects contain a curve which is either a conic or can
be approximated to be formed of conics. Another reason is a property of projective transformation that any projective transformation maps a conic into another
conic(also termed as the projective invariance property). A projective transformation is a pointwise mapping between two projective spaces.5 Ji et al. in [12] have
used a mix of various geometric features like points, lines and conics, to estimate
the pose of a camera with respect to the object coordinate frame. Towards the
same objective, they have have considered a linear approach that combines geometric features at different levels of complexity, thus improving the stability and
accuracy of the solution. The approach estimates the pose parameters from point
correspondences, line correspondences and 2D ellipse-3D circle correspondences.
For circle-ellipse correspondence, they have obtained two polynomials which define two constraints on the relative pose. But the authors have assumed that the
radius of the circle is known and the property of the circle as a conic section is
not used completely as the focus is more on using as many feature correspondences as available. The same problem of 2D perspective-3D pose estimation is
worked over by Wang in [13]. The approach as proposed in the paper amounts
to estimating the pose of the camera from single view and under the assumption
that the intrinsic camera parameters are known. The approach uses the image
of an absolute conic6 to estimate the pose of the camera. An added assumption
is needed that the image of center of the 3D circle is known is employed for the
minimal case where image of only one circle is known. But it hasnt been explicitly justified when would such an assumption hold true though some methods of
estimating the same have been suggested.
5 Definition
1.4
Our contribution
Our contribution primarily lies in an attempt to solve the problem from a slightly
different perspective. It has motivated two different approaches for pose estimation. The first approach is based on the equation,
( R tu T )T K T C2 K ( R tu T ) K T C1 K = 033 ,
where R, t, C1 , C2 have their usual meanings, u R3 is the vector7 defining
the scene plane that contains the conic C and is a scaling factor introduced to
account for the homogeneous quantities, C1 and C2 in the equation. The above
equation is derived by combining epipolar geometry with one conic correspondence. Intuitively this equation describes the relationship between the pose R, t
and the pair of conics in correspondence through the normal vector of scene plane,
u. This constraint can be further simplified if we assume that the conic C in scene
whose images C1 and C2 are known is a circle8 and that the translation vector lies
in a specified plane(defined by a normal vector w). These assumptions reduce the
number of unknown variables in previous equation to get
( R tu T )T K T C2 K ( R tu T ) K T C1 K = 033 ,
w T t = 0,
(1.8)
where R, t and are to estimated and C1 , C2 , u, K and w are known. A straightforward way to solve the above equations is to write a gradient descent algorithm
via explicit calculation of gradient vectors or use MATLABs inbuilt functions for
optimization on a cost function modeled from equation (1.8). Unfortunately any
optimization method, in general, can get stuck in a local minima, and through experiments on synthetic datasets, we have found that the algorithm does get stuck
at a point which is nowhere close to the true value. Such an experiment and its
result is given in section (4.1) of chapter (4). A second problem is that there is
no sure-shot way of figuring out how many global minimas does our system of
polynomials have. These facts make the starting points of the parameters more accountable to how does the algorithm behave. An estimate closer to the true value
helps the algorithm behave nicely and converge accurately to the true solution.
But with a starting point quite far off, the solution achieved upon convergence is
not at all close enough to the true value. To get round to this problem, we design
vector u defines the plane through the plane equation x T u + 1 = 0, x R3 .
8 By its requirement to being a circle we mean, a circle in the global coordinate system in R3 .
7 The
10
a geometric construction9 such that one can estimate all possible pose solutions to
a given problem. For this we transform the problem of estimating pose solutions
through optimization of cost function of equation (1.8) to a problem that involves
finding solutions to two pairs of polynomials, with each pair depending only on
two variables. The first pair of polynomials has three and four degree polynomials, whereas the second pair has quadratic polynomials. These polynomials
can be accurately solved using the symbolic computation toolbox available with
MATLAB. The advantage here is that at a time we have only two polynomials in
two variables to solve which is a considerable improvement over the conventional
optimization task which includes solving seven polynomials in seven variables at
the same time. This is the reason for the high accuracy our approach achieves.
Further, solving these polynomials we get the pose as a finite set of all possible
solutions in form of R and t. The process follows a geometric construction and
does not need optimization which in turn helps improve the accuracy of the results. The construction further improves our understanding of the above equation. The equation (1.8) relates the image and camera coordinate systems through
a conic correspondence. As a set of observations we propose some points on how
to pick one solution out of the finite set of all possible solutions as obtained from
this approach. We perform experiments on both real and synthetic data for this
geometric approach to pose estimation. For synthetic datasets, we find that the
pose solutions thus estimated, are accurate to an error of the order of 104 . Especially for datasets with rotation matrix close to identity matrix, the observations
help us select a solution which is closest to the true values. But the observations
dont hold true for datasets with rotation matrices considerably far from identity
matrix. For such cases, we propose using one additionally point correspondence
which is beyond the scope of this thesis. For real datasets the estimated pose solution is not accurate enough. But through a related experiment, we demostrate
that the error in pose solution is primarily due to the error in camera calibration
process.
1.5
Layout of thesis
In chapter (2) we introduce the basics of epipolar geometry. It deals with the setup
of a two camera system but from a projective geometry point of view. The prerequisites of epipolar geometry are projective, affine and euclidean spaces whose
9 For
11
properties and definitions are well covered in appendix (A). The camera models
are covered in appendix (B) and camera calibration in appendix (B.3). The discussion in these two appendices follows the textbooks by Hartley and Zisserman in
[1] and by Trucco, Emanuele and Verri in [14]. In chapter (3) we introduce and
describe in detail the geometric approach to pose estimation from one conic correspondence with the two assumptions. Alongwith the discussion of the algebra
and geometry behind the approach we list the experiments performed on synthetic and real data, we infer certain points of merit and demerit for the proposed
approach. In chapter (4) we take up two alternate methods of pose estimation,
which are solved through optimization algorithms. Their shortcomings and sample results for one method are provided alongwith an interpretation for the other
method. In chapter (5) we conclude the thesis where we discuss practical and
theoretical difficulties encountered, and a possible future line of work.
12
C HAPTER 2
Epipolar geometry
2.1
6= &p , p
.
x1 x2
qx
ox
qx
ox
2
2
1
1
int
int
(2.1)
This is the geometric way of defining a point correspondence. One point worth
and
are
noting is that the camera setup of the figure (2.1) is in R3 . If lines
qx
ox
1
parallel they dont intersect in a point in E3 , but in a point x , well defined in the
projective space P( E4 ) which by equation (A.13) is decomposed as
P( E4 ) = E3 t P( E3 ),
where t denotes the union of two disjoint sets. Thus point x lies in P( E3 ).
With this decomposition in mind, we can ensure that the point correspondence
between two images is well defined. This way of defining a point correspondence
motivates a special homography between two images. We call it special because,
such a homography would be constructed through the scene plane. As shown
later, this mapping is a part of a more general mapping between these two images
through the scene. In next section we intuitively describe this homography mapping through a scene plane and after that algebraically define the more general
mapping through scene points.
2.1.1
Based on the way a point correspondence between two images through a scene
plane, , is described, one can infer that such a mapping would be bijective. Distinct positions for would give different mappings unless the planes are parallel
to each other. One point to note is that given a pair of images and a scene, not
every point in first image forms a correspondence pair with a point in second
image through a homography realized through a scene plane. Only the points
which are projection of points on scene plane, in both of the image planes are the
only ones forming correspondence pairs through homgraphy mapping generated
through . This is termed as point transfer through scene plane by Hartley and
Zisserman in chapter (9), [1]. But the scene points(irrespective of whether they lie
on scene plane or not) in general also setup point correspondences between the
two images. We look at this mapping in an algebraic formulation next.
14
2.1.2
(2.2)
Hartley and Zisserman in [1] term this representation to be the algebraic expression of the epipolar geometry. Given a pair of cameras, their image planes have
point correspondences related through this algebraic equation. But the point mapping is not unique which is evident from the two figures (2.1) and (2.2).
point c in plane 2 . In short all points that lie on line c00 e1 are mapped to the same
point in plane 2 . Thus we say that the line c00 e1 corresponds to the point c. For
geometric intuition one has the following definitions from [1]:
(2.3)
2.1.3
The fundamental matrix is of rank 2, unique upto a non-zero real scalar. Certain
decompositions and properties of this fundamental matrix are enlisted below for
a quick reference. Detailed discussions on properties and different interpretations
can be obtained from [1, 8, 7]:
1. If P1 and P2 are the projection matrices1 , of two cameras then F = [e2 ] P2 P1 .
2. If the relative orientation and position between the two cameras are defined
1
16
(2.4)
3. If the scene contains a plane and the point mapping through the plane is
defined by the homography H,
F = [e] H,
were e is the epipole of the image plane 2 of the second camera and H is
defined such that
x = Hx 00 , x 2 , x 00 1 .
(2.5)
The second property is helpful for an intuitive grasp of the setup. The fundamental matrix maps points from one image to the other albeit upto a certain
ambiguity. The points are specified in local coordinate systems2 . The decomposition is though specified in terms of R and t which can be seen as being external
or specified in absolute coordinate system as compared to the image and scene
planes involved. This enables us to infer from an algebraic point of view how
does the change in R and/or t affect the change in point mapping. For more clarification, we can put equations (2.2) and (2.4) together:
x T K T [t] RK 1 x 00 = 0.
(2.6)
2.1.4
Before taking up the problem with two cameras, we consider a situation with just
one camera and the scene plane 1 . For a given relative orientation of the camera
2 To every plane(image or object) we fix an internal cartesian coordinate system. When we talk
of calibration matrix being fixed, we mean the coordinate system as well.
17
the mapping 1 4 . Thus given a relative orientation of the camera and the
planes, we can construct a unique homography. This statement is well proved and
discussed in depth in the textbook, [1] by Hartley and Zisserman, and which we
accept here without proof. The actual question is inverse of the above statement:
For a given homography can we orient the camera and scene plane in order to induce the
given matrix?". If we have fixed coordinate systems in both the planes, the given
homography actually translates to a euclidean problem. The homography, thus
gives us four point correspondences5 between two planes 1 and :
ai bi , ai , bi 1 , 1 i 4.
(2.7)
Thus the problem is about finding an orientation between the camera and scene
plane such that the point correspondences as mentioned above are obtained. One
can show that not any given homography (or a set of four point correspondences) can be
represented by an arrangement of the camera and the scene plane. It amounts to getting
the right representation and at the same time reducing the number of unknowns
and the number of equations in play. Once the basic arrangement is laid out, the
3 Following
18
two arbitrary points ought be specified in the local coordinate system. So we can select
T
T
P1 = 1 0 and P2 = 0 1 .
19
2.1.5
Adding one more camera to the above arrangement, we have two cameras and
the scene plane, . We assume 1 and 2 are image planes of two cameras. Any
20
(2.8)
The orientation so obtained is the pose between the two cameras consisting of
rotation R and translation t. An exact dependence of R and t on H as well as on
epipole e(of image plane 2 if H is defined as in equation (2.5)).
Algebraically the relation is specified as :
R = 1 (K 1 HK + K 1 ev T K )
(2.9)
t = K 1 e,
(2.10)
21
(2.11)
for one approach to pose estimation, taken up at the end in chapter (3). But it is
purely an optimization task though there is some possible of future work on it.
This thesis focuses on a different approach that involves one defining equation
instead of two here. We can combine this two equations by eliminating e. The
equation so formed forms the basis of our geometric approach. This equation has
been solved through optimization tool as well, but with results not good enough,
we create a geometric design and estimate R and t. Discussion on this design is
given in section (3.4) of chapter (3).
2.2
Conics
The epipolar geometry is laid out in previous section. It defines the point correspondences between the two images. Such point correspondences lead to correspondences between more complex features of images. The main focus of this
thesis being the use of conic correspondences for pose estimation, it is worthwhile
investigating the formulation of conics, its basic properties and mathematical definitions for a conic correspondence. A conic is a second degree curve in a plane
described by a quadratic equation as its solution set:
ax2 + bxy + cy2 + dx + ey + f = 0.
(2.12)
(2.13)
x
h
i
x y z b/2 c e/2 y = 0.
d/2 e/2
f
z
b/2 d/2
b/2 c/2
(2.14)
or non-degenerate conics:
1. A degenerate conic is the one in which rank(C ) is less than three. In this case
h
iT
the conic reduces to either a lone real point ( 0 0 0 ), a set of two lines
24
(2.15)
Figure 2.6: A cone with its apex at origin, image from [1].
2.3
Quadrics
i h
iT
x y z w S x y z w =0
(2.16)
where S is a real symmetric 4 4 matrix defining the solution set. The above
definition tells that the surface is a quadratic surface. Just like conics, various
classes of quadrics can be studied through its defining matrix S. Henceforth we
shall denote a quadric as a set of points by its defining matrix, S.
A quadric has certain fundamental properties which can be read from chapter-3
in the book [1]. For the sake of reference below are certain properties which we
would need in coming chapters:
25
2.4
Summary
27
C HAPTER 3
3.1
Let us restate the two equations that define R and t, (2.9) and (2.10) in terms of
epipolar geometry,
R = 1 (K 1 HK + K 1 ev T K ),
t = K 1 e,
(3.1)
where
1. R and t are the rotation matrix and the translation vector respectively describing the orientation of camera cam(O1 , 1 , K ) with respect to cam(O2 , 2 , K ).
28
vT K
2
K 1 HK
.
(3.2)
(3.3)
is one of the many places in this thesis where the fact that the cameras are calibrated
would be used to simplify the situation.
29
point correspondences between points in 1 and 2 through the points of the scene plane is
mapped through the homography H. The above equation is proved in a different way
by Hartley, [1]. It can be shown with some algebra that u represents the position
of the scene plane . If is represented by the solution set of the equation
h
x y z
iT
R |m1 x + m2 y + m3 z + 1 = 0, m1 , m2 , m3 R .
3
(3.4)
h
iT
Then u is the vector m1 m2 m3 uniquely defining the position of . Henceforth we shall alternately denote the plane defined by a vector u as above, by
the notation u .
3.2
Conic correspondence
The scene plane contains conic, C whose images are C1 and C2 in planes 1 and
2 respectively. These two conics are measured in the local coordinate system.
Then we have the transformed conics as C10 = K T C1 K and C20 = K T C2 K as representations of the two conics in a transformed local coordinate system in which the
x plane y plane axes are aligned to the x y axes of the cameras coordinate system
and the origin O plane point of intersection of normal vector to . We can use the
equation of conic correspondence stated in equation (2.15),
H T C2 H = C1 ,
to form the new constraint
( R tu T )T K T C2 K ( R tu T ) K T C1 K = 033 .
(3.5)
This equation transforms the problem of pose estimation from conic correspondence into a problem of estimating R, t and u from a set of five equations.
Though the matrix equation has six polynomial equations in all, its elements being unique upto non-zero scalar multiple, we have five equations, or by introducing one more variable , we have six equations but the additional variable, . As
evident from the equation (3.5), t and u appear as a scalar product form. We need
to estimate u upto a scalar multiple. This argument reduces the variable set to R,
h
iT
t, u = 1 n2 n3 and : nine parameters in all from six equations. In order to
reduce the number of unknowns further, we introduce two assumptions,
1. Scene conic C is a circle in the global coordinate system.
30
( R tu T )T K T C2 K ( R tu T ) K T C1 K = 033 ,
w T t = 0,
(3.6)
where u, C1 , C2 and w are known and R, t and are to be estimated. If we consider the geometry described by the above equations, we can intuitively note that
all of the seven polynomials are algebraically independent. This means that for
non-trivial cases, unique solution(s) exist. These equations form the backbone of
the approach for pose estimation we next propose.
31
3.3
Let us consider the arrangement as given in section (1.1), where only one camera
cam(O1 , 1 , K ) and the scene plane are considered. Let the conic C1 1 be
known. Given this setup we claim that there are finitely many positions of the
2 Traditionally,
features have included points but lines, conics and curves have been subsequently used for estimating fundamental matrix.
32
plane , unique upto non-zero scalar multiple, such that the scene conic, C
is a circle in the global coordinate system. This coordinate system is assumed
to have its origin coincide with O1 and the x y axes parallel to the x y axes
of the local coordinate system in plane 1 . In other words the orientation of the
camera cam(O1 , 1 , K ) with respect to the global coordinate system is represented
by R = I3 and t = [0 0 0] T .
h
Q=
" #
I3
03
K T C1 K [ I3 03 ] .
Q = P T C1 P1 =
Ccal 03
03T
#
.
xyz
iT
R3 | x y z 1
"
#
i C
h
0
3
cal
03T
xyz1
iT
=0 .
Then the conic C in , ( is the scene plane) is the intersection of with Q. Then
the intersection of with Q is given as the solution set of the equation,
1 m2 y m3 z
#
m1
03
=0
0
z
1 m2 y m3 z
y z 1
m1
"
Ccal
03T
33
(3.7)
iT
1 m2 y m3 z
m1
1 m2 y m3 z
=
y z Ccal
= 0.
y
m1
(3.8)
Thus the conic, C, in global coordinate system is obtained as the set of points
1 m2 y m3 z
iT
m
1
(
1
m
y
m
z
)
m
y
m
z
2
3
2
3
3
&
=
0
C=
.
y z Ccal
x y z R |x =
y
m1
m1
m2
m3
1
, n2 =
, n3 =
,
m1
m1
m1
(3.9)
h
iT
h
iT
h
iT
o3 = n1 0 0 , o1 = n2 1 0 , o2 = n3 0 1 .
(3.10)
Then we have
1 m2 y m3 z
m1
1 m2 y m3 z
y z Ccal
= 0.
y
m1
(3.11)
T
T
T
y
i o1 Ccal o1 o1 Ccal o2 o1 Ccal o3
T
iT
34
h
iT
1 + n2 n3 + n23
,
,
a
=
n
0
0
1
1 + n2 n3 + n22
q
q
k1 = (n2 + n3 )2 + 2, k2 = ( Mn2 + n3 )2 + M2 + 1,
1 1 T
n2 + n3
,
b=
n1
k1
k1 k1
T
Mn2 + n3
M 1
c=
.
n1
k2
k2 k2
M=
(3.12)
The points a, b, c so parameterized lie on plane . And the orthogonal axes are
h
n2 + n3 1 1
ab =
k1
iT
,
ac =
Mn2 + n3 M 1
k2
iT
.
ab.
ac = 0, k ab k= 1, k
ac k= 1 .
From this parametrization, we have the following relationship between local coordinate vector representation [u v] T and global coordinate vector representation
u
yz
Mv
v
u
Mz y
,z =
u = k1
, v = k2
.
+
+
k1
k2
k1 k2
M1
M1
(3.13)
u
Mv
u
v
o3 + ( +
)(o1 ) + ( + )(o2 )
k1
k2
k1 k2
T
Ccal
u
Mv
u
v
o3 + ( +
)(o1 ) + ( + )(o2 )
k1
k2
k1 k2
u(o2 + o1 ) v( Mo1 + o2 )
o3 +
+
k1
k2
T
Ccal
u(o2 + o1 ) v( Mo1 + o2 )
o3 +
+
k1
k2
Rewriting we get,
T
T
T
u
i l1 Ccal l1 l1 Ccal l2 l1 Ccal l3
T
35
= 0.
= 0.
Mo1 + o2
o1 + o2
, l2 =
and l3 = o3 .
k1
k2
Thus the conic C in parameteric form in local coordinate system is defined as
where l1 =
TC l
TC l
TC l
u
l
l
l
2
3
1
cal
cal
cal
1
1
iT
h
i 1
T
T
T
C=
u v | u v 1 l2 Ccal l1 l2 Ccal l2 l2 Ccal l3 v = 0 , (3.14)
1
l3T Ccal l1 l3T Ccal l2 l3T Ccal l3
T
l2 Ccal l1 l2T Ccal l2 l2T Ccal l3 .
l3T Ccal l1 l3T Ccal l2 l3T Ccal l3
(3.15)
( u a )2 + ( v b )2 = r 2 ,
its matrix representation would be obtained by rewriting the equation as
a
u
1
b
u v 1 0
v = 0.
2
2
2
a b a + b r
1
For the conic C to be a circle, its matrix, as defined in equation (3.15), has to be
of the same form as given above. Hence we have two conditions as
l1T Ccal l1 l2T Ccal l2 = 0
(3.16)
and
l1T Ccal l2 = 0,
(3.17)
h
iT
h
iT
n2 + n3 1 1
Mn2 + n3 M 1
, l2 = p
, Ccal = K 1 C1 K and
where l1 = p
( n2 + n3 )2 + 2
( Mn2 + n3 )2 + M2 + 1
1 + n2 n3 + n23
M=
, with n1 , n2 and n3 as defined in equation (3.9).
1 + n2 n3 + n22
We can solve the two equations, (3.16) and (3.17) for two variables n2 and n3 . Then
the plane is defined by the vector,
h
iT
m1 1 n2 n3 ,
36
where m1 can take any real value. Thus this solution gives us a series of parallel
planes, each of which, upon its intersection with cone Q1 gives us a circle.
In summary, this lemma defines the relationship between the normal vector u
that defines the scene plane and the conic C1 in image plane. For this we have
assumed that the conic in scene plane which has been projected onto C1 in 1 is a
circle.
3.4
One straightforward way to estimate R and t from equation (3.6) is through optimization of a certain cost function. Such a method is described in section (4.1)
of chapter (4) along with certain discussion on results for sample experiments we
have performed. But we consider here its shortcomings which can be observed
from results of these experiments in section (4.1.1). Optimization approaches in
general have a tendency to get stuck in local minima. Additionally it is not feasible
to estimate all possible global minimas for such a system of multiple polynomials.
Hence we look for a solution which involves fewer number of variables and equations, and takes the problem of optimization almost out of question. By almost we
mean that the search space should contain fewer global minimas and we should
be able to analytically estimate all possible pose solutions. The result is a geometric approach where we can estimate R and t with more accuracy and reliability.
For this approach, we state the assumptions used, and then step-by-step construct
the setup in such a way that the problem of estimation of R and t is transformed
into the problem of estimation of the relative orientation between two circles of the same
radius. We get multiple solutions but finite in number, of which all but one can be
eliminated by certain observations laid out in section (3.4).
Clarification on notations:
We shall henceforth use the term "camera-1" to mean the camera cam(O1 , 1 , K )
and "camera-2" to mean cam(O2 , 2 , K ). If a vector u determines the linear equation defining the set of points lying on a plane as observed from equation (3.4),
then we shall alternately denote the plane and hence the set of points as u . Transformed versions of the same geometric quantity or quantities of the same type
shall be suffixed by numerals, or special characters which are defined as and when
needed, e.g. two planes defined by distinct vectors u1 and u2 shall be denoted by
u1 and u2 respectively. Additionally, if we have a scaled vector ui defining
a plane ( being a real scalar), we denote the same plane, alternately, as ui ,
37
where i can be any numeral or a sub-scripted string, e.g. ua23 denotes the plane
defined the vector u a23 , being a real scalar.
Geometric construction
The two assumptions which were stated at the start of this chapter help in geometric construction forming the bulk of this approach:
1. The scene conic C is a circle in the global coordinate system.
2. The translation vector t lies in a given plane(lets denote the plane as w )
specified in the global coordinate system.
The geometry is augmented by the availability of conic correspondences C1 C2 .
The calibrated counterparts of C1 and C2 are C10 and C20 respectively. Henceforth
in this section for geometric construction we shall use C10 to mean the calibrated
counterpart of the conic C1 defined as C10 = K T C1 K. Revisiting the arrangement of
section (1.1), let us form cones Q1 and Q20 through conics C10 and C20 respectively as
shown in the figure (3.1). This diagram is basically an extension of the two camera
setup of section (1.1). The rigid body motion that defines the relative pose of cone
Q1 with respect to cone Q2 is considered to be R and t. C10 is the intersection of 1
with Q1 and C20 is the intersection of 2 with Q20 . C is the intersection of with
Q1 or with Q20 . In fact the cones Q1 and Q20 have to intersect in a planar conic. A
result is stated by Quan in [16] that the two cones must intersect in a quartic curve
which disintegrates into two second order planar curves of which one is the scene
conic C.
Step-1
Let us apply rigid body motion3 on the structure formed from camera-2(shown
in yellow in the above figure) and cone Q20 . Camera-2 has its coordinate system
as origin O2 and triplet of axes { Xc , Yc , Zc }. In other words, we first rotate Q20
through rotation matrix R and then translate it through the translation matrix
t. This motion results in cone Q2 and circle C is transformed into circle C 0 . As
a result, the two cameras in this case would coincide and so would the image
planes 2 and 1 . We see further that the circle C and the circle C 0 have the same
radii. This situation is shown in the next figure (3.2). This rigid body motion is
precisely the relative pose we have to estimate. The idea lies in estimating the
3 The
38
(3.18)
can be seen as every cone extends to infinity and the radius can take any positive real
value by appropriately positioning the plane.
40
C= 0
a1
and
C0 = 0
a2
0
1
b1
0
1
b2
a1
b1
,
2
2
2
a1 + b1 r1
(3.19)
a2
b2
.
2
2
2
a2 + b2 r1
(3.20)
The circles are in a specific local orthonormal coordinate system which solely depends on the plane position in R3 . Matrix C represents the circle of intersection
of cone Q1 with u11 and C 0 represents the circle of intersection of cone Q2 with
u21 . Their radii being the same we denote them by r1 . The two circles can be seen
as one being a rigid body motion of the other. Let us have the relative orientation
defined as
Rx + t = y, x C, y C 0 .
Further we know that cones Q1 and Q20 intersect in C. Hence, applying the same
rigid body motion, R and t on cone Q20 , we should get the cone Q2 , as shown in
figure (3.2).
Step-3
The next step is to map circle C to C 0 through a rigid body motion comprising
of rotation R and translation t, on C. From the two circles representations (3.19)
h
iT
and (3.20) we have the centers of two circles represented as a1 b1 for C and
h
iT
center a2 b2 for C 0 . But these center representations are in a local coordinate
system, unique for each plane. Their representations in global coordinate system
are obtained through equation (3.13) as shown in next equation. We shall denote
41
the global coordinate representation of the two center points as xc1 and xc2 . Then
equations (3.12), (3.9) and (3.13) give us,
M b
a1
+ 1 1
k11
k12
a
b
zc1 = 1 + 1
k11 k12
1 m12 yc1 m13 zc1
,
xc1 =
m11
yc1 =
(3.21)
(3.22)
h
for plane u21 . The plane u11 is assumed to be defined by the vector m11 m12 m13
h
iT
and u21 is assumed to be defined by the vector m21 m22 m23 . From equations (3.21) and (3.22), centers of the two circles are represented in global coordinate system as
h
iT
h
iT
xc1 = xc1 yc1 zc1 , xc2 = xc2 yc2 zc2 .
Primary condition in mapping circle C to C 0 is that the center xc1 should be mapped
to xc2 . Second condition is that the translation vector t should satisfy the assumption w T t = 0 where w is pre-specified. These two conditions lead to our next step
of geometric construction. figure (3.3) depicts the geometric construction for estimating R and t by mapping one circle C to C 0 . Steps four and five next describe
and solve this construction.
Step-4
We have a plane w1 through point xc2 such that w1 k w . Then the point
xc1rot = Rxc1 (lets assume) ,
should lie on w1 and
t = xc2 xc1rot ,
42
iT
(3.23)
(3.24)
Let us denote the point on perpendicular line from origin to plane u11 and also
lying on u11 as p1 , whose coordinates depend on u11 as
p1 =
u11
.
k u11 k2
(3.25)
The plane through xc1rot and parallel to u21 is denoted as uc1rot and defined by
the vector, uc1rot as
uc1rot =
u21
.
T
( xc1rot
u21 )
The point on perpendicular line from origin to plane uc1rot and also lying on
uc1rot is denoted by p2 . Its coordinates depend on uc1rot as
p2 =
uc1rot
.
k uc1rot k2
43
(3.26)
Then the distance of p2 from xc1rot should be the same as that of p1 from xc1 , giving
us the following polynomial equation:
(3.27)
These equations (3.23), (3.24) and (3.27) encode the solution to the parameters R
and t. Point xc1rot obtained as a solution to the above three equations help us in
determining R with the following constraints:
Rxc1 = xc1rot ,
Rp1 = p2 .
(3.28)
R = BA1 ,
(3.30)
with A and B both being invertible matrices, justifying an existence of R as obtained above. Now, from the way the solution to R is designed, we can ascertain
the following from equations (3.24), (3.27) and (3.28),
(3.31)
and the angle between the vectors xc1 and p1 is the same as the angle between
vectors xc1rot and p2 . With these facts in mind, one can easily prove the following
with matrices A and B as defined in equation (3.29),
A T A = B T B.
From this it is straightforward to note that the matrix R = BA1 obtained as in
equation (3.30) is a rotation matrix. Once R is known, t is estimated as
t = xc2 Rxc1 ,
44
(3.32)
with xc1rot estimated from equation (3.28). Thus we have estimated R and t for
one solution point xc1rot to the three equations (3.23), (3.24) and (3.27) designed
for one pair of planes (u11 , u21 ).
Step-5
The three polynomial equations have at most four real solutions, and each point
gives one pose solution R and t, leading to a maximum of four pose solutions
for each plane pair. Then above steps are repeated for all possible plane pairs
{u11 , u21 } Usol . Thus we get a set of solutions R and t obtained for all such
plane pairs. For a general case there would be more than one in such a set, of
which one would be the true solution we desire and the rest are to be eliminated. Next section describes the non-uniqueness of solutions in this set and some
thoughts on how to pick the particular solution which actually realized camera
setup.
Non-uniqueness of solution
Case-1: The three equations (3.23), (3.24) and (3.27) are polynomials of degree one,
two and two respectively. One can eliminate a variable and reduce the three equations to two. Hence the total number of possible solutions are four for each pair
of planes as an application of bezouts theorem on counting intersection points of
two curves. Hence for every plane pair, (u11 , u21 ), we have at most four pose
solutions possible.
Case-2: The second case arises due to the fact that for every plane u1 in U1 , we
have two planes possible in U2 , u2 and u2 , such that ru1 = ru2 = ru2 . These
planes have their normal vectors in opposite directions. The discussion following
lemma (3) states the same fact which in precise terms can be rewritten as:
If (u11 , u21 ) Usol has translation t as part of its solution then (u11 , u21 )
has translation t as a part of one of its solutions. So the translation vectors for
all solutions to (u11 , u21 ) are negative counterparts of translation obtained as
a solution to (u11 , u21 ). The complete relationship between pose solutions for
plane pairs (u11 , u21 ) and (u11 , u21 ) can be derived based on equations (3.29)
and (3.30) as:
t1 = t,
R1 = B1 A1
45
(3.33)
h
i
where B1 = B + 2 0 0 xc1rot p2 .
h
i
R1 = R + 2 0 0 xc1rot p2 A1 .
(3.34)
This relationship gives us a way to estimate R and t for one pair of planes (u1 , u2 )
if R and t for the pair (u1 , u2 ) are known.
Case-3: The third case arises due to the fact that if (u11 , u21 ) Rsol gives us a
solution R and t then (u11 , u21 ) gives us a solution R and t/. We prove this
fact in lemma (19) of appendix (C).
Because of the first two cases of non-uniqueness of a pose solution we have thirty
two pose solutions in all. Accounting for the third case as well, we cant estimate
the translation vector beyond non-zero scaling. Case-1 & 2 can be solved through
some point correspondences or as we show next through some observations. We
show next a breakup of how one can eliminate all but one solution.
Solution to case-1 and case-2: These two problems can be worked out by using some point correspondences. Ideally a single point correspondence should
be enough to select a true solution. But the one point correspondence we have
might be realized by more than one solution of R and t. Unfortunately, to the best
of our knowledge, there is no one-shot way of selecting the right discriminating
point correspondence. Additionally the main focus of this thesis being on minimal correspondences, we look for other ways for fixing one solution of the set
of solutions. We have tested our approach on synthetic data. The data has been
designed to model a real world scenario as closely as possible. For this we have
used the epipolar geometry toolbox, [17] in MATLAB. The point which is taken
care for is that the circle which is imagined by both of the cameras is wholly in
front of the cameras. In other words if c is the camera center, is the image plane
and x is a point on circle, then c and x are points on different sides of the plane
. This would eliminate sixteen of the thirty two solutions. The procedure is outlined next.
Condition for the scene conic to lie in front of the camera:
Consider a plane pair (u11 , u21 ) and circles, C and C 0 in these two planes. Let
centers of the two circles be xc1 and xc2 as mentioned in step-3 above. Writh
iT
ing the defining vectors for the two planes, u11 as m1 1 n2 n3 and u21
h
iT
as m10 1 n20 n30 , lemma (2) fixes n2 , n3 for u11 and n20 , n30 for u21 , which
means that the factors, m1 and m10 scale centers xc1 and xc2 respectively. Hence we
need the scales to be such that m1 xc1 and m10 xc2 lie in the front of the first camera.
This condition gives a possible range for the scaling factors. Either the range con46
sists of positive real values or negative values, based on which, we eliminate one
half of pose solutions in Usol .
Second observation is that for scenarios with small rotation angles the geodesic
distance of rotation matrix from identity matrix is the least for the specific pose
solution which is the best approximation to the true solution. This hypothesis
has been tested extensively on synthetic datasets. The distance metric used is the
geodesic metric on unit sphere, [18]:
d( R, I3 ) =
trace( L T L)
3.5
Experiments
3.5.1
Synthetic dataset has been designed using the epipolar geometry toolbox, [17].
Not going in details of the process followed, it should be noted that a scene circle
is first chosen in R3 . Calibration matrix K is assumed to be an identity matrix.
Projection matrix P1 is the same for all examples, with the first camera assumed
to be at the origin of the world coordinate system. The projection matrix of the
second camera, P2 is chosen randomly, starting from ones with smaller rotation
angles and progressively with larger angles. One such dataset and its solution, so
obtained through our algorithm is described next.
47
and R2 , t2 be the two best pose solutions selected through our algorithm. Then
the camera for pose solution R1 , t1 is shown in blue which almost coincides with
the true pose for second camera and the camera for pose solution R2 , t2 is shown
in black color.
Departure of rotation matrices for these two solutions and the true solution from
identity matrix is
d( Rtrue , I3 ) = 0.3515, d( R1 , I3 ) = 0.3516 & d( R2 , I3 ) = 1.8472.
The distances are based on geodesic distance on unit sphere between two points
R1 and R2 in SO(3) group, [18]:
d( R1 , R2 ) =k log( R1T R2 ) k F ,
where R1 and R2 are two rotation matrices.
Table 3.1: Results of single stage geometric approach on synthetic dataset. Here
Rtrue and ttrue denote true values and, R and t denote the pose solution obtained
through convergence for gradient descent scheme.
Angles with respect to x, y and
z axes
10 , 0 , 0
10 , 20 , 0
0 , 10 , 5
1 , 10 , 8
30 , 0 , 0
1 , 30 , 80
0.5
0.1
0.1
0.7
0.1
1
1
3
0.1
1
11
1
0.1
1
3
0.0891
0.0980
0.0178
Recovered
Translation
vector, t
Geodesic
distance of
R from
I3
Is
the
selected solution
with
smallest
geodesic
length?
0.2428
0.2469
yes
0.0028
0.5513
yes
0.0080
0.2760
yes
0.0321
0.3154
yes
0.8615
0.7239
yes
0.0264
2.093
no
2.1 104
7.9 106
2.3 104
1.3 104
0.1666 104
3.9 104
0.50097
0.1020
0.0989
0.6993
0.0999
1.0000
0.9993
2.9993
0.1000
0.9960
11.0096
1.0049
0.0914
1.0642
3.0429
0.0900
0.9853
0.1790
Angle
between
t
and
ttrue
One more point to note is that we estimate the translation vector only upto a
non-zero scalar multiple. Hence we scale it up with the same scalar which has
scaled the true translation of second camera for visualization purposes in figure
(3.4). Hence for this case we select R1 , t1 as the best possible pose solution taking
into consideration the observation that this solution has its rotation matrix closest
to identity matrix in geodesic sense.
This was for one experiment on synthetic dataset for our proposed approach. We
49
perform similar experiments on more synthetic datasets and tabulate the results
in table (3.1). The results so obtained justify the observations stated in solution
to cases two and three. For rotation angles which are small enough, the pose
solution chosen with the smallest geodesic distance from identity matrix is the
best approximation to the true value. But for substantially large angles as seen in
the results given in last row of table (3.1), the geodesic distance is quite large and
definitely not the smallest among all solutions in Usol . This poses a challenge on
finding such a threshold and fixing it so that for all datasets where the geodesic
distances of true rotation matrices from identity matrix are within this threshold,
of all of the estimated rotation matrices, the particular rotation matrix with the
smallest geodesic distance from identity matrix has the smallest geodesic distance
from the true value as well.
3.5.2
We have used a Cannon DSLR camera for real data experimentation. We calibrate
the camera using the MATLAB toolbox by Bouguet, [19]. Calibration matrix then
retrieved is
K=
3565.00387
3559.46384
968.51381
636.14655 .
1
The images captured have their dimensions as 1920 1280 pixels. Skew is zero,
the pixels being completely square. Errors in focal lengths are [15.04591 14.79244]T
and that in principal points are [12.16201 15.07881]T . Distortion coefficients estimated are
kc = [0.11590 1.97105 0.00693 0.00054 0.00000] T [0.01742 0.25227 0.00145 0.00138 0.00000] T ,
where kc(1), kc(2) and kc(5) are the radial distortion coefficients and kc(3) and
kc(4) are tangential distortion coefficients, [19] and [20]. Pixel errors in image
points through reprojection are: [0.52954 0.61128]T .
In figures (3.5) and (3.6) we have two images containing conics as projections
of a scene circle. The two conics, C1 and C2 in images are detected upto a root
mean square error of 0.0086. Table (3.2) lists out the result of pose solution obtained for the images in figures (3.5) and (3.6). The ground truth values, Rtrue
and ttrue are estimated through the calibration tool, [19] in form of extrinsic parameters5 . Errors being there in estimated calibration matrix and distortion in
5 In fact one can see that we have a calibration pattern in the two images alongwith the conics.
The set of images used for camera calibration contains these two images as well.
50
the camera, we expect the ground truth values thus estimated to be inaccurate.
Hence the plane w that contains the translation vector assumed to be defined by
the vector w = [1 5 2.74233822] T is not very accurate which leads to errors
in estimated pose with respect to the ground truth values.
Table 3.2: Results of single stage geometric approach on real dataset
Geodesic distance of Rtrue
from I3
0.4810
743.7650
130.3833
508.9385
Recovered
Translation
vector, t
Geodesic
distance of
R from
I3
Is
the
selected solution
with
smallest
geodesic
distance from
I3 ?
36.2930
0.8480
yes
1.0580
0.50097
0.1020
0.0989
Angle
between
t
and
ttrue
At this point it seems quite plausible that the errors in calibration solely contribute to errors in estimated pose. In fact we would like to point out that the best
pose solution for the above real data is still quite far from the true values. To further reinforce this fact, we carry out a related experiment with part real and part
synthetic dataset.
3.5.3
We assume that the first image with conic C1 has been obtained through a camera
with calibration matrix K by imaging the scene circle obtained as one of the two
plane solutions of lemma (2) for K T C1 K. Here C1 is the conic in local image coordinate system. Hence K T C1 K is its calibrated counterpart. Using epipolar geometry
toolbox, [17] we project the scene circle back on the second cameras image plane,
with K as its calibration matrix. Conic C2syn thus obtained doesnt coincide with
the conic, C2 detected in previous experiment with real dataset. Figure (3.7) depicts the two conics, C2 (shown in red) and C2syn shown in green. The points are
51
Table 3.3: Result of part real data for investigating the error due to erroneous
calibration matrix
Geodesic distance of Rtrue
from I3
0.4810
743.7650
130.3833
508.9385
Recovered
Translation
vector, t
Angle
between
t
and
ttrue
Geodesic
distance of
R from
I3
Is
the
selected solution
with
smallest
geodesic
distance from
I3 ?
0.0168
0.4813
yes
0.0015
7.4437
1.3033
5.0907
Figure 3.7: Difference between the two conics of real and sythetic datasets
With this new dataset, we run our algorithm and select the best solution, tabulated in table (3.3). If we continue the assumption of previous section that C2 with
other parameters kept the same, gives us the same pose solution, we have
K T C2 K/syn = K T C2syn K/, , syn 6= 0
from equation (3.5). Hence C2 and C2syn represent the same conic which has been
found to not be true(as evident from figure (3.7)). Either the calibration matrix
or the ground true values for pose, arent accurate or the conics C1 and C2 have
erroneous representation matrices. But the conic detection algorithm has errors
of the order of 103 which can be considered to be sufficiently negligible. And
R and t as estimated through the toolbox, [19], give us pixel errors of the order
of 0.1. This leaves us the calibration matrix which has substantial errors, of the
order of 10 in each of its elements. Added to these errors is the distortion which
is not included in the calibration matrix, giving us incomplete rectification while
52
3.6
Summary
This chapter forms the core of the thesis. We start with an introduction to two
equations derived in section (1) which relate the relative pose to a conic correspondence. Based on these two equations, we devise a geometric construction
in an epipolar geometry framework simplified by two important assumptions
regarding scene conic and plane containing the true translation vector. The geometric approach thus proposed is tested upon both synthetic and real dataset.
The results so obtained are compared, analyzed and discussed in order to explain
the performance of our proposed method. In next chapter, (4) we consider two
alternate approaches to pose estimation from one conic correspondence. These
two approaches differ from the geometric method we have taken up in this chapter in the way in which we estimate the pose solution. As against this geometric
approach, these alternate approaches are based on optimization of cost functions
appropriately modeled on equations that relate pose, R, t to elements of epipolar
geometry, H, e, C1 , C2 .
53
C HAPTER 4
4.1
The equations which define the dependence of R and t on conics C1 , C2 and the
scene plane are,
( R tu T )T K T C2 K ( R tu T ) K T C1 K = 033 ,
w T t = 0,
(4.1)
where u, C1 , C2 , w and K are known and R, t and are to be estimated. For sake
of brevity we assume C10 = K T C1 K and C20 = K T C2 K. From this we define the
54
C10 ,
scalar multiple. Hence from equation (3.6), we can consider u as a known constant
and hence have to estimate all elements of t. Vector w being constant, we have
unknown variables as Y, t and . The norm for matrices considered here is the
frobenius norm:
k A kF =
trace( A T A).
We have replaced the rotation matrix, R, with a real matrix Y and additional constraints Y T Y = I3 and det(Y ) = 1. The cost function has been optimized through a
command lsqnonlin(.) in MATLAB, [21]. Results of sample experiments with this
approach are listed in section (4.1.1). With a random starting point, the behavior
of the algorithm is as expected for a conventional optimization technique. After
a certain value of cost function is achieved, the algorithm tends to get stuck in a
local minima. Additionally the final value achieved upon convergence depends
on the starting point. Due to these reasons, it is practically unfeasible to estimate
a unique solution in form of a global minima to the cost function. This is evident
from results listed in table (4.1) of section (4.1.1). With a starting point closer to
the true value, the algorithm converges to a solution which is considerably close
to the true value. But with a starting point which is considerably far from the true
values, the point reached upon convergence is far from the true solution.
One can perform optimization by explicit computation of gradient descent as
E(Y, t, )
E(Y, t, ) E(Y, t)
,
and
are:
well. These vectors,
Y
t
E(Y, t, )
= 4C20 YY T C20 + 2(tT C20 Yu)C20 tuT + 2 k u k2 C20 ttT C20 Y + 2C20 YL
Y
T
+ C20 tuT LT + L0 + 4Y T Y 4Y + 2det(Y ) R
n ( det (Y ) 1),
E(Y, t, )
= 2(tT C20 Yu)C20 Yu + 2 k u k2 C22 t + 4 k u k4 C20 t 4uT C10 uC20 t + C2 Yu
t
+ 2C20 tuT Y T C20 Yu+ k u k2 (tT C20 tC20 Yu + 2C20 ttT C20 Yu) C20 YC10 u + 2(wT t)w,
E(Y, t, )
= 2tT C20 tuT C10 u + 2(trace(C12 )) trace(Y T C20 YC10 ) tT C20 YC10 u,
(4.3)
The derivations of the above partial differentials are omitted for they are quite
straightforward and elementary. Properties of matrix trace are used to simplify
the expressions resulting in the above equations. Now the parameters R, t and
can be iteratively updated through gradients given in equation (4.3). But we dont
give the exact algorithm here as such an algorithm would be quite straightforward
and not of importance here. As a matter of fact the performance is almost the same
as those obtained through optimization by lsqonlin(.) method.
4.1.1
For testing purposes we run optimization on the cost function of equation (4.2)
using optimization toolbox of MATLAB. Table (4.1) lists results of sample experiTable 4.1: Result of gradient descent approach on synthetic data. Here Rinit and
tinit denote starting points, Rtrue and ttrue denote true values and, R and t denote
the pose solution obtained through convergence for gradient descent scheme.
Angle
between
tinit and
ttrue
k Rtrue Rinit k F
Translation
vector, ttrue
11.6901
2.5935
12.5894
2.1696
60.9918
1.1672
3.2769
1.5701
4.4206
1.7508
0.5
0.1
0.1
0.5
0.1
0.1
0.5
0.1
0.1
0.5
0.1
0.1
0.5
0.1
0.1
k R Rtrue k F
Recovered
Translation
vector, t
15.6250
0.7307
4.7318
15.8957
0.4495
5.0122
1.2964
1.6602
1.5375
4.4460
0.9152
0.9110
15.8957
0.4495
5.0122
10.1486
0.7641
11.3576
0.7361
62.0119
0.0016
0.4041
0.1697
5.3466
0.1555
2.9376
2.7815
0.3382
0.5440
0.5583
purely optimization scheme, it is not feasible to estimate all possible pose solutions.
The function lsqnonlin(.) of optimization toolbox in MATLAB has the option of
two types of optimization algorithms inherently. One is the Levenberg Marquardt
algorithm, [22] and the other is the trust region method. These two vary in a manner which is not quite important to our problem at hand. But what is crucial is
the fact that the these algorithms dont always converge to the global minima or
even if they do, one can never ascertain fully how many distinct points of global
minima our cost function can attain. A second problem here is that the cost function which we are attempting to solve is a set of thirteen polynomials in thirteen
variables. Theoretically such a solution set has multiple solutions and through
such a pure optimization approach, it is not feasible to estimate all possible pose
solutions.
4.2
Another approach which we have given some thought to is a based on two stage
dependence of R and t on point and conic correspondences. This relationship
depends on a property of fundamental matrix that defines point correspondence
between two image planes 1 and 2 as,
a b b T Fa = 0, a 1 & b 2 .
(4.4)
57
(4.5)
(4.6)
This equation introduces a constraint on H in the form of a zero set of six homogeneous polynomials in nine homogeneous variables2 , which are the elements
of vector h. Let us assume h = vec( H ), where vec(.) is the usual vector operation
in linear algebra that transforms an n n dimensional matrix to n2 dimensional
vector formed by stacking up the columns of the matrix. The equation (4.6) is thus
transformed into a set of five polynomials given next:
h T S1 h
h S2 h
TS h = 0
f : R9 R5 : f (h) =
h
51 ,
3
h S4 h
h T S5 h
1 In
(4.7)
fact a set of three collinear points in one plane would invariably be mapped to three
collinear points in the other plane.
2 The conic and homography representations are in homogeneous coordinates, due to which
we estimate H uotp non-zero scalar multiple.
58
where Si are nine dimensional quadrics or real symmetric matrices of the form
C2
03 3
03 3
033
S1 = 033
033
033
S4 = 033
03 3
033
033 , S2 = 033
p1 C2
033
c2 /2
033
033 , S5 = 033
c2 /2
033
p4 C2
03 3
033
c2 /2
033
033
033 , S3 = c2 /2
p2 C2
033
033
c2 /2 .
c2 /2
p5 C2
C2
03 3
033
033
033
03 3
03 3
03 3
03 3
03 3 ,
p3 C2
(4.8)
These matrices define five polynomials of equation (4.7) which put five independent constraints on H. The condition that the vector h is unique upto non-zero
scalar multiplication gives us a polynomial constraint on h defined as:
f 6 (h) = (h T h 1)2 = 0.
(4.9)
(4.10)
(4.11)
The eight polynomials defined in equations (4.7), (4.9), (4.10) and (4.11) put eight
constraints on twelve parameters, H and e. With this accounting, we would need
four more constraints and if not, we it is not possible to get a unique pose solution. A way out of this is to add more point correspondences for each point correspondence has one polynomial constraint and correspondences between points
in general position would give algebraically independent constraint polynomials.
With exactly five point correspondences, we have a fully determined polynomial
59
system. Kahl and Heyden estimate the fundamental matrix through five point
correspondences and one conic correspondence in [11]. Now even if the system
of polynomials we have described is fully determined, each of the polynomials
so obtained would be quadratic in twelve variables. Hence we wouldnt get a
unique solution and hence the correct solution for an arbitrary starting point. Secondly, we have certain assumptions to simplify the epipolar geometry and our
primary focus is on minimal feature correspondences. Due to these two reasons
we dont go beyond one point correspondence here, but we do note that more
point correspondences can be integrated in this approach with minimal effort.
4.2.1
Similar to the first approach as discussed in section (4), we can implement this
approach in two ways. One is to use the MATLAB function lsqnonlin(.) to optimize
the cost function obtained as a sum of squares of polynomials in equations (4.7),
(4.9), (4.10) and (4.11):
E(e, H ) = (b T [e] Ha)2 + (h T h 1)2 + (e T e 1)2 + k H T C2 H C1 k2F . (4.12)
A second way is an algorithm keeping in mind the specific nature of function f
of equation (4.7). Here the cost function would be a sum of squares of polynomials
in equations, (4.9), (4.10) and (4.11),
E(e, H ) = (b T [e] Ha)2 + (h T h 1)2 + (e T e 1)2 .
(4.13)
This algorithm primarily aims to minimize the above cost function under a condition that h satisfies the condition f (h) = 0. As proved in lemma (17) of appendix
(C) the zero set of polynomial function f defines a fourth order manifold in R9 . Let
this manifold be denoted as MX . Our gradient descent algorithm then confines
the vector movement to MX . In summary we choose an initial point on MX (say
hinitial ) and estimate the gradient vector of the cost function with respect to h at
hinitial . Then we project the estimated gradient vector on the tangent space to MX
at the same point at hinitial . Next we update h along this vector. Such an approach
would minimize the cost function E(e, H ) and simultaneously impose upon h the
constraint that f (h) = 0 or h lies in MX . The gradients of E(e, H ) with respect to e
60
a1 [ e ] b
E(e, H )
(4.14)
iT
1. Update e as en+1 = en te
E(e, H )
|e=en ,H = Hn .
e
E(e, H )
|e=en+1 ,H = Hn on the tangent space of MX at point hn =
h
vec( Hn ). Let the projected vector so obtained be h.
2. Project
4.2.2
Both of these methods estimate e and H and hence F through one point and one
conic correspondence. Unlike the approach of section (4.1), we do away with the
two assumptions regarding scene conic being a circle and the translation plane
being known. As stated previously, R and t then can be found from via an SVD
decomposition of E = K T FK 1 . But the problem here is that being an under determined polynomial system, the solution obtained through optimization would
vary with starting points and may not the be true one. We have implemented
61
these two optimization tasks on synthetic datasets, but the results arent promising enough to be listed here. The reason why we have listed this approach is that
there is some intuition in this idea. Previously we saw that MX is a polynomial
manifold and can be seen as an intersection of five quadrics in R9 . Their defining matrices,S1 , ..., S5 are 9 9 symmetric matrices with a special structure. This
fact opens up a new way of looking at this optimization task. If the geometric
structure of this quadrics intersection is studied in detail, it may be possible to
have an improved optimization algorithm which gives us more accurate pose solutions. Further, we can see that the intersection is a non-linear set of points in R9 .
Hence a point of importance, we assume, is that by identifying the subsets of this
intersection set which are linear sets, we can simplify the process of optimization.
4.3
Summary
This chapter introduces two alternate ways of estimating pose from one conic
correspondence. The first approach, section (4.1) considers two assumptions and
hence one conic correspondence is enough for pose estimation. Whereas the second approach of section (4.2) doesnt consider the two assumptions and hence
needs five more point correspondences. For both of these approaches we design
cost functions which are optimized through either MATALBs optimization toolbox or gradient descent by explicit computation of the gradient vectors. We state
a common point for these two approaches, that the optimization tasks fail to converge to the true solution point. For the first approach we carry out experiments
on synthetic data and justify this point as well.
62
C HAPTER 5
63
the error in camera calibration. This forms a possibility that if we have highly accurate camera calibration data, the pose so obtained for such a real dataset would
be highly accurate.
Additionally, as is with problems in computer vision, there are many assumptions at work, some important and some trivial. The important ones here have
been the two assumptions we stated in opening paragraph here. Though they
are practically feasible for many scenes, a future line of work is possible where
one can generalize the assumptions and reconstruct a similar geometry, for pose
estimation. The circle assumption helps one fix two of the three elements of the
normal vector defining the scene plane. Similarly we can consider a general conic
and for that attempt to simplify the epipolar geometry.
Lastly we note that the geometric approach has been an alternative to conventional optimization approach. The problem of local minima, in optimization
approach is justified by sample experiments on synthetic datasets which has been
the primary motivation for considering geometric construction for pose estimation. Still we take-up these optimization based approaches in previous chapter
(4). The main reason is two fold. Firstly we demonstrate the above mentioned
shortcoming of optimizing a cost function modeled on the two equations that directly relate the pose R and t to conic correspondence and position of scene plane.
Secondly, we formulate a different cost function based on a combination of a point
and a conic correspondence. This doesnt need two assumptions we employed for
simplifying the geometry in geometric approach and the first alternate approach
for optimization. Additionally, the solution set is non-linear and to the best of our
knowledge, is not convex. But the solution set contains an intersection set of five
quadrics in a large, R9 space. These five quadrics have special structures, which
we might be able to investigate further for simplifying the optimization task. A
detained study of quadric intersection is a good line of future work to improve
upon the task of optimization approach for such a cost function.
To surmise, we note that all of these approaches(the geometric approach and
the two alternate approaches) are loosely based on a set of equations defining
the pose R, t in terms of various elements of epipolar geometry including point
and/or conic correspondences. Then we estimate the pose as a solution to these
equations. There have been attempts and studies of method of estimating pose
from a different perspective altogether. To cite a few, Kaminski and Shashua in
64
65
References
[1] R. Hartley and A. Zisserman, Multiple view geometry in Computer vision. Cambridge, 2003.
[2] P. Gurdjos, A. Crouzil, and R. Payrissat, Another way of looking at
plane-based calibration: The centre circle constraint, in Computer Vision,A
ECCV 2002, A. Heyden, G. Sparr, M. Nielsen, and P. Johansen, Eds.
Springer Berlin Heidelberg, 2002, vol. 2353, pp. 252266. [Online]. Available:
http://dx.doi.org/10.1007/3-540-47979-1_17
[3] Apollonius, Treatise on conic sections, T.L.Heath, Ed. New York: Dover, 1961.
[4] R. Haralick, H. Joo, D. Lee, S. Zhuang, V. Vaidya, and M. Kim, Pose estimation from corresponding point data, Systems, Man and Cybernetics, IEEE
Transactions on, vol. 19, no. 6, pp. 14261446, 1989.
[5] Longuet, A computer algorithm for reconstructing a scene from two projections, Nature, vol. 293, pp. 133135, Sep. 1981.
[6] R. M. Haralick, C.-N. Lee, K. Ottenberg, and M. Nlle, Review and analysis
of solutions of the three point perspective pose estimation problem, Int. J.
Comput. Vision, vol. 13, no. 3, pp. 331356, Dec. 1994. [Online]. Available:
http://dx.doi.org/10.1007/BF02028352
[7] Z. Zhang and T. Kanade, Determining the epipolar geometry and its uncertainty: A review, International Journal of Computer Vision, vol. 27, pp. 161195,
1998.
[8] Q.-T. Luong and O. D. Faugeras, The fundamental matrix:
Theory,
66
Available: http://dl.acm.org/citation.cfm?id=645305.648678
[11] F. Kahl and A. Heyden, Using conic correspondences in two images to estimate the epipolar geometry, in Computer Vision, 1998. Sixth International
Conference on, 1998, pp. 761766.
[12] Q. Ji, M. S. Costa, R. M. Haralick, and L. G. Shapiro, An integrated linear
technique for pose estimation from different geometric features, 1999.
[13] G. Wang, Q. Wu, and Z. Ji, Pose estimation from circle or parallel lines
ACCV 2007, ser. Lecture Notes
in a single image, in Computer Vision U
in Computer Science, Y. Yagi, S. Kang, I. Kweon, and H. Zha, Eds.
Springer Berlin Heidelberg, 2007, vol. 4844, pp. 363372. [Online]. Available:
http://dx.doi.org/10.1007/978-3-540-76390-1_36
[14] E. Trucco and A. Verri, Introductory Techniques for 3-D Computer Vision. Upper
Saddle River, NJ, USA: Prentice Hall PTR, 1998.
[15] D. K. Prasad, C. Quek, and M. K. H. Leung, A precise ellipse fitting
method for noisy data, in Proceedings of the 9th International Conference
on Image Analysis and Recognition - Volume Part I, ser. ICIAR12.
Heidelberg:
Berlin,
http://dx.doi.org/10.1007/978-3-642-31295-3_30
[16] L. Quan, Conic correspondence and reconstruction from two views, Pattern recognition and machine intelligence, vol. 18, 1996.
[17] G. Mariottini and D. Prattichizzo, Egt: a toolbox for multiple view geometry
and visual servoing, IEEE Robotics and Automation Magazine, vol. 3, no. 12,
December 2005.
[18] D. Huynh, Metrics for 3d rotations: Comparison and analysis, Journal of
Mathematical Imaging and Vision, vol. 35, no. 2, pp. 155164, 2009. [Online].
Available: http://dx.doi.org/10.1007/s10851-009-0161-2
67
http://dblp.uni-trier.de/db/journals/corr/corr1303.
html#abs-1303-3358
[26] J. Gallier, Geometric Methods and Applications: For Computer Science and Engineering, 2nd ed.
rated, 2011.
68
[31] Z. Zhang, A flexible new technique for camera calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22(11), pp. 13301334,
2000.
[32] [Online]. Available: http://en.wikipedia.org/wiki/Bezouts_theorem
69
C HAPTER A
A.1
Affine Geometry
In this section we introduce the geometry of affine spaces. These discussions will
lay the foundation for projective geometry.
A.1.1
Affine spaces
An affine space is a set A together with a vector space V and a faithful and transitive group action of V 1 (with addition of vectors as group action) on A. Explicitly,
an affine space is a set of points, A together with a map:
l : V A A, (, a) 7 + a with the following properties:
1. Left identity: a A, 0 + a = a (0 is a vector).
2. Associativity: , w V, a A, + (w + a) = ( + w) + a.
3. Uniqueness: a A, a : V A, 7 ( + a) is a bijection. (This justifies transitivity of the map l and faithfulness is seen in the fact that if two
elements f , g of V are such that
f + a = a, g + a = a, a A,
then f = g.
The vector space V is said to underly the affine space A and is also called a difference space. Thus we have the operator + ( defined as the map l) between a point
and a vector. Equivalently we can define an affine space in another way. We can
see it as some results that come from the above definition, considering an affine
space A and the underlying vector space is V:
1A
group action of a vector space V on a set X is a map V X X, (v, x ) :7 v.x with associativity and existence of an identity element in V.
70
(, a) 7 + a = b
= v + a = u + a
= (v u) + a = a.
(A.1)
Further, the uniqueness property says that 0 is the only vector such that
0 + a = a, a A.
Hence we have
v u = 0 = v = u.
This proves that the map : A A V is well defined. Also for every vector v
in V and every point a in A, we can find a point b in A such that b = v + a. Hence
( a, b) = v, b A, v V.
This proves that the map is onto. And we can find distinct points a1 , a2 , b1 , b2 in
A such that v = ( a1 , b1 ) = ( a2 , b2 ) for atleast one v in V. This proves that is
many-one.
Lemma 5. For three points a, b, c in A, ( a, b) + (b, c) = ( a, c) where is what we
have defined in lemma (4) above.
We shall accept this lemma without proof here.
Thus with a definition of an affine space in place, we can note one point that an
affine space is actually a set of points with such a vector space and represent it as
where X is the underlying vector space. Henceforth in this literature we will use
71
either of these notations for affine spaces. Also, if for two points a and b in A
( a, b) = , V,
(A.2)
then (b, a) = v.
A.1.2
{mi }1<i<n in V form a basis of V iff for every vector w in V, w can be written
as a linear combination of {mi }1<i<n , as w = in=1 (i mi ). A similar parallelism
can be obtained in the case of affine spaces as is described next.
Let us have an affine space ( A, V, ) and consider a linear combination of the set
of points { ai }1<i<n+1 in A, as
w = in=1 (i ai )
where {i }1in R. Ideally any combination like this is not defined. But
under a special case where in=1 (i ) = 1 we can define this notation to be w =
in=1 (i ( ai o )) + o for a point o in A. This is well defined since
V
( ai o ) =
oa
i
and it is equivalent to an addition of a point in A to a vector in V.
Thus for every pair of points o1 and o2 in A,
w = in=1 (i ( ai o1 )) + o1 = in=1 (i ( ai o2 )) + o2 .
Hence w is the same whatever point of A we select as our center. Such a combination of points of an affine space, with the scalars selected in such a way that
their sum is one, is termed as an affine combination. So we can think of in=1 (i ai )as
a valid point and establish the fact that affine combinations are preserved in an
affine space an affine space is closed with respect to affine combinations.
This discussion leads to the definition of a basis for an affine space. Given
an affine space ( A, V, ) , we define a set of points {mi }1in+1 to be a basis for
72
{
m
i m j }1 jn+1,i 6= j for its difference vector space V. By definition (A.1.1), the mapping
m
mi : mi (mi , a) 7
i a = , V
n +1
m
i a = j=1,j6=i ( j mi m j )
where { j } are real non-zero scalars. Hence using definition of of lemma (4),
n +1
1
a = nj=+1,j
6=i ( j mi m j ) + mi = j=1 ( j om j ) + o,
for a point o A. The second equality holds true for any o by lemma (5)
m
i m j = om j + mi o
and hence
n+1
n +1
1
nj=+1,j
6=i ( j mi m j )+ mi = j=1,j6=i ( j om j ) omi ( j=1,j6=i ( j )) + mi
n +1
1
= nj=+1,j
6=i ( j om j ) + omi (1 j=1,j6=i ( j )) + o
) + o where = 1 n+1
= nj=+11 ( j
om
j
i
k=1,k6=i k
By our definition of a barycenter
+1
a = nk=
1 ( j mk )
where n+1 = 1 nk=i k . Thus a is uniquely represented as a barycenter of the
family of points ((mk , k ))1kn+1 A.
A.1.3
Affine morphism
0 0
0
0
Given two affine spaces ( X, X , ) and ( X , X , ), a map f : X X is said to be
0
an affine morphism if and only if we can find a linear application f : X X and
73
have
f X
60
f 0
X -X
such that f (b) f (c) = f ( bc ), b,cX . As we know that is many-one, we can fix
0
bX , f (b) = f ( a) + f ( ab )
or v V, f ( a + v) = f ( a) + f (v).
(A.3)
This definition brings us to a result stated by Gallier in [26] which says that a
0
map f :XX is defined to be an affine morphism if and only if for every family of weighted points (( ai , i ))i I in X, the affine combination(or barycenter) is
preserved,
f (i I i ai ) = i I (i f ( ai )).
(A.4)
0
0
0
Here : XX X and : X X X are as defined in lemma (4). With some
elementary algebra we can show that equations (A.3) and (A.4) are equivalent
conditions.
A.1.4
Affine subspaces
as for an affine space, ( X, X , ), U is its subspace if and only if we can find a unique
vector subpsace U of X , such that (U, U , )2 is also an affine space as defined in section
(A.1.1).
This definition leads to many results :
is the same function as defined in the definition of the parent affine space ( X, X , ) in
74
another way we can see that U can be generated from Ua as u = + a for a unique
A.1.5
As mentioned earlier one of the motives of defining geometries as groups of transformations that leave certain properties invariant is that our primary interest lies
in capturing the invariant properties and thus define the transformations based
on point correspondences. In affine geometry the three properties that remain invariant across affine morphism are parallelism, incidence and cross-ratios as seen in
[27]. Below we consider only one invariant parallelism. For cross-ratios are defined
for projective spaces in next section and affine spaces being constricted versions
also keep them invariant. Incidence is a very simple extension of parallelism only.
Parallelism invariance
are the affine subspaces of ( X, X , ). With some simpler algebra we can see that
this condition is equivalent to saying that U is parallel to V (denoted as U//V) iff
75
U V or V U or U V = .
Hence the condition of parallelism invariance can be stated as:
0
Lemma 8. Given an affine morphism f : X X between two affine spaces ( X, X , )
0
0
and ( X , X , ) if U, V be the affine subspaces of X such that U//V then the correspond0
U //V .
Proof. We have seen in the section on affine morphism, that f defines the corre
0
Let U V .
0
0
a f ( U ) = a = f ( a), a U
0
= a = f ( a), a V ( U V )
0
= a f ( V ).
(A.5)
0
0
Thus a f ( U ) = a f ( V ). Hence f ( U ) f ( V ). Similarly we
can show the other way round if we assume V U . Hence now all that we
need to show is that f (U ) and f (V ) are affine subspaces of X. For that we need to
0
0
are affine subspaces of ( X , X , ), we can say that parallel affine subspaces are
transformed into parallel subspaces in image affine space.
A.2
Projective Geometry
76
A.2.1
(A.6)
Here we assume En+1 is the vector space over R. In some cases we might generalize to the complex field C and mentioned as required. Also do we see that '
is an equivalence relation here.
Many other equivalent definitions are also found in literature for P( En+1 ). One
might be interested in looking at an equivalence relation as a 1-dimensional subspace of En+1 , thus P( En+1 ) can be seen as the set of all 1-dimensional subspaces
of En+1 , or also the set of all lines passing through origin in En+1 . Different ways
of looking at the definition, but essentially the same structure is obtained. Alternate ways of describing a projective geometry are interesting enough to not miss.
Hence just for the sake of lateral view:
A projective space is a triplet (P, L, I ) such that
1. Any pair of distinct points are joined by a unique line.
2. Given any four points A, B, C, and D with no three collinear, if AB intersects
CD, then AC intersects BD.3
3. Every line is incident with at least three distinct points.
4. There exist three non-collinear points.
P is a set of points, L is a set of lines and I is an incidence structure which gives
us the information as to which line is incident on which point and which point is
incident on which line. We can derive as results from these axioms many other
properties of a projective space including its invariants. But they being out of
the scope of this text we skip it. Beutelspacher in [28] and Casse in [29] give an
extensive treatment of this topic.
3 This
axiom leads to the much talked about property of a projective plane that any two lines
must intersect at a point.
77
A.2.2
With the definition of a projective space in place, we can talk of its basis. And
thus, a basis of an n dimensional projective space P( En+1 ) is defined as a set of
points { ai }1in+2 iff we can find vectors {vi }1in+1 such that
p(vi ) = ai , i,1in+1 and p(in=+11 (vi )) = an+2 ,
(A.7)
00
sets of vectors {vi }1in+1 and {vi }1in+1 satisfying the conditions in condition
(A.7) for the same set of projective points { ai }1in+2 in P( En+1 ) then
0
00
vi = vi i,1in+2 .
Hence every point a in P( En+1 ) represented by one set of co-ordinates in one
vector-basis would also represent the same projective point in the another vectorbasis. Both of these vector bases ought conform to the condition for the same set
of projective points. This statement needs a clarification. We normally identify a
basis of a vector space by the unique set of scalars needed to identify the point. In
this case we have no notions of linear combinations of points in projective space,
atleast until now. Hence we consider a set of points which correspond to a basis
in the corresponding vector space as a basis of our projective space. But again the
problem arises because each point in a projective space can correspond to multiple vectors. So to ensure that whichever vectors for the given points we choose,
coordinates representation of any point in projective space remains the same, we
specify conditions as given in equation (A.7). Again we can choose any vector for
specifying its coordinate representation but the projective point would remain
the same.
Now from the definition we see that every projective point maps to not one
but a set of vectors positioned at origin. Hence we can define a mapping known
as canonical projection
p : En+1 \ 0n+1 P( En+1 ),
(A.8)
such that every non-zero vector of En+1 is mapped to a unique point in P( En+1 ).
This mapping can be obtained in a simple manner as for some vector in E3 , ( a, b, c)
it is mapped to a projective point ( a/c, b/c) in P( E3 ) if c 6= 0. And if c = 0
78
the point is mapped to a point at infinity on the line passing through (0, 0) and
( a, b). Thus we can see the vectors with c 6= 0 map to points in R2 and those with
c = 0 map to points at infinity. This plain reasoning tells us that augmenting an
euclidean or an affine space with points at infinity leads us to a projective space.
More on formal treatment of this augmenting in later sections.
A.2.3
Projective transformation
over En0 0 +1 .
0
v E , f p ( v ) = p f ( v ),
0
(A.9)
where p and p are canonical projections of the vector spaces onto the set of respective projective spaces as defined in equation (A.8).
mapping f : P( En+1 ) P( En0 0 +1 ) is known as a projective morphism. Certain literature in computer vision community terms such a map as a homography, though
we would use these two terms interchangeably, in this thesis, we shall reserve the
term homography to mean a projective morphism between two projective planes.
The set of all morphisms is denoted by C ( E, E). Some results follow which we
accept without proof for now.
Lemma 9. Given a mapping f :P( En+1 )P( En0 0 +1 ), there is a unique f :En+1En0 0 +1
upto a non-zero scalar multiplication.
Lemma 10. Given a mapping f :En+1En0 0 +1 , there is a unique f :P( En+1 )P( En0 0 +1 ).
Next we consider only those cases where n = n0 . This thesis focuses on problems that would need transformations between projective spaces that are of equal
dimensions.
79
0
0
If the mapping f :P( En+1 ) P( En+1 ) is bijective , the mapping f :En+1 En+1 is
0
one-one in a sense that u, v En+1 , u 6= v, for any R, then f (u) 6= f (v),
0
0
for any R. Similarly we can see that f is onto as every vector in En+1 would
0
0
f :En+1En+1 can be alternately represented as multiplication by a matrix:
0
(A.10)
v En+1 , f (v) = Av = v,
P( En+1 ) be two projective spaces of n dimensions and let their associated vector spaces
0
be En+1 and En+1 . Lets assume {mi }1in+2 and {mi }1in+2 be the basis of P( En+1 )
0
and P( En+1 ) respectively. Then the theorem says that there is a unique homography
0
Proof. Given the basis of a projective space, {mi }1in+2 lets assume {
m i } 1 i n +1
80
i 1 i n +1
) = m
We shall use canonical projection functions, p :En+1\{0}P( En+1 ), p(
m
i
i
0
0
0
0
0
0
and p : En+1\{0} P( En+1 ), p ( mi ) = mi . From the given condition g(mi ) =
0
mi , i,1in+2 , we have
)) = g(m ) = m0 = p0 ( m0 ) = p0 (
)),
g( p(
m
g (
m
i
i
i
i,1i n+2 .
i
i
), R,
mi = i
g (
m
i
i
i,1i n+1 .
Let,
(A.11)
m n +2 =
g (
m n +2 ).
But
) and m0
n +1
mn+2 = in=+11 (
m
i
n +2 = i =1 ( m i ).
)) =
)).
g (
m
g (in=+11 (
m
in=+11 (i
i
i
g ( mi ) = mi , i,1in+2 , R \ {0}.
(A.12)
) =
Let us consider two homographies, g1 and g2 such that
g1 (
m
mi , i,1in+2
i
0
) =
and
g (
m
m ,
, , R \ {0}. Then from lemma (10), we deduce
2
i,1i n+2
that
g1 =
g2 (here = ) implies that there is a unique homography associated
A.2.4
Projective subspaces
Let V be a subset of a given projective space P( En+1 ) P( E) and its associated vector space En+1 , then it is a projective subspace of P( En+1 ) iff we can find
0
a vector subspace V of En+1 such that P(V ) = V 4 . Thus if V is an m dimen
0
f : En+1 En+1 . Hence each of the vectors of a basis of any subspace of En+1
0
A.2.5
Affine completion
m
m =
m = ( x , ..., x ) and representing it with its co-ordinates in the given
1
m
m , 1]) 5 . Hence as there is a one-one correspondence
p
n
1
n +1
82
p([ m , 0]). Hence these points lying at infinity can be seen to correspond uniquely
to projections of vectors
m X . Thus these points project onto the set
X = P( En+1 ) \ X.
Thus we have
P( En+1 ) = X t X = X t P( X ).
(A.13)
the same representation as that of a vector in X . But they are to be treated differently. The reason is that projective points are not vectors and hence we cannot
state that projective points are equivalent to vectors of X . Thus the points not at
infinity correspond to affine points and those at infinity correspond to projection
of X .
Affine and projective basis
In the previous section we have affine completion to obtain a projective basis.
0
tion, X = P( En+1 ) = X t P( X ). Further, let {mi }1in+1 be a basis for X
resent
a part of the basis of X = P( En+1 ). We have one morehpoint, pn+1 = i
h
i
m1 m i ), 1 .
p 0n , 1 . Adding these n + 1 points together, we get pn+2 = p (in=+21
0
Thus the set of points { pi }1in+2 forms a projective bas is of X . This is not an
unique basis but one we can always relate the affine space and its corresponding
projective space with.
Thus we see that given a point, y in X with its co-ordinate vector (Y1 , ..., Yn ) in
83
the affine basis {mi }1in+1 , the corresponding point in the corresponding projective basis as extended above would be represented by the vector (Y1 , ..., Yn , 1)
in the basis { pi }1in+2 of En+1 . Thus this relation of affine and projective bases
gives us the same relation between points of the two spaces as given in the previous section.
Relation between PLG ( En+1 ) and AG ( X )
X, P( En+1 ) = X = X t P( X ), we can see that PLG ( En+1 ) is the group of all homographies or collineations onto P( En+1 ) and AG ( X ) is the group of all invertible
affine morphisms f : X X. Then we can easily prove the next lemma:
Lemma 12. A homography can be an invertible affine morphism if and only if it leaves
the points at infinity, invariant.
We would accept this result without proof.
Results on Hyperplanes
A projective hyperplane is defined as a subspace of a projective space of dimensions
n 1 where n is the dimension of the specified projective space. Also we can talk
of vector hyperplanes in the sense that they are the vector subspaces of dimension
n 1 where the vector space is of dimension n. Thus a projective line is a hyperplane in 2 dimensional space and a plane is a hyperplane in 3 dimensional space.
It follows that a projective hyperplane is obtained by projection of a vector space
hyperplane.
Consider a set H ( En+1 ) of hyperplanes of a vector space En+1 , and a set of
projective hyperplanes, H ( P( En+1 )); then there is a one-one correspondence between H ( En+1 ) and H ( P( En+1 )). We see how it is so from the definition of a projective subspace of dimension n 1, that every proojective subspace is obtained
by the application of the canonical projection p : En+1 \{0} P( En+1 ) on a vector
subspace of En+1 , thus giving us a projective subspace of P( En+1 ). Also every
projective subspace can be obtained from a unique vector subspace. Thus we see a
one-one correspondence between H ( En+1 ) and H ( P( En+1 )),
84
(A.14)
(A.15)
A.2.6
We know that parallelism and incidence are invariant in any affine transformation. And this can be seen from the previous sections on affine invariants. Similarly here in projective geometry we can talk about invariants, which are cross-ratio
and incidence. Its elementary to prove that incidence is preserved in a projective
transformation. For all we have to show is that the underlying vector spaces for
two projective subspaces in which one is a subset of another, are transformed into
vector subspaces in which again the corresponding vector subspace is a subset of
the other image. This can be proved using the linear property of the linear vector
space mapping. Hence we just define a cross-ratio here:
Lemma 13. Given four points a, b, c, d on P( E2 ) such that the first three points are
distinct, and denoting h a,b,c as the homography on P( E2 ) such that h a,b,c ( a) = ,
85
h a,b,c (b) = 0 and h a,b,c (c) = 1. Then the cross-ratio, denoted as { a, b; c, d} is the element h a,b,c (d). This value is invariant to homography or collineations.
Proof. Assume that we have a homography at hand f :XX. Here we can prove
for the case X is one dimensional projective space or a line and then using the
result that incidence is preserved, we can show that the same result holds for any
dimensional projective space. Hence the cross-ratio of points a, b, c, d in X is given
by { a, b; c, d} = h a,b,c (d) where h a,b,c is a homography defined as
h a,b,c : X X.
Thus through a mapping by f , the corresponding points on X are f ( a) =
0
{ a , b ; c , d } = h a0 ,b0 ,c0 (d ).
From the result of section (A.2.3) the composition of homographies is also a homography. Hence h
0
0
a ,b ,c
f = h a,b,c 6 .
{ a , b ; c , d } = { a, b; c, d}.
A.2.7
Duality
itself upto multiplication by a non-zero scalar. Also can we get only one homography between two
1 dimensional projective spaces again upto multiplication by a non-zero scalar. And we can find
a one-one correspondence between two 1-dimensional spaces.
7 A pencil of hyperplanes is a set of hyperplanes where each hyperplane contains a specific
subspace.
86
two lemmas next, which we accept without proof. We still give short explanation
for each lemma along with.
Lemma 14. A point in P( En+1 ) uniquely represents a hyperplane, H in P( En+1 ) and
0
hence a unique hyperplane H in En+1 (Refer to equation (A.15)). In fact if the point is
f P( En+1 ), represented by the coordinate vector (1 , ..., n ) of En+1 upto a non-zero
scalar multiplication, then the corresponding hyperplane is defined by the equation h T x =
0 where h is a vector uniquely defined as (1 , ..., n ) upto a non-zero scalar multiplication.
This hyperplane is normally considered to lie in En+1 . Thus a set of points in P( En+1 ) is a
set of hyperplanes in En+1 and also by equation (A.15) is a set of hyperplanes in P( En+1 ).
Lemma 15. There is a unique n 2 dimensional subspace of P( En+1 ), V, such that
= { H H ( P( En+1 )) : V H }. And for every x not in V, there is a unique H
containning V. as either a line in P( En+1 ) or in P( En+1 ) (we can note that there is a
one-one correspondence between P( En+1 ) and P( En+1 )).
An elaborate proof can be read from [27], given by Faugeras et al. . This gives
us a kind of understanding of a line in a projective dual space P( En+1 ) and corresponding pencil of hyperplanes in the corresponding projective space P( En+1 ).
Ideally we have a vector in dual space uniquely corresponding to a vector in
the vector space and vice versa. And by result (A.15) each of the points on the line
in projective dual space P( En+1 ) corresponds to a unique hyperplane in the vector space En+1 and hence to a unique hyperplane in the projective space P( En+1 ).
The above result adds to it that corresponding to all these points lying on a line,
the hyperplanes in P( En+1 ), contain a common n 2 dimensional projective subspace, V.
Lemma 16. If we consider another line D in P( En+1 ) such that it doesnt intersect V,
then we have a homography,
D : H 7 H D.
Proof. We are given the application D : H 7 H D. Hence to show that
it is a homography, we consider the corresponding map between the respective
vector spaces:
= P( F ), D = P( D ), F En+1 , D En+1 .
f : F D
is the corresponding map which we show is an isomorphism. We proceed in two
parts.
87
1. Firstly we show that the given application is a bijection. Considering the map
0
f : F D for every vector g in F , we have a unique hyperplane H such that
0
H defined by the equation in=+11 (hi xi ) = 0 where {hi } are the coordinates of g in
some basis { f i } in En+1 .
From the previous result, we see that for every x in P( En+1 ), x
/ V, there is a
unique H such that H =< x, V >. If l is a 1 dimensional subspace of En+1 ,
0
x = P(l ) then H H = l + F.
0
Now d D, d
/ V there is a unique H and hence a unique H . Also for every
0
F . Again for every hyperplane H in F can be written as F + l where l is a onedimensional subspace of En+1 . With some elementary algebra we can show that l
must be a subspace of 2-dimensional D . This gives us a chance to relate each one
of the hyperplanes of F to a unique vector in D . This tells us that the application
f : F D ,
is a bijection.
2.
g F , g = in=1 (hi f i )
where { f i } forms the basis of En+1 . Thus {hi } are the co-ordinate vectors of g. We
0
where {mi } is a basis in hyperplane H . Hence writing in the matrix form, we see
that it is equivalent to in=+11 (hi xi ) = 0, where x1 , ..., xn+1 are coordinates of vector
x, and {hi } are linear functions of the co-ordinates of basis vectors {m j }, j,1 jn+1 .
0
(m1 , ..., mn1 , v) form the basis of H in En+1 . Hence we can show that {hi } are
functions of m1 , ..., mn1 , v. where all vectors except v being constant, g (h1 , ..., hn+1 )
becomes a linear function of v whose coordinates are v1 , ..., vn+1 . As we have n + 1
variables and n + 1 linear equations, and (m1 , ..., mn1 , v) are linearly independent
vectors, we can also write v as a linear function of h. Thus every g in F linearly
88
(A.16)
A.2.8
0
0
0
be such that the lines defined by vectors aa , bb and cc are concurrent. Thus this
kind of projective transform is a special case, known as perspective projection or
perspectivity. In fact in section (A.2.3) we saw that all homographies f : P1 P1
form a group PLG (R2 ). Thus these perspective projections also form a group
which is also a subgroup of PLG (R2 ). How such a projection forms a group can
be summed up from the below figure (A.2) where m1 m2 and m2 m3 are
homographies due to perspective projection and also is m1 m3 a perspective
projection.
A.2.9
Figure (A.3) shows a correspondence between two planes m and l due to perspective projection. In this case we can extend the result for lines to see that such a
8 Of
course as defined in section (A.2.3.) a homography is unique upto non-zero scalar multiplication
89
erence to [6], we can see that given a point o, we can fix three directions and then
find three points, one in each direction, this would determine the position of two
planes and a homography between them. With some extension we can show that
specifying a fourth point correspondence, the freedom is restricted and not every
point o can act as a center of the perspective projection. A point to note is that the
relative positions are to be calculated. We have proved a similar result in chapter
(2), where we show that not all four point mappings can be realized by a perspective projection. Thus the set of homographies of projective planes obtained
through perspective projection forms a subgroup of the set of all homographies
of projective planes. The group structure of this subgroup has been studied and
discussed in literature on projective geometry like [27, 30]. This kind of homography is extremely useful for camera calibration and pose estimation. It defines
many properties governing image formation in a pin-hole camera model.
One point to note is that, in this appendix we have looked at homographies
as invertible bijective projective transformation, as in section (A.2.3) between two
projective spaces of equal dimensions(also known as projective morphisms). But
henceforth from here we will use the term homography only for projective morphisms between projective planes as considered in this section.
91
C HAPTER B
(B.1)
where K is a 3 3 camera calibration matrix that relates points in 3D camera coordinate system to 2D image coordinate system and houses intrinsic parametes.
Further, R and t = RC are the extrinsic parameters of camera. R is the rotation matrix and t is the translation vector relating 3D world coordinate system to
3D camera coordinate system. Point C denotes the camera center in the world coordinate system and hence C = [C 1] T is one of the vectors in R4 representing
C.
P = KR I | C is the 3 4 projection matrix of the camera.
This raises an important question, can all 3 4 real matrices represent a camera
projection matrix? The answer is yes. This question leads us to two kinds of
cameras, classified based on the position of the camera center:
B.1
Finite Camera
to space restrictions we denote a point X ( a, b, c) in P( E4 ) by one of its corresponding vectors ( a, b, c, 1) in R4 . Henceforth we would use this notation for a projective point unless
specified otherwise.
92
1
M p4
In short, a finite camera is the one
M, C is the point represented as C =
1
whose center C is a finite point in 3D world coordinate system. 2
B.1.1
h
i
Assuming that we have a finite camera at hand, a camera projection matrix P = M | p4
is dissected into following elements:
1. Column points: The leftmost 3 columns of P, which are p1 , p2 , p3 represent
the images of 3 principal directions X, Y, Z of world coordinate system. And
p4 represents the image of the origin of world coordinate system. This is so
in P3 a direction is represented by a point at infinity in that direction. Hence
X direction is represented by a point represented by the vector (1, 0, 0, 0)
2. Row vectors: Denoting the rows of P as r1 , r2 , r3 , the principal plane is the
plane parallel to image plane and passing through the camera center. Hence
all points that project to points represented by ( x, y, 0) lie on this plane.
x
r1
Thus PX = r2 X = y , and hence r3 X = 0. Thus r3 is the correspond0
r3
ing row representing the principal plane. Similarly we can see that the other
two rows are the planes which project to X and Y axes of the image plane.
They are known as axis planes.
B.2
Infinite Camera
An infinite camera is the one whose center is at infinity. Using the notion of the
previous section, we say that M is a singular matrix. And hence applying the
condition PC = 0, we get the camera center as
"
C=
d 31
0
2 Can
#
.
we say that a point at infinity in 3D world coordinate system is also a point at infinity in
3D camera coordinate system?
93
B.3
Camera calibration
u0
K = 0 v0 ,
0 0 1
(B.2)
where K is the camera calibration matrix. In fact, and represent the focal length
of the camera in terms of pixel dimensions in x and y directions respectively,
represents the skew due to distorted sensors in practical cameras and, u0 and v0
are the x and y coordinates of the principal point3 in image coordinate system.
Further, R is the rotation matrix and t = RC is the translation vector.
The process of camera calibration is defined as estimating these quantities.
Further, P is a 3 4 matrix with 11 degrees of freedom4 and rank 3. Thus a knowledge of 6 point correspondences in needed to uniquely estimate P upto non-zero
scalar multiplication. In fact only 5 and a half correspondences are needed. Representing our image plane as P( E3 ) and world coordinate space as P( E4 ), we can
show that the process of imaging points in scene,P( E4 ), onto image plane, P( E3 ),
is a form of a projective transform. Hartley and Zisserman in [1] and Trucco et al.
in [14] have a detailed treatment for various ways of estimating P. Also does a
paper by Zhang, [31], outline two main kinds of methods for camera calibration:
1. Photogrammetric calibration: Here the calibration is done by specifying a
set of 3D-2D point correspondences between the world coordinate system
and the image plane. For this an elaborate setup and knowledge of the 3D
coordinates of the model object are required.
2. Self-calibration: Here more than one images are obtained using the same
camera for the scene. Different images are created by a rigid motion of the
camera in 3D space5 . These views impose certain constraints on the internal parameters of the camera and hence can help us estimate the projection
matrix without the need for an explicit calibration model.
3A
principal point is the point of intersection of the perpendicular line to image plane from the
camera center, with the image plane itself.
4 K has 5 degrees of freedom, R has 2 and C has 3, thus a total of 11.
5 It can be an euclidean or a projective space.
94
To this end, the algorithm discussed by Zhang in the same paper is a mixture of
these two methods. This method is briefly outlined below:
A regular and simple pattern containing a grid of squares is printed on paper attached to a planar surface. Then a set of images of this pattern from varying angles
is obtained. Using a corner detection algorithm, point correspondences between
the pattern and the each of the images are obtained. These correspondences are
enough to estimate the homography between the image and the calibration pattern. This homography puts some constraints on the values of the parameters of
projection matrix. Assuming that the patterns paper lies on plane defined by the
equation, z = 0, the correspondence between a point X on the object to the point
x on the image plane, is seen as
x = v = PX = A [r1 r2 r3 t]
0 = A [r1 r2 t ] Y ,
1
1
1
95
C HAPTER C
h T S1
T
h S2
TS
JX ( h ) = 2
h
3
T
h S4
h T S5 5 9
(C.1)
is a matrix of rank five for all nonzero values of vector h R9 . Where Si are nine
dimensional quadrics or real symmetric matrices defined in section (4.2).
Proof. Let us in general assume that the five vector rows are linearly dependent.
Hence we have some scalars i , i = 1, 2, ..., 5 such that 5i=1 i h T Si = 0 and not all
of i are zero. Using the definition of Si , i = 1, 2, ..., 5 we can write
h
i
h
h T S1 = h1T C2
013
p1 h3T C2 , h T S2 = 013
h2T C2
h
i
h
T
T
T
T
T
h S3 = h2 C2 /2
h1 C2 /2
p3 h3 C2 , h S4 = h3T C2 /2
h
i
h T S 5 = 01 3
h3T C2 /2
h2T C2 /2 p5 h3T C2 .
96
i
p2 h3T C2 ,
01 3
h1T C2 /2
p4 h3T C2
Thus we have
1 h1T C2 + 3 h2T C2 /2 + 4 h3T C2 /2 = 0
(C.2)
(C.3)
and
h
Figure C.1: Two series of circular cross-sections in circular cone, figure from [3].
base has two distinct series of circular sections of which one series is the one with
planes parallel to the circular base. By a series of sections, we mean a sequence
of circles of progressively changing radii.1 For a cone with an ellipse as its base,
we would have two distinct series of circular sections, none of which would be
parallel to the base. In figure (3.3) we have two circles, one through the points
D, P, E and the other through points N, P, K which are members of the two series
of circular sections we just discussed. The parallel members of each series are
omitted. This proves the fact that U1 has exactly two solutions of plane positions
for the sections to be circles, upto non-zero scalar multiple.
Lemma 19. Let us have R and t as one of the pose solutions obtained for a plane pair
(u11 , u21 ) in Rsol . Then we hypothesize that (u11 , u21 ) would give us R and t/
as one of its solutions. We assume here that is a positive non-zero scalar.
Proof. Let the respective circles of intersection of planes u11 and u21 with cones
Q1 and Q2 be C and C 0 . Following the discussion in step-3 of geometric construction, described in section (3.4), the centers in global coordinate system denoted as
xc1 and xc2 . For this setup, we have points p1 , p2 and xc1rot (= Rxc1 ) as defined
from equations (3.25) and (3.26). Then from equations (3.29), (3.30) and (3.32) we
have a pose solution R and t.
Effect of vector scaling on circle center and radius: Let us consider the matrix
representation of a circle for the conic Ccal as given in equation (3.15) with center,
h
iT
l1T Ccal l3 l2T Ccal l3 in the local coordinate system. Scaling the vector representing the plane by a factor results in scaling of l3 by 1/ and a subsequent
scaling of the center coordinates and the radius by 1/. Similarly we can argue
1 By
98
for two set of conics and their plane solutions, as given next.
Scaling the two vectors defining u11 and u21 , by , we get planes, u11 and
h
iT
u21 and the vector defining the plane u11 is m11 m12 m13 and that for
h
iT
u21 the vector is m21 m22 m23 . From the definition of quantities k11 , k12
and M1 for vector u11 and k21 , k22 and M2 for vector u21 , given by equations (3.9)
and (3.12), we can notice that they are not affected by scaling. Hence from equations (3.21) and (3.22), one can infer that scaling u11 and u21 by results in scaling
of the centers xc1 and xc2 by 1/. The local coordinate system is chosen to be an
orthonormal set of axes. Hence the radius of the circle represented in both of the
local and global coordinate systems is the same which means that the radius is
also scaled by 1/.
Then by application of lemma (2) to each of the two planes scaled versions, u11
and u21 , with conics C10 and C20 being the same, we have
xc1 = xc1 / and xc2 = xc2 /,
and the radius(which stays the same as it was for C and C 0 ) of these two scaled
circles is
r1 = r1 /.
Further do we see that xc1rot = xc1rot /. From equations (3.25) and (3.26) we have
p1 = p/ and p2 = p2 /. Applying equation (3.29),
x p ( xc1 p1 )
A = c1 1
2
xc1rot p2 ( xc1rot p2 )
and B =
.
Then the pose solution to this pair of scaled planes, is obtained through application of equations (3.30) and (3.32) as
R = R,
t = t/.
99
(C.4)