Académique Documents
Professionnel Documents
Culture Documents
Abstract— This paper formulates a new pipeline for auto- such as 3D lidar to camera calibration [3], [4], [5], [6],
mated extrinsic calibration of multi-sensor mobile platforms. camera to IMU calibration [7], [8] and odometry based
The new method can operate on any combination of cameras, calibration [9]. While these systems can in theory be effec-
navigation sensors and 3D lidars. Current methods for extrinsic
calibration are either based on special markers and/or che- tively used for calibration, they present drawbacks that limit
querboards, or they require a precise parameters initialisation their application in many practical scenarios. First, they are
for the calibration to converge. These two limitations prevent generally restricted to a single type of sensor pair, second
them from being fully automatic. The method presented in their success will depend on the quality of the initialisation
this paper removes these restrictions. By combining information provided by the user, and third, approaches relying exclu-
extracted from both, platform’s motion estimates and external
observations, our approach eliminates the need for special sively on external sensed data require overlapping fields
markers and also removes the need for manual initialisation. A of view [10]. In addition, the majority of the methods do
third advantage is that the motion-based automatic initialisation not provide an indication of the accuracy of the resulting
does not require overlapping field of view between sensors. The calibration, a key for consistent data fusion and to prevent
paper also provides a method to estimate the accuracy of the failures in the calibration.
resulting calibration. We illustrate the generalisation of our
approach and validate its performance by showing results with In this paper we propose an approach for automated extrin-
two contrasting datasets. The first dataset was collected in a sic calibration of any number and combination of cameras,
city with a car platform, and the second one was collected in 3D lidar and IMU sensors present on a mobile platform.
a tree-crop farm with a Segway platform. Without initialisation, we obtain estimates for the extrinsics
of all the available sensors by combining individual motions
I. I NTRODUCTION
estimates provided by each of them. This part of the process
Autonomous and remote control platforms that utilise a is based on recent ideas for markerless hand-eye calibration
large range of sensors are steadily becoming more common. using structure-from-motion techniques [11], [12]. We will
Sensing redundancy is a pre-requisite for their robust op- show that our motion-based approach is complementary to
eration. For the sensor observations to be combined, each previous approaches, providing results that can be used to
sensors position, orientation and coordinate system must be initialise methods relying on external data and therefore
known. Automated calibration of multi-modal sensors is a requiring initialisation [3], [4], [5], [6]. We also show how
non trivial problem due to the different type of information to obtain an estimate of uncertainty for the calibrations
collected by the sensors. This is aggravated by the fact that parameters, so the system can make an informed decision
in many applications different sensors will have minimum or about which sensors may require further refinement. All of
zero overlapping field of view. Physically measuring sensor the source code used to generate these results is publicly
positions is difficult and impractical due to the sensors’ available at [13].
housing and the high accuracy required to give a precise The key contributions of our paper are:
alignment, especially in long-range sensing outdoor applica-
tions. For mobile robots, working on topologically variable • A motion-based calibration method that utilises all of
environments, as found in agriculture or mining, this can the sensors motion and their associated uncertainty.
result in significantly degraded calibration after as little as • The use of a motion-based initialisation to constrain
a few hours of operation. Therefore, to achieve long-term existing lidar-camera techniques.
autonomy at the service of non-expert users, calibration • The introduction of a new motion based lidar-camera
parameters have to be determined efficiently and dynamically metric.
updated over the lifetime of the robot. • A calibration method that gives an estimate of the
Traditionally, sensors were manually calibrated by either uncertainty of the resulting system.
placing markers in the scene or by hand labelling control • Evaluation in two different environments with two dif-
points in the sensor outputs [1], [2]. Approaches based on ferent platforms.
these techniques are impractical as they are slow, labour Our method can assist non-expert users in calibrating a
intensive and usually require a user with some level of system as it does not require an initial guess to the sensors
technical knowledge. position, special markers, user interaction, knowledge of
In more recent years, automated methods for calibrating coordinate systems or system specific tuning parameters. It
pairs of sensors have appeared. These include configurations can also provide calibration results from data obtained during
4844
IV. M ETHOD
ALGORITHM 1
A. Finding the sensor motion
1. Given n sensor readings from m sensors The approach utilised to calculate the transformations
2. Convert sensor readings into sensor motion between consecutive frames depends on the sensor type. In
i,k−1 our pipeline, three different approaches were included to
Ti,k . Each transformation has the associated
i,k−1 2
variance (σi,k ) work with 3D lidar sensors, navigation sensors and image
sensors.
3. Set one of the lidar or nav sensors to be the 1) 3D lidar sensors: 3D lidar sensors use one or more
base sensor B laser range finders to generate a 3D map of the surrounding
i,k−1
4. Convert Ri,k to angle-axis form Ai,k−1 environment. To calculate the transform from one sensor scan
i,k
to the next the iterative closest point (ICP) [16] algorithm is
For i = 1,...,m used. A point to plane variant of ICP is used and the identity
5. Find a coarse estimate for RiB by using the matrix is used for the initial guess as to the transformation.
Kabsch algorthim to solve To estimate the covariance the data is bootstrapped and the
AB,k−1
B,k
i
= RB Ai,k−1
i,k weighting the elements ICP matching is run 100 times. In our implementation to
by allow for shorter run times the bootstrapped data was also
i,k−1 2 B,k−1 2 −0.5
Wk = (max((σi,k ) ) + max((σB,k ) )) subsampled to 5000 points.
6. Label all sensor readings over 3σ from the 2) Navigation sensors: This category includes sensors
solution outliers that provide motion estimates, such as IMU’s and wheel
end odometry. These require little processing to convert their
standard outputs into a transformation between two positions
7. Find estimate for RiB by minimising and the covariance is usually provided as an output in the
m P m P n q
P T
Rerrijk σ 2 Rerrijk data stream.
i=1 j=i k=2 3) Imaging sensors: This group covers sensors that pro-
duce a 2D image of the world such as RGB and IR cameras.
Rerrijk = (Aj,k−1
j,k − Rji Ai,k−1
i,k )
The set of transforms that describe the movement of the
j,k−1 2 i,k−1 2 sensors are calculated, up to scale ambiguity using a standard
σ 2 = (σj,k ) + Rji (σi,k ) (Rji )T
using the coarse estimate as an initial guess in the structure from motion approach. To get an estimate of the
optimisation. covariance of this method we take the points the method uses
to estimate the fundamental matrix and bootstrap them 100
For i = 1,...,m times. The resulting sets of points are used to re-estimate
8. Find a coarse estimate for tBi by solving this matrix and the subsequent transformation estimate.
i,k−1 B B,k−1
tB
i = (R i,k − I)−1
(R t
i B,k − ti,k−1
i,k ) 4) Accounting for timing offset: As a typical sensor array
and weighting the elements by can be formed by a range of sensors running asynchronously,
i,k−1 2 B,k−1 2 −0.5
Wk = (max((σi,k ) ) + max((σB,k ) )) the times at which readings occur can differ significantly
end [17]. To account for this each of the sensor’s transforms are
9. Find estimate for tB interpolated at the times when the slowest updating sensor
i by minimising
Pm P m P n q
T
obtained readings. For the translation linear interpolation is
T errijk σ 2 T errijk used and for the rotation Slerp (spherical linear interpola-
i=1 j=i k=2
tion) is used.
i,k−1
T errijk = (Ri,k − I)tji + ti,k−1
i,k − Rij tj,k−1
j,k
using the coarse estimate as an initial guess in the B. Estimating the inter-sensor rotations
optimisation. Given that all of the sensors are rigidly mounted the
10. Bootstrap sensor readings and re-estimate RiB rotation of any two sensors here labelled A and B can be
and tB B 2 given by the following Equation
i to give variance estimate (σi )
11. Find all sensors that have overlapping field of A B,k−1 A,k−1 A
RB RB,k = RA,k RB (2)
view
In our implementation an angle axis representation of the
12. Use sensor specific metrics to estimate the sensor rotations is used, this simplifies Equation 2 as if
transformation between sensors with overlapping AA,k−1 is our rotation vector then the sensor rotations are
A,k
field of view related by
13. Combine results to give final RiB , tB
i and
(σiB )2 . AB,k−1
B,k
A A,k−1
= RB AA,k (3)
We utilise this equation to generate a coarse estimate for
the system and reject outlier points. This is done as follows.
4845
First, one of the sensors is arbitrarily designated the can be calculated. It is found using a method similar to that
base sensor. Next a slightly modified version of the Kabsch of the rotation. By starting with
algorithm is used to find the transformation between this
sensor and each other sensor. The Kabsch algorithm is B,k−1 A A,k−1
tA
B = (RB,k − I)−1 (RB tA,k − tB,k−1
B,k ) (7)
an approach that calculates the rotation matrix between
two vectors providing the smallest least squared error. The The terms can be rearranged and combined with informa-
Kabsch algorithm we used had been modified to give a non- tion from other timesteps to give the Equation 8
equal weighting to each sensor reading. The weight assigned
to the readings at each timestep is given by Equation 4
B,k−1
A A,k−1
tA,k − tB,k−1
RB,k − I RB B,k
B,k A A,k
tA,k+1 − tB,k
A,k−1 2 B,k−1 2 −0.5
Wk = (max((σA,k ) ) + max((σB,k ) )) (4) RB,k+1 − I A RB
B,k+1 tB = A A,k+1 B,k+1
(8)
where σ 2 is the variance. The taking of the maximum values RB,k+2 − I RB tA,k+2 − tB,k+1
B,k+2
makes the variance independent of the rotation and allows it ... ...
to be reduced to a single number.
Next an outlier rejection step is applied where all points The only unknown here is tA B . The translation estimate
that are over three standard deviations away from where the depends significantly on the accuracy of the rotation matrix
rotation solution obtained would place them are rejected. estimations and is the most sensitive parameter to noise. This
With an initial guess to the solution calculated and the B,k−1
sensitivity comes from the RB,k − I terms that make up
outliers removed we move onto the next step of the process the first matrix. As for our application (ground platforms) in
where all of the sensor rotations are optimised simultane- B,k−1
most instances RB,k ≈ I leading to a matrix that ≈ 0.
ously. Dividing by this matrix reduces the SNR and therefore de-
This stage of the estimation was inspired by how SLAM grades the estimation. Consequently the translation estimates
approaches combine pose information. However, it differs are generally less accurate than the rotation estimates.
slightly as in SLAM each pose is generally only connected This formulation must be slightly modified to be used with
to a few other poses in a sparse graph structure, where cameras, as they only provide an estimate of their position
in our application, every scan of every sensor contributes up to a scale ambiguity. To correct for this when a camera
to the estimation of the systems extrinsic calibration and is used the scale at each frame is simultaneously estimated.
therefore the graph is fully connected. This dense structure A rough estimate for the translation is found by utilising
of the problem prevents many of the sparse SLAM solutions Equation 8, which provides translation estimates of each
from being used. In order to keep our problem tractable we sensor with respect to the base sensor. The elements of this
only directly compare sensor transforms made at the same equation are first weighted by their variance using Equation
timestep. 4.
The initial estimate is used to form a rotation matrix that Once this has been performed the overall translation is
gives the rotation between any two of the sensors. Using this found by minimising the following error functions
rotation matrix the variance for the sensors is found by 5.
This is used to find the error for the pair of sensors as is q
shown in Equation 6. T ransError = T σ 2 T err
T errijk ijk
(9)
i,k−1 i,k−1 B,k−1
B,k−1 2
σ 2 = (σB,k A
) + RB A,k−1 2
(σA,k A T
) (RB ) (5) T errijk = (Ri,k − I)tB
i + ti,k − RiB tB,k
4846
E. Refining the inter-sensor transformations We do this by first using the transformations to the base
One of the key benefits introduced with our approach sensor to generate an initial guess. We then treat each of
is removing the restriction of the initialisation needed for the transforms obtained as an input corrupted by normally
calibration methods based on external observations. Our distributed noise using the estimated variance. From this
motion-based calibration described above is complementary we find a consistent transformation system that maximised
to these methods. The estimates provided by our approach the pdf of the observed transforms. We use Nelder-Mead
can be utilised to initialise convex calibration approaches [4], Simplex optimisation to optimise the system. As the camera-
and also reduce the search space of non-convex calibration camera transforms contain scale ambiguity only the rotation
approaches such as [3], [15], rendering a fully automated portion of these transforms is considered.
calibration pipeline. We will call this second stage the V. R ESULTS
refinement process.
For cameras and 3D lidar scanners methods such as A. Experimental Setup
Levinson’s [3] or GOM [15] can be utilised for the calibra- The method was evaluated using data collected with two
tion refinement. The calculation of the gradients for GOM different platforms, in two different type of environments:
is slightly modified in this implementation (calculated by (i) the KITTI dataset which consists of a car moving in an
projecting the lidar onto an image plane) to better handle the urban environment and (ii) the ACFR’s sensor vehicle known
sparse nature of the velodyne lidar point cloud. Given that as “Shrimp” moving in a farm.
the transformation of the velodyne sensors position between 1) KITTI Dataset Car: The KITTI dataset is a well known
timesteps is known from the motion-base calibration, we publicly available dataset obtained from a sensor vehicle
introduce a third method for refinement. This method works driving in the city of Karlsruhe in Germany [18]. The sensor
by assuming that the points in the world will appear the same vehicle is equipped with two sets of stereo cameras, a
colour to a camera in two successive frames. It operates by Velodyne HDL-64E and a GPS/IMU system. This system
first projecting the velodyne points onto an image at timestep was chosen for calibration due to the ease of availability and
k and finding the colour of the points at these positions. The the excellent ground truth available due to the recalibration
same scan is then projected into the image taken at timestep of its sensors before every drive. On the KITTI vehicle some
k+1 and the mean difference between the colour of these sensor synchronisation occurs as the velodyne is used to
points and the previous points is taken. It is assumed that this trigger the cameras, with both sets of sensors operating at
value is minimised when the sensors are perfectly calibrated. 10 Hz. The GPS/IMU system gives readings at 100 Hz.
All three of the velodyne-camera alignment methods de- All of the results presented here were tested using drive
scribed above give highly non-convex functions that are 27 of the dataset. In this dataset the car drives through
challenging to optimise. Given the large search space, these a residential neighbourhood. In this test while the KITTI
techniques are always constrained to search a small area vehicle has an RTK GPS it frequently losses its connection
around an initial guess. However in our case we have an esti- resulting in jumps in the GPS location. The GPS and IMU
mated variance, rather than a strict region in the search space. data are also preprocessed and fused with no access to the
Because of this in our implementation we make use of the raw readings. Drive 27 was selected as it is one of the longest
CMA-ES Covariance Matrix Adaptation Evolution Strategy of all the drives provided in the KITTI dataset giving 4000
optimisation technique. This technique randomly samples the consecutive frames of information.
search space using a multivariate normal distribution that is 2) ACFR’s Shrimp: Shrimp is a general purpose sensor
constantly updated. This optimisation strategy works well vehicle used by the ACFR to gather data for a wide range of
with our approach as the initial normal distribution required application. Due to this it is equipped with an exceptionally
can be set using the variance from the estimation. This means large array of sensors. For our evaluation we used the
that the optimiser only has to search a small portion of the ladybug panospheric camera and velodyne HDL-64E lidar.
search space and can rapidly converge to the correct solution. On this system the cameras operates at 8.3 Hz and the lidar at
To give an indication of the variance of this method a local 20 Hz. The dataset used is a two minute loop driven around
optimiser is used to optimise the values of each scan used a shed in an almond field. This proves a very challenging
individually starting at the solution the CMA-ES technique dataset for calibration as both the ICP used in matching
provides. lidar scans and the SURF matching used on the cameras
frequently mismatch points on the almond trees leading to
F. Combining the refined results sensor transform estimates that are significantly noisier than
The pairwise transformations calculated in the refinement those obtained with the KITTI dataset.
step will not represent a consistent solution to the transforma-
tion of the sensors. That is, if there are three sensors A, B and B. Aligning Two Sensors
C then TBA TCB 6= TCA . The camera to camera transformations The first experiment run was the calibration of the KITTI
also contain scale ambiguity. To correct for this and find a car’s lidar and IMU. To highlight the importance of an
consistent solution the transformations are combined. This is uncertainty estimate the results were also compared with
done by using the calculated parameters to find a consistent a least squares approach that does not make use of the
transformation that has the highest probability of occurring. readings variance estimates. In the experiment a set of
4847
1
least squared method, this is most likely due to the RTK
Roll
GPS frequently losing signal, producing significant variation
0.5
in the sensors translation accuracy jumping position during
0
the dataset.
100 200 300 400 500 600 700 800 900 1000
The accuracy of the predicted variance of the result is
1
more difficult to assess than that of the calibration. However
Pitch
0.5 all of the errors were within a range of around 1.5 standard
deviations from the actual error. This demonstrates that the
0
100 200 300 400 500 600 700 800 900 1000 predicted variance provides a consistent indication of the
1
estimates accuracy.
Yaw
0.5
C. Refining Camera-Velodyne Calibration
0
100 200 300 400 500 600 700 800 900 1000
The second experiment evaluates the refinement stage by
Number of Sensor Readings
using the KITTI vehicles velodyne and camera to refine
the alignment. In this experiment 500 scans were randomly
1
selected and used to find an estimate of the calibration
0.5 between the velodyne and leftmost camera on the KITTI
X
4848
2
10
1.5
Roll 8
Roll
1 6
4
0.5
2
0
2 _ __ ___ ____ _ __
12
1.5 10
Pitch
Pitch
8
1
6
0.5 4
2
0
2 _ __ ___ ____ _ __
15
1.5
Yaw
10
Yaw
0.5 5
0 0
Starting Levinson GOM Colour Combined Seperate
1 2
1.5
X
0.5
X
0.5
0
0 _ __
1 _ __ ___ ____
6
Y
0.5
Y
0
0 _ __
1 _ __ ___ ____
6
4
Z
0.5
Z
0
0 Combined Seperate
Starting Levinson GOM Colour
Fig. 3. Box plot of the error in rotation in degrees and translation in metres Fig. 4. Box plot of the error in rotation in degrees and translation in metres
for the three refinement approaches. for combining sensor readings and performing separate optimisations.
4849
1.5 proach provides an estimate of the variance of the calibration.
Roll 1 The method was validated using scans from two separate
0.5
mobile platforms.
0 ACKNOWLEDGEMENTS
_ __ ___ ____ _____
1
This work has been supported by the Rio Tinto Centre
for Mine Automation and the Australian Centre for Field
Pitch
0.5
Robotics, University of Sydney.
0
_ __ ___ ____ _____ R EFERENCES
1 [1] A. Geiger, F. Moosmann, O. Car, and B. Schuster, “Automatic camera
and range sensor calibration using a single shot,” 2012 IEEE Interna-
Yaw
4850