Robust Cooperative Visual Localization With Experimental Validation For Unmanned Aerial Vehicles

Original Article
Robust cooperative visual localization

with experimental validation for
unmanned aerial vehicles
Proc IMechE Part G:

J Aerospace Engineering
0(0) 119
! IMechE 2012
Reprints and permissions:
sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/0954410012466006
uk.sagepub.com/jaero
Abdelkrim Nemra and Nabil Aouf
Abstract
This article aims to present an adaptive and robust cooperative visual localization solution based on stereo vision
systems. With the proposed solution, a group of unmanned vehicles, either aerial or ground will be able to construct
a large reliable map and localize themselves precisely in this map without any user intervention. For this cooperative
localization and mapping problem, a robust nonlinear H1 filter is adapted to ensure robust pose estimation. In addition,
a robust approach for feature extraction and matching based on an adaptive scale invariant feature transform stereo
constrained algorithm is implemented to build a large consistent map. Finally, a validation of the solution proposed is
presented and discussed using simulation and experimental data.
Keywords
Visual SLAM, data fusion, feature extraction, cooperation
Date received: 2 May 2012; accepted: 25 September 2012
Introduction
Self-localization of unmanned vehicles (UV) is still
considered as a fundamental problem in autonomous
vehicle navigation. The problem has been tackled in
the recent past for dierent platforms and a number
of ecient techniques were proposed as presented in
Cox and Wilfong,1 Toledo-Moreo et al.,2 Leonard
and Durrant-Whyte,3 Toth4 and references there in.
More recently, driven by the interest of military applications and planetary explorations, researchers have
turned their attention towards localization of vehicles
moving in hostile and unknown environments.5,6 In
these cases, a map of the environment is not available
and hence a more complex simultaneous localization
and mapping (SLAM) problem must be faced and
solved. The vehicle used in these above-mentioned
applications, ranging from aerial to ground vehicles,
has to collect information from the environment
through its sensors and at the same time it must localize itself with respect to the map it is building.
Extensive research works, employing extended
Kalman lter (EKF) and particle lters,7,8 have been
reported in the literature to address several aspects of
vehicle SLAM problem.9 However most of these
research works concern the case of a single vehicle
exploring an environment through landmark
detection.
In many applications as related to defence industry, a single sensing platform may not be sucient to
precisely collect data or to create maps of an

unknown or partially known environment. Currently
and more in the future, there are a number of applications under development in which distributed sensing systems are deployed to gather information about
remotely monitoring environments. Fleets of autonomous underwater vehicles (AUVs),10 unmanned aerial
vehicles (UAVs)11 and unmanned ground vehicles
(UGV)12 have been lately proposed for applications
ranging from environmental monitoring through surveillance to defence. These systems require the ability
to share information across a wide variety of platforms and to fuse information from dierent sources
into a consistent scene view.13 Deploying multiple
vehicles into an environment and providing them
with a mechanism for sharing information can provide higher data rates, increases robustness, and minimizes systems failure.
Autonomous navigation of multiple vehicles has
introduced the problem of cooperative simultaneous
localization and mapping (C-SLAM) that has very
recently attracted the interest of many researchers.

2
C-SLAM is performed when multiple vehicles share
navigation and perception sensing information in
order to improve their own position estimates
beyond what is possible with a single vehicle. Simple
collective navigation has been demonstrated in simulation using multiple cartographer vehicles that randomly explore the environment.14 Sty15 performs
simple relative localization between collaborators
using directional beacons. Other research work presented,16,17 challenges in terms of localization performances for C-SLAM and its growing
uncertainties. Simulation and experimental validation
were conducted in this later work to support the presented analysis. In addition, techniques based on
entropy minimization,18 and information theory,19
were also developed in the past for C-SLAM problem.
To date, typical SLAM approaches have been
using laser range sensors to build maps in two and
three dimensions.2022 Recently, the interest of using
cameras as main sensors in SLAM has increased and
some authors have been concentrating on building 3D
maps using visual information obtained from cameras
and more specically stereo cameras.23,24 These
approaches are usually denoted as visual SLAM.
The reasons for this interest stem from:
i.
Stereo vision systems are typically less expensive

than laser range systems.
ii. Vision systems provide a great quantity of information and allow us to integrate in the robot
other techniques, such as face or object
recognition.
iii. Most laser based systems provide 2D information
from the environment whereas stereo vision systems are able to provide a more complete 3D
representation of the space.
Up to date a little eort only has been done in the
eld of multi-vehicle visual SLAM, named as
cooperative visual SLAM (C-VSLAM), which considers the case where several vehicles move within
their environment and build their map cooperatively.25,26 This challenging visual absolute localization/mapping problem is dierent from research
carried on recently to achieve relative navigation in
multi-vehicle systems.27 In this article, we concentrate
on this C-VSLAM problem and propose a solution
that allows us to build a visual map using a set of
reliable and a priori unknown visual observations
obtained by a team of unmanned vehicles.
Some solutions for C-SLAM,28 and more specically C-VSLAM,29,30 are based on EKF. However, this
latter is very sensitive to outliers and the lower bound
for the map accuracy, as presented in Dissanayake
et al.,31 is violated due to errors introduced during
EKF linearization process producing inconsistent estimates.31,32 Achieving C-SLAM,33,34 and more specifically C-VSLAM,26,30 based on particle lter
estimation scheme has a number of drawbacks related
Proc IMechE Part G: J Aerospace Engineering 0(0)

to the computation time that make it not very suitable
for hard real time applications as in airborne navigation. Although some very recent progress in proposed
lters trying to approximate the nonlinear SLAM
problem,35,36 and lately C-SLAM,37 by means of
what is called square root information smoothing
technique, issues related to ecient retrieval of marginal covariance and results obtained through few
data sets show that there is still a way to go to
bring in such techniques to meet real time aerospace
visual navigation requirements.
In this article, based on a robust nonlinear H1
(NH1) sensor fusion algorithm, we propose to
solve the C-VSLAM problem for UVs navigating in
unknown natural environment, under realistic conditions and with experimental validations. Eective
stereo observation model and a map management
approaches are also proposed to ensure a reduced
computation time as to maintain a suitable correlation between features, which is very important in
C-VSLAM loop closing detection. The proposed
map management is suitable for practical and real
time considerations for the studied vehicles
C-VSLAM problem since that the size of the ltered
system state vector is kept of reasonable size.
The outline of this article is as follows: Single
visual SLAM section develops the single UAV
VSLAM process and observation model based on
stereo camera system. In section Cooperative
VSLAM, the general architecture of the cooperative
UAV VSLAM is presented. The advantage of using
multiple UAV VSLAM with the loop closure concept
is introduced in section Feature observation and
matching. In section Simulation results, simulation
results using realistic scenarios are presented and a
comparison between a single UAV and multiple
UAV VSLAM is made. Finally, in Experimental
results section experimental results of the
C-VSLAM are presented and compared with the
single VSLAM.
Single visual SLAM

Authors introduced single VSLAM algorithm for an
aerial unmanned vehicle, as shown in Ergin Ozkucur
and Levent Akn,30 using EKF and NH1 lter as in
Nemra and Aouf.32,38 In both algorithms, map feature locations, vehicles position, velocity and attitude
are estimated using relative observations between the
vehicle and each feature obtained using stereo vision
system. In this article, we adopt the NH1 estimation
technique to solve the VSLAM problem rather than
EKF estimation technique. The former presents more
robustness and consistency compared to the EKFbased VSLAM solution.7
In this section, we briey cover the basics of the
NH1-based VSLAM algorithm for a single aerial
vehicle. More details can be found in Nemra and
Aouf.32
Nemra and Aouf
State vector
The estimated state vector xk at times k contains the
3D vehicle position (pn ), velocity (vb ) and Euler angles
(n , , ) and the N 3D feature locations (mni ) in
the environment, i 1, . . . , N. In our work the features are the keypoint detected by the scale invariant
feature transform (SIFT) algorithm and represent the
minima and maxima of the dierence of Gauss
image.40 The superscript n indicates the vector is referenced in the navigation frame, and b indicates the
vector is referenced in the body frame. The body
frame is the same as the vehicle frame. The navigation
frame is xed with respect to the surface of the Earth
and is placed arbitrarily within the operating area of
the vehicle, with x-axis pointing north, y-axis pointing
east and z-axis pointing down towards the centre of
the Earth.
transformation matrix from body to navigation

frame respectively Figure 1(a). More details about
the process model are given in Nemra and Aouf32
and Ronnback.39 Feature locations are assumed to
be stationary in the navigation frame and thus the
process model for the position of the jth feature is
mnj,k mnj,k1
The problem with the navigational solution provided by inertial navigation system (INS)
(Figure 1(b)), is that it drifts with time as in most
other dead reckoning systems. The drift rate of the
inertial position is typically a cubic function of
time. Small errors in gyros will be accumulated
in angle estimates, which in turn misrepresent gravitational acceleration as the vehicle acceleration
and results in quadratic velocity and cubic position
errors.
Process model
The state estimate xk at times k is predicted forward in
time by integrating inertial measurement unit (IMU)
readings using an inertial frame mechanisation of the
inertial navigation equations as
3
3 2
pnk1 Cnb vnk t
pnk

4 vbk 5 4 vbk1 fbk vbk1 wbk CTnb gn t 5
nk
bk1 Enb wb t
2
1
where fb and wb are the body-frame referenced vehicle
accelerations and rotation rates from the IMU and gn
is the acceleration due to gravity. Cnb and Enb are
the direction cosine matrix and rotation rate
Nonlinear H1 filter for VSLAM problem

Many researchers have studied NH1 optimal estimator such as, Shaked and Berman,43 Petersen and
Savkin,44 Basar and Bernhard45 and Einicke and
White.46 Our H1 ltering procedure uses a similar
procedure as in Einicke and White.46 The NH1
lter attempts to estimate the states given in equation
(3) while satisfying the H1 performance criterion
with respect to bounded in norm uncertainties i
(the higher order linearization terms should be norm
bounded as ji j4i ).32
The goal of the H1 lter is to bound the energy of
the input noise by some 2 , where is the worst case
or the least favorable noise, wk and vk .32
Figure 1. UAV with IMU and stereo cameras. (a) Different frames for the UAV system and (b) INS architecture.
UAV: unmanned aerial vehicle; IMU: inertial measurement unit.
The estimation process uses the process and observation models in a nonlinear lter to estimate the state
xk from observations yk .
These inputs must satisfy these norm bounds

2
Y

2

11
421 x~ k=k 2 22 kwk k22
k
2
xk fxk1 , uk1 gxk1 wk1

yk hxk vk
f is the discrete version of equation (1) (in addition

to elements of the landmarks states), g is a nonlinear
function, xk is the state at time step k, wk is some
additive process noises, yk is the observation made
at time k, vk is some additive observation noises.
The objective of the ltering technique is to estimate
xk using available observations yk.
The non-linear aerial vehicle model and observation model can be expanded about the ltered and
predicted estimates of x^ k and x^ k1 as
xk fx^ k1=k1 , uk rfk xxk x^ k=k 1 xk x^ k=k

rfw x 2 xk x^ k=k wk
4

yk h x^ k=k1 , uk rhk x xk x^ k=k1

3 xk x^ k=k1 vk
where rfk x is the Jacobian of f evaluated at xk1 ,

rfw x the Jacobian of f=wk evaluated at xk1 and
rhk x is the Jacobian of h evaluated at xk1 and i
represent higher order terms of the Taylor series
expansion. These higher order terms are norm
bounded as: ki k4i .
The state and observation model may be rewritten
as

xk1 Fk xk k wk k 1 x~ k=k 2 x~ k=k wk

6
yk Hk xk vk k 3 x~ k=k1
with
and

2
kk k22 423 x~ k=k 2
12
Instead of solving for the nonlinear process and

observation models, which contain the extra terms
k and k that are not used in the EKF formulation,
the following scaled H 1 problem is considered
x^ k1 Fk x^ k k cv wk k
13
yk Hk xk cw vk k
14

1
where c2w 1 2 21 2 23 and c2v c2w 1 21 .
This nal form, equations (13) and (14), results to
the same structure of the EKF7 except that the error
covariance correction of the linear H 1 lter is used
with the noise process wk and vk scaled by cw and cv .
Nonlinear H1 filter
The nal NH1 formulation is then written in a
Predictor-Corrector scheme:
Initialization:
Q^ k c2w Qk
R^ k c2v Rk
Predictor:

x^ k=k1 f x^ k1=k1 , uk1 , 0
Pk=k1 Fk1 Pk1=k1 FTk1 k1 Q^ k Tk1
Corrector:

Fk rfk x^ k=k , k rfw x^ k=k ,

Hk rhk x^ k=k1 , k f x^ k=k Fk x^ k=k , and

k h x^ k=k1 Hk x^ k=k1
x^ k1 Fk x^ k k wk k k
yk Hk xk vk k
x^ k=k x^ k=k1 Kk yk Hk xk=k1
7
8
Kk Pk=k1 Hk Hk Pk=k1 HTk R^ k 1

"
#
Pk=k1 HTk
Pk=k1 2 I
k=k
Hk Pk=k1 Hk Pk=k1 HTk R^ k
"
#
I

Pk=k Pk=k1 Pk=k1 I HTk k=k
Pk=k1
HTk
where

k 1 x~ k=k 2 x~ k=k wk
and
X
k

3 x~ k=k1
10
Cooperative VSLAM
In the multi-vehicle VSLAM problem proposed in
this study, the estimated state vector becomes the position, velocity and attitude of the multiple vehicles and
the positions of point features in the environment. In
this section, we extend the single VSLAM, presented
above, to the multiple vehicle case.
Nemra and Aouf
The state vector and Jacobian matrices are given by

State vector
2
Xuav1,k
6
..
6
6 X .
6 uavM,k
6 n
m
Xk 6
6 mn1,k
6 2,k
6
6 ..
4.
mnN,k
3
7
7
7
7
7
7
7
7
7
7
5
Jacobian matrix of f w.r.t

2
XFsys
Fuav1,k
6
6 099
6
6 ..
4 .
03N9
099
..
.
099
099
FuavM,k
03N9
3
093N
.. 7
. 7
7
7
093N 5
I3N3N
Jacobian matrix of f w.r.t wk

2
sys
uav1
6 0
96
6
4
096
03N6
096
..
.
096
03N6
096
Jacobian matrix of f w.r.t vk Vsys I44

Figure 2 shows a diagram of our cooperative
VSLAM concept. The essential step in this architecture is the Observation model as a good cooperation
is obtained when the shared region among the UAVs
is large.
To detail our C-VSLAM concept, assume we
have N UAVs navigating in outdoor environment,
and Mi is the number of features observed by the
ith UAV at time t k. The C-VSLAM algorithm
will run centrally at the ground station and communicates the position and the map to each UAV as
follows:
At t 0 the UAVs positions, velocities and
orientations are known as well as the covariance
matrix. During navigation, each UAV observes a
set of features that can be divided into three types
(Figure 3):
Type 1:
Type 2:
096 7
7
5
uavM
03N6
Jacobian matrix of h w.r.t X Hsys 049i1 Huav i

049Mi 049j1 Hfeature j 049Nj for the ith
vehicle and the jth feature.
Figure 2. Cooperative VSLAM architecture.

VSLAM: visual simultaneous localization and mapping.
Type 3:
feature re-observed (has been observed

by the same UAVi).
feature re-observed (has been observed
by other UAVj and j 6 i).
new feature observed for the first time.
Features of Type 1 and Type 2 will be used to

update the map and the UAVs states, where features
of Type 3 will be initialized and added to the map
using the inverse model of observation.31 When a
new feature is observed by more than one UAV
then it will be initialized more accurately (red feature
in Figure 3).
C-VSLAM based on NH? lter is summarized

below:
Feature observation and matching

Computer vision becomes vital for automatic perception of the environment especially for mapping in
autonomous vehicle applications. Stereo system is
an attractive source of information for machine perception because it leads to direct range measurements,
and unlike monocular approaches, does not merely
infer depth or orientation through the use of photometric and statistical assumptions.
The observation model proposed in this paper is
based on stereo vision cameras and is detailed in
Nemra and Aouf.32 The observation model, linking
the perceived visual landmarks to the SLAM state
vector is given by
8
n
c1 n
c1 n
c1
mc1
>
11 xmi m12 ymi m13 zmi m14
>
u
1
>
c1
n
c1
n
c1
n
>
m31 xmi m32 ymi m33 zmi mc1
>
34
>
>
>
c1 n
c1 n
c1 n
c1
>
m
x
m
y
m
z
m
>
mi
mi
mi
21
22
23
24
>
>
< v1 mc1 xn mc1 yn mc1 zn mc1
31 mi
32 mi
33 mi
34
19
>
mc2
xnmi mc2
ynmi mc2
znmi mc2
>
11
12
13
14
>
u
>
2
c2 n
c2 n
c2
n
>
mc2
>
31 xmi m32 ymi m33 zmi m34
>
>
>
n
c2 n
c2
>
mc2 xnmi mc2
>
22 ymi m23 zmi m24
>
: v2 21
c2 n
c2 n
c2 n
m31 xmi m32 ymi m33 zmi mc2
34
where xnmi ynmi znmi T are the coordinates of the landmark mi in the navigation frame (north east down,
c2
NED). mc1
ij and mij are the projection matrix components of camera right and camera left, respectively.32
Adaptive SIFT
Despite SIFT algorithm44 has many advantages; the
detected keypoints are still not ecient especially for
visual SLAM problem. SIFT algorithm uses a constant factor k between the scales, however a big
value of k implies few keypoints, which is a disadvantage for the VSLAM problem especially if these keypoints are unstable. Also small values of k implies
detection of lot of keypoints, which is a problem for
VSALM too. Clearly it is not evident to nd the suitable value of k for all kind of images.
Lowe44 proposes an adequate value of k 1:6,
which is ecient only with images with appropriate
amount of texture. For example, in low frequency
images (seas, deserts. . .) only few features are detected.
On the other hand, in high frequency images (spatial,
building. . .) SIFT algorithm leads to lots of features,
which are not suitable for VSLAM. As a solution to
this problem, an adaptive scale representation with an
Nemra and Aouf
Figure 3. Features in C-VSLAM algorithm.

C-VSLAM: cooperative visual simultaneous localization and mapping.
adaptive factor k was introduced to propose our adaptive scale invariant feature transform (ASIFT)32,44 that
will be used in this work. The scale factor is estimated
based on the energy of the DoG image.
Stereo vision constraints

In the last decade, a lot of computer vision work such
as stereo-ego motion and image mosaic are proposed
based on the extraction and matching of invariant
features in stereo images or successive images.45
Suppose pi and pj are two keypoints in image i and
image j, with their descriptors Di and Dj , respectively.
Then, the distance between the two descriptors is
given by the Euclidian distance:

q

T

Distancei, j Di Dj 2 Di Dj : Di Dj
It is proposed that pk is the correspondent of pi if
and only if:
8
Dmin 1 Distancei, k minDistancei, j with
>
>
>
>
< j 1 M the lowest Distance
Dmin 2 minDistancei, j with j 1 M
>
>
and j 6 k the 2nd lowest Distance
>
>
:
Dmin 1 5 DistRatio Dmin 2
M: keypoint number in the second image.

DistRatio: is a factor 2 0, 1 used to avoid the
problem of similar descriptors.
For example if DistRatio 1 this means that any
keypoint pi have a correspondent pk even if Dmin 1
Dmin 2 (the descriptor of the two keypoints are almost
similar). This kind of association is not suitable
because it is very sensitive to noise. To increase the
robustness of our association approach we should
decrease the value of DistRatio.
This manner to nd feature correspondence is computationally heavy and many useless distances are
calculated. For example, if we assume Mi number of
features in image Ii and Mj number of feature in
image Ij then the complexity of the matching algorithm using Euclidian distance is Mi Mj
128 127 summations and Mi Mj 128 multiplications. This huge number of operations which should
run between two frames makes the matching algorithm time-consuming and not very suitable for real
time applications.
In this part of our work, we propose an approach
to reduce the complexity of the feature matching process using stereo vision constraints. Rather than looking for feature correspondence in the entire image, the
area of search will be reduced to regions of interest
Figure 4. Features correspondence.
Figure 5. Feature matching.
(ROIs) limited by the horizontal and vertical disparities (HD and VD, respectively). To nd the correspondent of a feature pi xi , yi in image Ii , we will
look into the region of centre xi , yi at Ij limited by
the vertical and horizontal maximum disparity, VD
and HV respectively, as shown in Figure 4. Good feature matching is obtained with signicant reduction of
the computation time is shown in Figure 5 with
DistRatio 0:1.
Disparity analysis
Maximum horizontal disparity HDmax. HDmax can be estimated as follows: Assume a point M of coordinates
x, y, z in the world frame and its projection on the
image left and right planes are ul , vl and ur , vr
respectively. f is the focal distance and b is the baseline
(distance between the two cameras). By triangle similarity as shown in Figure 6, we get
f ul
ur

z x xb
20
Figure 6. Triangulation principal.
f ul ur
z
b
21
Since the disparity is dened by

d ul ur
22
Nemra and Aouf
From equations (21) and (22), we obtain

z
fb
d
23
From equation (23) and for calibrated stereo-cameras, the depth z of the point M depends only on the
disparity. From the same equation, we can conclude
that the maximum disparity is linked with the minimum depth HDmax dmax zfb
where zmin is the
min
depth of the nearest point observed by both cameras
(left and right).
Maximum vertical disparity VDmax. The estimation of the
vertical disparity is based on the epipolar geometry.
Assume a feature pl with coordinate ul , vl in the left
image and its correspondent in the right image pr with
coordinate ur , vr . Then, the equation of the epipolar
line is given by
pTr
F pl 0
where F is the fundamental matrix, given by

T
F M1
E Ml
r
24
25
Ml and Mr are the intrinsic parameter matrix for

camera left and camera right, respectively. E is the
essential matrix.462
3
f11 f12 f13
If we put F 4 f21 f22 f23 5, then equation (20)
becomes
f31 f32 f33
ul ur f11 ul vr f21 ul f31 vl ur f12 vl vr f22
vl f32 ur f13 vr f23 f33 0

HDmax
HDmax
B4vr 4A ul
B
A ul
2
2
32
Then, we conclude that maximum vertical disparity
is given by

HDmax
HDmax
VDmax A ul
A ul
2
2
33
VDmax A HDmax
34
An example using DistRatio 0:2 is illustrated in

Figures 7 to 9. Figure 7 shows the extracted feature using adaptive SIFT with a disparity window.
A landmark from image left got 06 candidates only
rather than 245 (number of features extracted in left
image). Figures 8 and 9 show the results of features
matching.
If the left and right images are rectied then the
matching problem becomes easier because in this case
the vertical disparity VD 0. Thus, the area of search
of the feature will be reduced to one dimension space
limited by the horizontal disparity HD.
26
Simulation results
and the epipolar equation will be given by

v r A ur B
HDmax
HDmax
4ur 4ul
30
2
2

HDmax
HDmax
A ul
B4Aur B4A ul
B
2
2
31
ul
27
A
ul f11 vl f12 f13

ul f21 vl f22 f23
28
B
ul f31 vl f32 f33

ul f21 vl f22 f23
29
This section presents simulation results of C-VSLAM

involving two UAVs. Each UAV has its own IMU
and stereo vision cameras. The C-VSLAM algorithm
is simulated to run centrally at the command station
while communicating the position and the map to
each UAV.
Figure 7. Features extraction. (a) Right image 241 landmarks and (b) left image 245 landmarks.
10
Figure 8. Features correspondence.
Figure 9. 64 good matching.
In Figure 10, curves in the left (right) side show the

position of the UAV1 (UAV2) in the x-, y- and z-axes.
As can be seen, X-position (Figure 10(a) and (d)) and
Y-position (Figure 10(b) and (e)) are estimated with
signicant accuracy. This can be explained by the fact
that cameras or stereo vision system can provide precise bearing information. This is not completely the
case for the range information where the stereo vision
system provides less accurate Z-position as shown in
Figure 10(c) and (f)). On the above gures, we can
also observe the eect of loop closing detection on
UAV1 at t 200 s, as well as the precision improvement obtained when UAV2 visits features already visited by the UAV1 at t1 80 s and at t2 150 s.
Figure 11 shows the trajectories of the two UAVs
in the x- and y-axes. While UAV1 (Red) closes
its loop at t 200 s, (dashed square, the red UAV
visit the same area for the second time), UAV2
(Blue) does not make any loop closing but it visits
many features already visited by UAV1 (dashed
ellipses, the blue UAV visits area already visited by
the red UAV).
Figure 12 presents the evolution of the uncertainties for six features from the global map. As shown,
the uncertainty of each feature decreases with time. At

t 200 s a signicant decrease of the uncertainty is
observed and this is justied by the loop closing detection, which improves the consistency of the estimator
at that time.
Figure 13 shows a comparison between single
and cooperative UAV VSLAM in simulation. The
estimation of the UAV1 Z-position with a single
VSLAM, even if it still much better than the
INS position, leads to an increasing error with
time if no loop closure is detected. On the other
hand, the estimation of the UAV1 Z-position
with cooperative VSLAM provides more accurate
position.
Experimental results
Experimental indoor results
Experimental validation was arranged based on
cooperative mobile robots (Pioneer3 AT). Without
losing generality in the validation process from
ground robots (3DOF) to aerial vehicles (6DOF),
the use of ground vehicles for experimental validation
Nemra and Aouf
11
(a) 30
(d) 35
25
30
20
25
True position
UAV2 X position (m)
UAV1 X position (m)
INS position
15
10
5
VSLAM position
20
15
10
True position
INS position
VSLAM position
50
100
150
200
250
50
(b)
50
150
200
250
(e) 30
True position
True position
20
INS position
INS position
30
VSLAM position
UAV2 Y position (m)
UAV1 Y position (m)
100
Time (s)
Time (s)
10
10
10
VSLAM position
0
10
20
30
30
40
50
(c)
50
100
150
Time (s)
200
50
250
100
150
200
250
Time (s)
(f)
8
6
UAV2 Z position (m)
6
UAV1 Z position (m)
50
4
2
0
True position
INS position
VSLAM osition
2
4
4
2
0
True position
INS position
VSLAM position
6
0
50
100
150
Time (s)
200
250
50
100
150
200
250
Time (s)
Figure 10. UAVs positions left: XYZ position of UAV1 in navigation frame; right: XYZ position of UAV2 in navigation frame.
UAV: unmanned aerial vehicle.
12
Figure 11. UAV1 and UAV2 True, INS and corrected position.
UAV: unmanned aerial vehicle; INS: inertial navigation system.
0.9
Feature 1
0.8
Feature 2
Features uncertainties (m)
0.7
Feature 3
Feature 4
0.6
Feature 5
0.5
Feature 6
0.4
0.3
0.2
0.1
0
50
100
Time (s)
Figure 12. Feature uncertainties.
150
200
250
Nemra and Aouf
13
8
6
4
UAV Z1 position (m)
2
0
2
4
6
True position
INS position
Single UAV Visual SLAM position
Multiple UAV Visual SLAM position
8
10
12
14
0
50
100
150
200
250
Time (s)
Figure 13. Z estimation single VSLAM vs C-VSLAM.

VSLAM: visual simultaneous localization and mapping; C-VSLAM: cooperative visual simultaneous localization and mapping.
Figure 14. Mobile robot representation in navigation frame.
was dictated by the unavailability of a multiple

aerial
vehicle
setup.
In
this
experiment,
Odometer pose, instead of the UAV full INS information, was fused with stereo vision data. Articial
noises were added to the odometry data of the robot
to emulate well drifts appearing for in the
odomtery sensors with time and for long trajectories,
which is not our case here especially for indoor
experiments.
Figure 14 shows the state variables (X, Y and ) of
the two mobile robots considered in this experiment.
Our vehicle has three degree of freedom rather than
six as for aerial vehicles. However, the observation
system is still the same as presented in section
Feature observation and matching.
Figure 15 presents the interface that is built and

run centrally on our command computer and based
on the communication data transmitted by the two
cooperative robots. This gure contains the two
images left and right observed by the rst robot and
the second robot with features extracted (green point)
and tracked (red point). In the bottom we have the
estimation trajectories of each robot, odometer position (red), true position (dashed), CVSLAM position
(green). Statistics about the number of extracted and
associated features for both robots are given in the left
bottom corner of the interface.
Figure 16 shows the results of the experimental
pose estimation of the two cooperative robots navigating indoor. Good pose estimation (position X, Y
14
Figure 15. Computer real time interface for C-VSLAM.

C-VSLAM: cooperative visual simultaneous localization and mapping.
and orientation ) is obtained by the C-VSLAM algorithm comparing to the odometer pose for the two
robots.
Comparison between experimental single VSLAM
and C-VSLAM is given in Figure 17. As it is shown,
navigation positions (X and Y) estimated by single
VSLAM is less accurate comparing to the
C-VSLAM estimates that maintain a good precision.
In fact a zoom on the rst instants of robot travels
shows that C-VSLAM and single VSLAM look close
to each others. However, after a little travelling
C-VSLAM navigation results outperform single navigation results. This is mainly due to the fact that sharing visual features augments the reliability and
robustness of the estimation process by reducing perception uncertainties. In addition to this, loop closing
eects is apparent on both single VSLAM and
C-VSLAM by looking carefully on Figure 17 at
time 70 s.
Experimental outdoor results

Experimental outdoor validation was also conducted
in order to verify the robustness of our approach in a
scenario when environment conditions are not
controlled.
Figure 18 shows the results of pose estimation
using experimental data of two mobiles robot navigating in outdoor environment. As can be seen good pose
estimation (position X, Y and orientation ) is
obtained by the C-VSLAM algorithm comparing to

the odometer pose.
Experimental results of the images mosaic using
C-VSLAM algorithm in outdoor environment are presented in Figures 19 to 21. Figures 19 and 20 show the
image mosaics built using mobile robot 1 and 2,
respectively. Figure 21 shows the image mosaic built
in the command computer using both robot scene perceptions. From these gures, it is clear that cooperating
robots will have access to a larger map than each robot
alone, which is very important to explore large areas in
outdoor environments.
Conclusion
In this article, we proposed a robust approach to
solve the cooperative airborne VSLAM problem
based on the development of a full stereo camera
observation model. An adaptive SIFT feature extractor followed by stereo vision constraints fast
matching were introduced to construct a large map.
Robust C-VSLAM is implemented based on the
NH1 lter and compared with the single VSLAM.
The proposed C-VSLAM solution was validated
using simulation and experimental data and
good and very promising results were obtained.
Future work will include extending this cooperative
strategy in a distributed estimation framework for
an experimental setup based on a set of cooperative
UAVs.
Nemra and Aouf
Figure 16. Robots positions, XY position of Robot 1 in navigation frame XY position of Robot 2 in navigation frame.
15
16
Figure 17. XY mobile robot localization single VSLAM vs cooperative VSLAM.
VSLAM: visual simultaneous localization and mapping.
Nemra and Aouf
17
Figure 18. Robots positions left: XY position of Robot 1 in navigation frame; right: XYy position of Robot 2 in navigation frame.
Figure 19. Image mosaic constructed by Robot 1.
Figure 20. Image mosaic constructed by Robot 2.
18
Figure 21. Image mosaic constructed by Robot 1 and 2.
Funding
This research received no specic grant from any funding
agency in the public, commercial, or not-for-prot sectors.
REFERENCES
1. Cox IJ and Wilfong GT. Autonomous robot vehicles.
New York: Springer Verlag, 1990.
2. Toledo-Moreo R, Zamora-Izquierdo MA, UbedaMinarro B, et al. High-integrity IMM-EKF-based
road vehicle navigation with low-cost GPS/SBAS/
INS. IEEE Trans Intell Trans Syst 2007; 8(3).
3. Leonard JJ and Durrant-Whyte HF. Directed sonar sensing for mobile navigation. Boston: Kluwer Academic
Publisher, 1992.
4. Toth C. Sensor integration in airborne mapping. IEEE
Trans Instrum Meas 2002; 51(6): 13671373.
5. Lacroix S, Mallat A, Chatilla R, et al. Rover self localization in planetry-like environments. In: 5th
International symposium on artificial intelligence,
robotics and automation in space, Noordwijk: The
Netherlands, June 1999.
6. Olson CF. Probabilstic self-localization for mobile
robots. IEEE Trans Robot Automat 2000; 16(1): 5566.
7. Ollero A and Merino L. Control and perception techniques for aerial robotics. Ann Revs Contr 2004; 28(2):
167178.
8. Montemerlo M, Thrun S, Koller D, et al. FastSLAM:
A factored solution to the simultaneous localization
and mapping problem. In: Proceedings of the AAAI
national
conference
on
artificial
intelligence,
Edmonton, Canada, 2002.
9. Grabowski R, Navarro-Sement LE, Paredis CJJ, et al.
Heterogenous teams of modular robots for mapping
and exploration. Autonom Robot 2000; 8(3): 293308.
10. Singh H, Catipovic J, Eastwood R, et al. An integrated
approach to multiple AUV communications, navigation
and docking. In: Proceedings of the IEEE oceanic engineering society, Florida, US: Fort Lauderdale, 1996,
pp. 5964.
11. Kim JH, DeFilipps JM, Impert NP, et. al. ATM network based integrated battle space simulation with multiple avawacs-fighter platforms. In: Proceedings of the
IEEE military communications conference, vol. 1, IEEE,
Boston, 1998, pp.101107.
12. Cohen W. Adaptive mapping and navigation by teams of
simple robots. Robot Autonom Syst 1996; 18: 411434.
13. Nettleton EW, Durrant-Whyte HF, Gibbens PW, et al.
Multiple platform localisation and map building. In:
Proceedings of the sensor fusion and decentralized
control in robotic systems III, vol. 4196, Boston, 2000,
pp.337347.

14. Cohen W. Adaptive mapping and navigation by teams of
simple robots. Robot Autonom Syst 1996; 18: 411434.
15. Sty K. Using situated communication in distributed
autonomous mobile robots. In: Seventh Scandinavian
conference on artifcial intelligence (SCAI01),
Amsterdam, 2001.
16. Mourikis AI and Roumeliotis SI. Performance analysis
of multirobot cooperative localization. IEEE Trans
Robot 2006; 22(4): 666681.
17. Roumeliotis SI and Rekleitis IM. Propagation of uncertainty in cooperative multirobot localization: analysis
and experimental results. Autonom Robot 2004; 17(1):
4154.
18. Burgard W, Fox D and Thrun S. Active mobile robot
localization by entropy minimization. In: Proceedings of
second euromicro workshop on advanced mobile robots
(EUROBOT97), Brescia, 1997.
19. Rochaa R, Diasa J and Carvalhob A. Cooperative
multi-robot systems: a study of vision-based 3-D mapping using information theory. Robot Autonom Syst
2005; 53(3-4): 282311.
20. Stachniss C, Grisetti G, Hahnel D, et al. Improved
Rao-Blackwellized mapping by adaptive sampling and
active loop-closure. In: Proceedings of the workshop on
self-organization of adaptive behaviour, SOAVE,
Ilmenau, Germany, 2004, pp.115.
21. Triebel R and Burgard W. Improving simultaneous
mapping and localization in 3D using global constraints. In: Proceedings of the national conference on
artificial intelligence, AAAI, Pittsburgh, Pennsylvania,
2005.
22. Grejner-Brzezinska DA, Toth CK and Sun HX, et al. A
robust solution to high-accuracy geolocation: quadruple integration of GPS, IMU, pseudolite, and terrestrial
laser scanning. IEEE Trans Instrum Meas 2011; 60(11):
36943708.
23. Schleicher D, Bergasa LM and Ocana M, et al. Realtime hierarchical outdoor SLAM based on stereovision
and GPS fusion. IEEE Trans Intell Trans Syst 2009;
10(3).
24. Sappa AD, Dornaika F and Ponsa D, et al. An efficient approach to onboard stereo vision system pose
estimation. IEEE Trans Intell Trans Syst 2008; 9(3):
476490.
25. Wu M, Huang F, Wang L, et al. Cooperative multirobot monocular-SLAM using salient landmarks. In:
IEEE international asia conference on informatics in control, automation and robotics, IEEE, Bangkok, 2009,
pp.151155.
26. Gil A, Reinoso O and Ballesta M, et al. Multi-robot
visual SLAM using a Rao-Blackwellized particle filter.
Robot Autonom Syst Robot Autonom Syst 2010; 58(1):
6880.
27. Martinelli A, Pont F and Siegwart R. Multi-robot localization using relative observations. In: IEEE international conference on robotics and automation,
Barcelona, Spain, 2005.
28. Trawny N and Barfoot T. Optimized motion strategies
for cooperative localization of mobile robots. In:
Proceedings of the IEEE international conference on
robotics and automation, vol. 1, New Orleans, US,
2004, pp.10271032.
29. Wu M, Huang F, Wang L, et al. Cooperative multirobot, monocular-SLAM using salient landmarks.
Nemra and Aouf
30.
31.
32.
33.
34.
35.
36.
37.
In: Proceedings of the international asia conference on

informatics in control, automation and robotics,
Bangkok, Thailand, 2009, pp.151155.
Ergin Ozkucur N and Levent Akn H. Cooperative
multi-robot map merging using fast-SLAM. In:
Proceedings of the robocup international symposium
2009, vol. 5949, Springer Berlin/Heidelberg, 2009,
pp.449460.
Dissanayake MWMG, Newman P and Clark S, et al.
A solution to the simultaneous localization and map
building (SLAM) problem. IEEE Trans Robot
Automat 2001; 17(3): 229241.
Nemra A and Aouf N. Robust airborne 3D visual simultaneous localization and mapping with observability
and consistency analysis. J Intell Robot Syst, DOI:
10.1007/s10846-008-9306-6.
Fox D, Burgard W and Kruppa H, et al. A probabilistic
approach to collaborative multi-robot localization.
Autonom Robot 2000; 8: 325344.
Thrun S. A probabilistic online mapping algorithm for
teams of mobile robots. Int J Robot Res 2001; 20(5):
335363.
Dellaert F and Kaess M. Square root SAM: simultaneous localization and mapping via square root information smoothing. Int J Robot Res 2006; 25(12):
11811203.
Kaess M, Ranganathan A and Dellaert F. iSAM: incremental smoothing and mapping. IEEE Trans Robot
2008; 24(6): 13651378.
Andersson LAA and Nygards J. C-SAM: multi-robot
SLAM using square root information smoothing.
19
38.
39.
40.
41.
42.
43.
44.
45.
46.
In: IEEE international conference on robotics and automation, Pasadena, CA, USA, 2008, pp.27982805.
Nemra A and Aouf N. Robust nonlinear filtering for
INS/GPS UAV localization. In: IEEE mediterranean
conference on control and automation, Ajaccio, June
2008, pp.695702.
Ronnback S. Development of an INS/GPS navigation
loop for an UAV, Masters Thesis, Lulea Tekniska
Universitet, 2000.
Shaked U and Berman N. H1 nonlinear filtering of
discrete-time process. IEEE Trans Signal Process
1995; 43: 22052209.
Petersen I and Savkin A. Robust Kalman filtering for
signals and systems with large uncertainties. Boston:
Birkhauser, 1999.
Basar T and Baernard P. H1 optimal control and
related minimax design problems: a dynamic game
approach. Boston: Birkhauser, 1991.
Einicke G and White L. Robust extended Kalman filtering. IEEE Trans Signal Process 1999; 47(9):
25962599.
Lowe DG. Distinctive image features from scaleinvariant keypoints. Int J Comput Vision 2004; 60(2):
91110.
Nemra A and Aouf N. Robust feature extraction
and correspondence for UAV map building. In: IEEE
mediterranean conference on control and automation,
2009.
Ma Y, Soatto S and Kosecka J, et al. An invitation to 3-D
vision: from images to geometric models. Interdisciplinary
Applied Mathematics. New York: Springer, 2003.

Robust Cooperative Visual Localization With Experimental Validation For Unmanned Aerial Vehicles

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Robust Cooperative Visual Localization With Experimental Validation For Unmanned Aerial Vehicles

Transféré par

Droits d'auteur :

Formats disponibles

Original Article

Robust cooperative visual localization

Proc IMechE Part G:

Abdelkrim Nemra and Nabil Aouf

precisely collect data or to create maps of an

Stereo vision systems are typically less expensive

Proc IMechE Part G: J Aerospace Engineering 0(0)

Single visual SLAM

Nemra and Aouf

transformation matrix from body to navigation

Nonlinear H1 filter for VSLAM problem

Proc IMechE Part G: J Aerospace Engineering 0(0)

These inputs must satisfy these norm bounds

xk fxk1 , uk1 gxk1 wk1

f is the discrete version of equation (1) (in addition

where rfk x is the Jacobian of f evaluated at xk1 ,

Instead of solving for the nonlinear process and

x^ k=k x^ k=k1 Kk yk Hk xk=k1

Kk Pk=k1 Hk Hk Pk=k1 HTk R^ k 1

Nemra and Aouf

The state vector and Jacobian matrices are given by

Jacobian matrix of f w.r.t

Jacobian matrix of f w.r.t wk

Jacobian matrix of f w.r.t vk Vsys I44

Jacobian matrix of h w.r.t X Hsys 049i1 Huav i

Figure 2. Cooperative VSLAM architecture.

feature re-observed (has been observed

Features of Type 1 and Type 2 will be used to

Proc IMechE Part G: J Aerospace Engineering 0(0)

C-VSLAM based on NH? lter is summarized

Feature observation and matching

Nemra and Aouf

Figure 3. Features in C-VSLAM algorithm.

Stereo vision constraints

M: keypoint number in the second image.

Proc IMechE Part G: J Aerospace Engineering 0(0)

Figure 4. Features correspondence.

Figure 5. Feature matching.

Figure 6. Triangulation principal.

Since the disparity is dened by

Nemra and Aouf

From equations (21) and (22), we obtain

where F is the fundamental matrix, given by

Ml and Mr are the intrinsic parameter matrix for

An example using DistRatio 0:2 is illustrated in

and the epipolar equation will be given by

ul f11 vl f12 f13

ul f31 vl f32 f33

This section presents simulation results of C-VSLAM

Proc IMechE Part G: J Aerospace Engineering 0(0)

Figure 8. Features correspondence.

Figure 9. 64 good matching.

In Figure 10, curves in the left (right) side show the

the uncertainty of each feature decreases with time. At

Nemra and Aouf

UAV2 X position (m)

UAV1 X position (m)

UAV2 Y position (m)

UAV1 Y position (m)

UAV2 Z position (m)

Proc IMechE Part G: J Aerospace Engineering 0(0)

Figure 12. Feature uncertainties.

Nemra and Aouf

Jacobian matrix of f w.r.t vk Vsys I44

Jacobian matrix of h w.r.t X Hsys 049i1 Huav i