Académique Documents
Professionnel Documents
Culture Documents
Abstract
This article aims to present an adaptive and robust cooperative visual localization solution based on stereo vision
systems. With the proposed solution, a group of unmanned vehicles, either aerial or ground will be able to construct
a large reliable map and localize themselves precisely in this map without any user intervention. For this cooperative
localization and mapping problem, a robust nonlinear H1 filter is adapted to ensure robust pose estimation. In addition,
a robust approach for feature extraction and matching based on an adaptive scale invariant feature transform stereo
constrained algorithm is implemented to build a large consistent map. Finally, a validation of the solution proposed is
presented and discussed using simulation and experimental data.
Keywords
Visual SLAM, data fusion, feature extraction, cooperation
Date received: 2 May 2012; accepted: 25 September 2012
Introduction
Self-localization of unmanned vehicles (UV) is still
considered as a fundamental problem in autonomous
vehicle navigation. The problem has been tackled in
the recent past for dierent platforms and a number
of ecient techniques were proposed as presented in
Cox and Wilfong,1 Toledo-Moreo et al.,2 Leonard
and Durrant-Whyte,3 Toth4 and references there in.
More recently, driven by the interest of military applications and planetary explorations, researchers have
turned their attention towards localization of vehicles
moving in hostile and unknown environments.5,6 In
these cases, a map of the environment is not available
and hence a more complex simultaneous localization
and mapping (SLAM) problem must be faced and
solved. The vehicle used in these above-mentioned
applications, ranging from aerial to ground vehicles,
has to collect information from the environment
through its sensors and at the same time it must localize itself with respect to the map it is building.
Extensive research works, employing extended
Kalman lter (EKF) and particle lters,7,8 have been
reported in the literature to address several aspects of
vehicle SLAM problem.9 However most of these
research works concern the case of a single vehicle
exploring an environment through landmark
detection.
In many applications as related to defence industry, a single sensing platform may not be sucient to
2
C-SLAM is performed when multiple vehicles share
navigation and perception sensing information in
order to improve their own position estimates
beyond what is possible with a single vehicle. Simple
collective navigation has been demonstrated in simulation using multiple cartographer vehicles that randomly explore the environment.14 Sty15 performs
simple relative localization between collaborators
using directional beacons. Other research work presented,16,17 challenges in terms of localization performances for C-SLAM and its growing
uncertainties. Simulation and experimental validation
were conducted in this later work to support the presented analysis. In addition, techniques based on
entropy minimization,18 and information theory,19
were also developed in the past for C-SLAM problem.
To date, typical SLAM approaches have been
using laser range sensors to build maps in two and
three dimensions.2022 Recently, the interest of using
cameras as main sensors in SLAM has increased and
some authors have been concentrating on building 3D
maps using visual information obtained from cameras
and more specically stereo cameras.23,24 These
approaches are usually denoted as visual SLAM.
The reasons for this interest stem from:
i.
State vector
The estimated state vector xk at times k contains the
3D vehicle position (pn ), velocity (vb ) and Euler angles
(n , , ) and the N 3D feature locations (mni ) in
the environment, i 1, . . . , N. In our work the features are the keypoint detected by the scale invariant
feature transform (SIFT) algorithm and represent the
minima and maxima of the dierence of Gauss
image.40 The superscript n indicates the vector is referenced in the navigation frame, and b indicates the
vector is referenced in the body frame. The body
frame is the same as the vehicle frame. The navigation
frame is xed with respect to the surface of the Earth
and is placed arbitrarily within the operating area of
the vehicle, with x-axis pointing north, y-axis pointing
east and z-axis pointing down towards the centre of
the Earth.
The problem with the navigational solution provided by inertial navigation system (INS)
(Figure 1(b)), is that it drifts with time as in most
other dead reckoning systems. The drift rate of the
inertial position is typically a cubic function of
time. Small errors in gyros will be accumulated
in angle estimates, which in turn misrepresent gravitational acceleration as the vehicle acceleration
and results in quadratic velocity and cubic position
errors.
Process model
The state estimate xk at times k is predicted forward in
time by integrating inertial measurement unit (IMU)
readings using an inertial frame mechanisation of the
inertial navigation equations as
3
3 2
pnk1 Cnb vnk t
pnk
4 vbk 5 4 vbk1 fbk vbk1 wbk CTnb gn t 5
nk
bk1 Enb wb t
2
1
where fb and wb are the body-frame referenced vehicle
accelerations and rotation rates from the IMU and gn
is the acceleration due to gravity. Cnb and Enb are
the direction cosine matrix and rotation rate
Figure 1. UAV with IMU and stereo cameras. (a) Different frames for the UAV system and (b) INS architecture.
UAV: unmanned aerial vehicle; IMU: inertial measurement unit.
The estimation process uses the process and observation models in a nonlinear lter to estimate the state
xk from observations yk .
and
2
kk k22 423 x~ k=k 2
12
13
yk Hk xk cw vk k
14
1
where c2w 1 2 21 2 23 and c2v c2w 1 21 .
This nal form, equations (13) and (14), results to
the same structure of the EKF7 except that the error
covariance correction of the linear H 1 lter is used
with the noise process wk and vk scaled by cw and cv .
Nonlinear H1 filter
The nal NH1 formulation is then written in a
Predictor-Corrector scheme:
Initialization:
Q^ k c2w Qk
R^ k c2v Rk
Predictor:
x^ k=k1 f x^ k1=k1 , uk1 , 0
Pk=k1 Fk1 Pk1=k1 FTk1 k1 Q^ k Tk1
Corrector:
Fk rfk x^ k=k , k rfw x^ k=k ,
Hk rhk x^ k=k1 , k f x^ k=k Fk x^ k=k , and
k h x^ k=k1 Hk x^ k=k1
x^ k1 Fk x^ k k wk k k
yk Hk xk vk k
7
8
where
k 1 x~ k=k 2 x~ k=k wk
and
X
k
3 x~ k=k1
10
Cooperative VSLAM
In the multi-vehicle VSLAM problem proposed in
this study, the estimated state vector becomes the position, velocity and attitude of the multiple vehicles and
the positions of point features in the environment. In
this section, we extend the single VSLAM, presented
above, to the multiple vehicle case.
Xuav1,k
6
..
6
6 X .
6 uavM,k
6 n
m
Xk 6
6 mn1,k
6 2,k
6
6 ..
4.
mnN,k
3
7
7
7
7
7
7
7
7
7
7
5
Fuav1,k
6
6 099
6
6 ..
4 .
03N9
099
..
.
099
099
FuavM,k
03N9
3
093N
.. 7
. 7
7
7
093N 5
I3N3N
sys
uav1
6 0
96
6
4
096
03N6
096
..
.
096
03N6
096
096 7
7
5
uavM
03N6
Type 3:
1
>
c1
n
c1
n
c1
n
>
m31 xmi m32 ymi m33 zmi mc1
>
34
>
>
>
c1 n
c1 n
c1 n
c1
>
m
x
m
y
m
z
m
>
mi
mi
mi
21
22
23
24
>
>
< v1 mc1 xn mc1 yn mc1 zn mc1
31 mi
32 mi
33 mi
34
19
>
mc2
xnmi mc2
ynmi mc2
znmi mc2
>
11
12
13
14
>
u
>
2
c2 n
c2 n
c2
n
>
mc2
>
31 xmi m32 ymi m33 zmi m34
>
>
>
n
c2 n
c2
>
mc2 xnmi mc2
>
22 ymi m23 zmi m24
>
: v2 21
c2 n
c2 n
c2 n
m31 xmi m32 ymi m33 zmi mc2
34
where xnmi ynmi znmi T are the coordinates of the landmark mi in the navigation frame (north east down,
c2
NED). mc1
ij and mij are the projection matrix components of camera right and camera left, respectively.32
Adaptive SIFT
Despite SIFT algorithm44 has many advantages; the
detected keypoints are still not ecient especially for
visual SLAM problem. SIFT algorithm uses a constant factor k between the scales, however a big
value of k implies few keypoints, which is a disadvantage for the VSLAM problem especially if these keypoints are unstable. Also small values of k implies
detection of lot of keypoints, which is a problem for
VSALM too. Clearly it is not evident to nd the suitable value of k for all kind of images.
Lowe44 proposes an adequate value of k 1:6,
which is ecient only with images with appropriate
amount of texture. For example, in low frequency
images (seas, deserts. . .) only few features are detected.
On the other hand, in high frequency images (spatial,
building. . .) SIFT algorithm leads to lots of features,
which are not suitable for VSLAM. As a solution to
this problem, an adaptive scale representation with an
adaptive factor k was introduced to propose our adaptive scale invariant feature transform (ASIFT)32,44 that
will be used in this work. The scale factor is estimated
based on the energy of the DoG image.
(ROIs) limited by the horizontal and vertical disparities (HD and VD, respectively). To nd the correspondent of a feature pi xi , yi in image Ii , we will
look into the region of centre xi , yi at Ij limited by
the vertical and horizontal maximum disparity, VD
and HV respectively, as shown in Figure 4. Good feature matching is obtained with signicant reduction of
the computation time is shown in Figure 5 with
DistRatio 0:1.
Disparity analysis
Maximum horizontal disparity HDmax. HDmax can be estimated as follows: Assume a point M of coordinates
x, y, z in the world frame and its projection on the
image left and right planes are ul , vl and ur , vr
respectively. f is the focal distance and b is the baseline
(distance between the two cameras). By triangle similarity as shown in Figure 6, we get
f ul
ur
z x xb
20
f ul ur
z
b
21
22
fb
d
23
From equation (23) and for calibrated stereo-cameras, the depth z of the point M depends only on the
disparity. From the same equation, we can conclude
that the maximum disparity is linked with the minimum depth HDmax dmax zfb
where zmin is the
min
depth of the nearest point observed by both cameras
(left and right).
Maximum vertical disparity VDmax. The estimation of the
vertical disparity is based on the epipolar geometry.
Assume a feature pl with coordinate ul , vl in the left
image and its correspondent in the right image pr with
coordinate ur , vr . Then, the equation of the epipolar
line is given by
pTr
F pl 0
24
25
HDmax
HDmax
B4vr 4A ul
B
A ul
2
2
32
Then, we conclude that maximum vertical disparity
is given by
HDmax
HDmax
VDmax A ul
A ul
2
2
33
VDmax A HDmax
34
26
Simulation results
HDmax
HDmax
4ur 4ul
30
2
2
HDmax
HDmax
A ul
B4Aur B4A ul
B
2
2
31
ul
27
A
28
B
29
Figure 7. Features extraction. (a) Right image 241 landmarks and (b) left image 245 landmarks.
10
Experimental results
Experimental indoor results
Experimental validation was arranged based on
cooperative mobile robots (Pioneer3 AT). Without
losing generality in the validation process from
ground robots (3DOF) to aerial vehicles (6DOF),
the use of ground vehicles for experimental validation
11
(a) 30
(d) 35
25
30
20
25
True position
INS position
15
10
5
VSLAM position
20
15
10
True position
INS position
VSLAM position
50
100
150
200
250
50
(b)
50
150
200
250
(e) 30
True position
True position
20
INS position
INS position
30
VSLAM position
100
Time (s)
Time (s)
10
10
10
VSLAM position
0
10
20
30
30
40
50
(c)
50
100
150
Time (s)
200
50
250
100
150
200
250
Time (s)
(f)
8
6
6
UAV1 Z position (m)
50
4
2
0
True position
INS position
VSLAM osition
2
4
4
2
0
True position
INS position
VSLAM position
6
0
50
100
150
Time (s)
200
250
50
100
150
200
250
Time (s)
Figure 10. UAVs positions left: XYZ position of UAV1 in navigation frame; right: XYZ position of UAV2 in navigation frame.
UAV: unmanned aerial vehicle.
12
Figure 11. UAV1 and UAV2 True, INS and corrected position.
UAV: unmanned aerial vehicle; INS: inertial navigation system.
0.9
Feature 1
0.8
Feature 2
Features uncertainties (m)
0.7
Feature 3
Feature 4
0.6
Feature 5
0.5
Feature 6
0.4
0.3
0.2
0.1
0
50
100
Time (s)
150
200
250
13
8
6
4
2
0
2
4
6
True position
INS position
Single UAV Visual SLAM position
Multiple UAV Visual SLAM position
8
10
12
14
0
50
100
150
200
250
Time (s)
14
and orientation ) is obtained by the C-VSLAM algorithm comparing to the odometer pose for the two
robots.
Comparison between experimental single VSLAM
and C-VSLAM is given in Figure 17. As it is shown,
navigation positions (X and Y) estimated by single
VSLAM is less accurate comparing to the
C-VSLAM estimates that maintain a good precision.
In fact a zoom on the rst instants of robot travels
shows that C-VSLAM and single VSLAM look close
to each others. However, after a little travelling
C-VSLAM navigation results outperform single navigation results. This is mainly due to the fact that sharing visual features augments the reliability and
robustness of the estimation process by reducing perception uncertainties. In addition to this, loop closing
eects is apparent on both single VSLAM and
C-VSLAM by looking carefully on Figure 17 at
time 70 s.
Conclusion
In this article, we proposed a robust approach to
solve the cooperative airborne VSLAM problem
based on the development of a full stereo camera
observation model. An adaptive SIFT feature extractor followed by stereo vision constraints fast
matching were introduced to construct a large map.
Robust C-VSLAM is implemented based on the
NH1 lter and compared with the single VSLAM.
The proposed C-VSLAM solution was validated
using simulation and experimental data and
good and very promising results were obtained.
Future work will include extending this cooperative
strategy in a distributed estimation framework for
an experimental setup based on a set of cooperative
UAVs.
Figure 16. Robots positions, XY position of Robot 1 in navigation frame XY position of Robot 2 in navigation frame.
15
16
Figure 17. XY mobile robot localization single VSLAM vs cooperative VSLAM.
VSLAM: visual simultaneous localization and mapping.
17
Figure 18. Robots positions left: XY position of Robot 1 in navigation frame; right: XYy position of Robot 2 in navigation frame.
18
Funding
This research received no specic grant from any funding
agency in the public, commercial, or not-for-prot sectors.
REFERENCES
1. Cox IJ and Wilfong GT. Autonomous robot vehicles.
New York: Springer Verlag, 1990.
2. Toledo-Moreo R, Zamora-Izquierdo MA, UbedaMinarro B, et al. High-integrity IMM-EKF-based
road vehicle navigation with low-cost GPS/SBAS/
INS. IEEE Trans Intell Trans Syst 2007; 8(3).
3. Leonard JJ and Durrant-Whyte HF. Directed sonar sensing for mobile navigation. Boston: Kluwer Academic
Publisher, 1992.
4. Toth C. Sensor integration in airborne mapping. IEEE
Trans Instrum Meas 2002; 51(6): 13671373.
5. Lacroix S, Mallat A, Chatilla R, et al. Rover self localization in planetry-like environments. In: 5th
International symposium on artificial intelligence,
robotics and automation in space, Noordwijk: The
Netherlands, June 1999.
6. Olson CF. Probabilstic self-localization for mobile
robots. IEEE Trans Robot Automat 2000; 16(1): 5566.
7. Ollero A and Merino L. Control and perception techniques for aerial robotics. Ann Revs Contr 2004; 28(2):
167178.
8. Montemerlo M, Thrun S, Koller D, et al. FastSLAM:
A factored solution to the simultaneous localization
and mapping problem. In: Proceedings of the AAAI
national
conference
on
artificial
intelligence,
Edmonton, Canada, 2002.
9. Grabowski R, Navarro-Sement LE, Paredis CJJ, et al.
Heterogenous teams of modular robots for mapping
and exploration. Autonom Robot 2000; 8(3): 293308.
10. Singh H, Catipovic J, Eastwood R, et al. An integrated
approach to multiple AUV communications, navigation
and docking. In: Proceedings of the IEEE oceanic engineering society, Florida, US: Fort Lauderdale, 1996,
pp. 5964.
11. Kim JH, DeFilipps JM, Impert NP, et. al. ATM network based integrated battle space simulation with multiple avawacs-fighter platforms. In: Proceedings of the
IEEE military communications conference, vol. 1, IEEE,
Boston, 1998, pp.101107.
12. Cohen W. Adaptive mapping and navigation by teams of
simple robots. Robot Autonom Syst 1996; 18: 411434.
13. Nettleton EW, Durrant-Whyte HF, Gibbens PW, et al.
Multiple platform localisation and map building. In:
Proceedings of the sensor fusion and decentralized
control in robotic systems III, vol. 4196, Boston, 2000,
pp.337347.
30.
31.
32.
33.
34.
35.
36.
37.
19
38.
39.
40.
41.
42.
43.
44.
45.
46.
In: IEEE international conference on robotics and automation, Pasadena, CA, USA, 2008, pp.27982805.
Nemra A and Aouf N. Robust nonlinear filtering for
INS/GPS UAV localization. In: IEEE mediterranean
conference on control and automation, Ajaccio, June
2008, pp.695702.
Ronnback S. Development of an INS/GPS navigation
loop for an UAV, Masters Thesis, Lulea Tekniska
Universitet, 2000.
Shaked U and Berman N. H1 nonlinear filtering of
discrete-time process. IEEE Trans Signal Process
1995; 43: 22052209.
Petersen I and Savkin A. Robust Kalman filtering for
signals and systems with large uncertainties. Boston:
Birkhauser, 1999.
Basar T and Baernard P. H1 optimal control and
related minimax design problems: a dynamic game
approach. Boston: Birkhauser, 1991.
Einicke G and White L. Robust extended Kalman filtering. IEEE Trans Signal Process 1999; 47(9):
25962599.
Lowe DG. Distinctive image features from scaleinvariant keypoints. Int J Comput Vision 2004; 60(2):
91110.
Nemra A and Aouf N. Robust feature extraction
and correspondence for UAV map building. In: IEEE
mediterranean conference on control and automation,
2009.
Ma Y, Soatto S and Kosecka J, et al. An invitation to 3-D
vision: from images to geometric models. Interdisciplinary
Applied Mathematics. New York: Springer, 2003.