Vous êtes sur la page 1sur 9

Low-Cost, Visual Localization and Mapping for Consumer Robotics Paolo Pirjanian, Niklas Karlsson, Luis Goncalves*, Enrico

Di Bernardo* Evolution Robotics, Inc. and Idealab* 130 W Union Street Pasadena, CA 90098 www.evolution.com paolo@evolution.com 4/28/03

Abstract One difficult problem in robotics is localization - the ability of a mobile robot to determine its position in the environment. Roboticists around the globe have been working on finding a solution to localization for more than 20 years, however, only in the past 4-5 years have we seen some promising results. Thanks to these recent results, a robot is for the first time able to localize itself rather reliably. But these solutions work in rather structured environments and tend to fail in realistic settings. Furthermore, systems using these solutions require a 5000USD sensor, which makes the solution infeasible for many consumer and commercial applications. In this work, we describe a first-of-a-kind, breakthrough technology for localization that requires only one low-cost camera (less than 50USD) and optical wheel encoders. Because of its low cost and robust performance in realistic environments, this technology is particularly well-suited for use in consumer and commercial applications.

Challenges of consumer robotics


The challenges of robotics are plenty. Coupling these challenges with strict price and power constraints of consumer products makes them many times more challenging. Mass consumer products are typically priced in the low hundreds USD and no more than a couple of thousands USD. To leave room for profits the cost of manufacturing a product is likely to be around 30-50% of its retail price and includes the cost for packaging, marketing, and more. Leaving out packaging, marketing and other costs might leave about 20-40% of the retail price as the budget for the hardware components of a product. For example, a product which retails at 500USD will typically have a bill of materials budget of about 100-200USD. Fitting the hardware requirements of a robot into this kind of a budget is a challenge of its own. The implications of a small budget are to develop new mechanisms and technologies that work with low-cost hardware. In particular, the budget constraints require using lower-cost sensors that may not deliver the same performance as a more expensive sensor. Furthermore, a computational platform that can fit into such a budget will be significantly less capable than a general-purpose computer with, e.g., a Pentium III, 1.4GHz, sizeable RAM and disk drive. Hence, a solution to localization and mapping will have to be based not only on less capable sensors but also on less computing power.

Solving the localization problem using a 5000USD Laser Range Finder (LRF) sensor is hard enough and quite impressive. Now imagine what it would take to solve this problem with a budget of about 100-200USD. To accomplish a 2 to 3 order of magnitude cost improvement we would have to either cost-reduce the current solution or to develop an entirely new solution which does not rely on costly sensor and computation components. The former approach, even if feasible is not adequate since LRF-based localization tends to be fragile in unstructured, realistic environments, which makes it a poor candidate for most consumer robotics applications. The latter approach implies that the 5000USD should be replaced with a 10-50USD sensor and the Pentium-class board should be replaced with a low-end embedded board of about 100-150USD. Thus the new solution to localization has to be both cost efficient and with reasonably low computational requirements. Note that the available computing capacity cannot be exclusively allocated to the localization algorithm and should be shared with a number of other processes for the robots other functions. In this article, we describe an approach to localization that uses a 50USD camera, which is the retail price at low quantities. The cost of the camera can feasibly be reduced to $15 or less in large quantities and in-house manufacturing. Our vision-based approach to localization is comparable in computational burden to the existing LRF-based approaches. In the Future Work section, we outline our strategy for how to embed our localization technology onto low-cost embedded platforms.

Simultaneous Localization and Mapping


Localization is the process of a robot estimating its own position with respect to its environment. In pure localization, however, the robot must be provided an accurate map. This is clearly a constraining element because 1) the robot cannot adapt to changes in the environment, e.g., after reorganizing furniture or remodeling, and 2) manually building a map is a labor-intensive task, which may require specific expertise and hence add to installation cost. Ideally, consumer products should be easily deployable and require no or very limited installation by skilled labor. In order to reach adaptability and a higher degree of autonomy the robot must build a map using its sensors, and then to use this map to find out where it is. This is known as Simultaneous Localization and Mapping, or SLAM for short. SLAM is largely considered one of the most difficult problems in robotics, and progress in the SLAM research is bound to greatly impact the field of robotics. SLAM is a fundamental primitive for many applications of autonomous mobile robots. Once the robot is able to build a map of its surroundings and determine its own position, then it will be able to navigate to desired locations in previously unknown environments for performing its tasks. For instance, a delivery robot will be able to go on a round to deliver packages or a home robot can go to the kitchen to fetch a drink. Only within the last 4-5 years has the research community been able to come up with promising results in the search for solutions to the SLAM problem. Partially, these solutions have been enabled by the availability of better sensors, in particular the SICK

Laser Range Finder. However, the SICK LRF does not deserve the entire credit for the success of these promising results. The other major reason for this success is because of a class of probabilistic techniques that have proven suitable to solving SLAM. A comprehensive overview of the state of the art in SLAM techniques is given in [3] and [4]. The main idea behind probabilistic approaches to SLAM is to represent information using probability density functions. Sensor measurements and motion models for the robot are uncertain due to inaccuracies, noise and stochastic processes caused by the robots hardware and their interactions with unstructured, unknown environments. Probability density functions are suitable for representing the uncertainty in the measurement model as well as the robot motion model. A sensor measurement is represented using a probability distribution over what might be the true value of the measured quantity. For example, the robots pose estimate can be represented using a density function, which captures the uncertainty caused by, e.g., wheel slippage and other unpredictable processes. As new measurements are acquired, they are incorporated in the density function and if the measurements are consistent, then the density function accumulates more and more evidence about the robots true pose with a high confidence/probability. Most probabilistic methods fundamentally use Bayesian Estimation, which addresses the problem of estimating the state of some dynamic system recursively from uncertain measurements. Suppose the state s is the robot pose, and let p(s) represent the density function of s at some point in time. If a measurement l is acquired with a density p(l), then an improved density p(s) of s can be computed by integrating p(s|l)p(l) over all possible ls. This, of course, requires knowledge of the conditional probability function p(s|l). Think about p(s|l) as the likelihood that the robot is located at s given the sensor has detected l in the environment. For example, what is the likelihood that the robot is in the living room given it sees a TV? See [3] for more details. Some successful approaches to probabilistic localization include particle filters and multi-hypothesis tracking. In particle filtering, the above density function p(s) is approximated by a number of samples or particles, instead of by a continuous analytical function. Furthermore, the integration of p(s|l)p(l) over all ls is replaced by an importance sampling [5]. A nice behavior of particle filters is that any continuous, discontinuous, and multi modal density function can be approximated with an appropriate number of particles. A particle filter combined with probabilistic models of robot motion and perception is also called a Monte Carlo Localization (MCL) algorithm. See for example [6] for an application of particle filters to the localization of mobile robots. Multi-hypothesis tracking typically refers to the approach where a number of Kalman filters are used to track several likely hypothesis of robot pose. For example, if a TV is detected, then you can be fairly confident (since it might be a false detection of the TV, you cannot be completely certain though) that the robot is in a room where there is a TV. If you have a TV both in the living room and in the bedroom, it is not yet possible to determine in which of those two rooms the robot is located. Hence, in multi-hypothesis tracking one Kalman filter will track the robot pose assuming it is in the living room,

while a second hypothesis will assume the robot is in the bedroom. As more measurement data is acquired, it typically leads to enough evidence to reject all but one of the hypotheses [1]. State of the art SLAM techniques developed using such probabilistic approaches use costly range sensors such as a ring of ultrasonic sensors and/or laser range finders. These techniques rely on getting measurements from reliable structure in the environment such as walls. Such techniques tend to fail in open areas that do not have walls within the range of the sensors or in environments where the wall is not visible because of clutter such as chairs and other furniture. The range-based SLAM techniques work best in static environments where there are little or no moving objects and/or people. However, these techniques can deal with some change in the environment. In summary, range-based SLAM techniques are suited for application areas that are not strictly cost constrained and in semi structured environments. One approach to moving away from costly range sensors is to use cameras for localization and mapping. Recent results reported on vision-based localization [2], however, rely on a known map and are constrained to semi-structured environments. It is also unclear how these vision-based approaches deal with changes in the environment.

Visual SLAM
Visual SLAM (vSLAM) provides a first-of-a-kind, cost-efficient solution to localization and mapping that is a major breakthrough for robotics in general and consumer robotics in particular. By using a low-cost camera and optical encoders as the only sensors this solution can reduce the cost of localization by 2 to 3 orders of magnitude compared to existing range-based techniques. The breakthrough that vSLAM provides consist not only of a major cost improvement but also of a significant improvement in performance. Since it relies on visual information rather than range measurements it can be used in cluttered environments such as furnished homes and office buildings, where range-based SLAM techniques tend to fail. Furthermore, our vSLAM algorithm is highly robust to dynamic changes in the environment caused by lighting changes, moving objects and/or people. By combining reliable performance with low-cost requirements vSLAM is an important and enabling, breakthrough technology for consumer robotics.

Figure 1 (Left) The view seen from the robot's current position. Note how the image is blurred due to the robots motion. (Right) Image from a landmark that is matched with the robots current view.

This novel approach to localization and mapping builds a visual map, which consists of a set of unique landmarks created by the robot along its path. A landmark consists of an image of the scene along with the robots position to indicate the position estimate of the landmark. The ith landmark is described by a tuple Li = <Ii, Si>, where Ii is the image of the landmark and Si is the position estimate of the landmark. A map, m, consists of a set of k landmarks, m = {L1, L2, , Lk}. In an unknown environment, the robot automatically generates new landmarks, which are maintained in a database. Initially, the position estimates of the landmarks are based on wheel odometry alone. As the robot revisits mapped areas, it uses known landmarks to estimate an improved robot pose, which is then used to improve the landmark position estimates. By using the landmarks for robot pose estimation vSLAM can bound the error introduced by odometry drift. vSLAM continuously compares the current image, seen from the robot, with the images of the database to find matching landmarks (see Figure 1). Such a match is used to update the pose of the robot according to the relative position of the matching landmark. Furthermore, it updates the position of the matched landmark using the robots current pose estimate. By continuously updating the position of landmarks based on new data, vSLAM incrementally improves the map by calculating more accurate estimates for the position of the landmarks. An improved map results in more accurate robot pose estimates. Better pose estimates contribute to better estimates for the landmark positions and so on. So vSLAM continuously improves the map and the robots pose estimates based on newly acquired data. Our experimental results from various sites show an accuracy of about 10cm in x and y position estimates and an accuracy of about 5 degrees in absolute heading estimates. When the robot does not see or detect any known landmarks then it takes two specific actions. First it updates the robots pose estimate using odometery measurements relative to the last position estimate. Wheel odometry can be quite accurate for short traversals but can introduce unbound errors for longer travel distances. When the odometry error

grows large then vSLAM updates the map with a new landmark. In this way, vSLAM gradually builds a denser map but not denser than necessary since landmarks are only created when no others are found. As a result, vSLAM is very adaptive to changes in its environment. I.e., if the environment changes so much that the robot no longer recognizes previous landmarks then it automatically updates the map with new ones. Outdated landmarks that are no longer recognized can easily be deleted from the map by simply keeping track of if they were seen/matched when expected.

Figure 2 The figure shows the path and the visual map generated by vSLAM in a 2 bedroom apartment. The dotted yellow path represents the actual path that the robot traversed. The red path is the one generated by vSLAM. The green path is the one generated by wheel odometry. The blue circles are the position of the landmarks generated by vSLAM. They layout of the apartment was manually superimposed on the visual map as a point of reference. This layout was not generated by vSLAM.

Figure 2 shows the results of vSLAM from a run in a two bedroom apartment. The robot was driven around along the dashed reference path (this path is unknown to the SLAM algorithm). The vSLAM algorithm builds a map consisting of landmarks marked with blue circles in the figure. The corrected robot path, which uses a combination of visual landmarks and odometry, provides a robust and accurate position determination for the robot as seen by the red path in the figure. Clearly, the green path (odometry only) is incorrect since, according to this path, the robot seems to be traversing through walls and furniture. The red path (the vSLAM corrected path), on the other hand, is consistently following the reference path. A disadvantage of vSLAM is that it does not produce a 2D floor map of the environment as traditional approaches do. 2D Cartesian maps produce a nice layout of the building

and are easy to interpret by most users and provide a good user interface. The maps that vSLAM produces consist of the landmarks (see figure xx) and may not present information about the topology of the environment. Two drawbacks of this map representation are 1) it is difficult to interpret by users and hence provides a poor user interface and 2) it is not suitable for most path-planning algorithms. Most path-planning algorithms use a 2D representation of the environment and generate a shortest or safest path from the robots current position to a destination point. It is conceivable, however, to envision a useful map generated by augmenting vSLAMs visual map with thumbnails that show pictures of locations such as the kitchen, etc. This kind of a map can potentially be more powerful than a plain 2D map of obstacles. Additionally, it would be rather trivial to augment vSLAMs visual map with range measurements. Since vSLAM already provides accurate robot pose estimates it can easily support integration of range measurements obtained by the robot. Another disadvantage of not having a Cartesian map is that path-planning cannot be performed using conventional algorithms such as potential field methods, Voronoi maps, etc. However, the landmarks in vSLAMs map provide rich topological information about the connectivity of adjacent landmarks with free-space in between. The information encoded in this topological representation can be used for topological pathplanning in a straightforward manner. This assumes that we have a local obstacle avoidance module, which will handle local obstacles not represented in the topological map.

Conclusions
The vSLAM algorithm described in this work provides a cost-efficient solution to localization and mapping, which is an enabling, breakthrough for robotics and in particular for consumer robotics. The cost improvement is provided by utilizing a 50USD camera compared to a 5000USD laser range finder. We do acknowledge that the rangebased approaches to SLAM can use other range devices such as ultrasonic sensors that are much less expensive compared to a laser range finder. However, each sonar device costs in the order of 15-25USD and these approaches rely on using a ring of 16 or more sonars, which will bring the cost of sonar-based SLAM to about 250-400USD. Furthermore, the results accomplished by sonar-based SLAM are much less significant in terms of robustness and overall performance compared to SLAM using laser ranger finders. Due to the richness of the information contained in images compared to range data, vSLAM is likely to generalize to more complex environments than achievable with range-based SLAM. On the other hand, vSLAM will not work in environments which have no or very few visual features. Such environments could consist of plain/white corridors and the likes. This will, however, become a problem only if the corridors are so long that the robot can no longer rely on odometry until it sees a scene with features to correct its drift. It is likely that in most environments vSLAM will find adequate visual features generated by objects such as a picture on the walls, furniture, decorative items, doorway, etc.

Without completing any exhaustive comparison between the competitive methods our experimental results show that vSLAM produces much more reliable performance compared to laser-based SLAM techniques let alone sonar-based SLAM. These claims must be supported by a comparative study. At this point, however, such a study will only be of academic value since vSLAM is the only approach that provides a cost-effective solution that is acceptable for most consumer robotics products. Future improvements in range-based SLAM and sensor technology may provide a more superior and more costeffective solution. However, it is hard to conceive of an alternative sensor for SLAM that is less costly than cameras based on CMOS or CCD imaging technologies.

Future Work
Although vSLAM provides a breakthrough cost reduction in sensor requirements for performing localization, it does not address the issue of low-end, low-cost processing constraints. Due to cost and power constraints, consumer robotics products cannot afford the expense of high-end, Pentium class onboard computing. Since vSLAM is based on image processing it is rather computation intensive. To open the door for lower-cost applications, we are planning to optimize the algorithms to run on low-end, embedded computing platforms that may only have integer unit processing and significantly lower computation power. Vision is a versatile and powerful sensor which can enable many useful capabilities for robotics and other products. In particular, vision is a key to consumer robotics applications. Hence one important challenge for the robotics community is to develop technologies for affordable processing of visual data using low-cost, low-power computing platforms.

Acknowledgements
The work described in this article has been conducted at Evolution Robotics, Inc. and represent proprietary intellectual property of the company. The authors of this article would like to thank all the employees and advisors of Evolution Robotics, Inc. and Idealab! Inc. for their support. In particular, we would like to thank and acknowledge the contributions of X-Paul and Alberick.

Reference
1. Roumeliotis, S.I. and Bekey, G.A. Bayseian estimation and Kalman filtering: A unified framework for mobile robot localization. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 2985-2992, San Francisco, CA, 2000. 2. Wolf, J., Burgard, W. and Burkhardt, H. "Robust Vision-based Localization for Mobile Robots Using an Image Retrieval System Based on Invariant Features, Proceedings of the 2002 IEEE Int. Conf. on Robotics and Automation, Washington, DC, USA, May, 2002. P.359-363.

3. Thrun, S. Probabilistic algorithms in robotics. Technical Report, CMU-CS-00126, Carnegie Mellon University, Pittsburgh, PA, USA, April, 2000. 4. Thrun, S. Mapping: A Survey. Technical Report, CMU-CS-02-111, Carnegie Mellon University, Pittsburgh, PA, USA, February, 2000. 5. Fox, D., Thrun, S., Burgard, W., and Dellaert, F. Particle filters for mobile robot localization. From: Doucet, A., Freitas, N., and Gordon, N., Sequential Monte Carlo methods in practice. Springer Verlag, New York, 2001, P. 401-428. 6. Fox, D., Burgard, W., and Thrun, S. Markov Localization for mobile robots in dynamic environments. Journal of Artificial Intelligence Research 11, 1999, P. 391427.

Vous aimerez peut-être aussi