Vous êtes sur la page 1sur 1

engineering

One Eye on the World


technology

A group of Stanford scientists develops a new


algorithm for robot vision by Wenqi Shao

I
magine future expressways in the sky and on the ground probably be poor. The level of
whizzing with robots. You’ll only find this in science fiction, accuracy demonstrated by the study Credit to Ashutosh Saxena
as robots today are too clumsy to maneuver around obstacles is sufficient for a robot refreshing The automatic robot car
at high speeds because they have trouble judging depth. A its viewed images at ten frames per used to test the monocular
group of Stanford computer scientists led by Professor Andrew second and moving at 20 mph to vision algorithm.
Ng, however, could make this a reality. The team has developed a adjust its path and avoid obstacles.
novel algorithm to improve vision processing by robots. The monocular vision algorithm was implemented in an
automatic-robot car, measuring 2 feet by 2.5 feet by 1 foot, driving
The Vision Algorithm at 11 mph. At the Stanford sculpture garden, a high-density
The imaging algorithm developed by graduate students obstacle environment filled with sculptures, trees, bushes, and
Ashutosh Saxena and Andrew Sung H. Chung and Professor Ng rocks, the robot vehicle was able to self-navigate for up to one
improves upon traditional algorithms by combining the concepts minute before crashing. On a terrain with fewer obstacles, such
of monocular vision - seeing with a single eye and prior knowledge as a parking lot with trees, the robot was able to navigate with
a process of supervised learning present in humans. only camera input for approximately two to three minutes.
The robot’s “eye” is a single camera that captures a set of
images from the surrounding environment. The depth from the Seeing the Future
camera to each pixel is recorded in a database called a depthmap. While initial trials have demonstrated the success of the
Cues such as texture variations, edges, object size, and haze are monocular vision algorithm, a remaining challenge is to
used to determine the depths at individual points and the relation reduce the requirement of the extensive prior knowledge of the
between depths at different points. surroundings.
Unlike traditional algorithms, the novel algorithm relies The robot’s
heavily on stored knowledge from previously encountered images. operational
Once captured, the image is divided into smaller sections called time in a
patches. The depth of each patch is analyzed individually and random outdoor
in a global-image context. Each image patch uses information environment,
from its four neighbors at three different size scales and from without prior
its respective location in the image. The algorithm deduces the knowledge, is
image patches in the following manner: more detailed surfaces approximately
are closer; merging edges indicate further distances; smaller five seconds.
objects are farther away; and haze is used to indicate greater Thus, although
distance. Through this process, the features on the image are the robot would
used to determine 3-D depths. perform fairly
well in Palo
Testing for Robot Vision Alto or another
In an initial study, Saxena, Chung, and Ng created a depthmap familiar setting,
database using a 3-D laser scanner to collect 425 images from it would perform
a variety of environments including campus areas, forests, and poorly if placed
indoor places. This database enabled the robot to learn to judge in an unfamiliar
distances as it captured new images. environment
In a study done by Ng’s team, robots were able to judge such as the
distances in surface of Mars. Credit to Ashutosh Saxena
both indoor Ideally, images
and outdoor from the internet Depthmap Results for a varied set of environments
locations with or o u t s i d e showing original image (column 1), actual depthmaps
an average sources could be (column 2), depthmaps predicted by models (column 3).
error of 35%, downloaded to
meaning that the robot to enhance its prior knowledge.
a robot could The monocular vision algorithm is just the beginning of
determine the exciting developments in visual processing. Saxena, Chung, and
distance of an Ng hope to generalize the machine vision algorithm so that it can
object 100 feet be applied in other instruments and procedures beyond driving
Credit to Ashutosh Saxena away as if it was a robot-controlled car. Their work provides a glimpse of what the
A depthmap for an image patch, which includes between 65 and future may hold for artificial vision. S
features from its immediate neighbors, its more 135 feet away.
distant neighbors (at larger scales), and its The highest
corresponding column. degree of depth
wenQi Shao is a freshman double majoring in Math and
error occurred Computational Science and Economics. She is also an officer in
in images dominated by irregular leaves and branches. However, the Forum for American-Chinese Exchange at Stanford (FACES). In
even human performance and judgment on these images would her free time, she enjoys tennis, piano, reading, and traveling.
46 stanford scientific layout design : Wenqi Shao

Vous aimerez peut-être aussi