119 vues

Transféré par sharukj khan

report on obstacle detection....

- Kinema Tic 1
- Funciones Biblioteca MatLab
- Referencia a Partir de Fotos
- Kyto_MethodForMeasuringStereoCameraDepthAccuracy
- Exercise
- Psychtoolbox3
- curve fitting in Matlab
- 20. Eng-SSCE Secure Communication Based on a Chaotic-P.karthik
- Strategy Insight 2013_0517 - 17 May 2013
- Hout Perception
- lec4.pdf
- Airborne LIDAR and GIS Tools in Transmission Line Re-Rating Projects
- Function Varargout
- A Survey of Machine Vision Techniques for Fruit Sorting and Grading
- controls mannual.docx
- MATLAB Desktop Keyboard Shortcuts
- mathlab7.pdf
- ex1
- Read Elapsed Time From Stopwatch - MATLAB Toc - MathWorks India
- 06147421

Vous êtes sur la page 1sur 56

Abstract

Perceiving the surroundings accurately and quickly is one of the most essential and

challenging tasks for autonomous systems such as self-driving cars. The most common sensing

systems such as RADAR and LIDAR that are used for perception on self-driving cars today give

a full 360 view to the car making it more informed about the environment than a human driver.

This article presents a methodology to employ two 360 cameras to perceive obstacles all around

the autonomous vehicle using stereo vision. Using vertical, rather than horizontal, camera

displacement allows the computation of depth information in all viewing directions, except

zenith and nadir which have the least useful information about the obstacles. The Key idea for

obstacle detection is to classify points in the 3D space based on height, width and traversable

slope relative to the neighboring points. The detected obstacle points can be mapped onto

convenient projection planes for motion planning.

CHAPTER-1

INTRODUCTION

Computer vision is one of the toughest problems in Artificial Intelligence (AI) that has been

challenging researchers and engineers for decades. This problem of extracting useful information

from images and videos finds application in a variety of fields such as robotics, remote sensing,

virtual reality, industrial automation, etc. The concept of making cars drive by themselves has

gained immense popularity today in the AI world, mainly motivated by the number of accidents

that occur due to driver errors/ negligence. Autonomous vehicles are designed to sense their

surroundings with techniques such as RADAR, LIDAR, GPS and computer vision. This array of

sensors working coherently to observe and record the surroundings constitute the perception

module of the vehicle. The next stage in the pipeline is the localization step which stitches

together the incomplete and disconnected information obtained from the sensors to identify the

position, velocity and other states of the vehicle and the obstacles (including dynamic obstacles).

The final module is the planning stage where the vehicle has to decide what it has to do given the

situation it is in. The present day research prototypes built by major players in the industry/

academia have LIDAR and RADAR as their primary perception systems. They generally provide

a very accurate full 360 view to the vehicle making it more informed about the environment than

a normal human driver. The downside to these systems is the cost involved in deploying them.

So an option is to use cameras and computer vision techniques to substitute these systems.

It has been shown in several cases that stereoscopic vision can be applied to extract useful

information about the surroundings that could assist the navigation of mobile robots [1], [2], [3].

But in most cases the field of view is limited to just the front of the robot. It is very critical that

the vision system we are targeting doesnt compromise on obtaining the 360 view provided by

LIDAR / RADAR systems. This can be achieved by employing special cameras that capture 360

by 180 image of a scene.

RELATED WORK

It is very essential for an autonomous vehicle to accurately and reliably perceive and

discriminate obstacles in the environment. To this end, many approaches have been presented for

different application areas and scenarios in past years using stereo vision or 2D/3D sensor

technologies. Each obstacle detection system is focused on a specific tessellation or clustering

strategy, hence they have been categorized into 4 main models [4]: (i) probabilistic occupancy

map, (ii) digital elevation map, (iii) scene flow segmentation and (iv) geometry-based clusters.

In probabilistic occupancy maps, the world is represented as a rigid grid of cells containing a

random variable whose outcome can be free, occupied, or undefined (not mapped) [5], [6]. The

goal here is to compute the associated joint distribution depending on a set of measurements

carried out on a certain discrete set of time moments. Digital elevation maps (DEM) is one of the

algorithms that try to detect obstacles relying on the fact that they protrude up from a dominant

ground surface. The obstacle detection algorithm proposed in [7] marks DEM cells as road or

obstacles, using the density of 3D points as a criterion. It also involves fitting a surface model to

the road surface. The scene flow segmentation or otherwise called as optical flow utilizes

temporal correlation between different frames of a scene captured by stereo cameras to classify

obstacles that are in motion [8], [9], [10]. This method thus naturally handles tracking dynamic

obstacles. Finally the geometry-based clustering involves classification based on geometric

structure of point clouds in the 3D space. The obstacle detection algorithm that will best suit this

category is [11] which is based on a search method that clusters points using a double cone

model. This algorithm became the basis for the obstacle detection module that went on the

The approach we present in this report is inspired by the method in [11]. Given our objective

of obtaining a 360 view, we present a few modifications to this algorithm that will use 360 stereo

pair to obtain obstacle information. The remainder of this report is structured as follows. Section

3 talks about the equipment setup that would best suit for this application. Section 4 describes the

disparity and depth estimation technique used. Section 5 talks about the pixel-level

implementation details of the obstacle detection algorithm. Section 6 discusses the experimental

results on a few test scenes. Section 7 outlines possible next steps from this project.

Reliably and accurately detecting obstacles is one of the core problems that need to be solved

to enable autonomous navigation for robots and vehicles. For many use cases such as micro

aerial vehicles (MAVs) or self-driving cars, obstacle detection approaches need to run in (near)

real-time so that evasive actions can be performed. At the same time, solutions to the obstacle

detection problem are often restricted by the type of vehicle and the available resources. For

example, a MAV has restricted computational capabilities and can carry only a certain payload

while car manufacturers are interested in using sensors already built into series vehicles in order

to keep self-driving cars affordable.

There essentially exist two approaches for obstacle de-tection. Active methods use sensors

such as laser scanners, time-of-flight, structured light or ultrasound to search for obstacles. In

contrast, passive methods try to detect obstacles based on passive measurements of the scene,

e.g., in camera images. They have the advantage that they work over a wide range of weather and

lighting conditions, offer a high resolution, and that cameras are cheap. At the same time, a wide

field of view can be covered using for example fisheye cameras. In this paper, we therefore

present an obstacle detection system for static objects based on camera images. We use stereo

vision techniques [1], [2], [3] to obtain a 3D model of the scene. Our main motivation is to

enable self-driving cars to detect static objects such as parked cars and signposts, determine the

amount of free space around them, and measure the distance between obstacles, e.g., to

determine the size of an empty parking spot. We therefore detect obstacles as objects obtruding

from the ground [4]. Most existing stereo vision-based techniques rely on classical forward

facing binocular stereo cameras with a relatively narrow field of view and visual (inertial)

odometry (VIO) systems to provide accurate vehicle poses. These systems are mainly targeted

for detecting objects which are in front of the car and are therefore used in standard on road

forward driving situations. For many maneuvers such as parking into a parking spot or

navigating in a narrow parking garage, a full surround view is very important. We show that for

such situations accurate obstacle detections can be obtained from a system that uses only

monocular fisheye cameras and the less accurate poses provided from the wheel odometry of the

car, if the noisy individual detections are properly fused over time. The resulting system does not

require complex VIO systems, but simply exploits information already available while running in

real-time on our test vehicle, we thus avoid any unnecessary delay a VIO system might

introduce.

This paper makes the following contributions: We describe the overall obstacle detection

system and explain each part in detail, highlighting the rationale behind our design decisions. We

demonstrate experimentally that highly precise vehicle poses are not required for accurate

obstacle detection and show that a proper fusion of individual measurements can compensate for

pose errors. Self-driving cars are currently a very active field of research and we believe that the

proposed system and our results will be of interest to a significant part of researchers working in

this field. To our knowledge, ours is the first system that uses monocular fisheye cameras and

only relies on the wheel odometry.

The paper is structured as follows: The remainder of this section discusses related work. Sec. II

provides an overview over both our vehicle setup and our obstacle detection system. Sec. III

explains the computation of the depth maps. Obstacle extraction is described in Sec. IV, while

Sec. V details how to fuse detections from multiple depth maps. Sec. VI experimentally

evaluates the proposed method.

Fig. 1. Overview over the obstacle detection system proposed in this paper.

height map [4], [9], [13], or a volumetric data structure [14] into which the depth measurements

are fused. Occupancy grids are probably the most popular scene representation as they not only

provide information about the positions of the occupied space but also about free space [15].

In the case of ground-bound vehicles, e.g., cars moving on planar surfaces, obstacles

correspond to objects obtrud-ing from the ground: [9] compute a height profile from stereo

measurements and subsequently estimate for each 2D occupancy grid cell whether it corresponds

to free space or occupied space. While [9] use a probabilistic model, [16] employ dynamic

programming on the grid cells to determine free space. Given the free space, stixels can be used

to compactly represent obstacles as boxes standing on the ground [4]. Overhanging structures

and archways can be handled by allowing multiple height map layers [13].

In this paper, we follow a similar approach and model ob-stacles as objects extruding from the

ground. While existing work requires high-precision sensors [16] or visual odometry methods to

obtain precise vehicle poses before fusing the depth data [9], [17], we show that using the less

accurate wheel odometry readily available in every car is sufficient.

The disadvantage of most stereo cameras is their limited field of view (FOV). Thus, car

manufacturers are beginning to additionally integrate wide FOV [7] or fisheye cameras [18] to

cover the entire scene surrounding the vehicle. The extremely wide FOV of fisheye cameras

enables them to also observe objects close to the vehicle, i.e., they are well-suited for obstacle

detections. In this paper, we thus use such cameras and show experimentally that we are able to

accurately detect obstacles through motion stereo. [19] use motion stereo, based on very accurate

poses from GPS/INS measurements, to generate large-scale urban reconstructions by estimating

the 3D geometry of the scene. In contrast, our approach determines which parts of the

environment are occupied by obstacles and which are free space.

As illustrated in Fig. 1, our obstacle detection framework consists of three main stages. First,

we extract a depth map for each camera mounted on the car using multi-view stereo matching on

a sequence of camera frames recorded by the moving car. The camera poses required for this step

are directly obtained from the wheel odometry and the extrinsic calibration of the cameras. No

visual odometry system is employed to refine the poses. The depth maps provide a 3D

reconstruction of the surroundings of the car but do not offer any information about which

structures are obstacles that need to be avoided and which parts correspond to free space that the

car can move through. In the second stage, we thus detect and extract both obstacles and free

space for each individual depth map. Since obstacles are objects protruding from the ground,

obstacle detection is performed in 2D.

One of the shortcomings of using wheel odometry is that the rotation of the wheels is only

discretely sampled, which leads to a slight oscillation around the true traveled distance. This is

not a problem in terms of determining the position of the car but leads to uncertainty for the

estimated depth maps due to a slightly inaccurate baseline in the stereo matching. Therefore, the

third stage fuses the obstacle detections over several camera frames to obtain a more accurate

estimation of the occupied space around the car. Depending on the use case, the fusion can be

done independently per camera or as a single fusion that combines data from multiple cameras.

We integrated our obstacle detection system into two VW Golf VI cars that are equipped with a

system of four fisheye cameras mounted on the car with minimally overlapping FOVs. The

cameras are installed in the side mirrors as well as in the front and back of the car and each have

a 185 FOV, jointly covering the whole field of view around the car. They record at a frame rate

of 12.5Hz and are triggered syn-chronously. Besides the cameras, we also utilize the wheel

odometry of the car, which provides a 3 degrees-of-freedom pose of the car on the ground plane

by measuring the rotation of the wheels. The cameras are calibrated with respect to the odometry

frame using a simultaneous localization and mapping-based calibration pipeline [20]. The cars

are the test platforms of the V-Charge project [18], whose aims are fully automated navigation

and parking. As such they are equipped with other environment perception sensors which are not

used for our system. However, since the output of our method is simply a set of occupied and

free space estimations, our results can easily be fused into a combined obstacle map together

with similar data from other sensors. All used sensors are either already built into todays cars or

close-to-market. For the processing of the sensor data and to perform the navigational tasks

required for automated driving, the cars have a cluster of 6 PCs installed in the trunk.

Fig. 2. Results of our depth map computation procedure. (Left) image of the reference view, the

depth maps computed (middle) without ground direction plane-sweep and (right) the complete

depth map computation procedure including the ground directions. The depth maps are coloured

with respect to height above ground instead of depth (color pattern repeating every 0:5 meters),

to illustrate the quality of the ground. Without the ground direction plane-sweep, the ground is

reconstructed rather bumpy (cf. middle). In contrast, our complete procedure obtains a much

smoother reconstruction. motion blur due to the close proximity to the car. To reduce noise on

the areas where the ground plane is visible in the images, we introduce a two stage approach

which first checks if the ground plane matches sufficiently well in the images and only

reconstructs other structures if a matching patch is unlikely to belong to the ground plane.

Standard binocular stereo matching takes two stereo rec-tified images as an input and

computes a depth map for one of the two images. In our case, images are recorded with a

continuous frame rate. This means we have the option to match more than one image to a single

reference image to increase the quality of the computed depth maps by increasing the baseline.

For general camera configurations, it is not possible to stereo rectify a set of more than two

images. To avoid this problem, plane-sweeping stereo [1] is usually used since it does not require

any rectification. The main idea of plane-sweeping is to match a set of images to a reference

image by projecting them onto a plane hypothesis and then back to the reference image. The

images warped through this procedure and the reference image will then be compared using an

image dissimilarity measure, which is evaluated over a small matching window. If the tested

plane hypothesis is close to the true depth of a pixel in a reference image, the corresponding

dissimilarity value will be low. Testing many plane hypotheses and taking the depth induced by

the best matching plane for each pixel then produces a depth map for the reference image.

Originally, plane-sweeping has been proposed for pinhole images, where the images are warped

into the reference view through a planar homography. In order to cover a wider field of view

with each camera, and thus enable obstacle detection around the car, our setup uses fish eye

cameras. While this increases the complexity of the warping process slightly, depth maps can

still be computed in real-time on a graphics processing unit (GPU) [3].

If the normal direction of the plane hypothesis is not well aligned with the actual surface

direction, the warped image will be locally distorted with respect to the reference image, which

increases the dissimilarity score even for matching patches. This can be overcome by aligning

the sweeping plane directions with the predominant directions in the scene [2]. We therefore

sweep planes in two directions; Since we are interested in detecting obstacles on the ground, one

sweep direction uses planes parallel to the ground plane while the others are fronto-parallel to the

camera. The extrinsic camera calibration, which gives us the height of the cameras with respect

to the surface the car drives on, provides a good initial estimate of the ground plane. Hence, we

only need to test very few planes for the ground direction; we used 10 in our experiments. For

the fronto-parallel sweep, we space the planes inversely proportional to the depth, resulting in an

even sampling in disparity space. For each pixel in the reference image and each plane, we

obtain one dissimilarity score from each image that is matched against the reference image. To

obtain one single cost per plane for each pixel in the reference image, we simply average these

dissimilarity scores. Notice, this means we do no utilize any occlusion handling (c.f. [3]).

Occlusion handling could slightly improve the quality of the depth maps around occlusion

boundaries but this small gain comes with a set of disadvantages. Firstly, occlusion handling

slows down the depth map computation process. Secondly, doing occlusion handling the way it

was initially proposed in [21] would mean that we would compute the depth map for at least one

frame before the newest one available. This would lead to an unnecessary delay of the data. As

we could not observe any significant gains with it and since it comes with the aforementioned

problems, we refrained from using occlusion handling in our experiments. The depth map is

finally extracted as the depth of the plane with minimal cost, utilizing a winner takes all strategy.

The last step of our depth map computation procedure is to combine the potentially incomplete

depth map computed for each sweeping direction. For this, we need to define a bit more formally

the output of the plane-sweeping algorithm. For the sweeps in the ground and fronto-parallel

direction, we obtain a depth map Zg(x; y) and Zf (x; y), respectively. The computation of the

depth maps provides for each pixel the uniqueness of this match. Let C ;2(x; y) be the cost of the

second-best matching plane for pixel (x; y). The uniqueness ratio is then defined as the ratio C

(x; y)=C ;2(x; y). Both of these additional values can be used to filter the depth maps: A low

matching cost implies that the image patches match well. A large uniqueness ratio indicates

ambiguities, e.g., uniformly textured regions or repeating patterns will result in a ratio close to 1.

We use two sets of thresholds for the two sweeping direction T Cg , TUg and TCf , TUf , respectively.

Let

dissimilarity cost and the uniqueness ratio for the ground-parallel and fronto-parallel sweeps,

respectively. The two depth maps are merged as

uses matches on the ground plane. In our experiments, we found this behavior to be desirable as

it leads to a smoother reconstruction of the ground which is important to avoid false positive

obstacle detections.

For our experiments, we use zero mean normalized cross correlation (ZNCC) scores over a 9 9

pixel window for measuring the image similarity. The matching cost C (x; y) are computed as the

negative ZNCC score normalized to the interval [0; 1], i.e., a matching cost of 1 corresponds to a

ZNCC score of 1 and a matching cost of 0 corresponds to a ZNCC score of 1. There are 10

planes used for sweeps parallel to the ground and 50 planes for the fronto-parallel direcion. The

filtering thresholds are set to TUg := 0:9925, TUf := 0:98, TCg := 0:18 and TCf := 0:17. The depth

maps are computed on a resolution of 640 400 pixels and we match two images against the

reference image, meaning 3 images contribute to every depth map. Fig 2 shows the results of our

depth maps computation procedure. The benefit of the additional ground direction sweeping is

mostly visible in the form of a smoother ground plane.

The output of the previous section is a depth map for each camera. In this section, we describe

how to extract obstacles from a single depth map. Following [4], [9], we make the simplifying

assumption that the car is moving on a plane, i.e., every object not lying on the plane is a

potential obstacle, and perform obstacle detection in 2D.

Obstacle detection consists of three steps. Given a depth map, we first create a 2D occupancy

grid that encodes which cells contain depth measurements. From this grid, we then extract

obstacles from the occupied cells and refine their positions to avoid discretization artifacts.

Finally, we estimate the uncertainty of each detection, which will be used in Sec. V to fuse

detections from multiple depth maps.

We define the local coordinate frame of the car to coincide with the vehicle odometry frame. It

has its origin on the ground plane at the point where the middle of the rear axle orthogonally

projects to the ground plane. The x-axis is pointing towards the front of the car, the y-axis to the

left and the z-axis is pointing vertically away from the ground (cf. Fig. 3). Due to this choice of

coordinate system, we can directly infer the position of the ground for each camera from the

known extrinsic calibration of the cars camera system.

For each camera, we define a separate occupancy grid that covers the relevant part of the area

observed by the camera.

Fig. 3. Illustration of the coordinate systems used in the paper. The x-,

y-, and z-axes are colored in red, green, and blue, respectively

The grid is placed in the ground plane, i.e., coincides with the x-y plane of the vehicle frame. The

cameras are mounted such that their x-axes are parallel to the ground. Thus, the x-and y-axes of

the occupancy grid are chosen to be parallel to the projections of the x- and z-axes of the camera

onto the ground plane. As illustrated in Fig. 3, the origin of the occupancy grid is the projection

of the camera center codo, expressed in the vehicle frame, to the x-y plane. We express

coordinates in the occupancy map by an (angle, disparity) pair (d; ), where disparity d = 1=y grid

corresponds to the inverse depth of the measurement.

We use disparities instead of the original y-coordinates since the depth maps are already

computed based on dispari-ties by spacing the planes inversely-proportional to the depth. As a

consequence, the occupancy grid has a higher resolution very close to the camera, which leads to

an unnecessary high grid resolution and thus memory and time consumption. In order to

guarantee a minimal cell size in depth direction we virtually shift the grid by y shift. The definition

of the final occupancy grid coordinates (dgrid; ) can now be stated in terms of the Cartesian

grid

The occupancy grid coordinates (dgrid; grid) are only de-fined for Cartesian coordinates with ygrid >

0. Notice that the resulting grid has the shape of an isosceles trapezoid, similar to the polar grid

in [16].

odo

Let Rcam 2 R3 3

and odo

tcam 2 R3 1

be the rotation and translation that transform a depth

measurement xcam from the local camera coordinate system into the vehicle frame, i.e., x odo =

odo odo

Rcamxcam + tcam. As shown in Fig. 3, the corresponding grid cell is obtained by projecting x odo

codo onto the x-y plane of the grid frame, computing the angle and disparity, and performing the

shift by yshift.

For each grid cell (d; ), we store the number F (d; ) of points from the depth map that vote for

free space in this specific cell, the number O(d; ) of points that vote for occupied space, and

additionally the average disparity dgrid of the points voting for occupied space within a given cell.

If the vehicle frame position xodo of a depth measurement is close to (or below in case of errors in

the depth map) the ground, it votes for free space in its corresponding cell. If x odo lies above the

ground plane up to a maximum height, it votes for occupied space in its cell and the

corresponding average disparity value is updated. The maximum height

Fig. 4. Results of our obstacle detection approach: (Top row) Two input images. (Bottom row)

Obstacles detected in three different frames, projected to the ground plane. The red dots denote

the obstacle position and the cyan lines indicate the uncertainty area along the ray. The last image

contains all obstacles extracted in an interval of 1 seconds around the corresponding frame and

shows that simply adding individual detections produces significant noise.

each grid cell, we aim to determine whether it corresponds to free space or is occupied, i.e.,

contains an obstacle. In robotics, this is often achieved in terms of an occupancy grid using an

inverse sensor model [10]. We follow a different approach which originates from the fusion of

laser point clouds [24] and has since then been used successfully in many computer vision

systems [25], [26], [27]. Its main idea is to follow each ray originating from a camera center until

it hits the first obstacle. For all grid cells traversed by the ray, a negative weight is added to

denote free space. At the measurement, the ray enters the object and hence the space behind the

measurement should be occupied, which is denoted by adding a positive weight. Since we do not

know the thickness of each object, we only enter positive weights for a small region behind the

obstacle. After adding the measurements for all cameras that should be considered, a cell having

a negative accumulated weight is then considered as free space and a cell with a positive weight

as occupied space. For cells that have a weight of exactly zero, the space is considered as

unobserved.

For real-time processing, we need to limit the size of the grid. We achieve this by only

updating a 10 10 meter area which is placed such that it covers the area observed in the current

camera frame. Which part of the fusion grid needs to be updated is determined based on the

wheel odometry reading. For each cell inside the viewing area of the current camera frame, we

determine the obstacle which lies on the ray from the camera center to the grid cell. The weight

added to the current grid cell is defined by comparing the distance l obst of the obstacle to the

camera with the distance lcell between the cell and the camera. As explained in Sec. IV-B, there

are two types of obstacles, conventional obstacles and the free space obstacles placed at the

end of the definitively visible space along a ray. Consequently, two different types of weights are

used. The free space weight wf is used for both types while the obstacle weight wo is only

employed for the standard obstacles. For both types of obstacles, an

Fig. 6. Fusion results for an indoor sequence. Reflections on the ground cause erroneous

detections while backlight complicates precise depth esti-mation. Still, our system is able to

outline the positions of the obstacles.

uncertainty interval (lobst u1; lobst + u2) along the ray is given and we distribute the weight of the

obstacle along the interval. Thus, obstacles that are measured with low accuracy will only

contribute little weight to each grid cell.

The free space weight for the cells along the ray is

where the constant k 0 is added only to cells in front of the obstacle. The obstacle weight w o

should only be non-zero in the uncertainty regions and is thus defined as

Additionally, we impose a minimal uncertainty area to make sure that the weight is spread at

least into a few cells in the grid. We also discard measurements which have a very large

uncertainty interval of more than 4 meters.

All grid cells are initialized with weight zero and are updated whenever new obstacle

detections are available by adding the respective weight to the grid cells. The update is

constantly performed in a separate compute thread con-current to depth map generation and

obstacle detection. The area of the grid that is updated is set to 400 400 cells, resulting in a 2:5cm

resolution for the 10m 10m area. The constant for free space is set to k = 4. Figures 5 and 6

depict results of the fusion approach, where free space is colored green, occupied space is red,

and unobserved space is black.

Car accidents are the key problem of todays traffic world. Particularly this strongly felt in

urban areas where cars density is largest. According to Lithuanian 2011 year car accidents

statistic most common type of car accidents are collisions. It represents 44% of total car

accidents. Accidents happened with pedestrians and obstacles also could be assigning to similar

car accidents type. By summing of everything, we are getting 82% of total accidents [1]. Most of

car accidents happened due to human factor: somnolence, inattention, absence of mind and slow

reaction. All these factors are critical for a driving safety. Using stereo vision system is a way to

reduce number of car accidence. Stereo vision based traffic observation systems are in active

development period. Scientist and many famous car manufacturers are developing durable,

accurate and real time car driver assistance systems. Many of works done by researching and

developing vision system for lane tracking [2], pedestrian or bicyclists detection [3], car

detection [4] or driver monitoring [5].

Fig. 1. Typical structure of stereo vision [6].

The main purpose of using stereo vision system is capability to transform 2D view into 3D

information usually called depth map. Typical visual system consists of two digital video

cameras which are displaced horizontally from one another to obtain different views of a scene,

in a manner similar are used to human binocular vision (Fig. 1). 3D information derived by

finding the corresponding points across multiple cameras images. Correspondences lie to the

same epipolar line. To simplify the matching process, camera images are often rectified. The

result is epipolar line corresponds on image scan line. Before image rectification system has to

meet these requirements: cameras image planes are parallel, focal points height and lengths are

the same. To get a depth map a disparity value must be calculated for every image pixel. A

disparity value for a target point p calculated using following expression

High level is the last processing stage where depth and image data are used for object extraction

and ecognition. Further object list is used for front vehicle

Fig. 2. Stereo vision system architecture consist of three data processing levels.

Suggested approach based on dense edge map calculation and ROI filter. Fig. 3 illustrates a

block diagram of vision system data processing. Totally eight steps sequence to extract and

mark car from 2D image set. Car extraction process starts from capturing of view images.

Next, captured 2D images processed in the middle data processing level. Here 2D data

transformed into 3D information, well known as depth map. ROI filter eliminates redundant

3D information. The rest of data is processed by dense edge map estimation algorithm.

Region of Interest filter is used to eliminate redundant 3D information. ROI can be represented

as a frustum in a 3D environment (Fig. 4). According to fact that system designed to observe

front vehicles a frustum is projected on a roadway and covers one or more drive lines. All data

outside this space are eliminated leaving only significant information. The main benefit of

applying ROI filter is saved extra computation resources for further object extraction processes.

Fig. 4. Illustration of region of interest filtering.

Edge dense map estimation basically is a depth maps transformation from the front view to the

top view perspective. A value of every new map element calculated using following expression:

Where

where A is 3D data array; B is dense edge data array, i{0,1..h-1}, h is image height,

j{0,1..w-1}, w is image width, c is a factor calculated

This transformation is modified following the idea to analyze not whole object but only their

edges. The result of this transformation is an edges map in a perspective of top view. Fig. 5

depicts the transformation principle in a 3D environment. The most density of points is located

on objects edges. Edges are highlighted by summing points in a vertical direction. The more

objects edge is higher and sharper the more points will be counted in a vertical direction. Flat

and low altitude objects have a small amount of points; therefore, objects like a road can be

easily eliminated using simple threshold.

A virtual 3D computer reality was chosen as very first vision system and approached car

extraction algorithm test environment. Specially designed software allows simulating stereo

vision system behavior in a virtual 3D reality. Software based on OpenGL and OpenCV

programming libraries and is written in a C++ language. Software runs on Core 2 Duo 2.53GHz

CPU and 4 GB memory of RAM. Similar kind of test method has been used by other authors [9]

and [10] to test stereo vision systems for relative applications. In order that tested situations

would be more realistic, a segment of street with all following infrastructure: cars, road signs,

buildings, pedestrians were designed and rendered in a virtual 3D reality. In this test stage a

depth map was generated using OpenGL graphic API methods. This allows getting the best depth

map quality. Depth map can be estimated and using any kind of stereo matching algorithm.

Simulation software allows imitating stereo camera view and capture left and right camera

images. Estimated depth map can be represented as a point cloud converting 3D data to real

word coordinates. Different traffic situations, vision systems configurations can be emulated and

repeated many times. This is the main advantage of described simulation software. bottom. The

opposite situation presented in the situation no. 2. Detected car marked with a red colored

rectangle. Red color indicates that car is in the unsafe distance and driver must pay attention.

Close objects represent as big blobs located near to dense edge map bottom. The rest two

situations (no. 3, 4) shows system ability detect not only forward cars but and different kind of

obstacles, for instance pedestrians crossing the street. Another aspect of vision system is ability

to detect multi numbers of obstacles.

Fig. 6. Four traffic situation examples simulated using virtual 3D reality. Left image camera

view, right image edge dense map.

CHAPTER-2

PROPOSED METHOD

A pair of cameras are required to implement stereo vision. Since the application of this work is

to replace existing LIDAR-based perception systems, the system developed in this work needs to

detect obstacles all around the vehicle. Two Ricoh Theta 1 cameras were used to capture 360 by

180 spherical panoramas. The Ricoh Theta camera comprises of two opposite-facing 185 fish-

eye lens and sensors. The two images are stitched by the Ricoh application to form a 360 by 180

spherical image.

The two cameras were displaced vertically as this would result in loss of information directly

above and below the cameras, which are areas, not of interest to us. For the purposes of this

work, the camera was mounted on a tripod at one position to capture the first image. A spirit

level was used to ensure that the camera was horizontal. The camera was then vertically

displaced through a known height on the tripod to the second position to capture the second

image.

The spherical images captured by the ricoh cameras are in equirectangular format i.e. they are

uniformly smapled in azimuth and altitude angles. The azimuth variation is from

to along the x-axis of the image and the altitude variation is from =2 to =2 along the y-axis of

the image. The images are of resolution 2048 1024 pixels.

The disparity for the spherical image pair is the change in the altitude angle between the two

images, since the cameras are displaced vertically. There are several techniques to estimate

disparity values such as those outlined in [14], [15], [16]. The procedure adopted in this work

involves estimating the optical flow of each point. The optical flow estimation is done using the

optical flow software2 developed by [17]. As the cameras are vertically displaced, the optical

flow is in the vertical direction. The optical flow at every pixel gives us the disparity value in

number of pixels, at that pixel.

To speed up the disparity estimation step, the original image is down-sampled to 1022 512

pixels. To further speed up the computation, the top and bottom portions of

the image which dont contain important information for the vehicle are chopped out. An altitude

range of 30_ to 30_ is considered for the process. The resolution of the image so obtained is

1023 _ 171 pixels. The image is further down-sampled to 767 _ 129 pixels, on which the rest of

the algorithm is run. The black and white image of the obstacle map obtained as the output of the

obstacle detection algorithm is then up-sampled to 1023 _ 171 pixels.

image is of resolution 767 by 129 pixels with an altitude range of 30 to 30 .

Fig. 4: Disparity map generated from optical flow estimation. Brighter pixels correspond to

greater disparity and so smaller depth. We observe uniform gradient of disparity along the

ground plane, in the bottom half of the image.

Depth estimation The next step is to determine the depth map. This is achieved through some

basic geometry and trigonometric transformations applied to the disparity map.

where l is the baseline distance, __ is the disparity value, _1 is the altitude angle of the point with

respect to the upper camera center and PA is the distance from the point to the upper camera

center. The depth is then calculated as

Using the above formulae, the depth value for every pixel can be calculated from the

corresponding disparity value.

Fig. 5: Depth map generated from the disparity map. Darker pixels correspond to smaller depth.

OBSTACLE DETECTION

Geometry-based clustering technique has been proposed to detect obstacles in the scene.

Before we dive into the obstacle detection algorithm, we need to come up with a definition for an

obstacle. It is important to understand and visualize obstacles in the 3D space. Let us consider

the most general case of obstacles found above a ground plane and focus our analysis on this

case. We later discuss some of the other possibilities of obstacles or spaces in a scene that need to

be avoided by a vehicle. Thus, obstacles are points or areas in the scene which are at a height

from the dominant ground plane.

Mathematically, we will define obstacles in terms of two distinct points in space in the following

way:

Two points P1 and P2 belong to the same obstacle and are said to be compatible if

1) HT < jP2Z P1Z j < Hmax. The difference between the elevations of the two points is within a

range defined by HT and Hmax.

2) ((P2 P1) (P3 P1)=jjP2 P1jjjjP3 P1jj) > cos T . The point P3 is obtained by displacing P1 through

Hmax in the z direction. This condition enforces that the angle between P 2 and P1 with respect

to the z direction or elevation direction is less than a threshold value.

In the above definition, HT is the minimum height of an object for it to be considered an

obstacle. Hmax is the maximum height of on obstacle. The value of T can be set based on the

accepted traversable slope to classify obstacles appropriately.

The definition is illustrated in Figure 6. We construct an upward cone from P1 in the 3D space

based on the values chosen for the three parameters HT , Hmax and _T . If any of the points in

the scene lie in the frustum as shown, those points are classified as obstacle points.

Fig. 6: Cone formed in the 3D space with P1 at the center. The point P2, if lies within the frustum

formed by HT , Hmax and T , is classified as an obstacle point. (Source: [11])

From the depth map generated before, we have information about the depth of all the pixels.

All the points or pixels are now 3D points in Azimuth-Altitude-Depth space. The points are

transformed from the Azimuth-Altitude-Depth space to the X-Y-Z space.

A naive algorithm would involve examining all point pairs which would have a complexity of

O(N2). The algorithm proposed is more efficient and compares a point with a limited set of

points. These set of points are the points contained within the trapezium formed by the projection

of the frustum or cone formed above the point (referred to as base point from here on) in the 3D

space, onto the image plane. The projected trapezium is scaled according to depth of the base

point. The parameters of the trapezium are

H

where H is the height of the image and depth is the depth of the base point. The first term is a

scaling factor to transform the height to the altitude angle and thus, the number of pixels.

Similarly,

Here, hT is height of the closer parallel side of the trapezium from the base point and h max is

height of the farther parallel side from the base point. The upper left angle of the trapezium is

same as T , the threshold angle chosen for the compatibility definition earlier.

We loop through all the points in the trapezium and form a point pair for each point with the base

point. If the pair of points satisy the definition of obstacles i.e. are compatible, then we classify

that point in the trapezium as an obstacle point. This algorithm has a better time complexity than

the Naive alogrithm. Let K denote the average number of points in the trapezium. The

complexity is O(KN).

Algorithm:

Determine the set of pixels, TP , in the projected trapezium of P on the 2D image plane.

Examine all points in TP and determine set OP of points Pi in TP , compatible with P .

If OP is not empty, classify all points of OP as obstacle points.

Fig. 7: Black and white image of the Obstacle map. The white pixels correspond to obstacle

points.

Post-processing

Due to the lack of texture on the ground, we get fewer feature points in those regions and so

the estimated disparity values are inaccurate. The disparity gradient is discontinuous in those

regions on the ground, resulting in noise in the output. Median filtering and morphological

closing operation are applied to the output of the obstacle detection algorithm to close the small

holes.

Fig. 8: Black and white image of the Obstacle map after post-processing. The small holes in

Figure 7 are filled.

Fig. 9: Original lower image with obstacles overlaid on it. The pixels in yellow correspond to

obstacles.

Polar Map

The primary application of this work is to detect obstacles so as to plan the motion of the

vehicle. It is important to represent the obstacle points in a manner that might be useful for

motion planning. A polar map representation of the obstacle points is chosen. The polar map is

generated by projecting the 3D obstacle points onto the ground plane. In figure 10, the vehicle is

at the center of the circle facing 0 . The obstacle points are depicted in blue with the radius of the

circle indicating the depth from the vehicle. The small patch at 0 , close to the center corresponds

to the white sign board in the scene in figure 3 and the patch at 30 corresponds to the black

platform. The buildings and trees can be seen further away in the map. From the polar map, we

can plan a path for the vehicle, not obstructed by obstacles.

Fig. 10: The agent is at the center of the map, facing 0 . The blue points correspond to polar

positions of the obstacle points around the agent.

CHAPTER-3

RESULTS

Figure 11 depcits a scene which has a good mix of obstacles at short range and long range. The

algorithm detects all the obstacles in scene. The process was tested on various outdoor settings

and gave positive results in all the cases. In figure 13, we can observe that the polar map

representation accurately captures the cars parked next to each other at angles 315 to 35 . Also, in

figure 13, we notice small erroneous detection at 90 on the ground. This is due to the

lack of texture on the ground, which results in inaccurate disparity estimation. Thus, the obstacle

detection algorithm classifies it as an obstacle point. The accuracy of the obstacle detection is

limited by the accuracy of disparity estimation techniques. The programs were developed in

MATLAB3. The average runtime of disparity estimation is 125s and that of obstacle detection is

240s.

It was mentioned earlier that obstacles above the ground were being considered for this work.

But the process developed

also works for potholes in the ground. When a point inside the hole is picked, the cone/frustum in

the 3D space will contain points on the ground around the pothole. These points will be classified

as obstacles which implies that these points will be avoided in the path planning process.

Potholes are areas that need to be avoided and the algorithm does exactly that.

CHAPTER-4

MATLAB

INTRODUCTION TO MATLAB

What Is MATLAB?

MATLAB is a high-performance language for technical computing. It integrates

computation, visualization, and programming in an easy-to-use environment where problems and

solutions are expressed in familiar mathematical notation. Typical uses include

Algorithm development

Data acquisition

MATLAB is an interactive system whose basic data element is an array that does not

require dimensioning. This allows you to solve many technical computing problems, especially

those with matrix and vector formulations, in a fraction of the time it would take to write a

program in a scalar non interactive language such as C or FORTRAN.

The name MATLAB stands for matrix laboratory. MATLAB was originally written to

provide easy access to matrix software developed by the LINPACK and EISPACK projects.

Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the state of

the art in software for matrix computation.

MATLAB has evolved over a period of years with input from many users. In university

environments, it is the standard instructional tool for introductory and advanced courses in

mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-

productivity research, development, and analysis.

MATLAB features a family of add-on application-specific solutions called toolboxes. Very

important to most users of MATLAB, toolboxes allow you to learn and apply specialized

technology. Toolboxes are comprehensive collections of MATLAB functions (M-files) that

extend the MATLAB environment to solve particular classes of problems. Areas in which

toolboxes are available include signal processing, control systems, neural networks, fuzzy logic,

wavelets, simulation, and many others.

Development Environment:

This is the set of tools and facilities that help you use MATLAB functions and files. Many

of these tools are graphical user interfaces. It includes the MATLAB desktop and Command

Window, a command history, an editor and debugger, and browsers for viewing help, the

workspace, files, and the search path.

like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like matrix

inverse, matrix Eigen values, Bessel functions, and fast Fourier transforms.

This is a high-level matrix/array language with control flow statements, functions, data

structures, input/output, and object-oriented programming features. It allows both "programming

in the small" to rapidly create quick and dirty throw-away programs, and "programming in the

large" to create complete large and complex application programs.

Graphics:

MATLAB has extensive facilities for displaying vectors and matrices as graphs, as well as

annotating and printing these graphs. It includes high-level functions for two-dimensional and

three-dimensional data visualization, image processing, animation, and presentation graphics. It

also includes low-level functions that allow you to fully customize the appearance of graphics as

well as to build complete graphical user interfaces on your MATLAB applications.

This is a library that allows you to write C and FORTRAN programs that interact with

MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking), calling

MATLAB as a computational engine, and for reading and writing MAT-files.

MATLAB DESKTOP:-

Mat lab Desktop is the main Mat lab application window. The desktop contains five sub

windows, the command window, the workspace browser, the current directory window, the

command history window, and one or more figure windows, which are shown only when the

user displays a graphic.

The command window is where the user types MATLAB commands and expressions at

the prompt (>>) and where the output of those commands is displayed. MATLAB defines the

workspace as the set of variables that the user creates in a work session. The workspace browser

shows these variables and some information about them. Double clicking on a variable in the

workspace browser launches the Array Editor, which can be used to obtain information and

income instances edit certain properties of the variable.

The current Directory tab above the workspace tab shows the contents of the current

directory, whose path is shown in the current directory window. For example, in the windows

operating system the path might be as follows: C:\MATLAB\Work, indicating that directory

work is a subdirectory of the main directory MATLAB; WHICH IS INSTALLED IN

DRIVE C. clicking on the arrow in the current directory window shows a list of recently used

paths. Clicking on the button to the right of the window allows the user to change the current

directory.

MATLAB uses a search path to find M-files and other MATLAB related files, which are

organize in directories in the computer file system. Any file run in MATLAB must reside in the

current directory or in a directory that is on search path. By default, the files supplied with

MATLAB and math works toolboxes are included in the search path. The easiest way to see

which directories are on the search path.

The easiest way to see which directories are soon the search paths, or to add or modify a

search path, is to select set path from the File menu the desktop, and then use the set path dialog

box. It is good practice to add any commonly used directories to the search path to avoid

repeatedly having the change the current directory.

The Command History Window contains a record of the commands a user has entered in

the command window, including both current and previous MATLAB sessions. Previously

entered MATLAB commands can be selected and re-executed from the command history

window by right clicking on a command or sequence of commands. This action launches a menu

from which to select various options in addition to executing the commands. This is useful to

select various options in addition to executing the commands. This is a useful feature when

experimenting with various commands in a work session.

The MATLAB editor is both a text editor specialized for creating M-files and a graphical

MATLAB debugger. The editor can appear in a window by itself, or it can be a sub window in

the desktop. M-files are denoted by the extension .m, as in pixel up .m. The MATLAB editor

window has numerous pull-down menus for tasks such as saving, viewing, and debugging files.

Because it performs some simple checks and also uses color to differentiate between various

elements of code, this text editor is recommended as the tool of choice for writing and editing M-

functions. To open the editor , type edit at the prompt opens the M-file file name .m in an editor

window, ready for editing. As noted earlier, the file must be in the current directory, or in a

directory in the search path.

Getting Help:

The principal way to get help online is to use the MATLAB help browser, opened as a

separate window either by clicking on the question mark symbol (?) on the desktop toolbar, or by

typing help browser at the prompt in the command window. The help Browser is a web browser

integrated into the MATLAB desktop that displays a Hypertext Markup Language(HTML)

documents. The Help Browser consists of two panes, the help navigator pane, used to find

information, and the display pane, used to view the information. Self-explanatory tabs other than

navigator pane are used to perform a search.

CHAPTER-5

BACKGROUND:

Digital image processing is an area characterized by the need for extensive experimental

work to establish the viability of proposed solutions to a given problem. An important

characteristic underlying the design of image processing systems is the significant level of

testing & experimentation that normally is required before arriving at an acceptable solution.

This characteristic implies that the ability to formulate approaches &quickly prototype candidate

solutions generally plays a major role in reducing the cost & time required to arrive at a viable

system implementation.

What is DIP

An image may be defined as a two-dimensional function f(x, y), where x & y are spatial

coordinates, & the amplitude of f at any pair of coordinates (x, y) is called the intensity or gray

level of the image at that point. When x, y & the amplitude values of f are all finite discrete

quantities, we call the image a digital image. The field of DIP refers to processing digital image

by means of digital computer. Digital image is composed of a finite number of elements, each of

which has a particular location & value. The elements are called pixels.

Vision is the most advanced of our sensor, so it is not surprising that image play the

single most important role in human perception. However, unlike humans, who are limited to the

visual band of the EM spectrum imaging machines cover almost the entire EM spectrum, ranging

from gamma to radio waves. They can operate also on images generated by sources that humans

are not accustomed to associating with image.

There is no general agreement among authors regarding where image processing stops &

other related areas such as image analysis& computer vision start. Sometimes a distinction is

made by defining image processing as a discipline in which both the input & output at a process

are images. This is limiting & somewhat artificial boundary. The area of image analysis (image

understanding) is in between image processing & computer vision.

There are no clear-cut boundaries in the continuum from image processing at one end to

complete vision at the other. However, one useful paradigm is to consider three types of

computerized processes in this continuum: low-, mid-, & high-level processes. Low-level

process involves primitive operations such as image processing to reduce noise, contrast

enhancement & image sharpening. A low- level process is characterized by the fact that both its

inputs & outputs are images.

Mid-level process on images involves tasks such as segmentation, description of that

object to reduce them to a form suitable for computer processing & classification of individual

objects. A mid-level process is characterized by the fact that its inputs generally are images but

its outputs are attributes extracted from those images. Finally higher- level processing involves

Making sense of an ensemble of recognized objects, as in image analysis & at the far end of

the continuum performing the cognitive functions normally associated with human vision.

areas of exceptional social & economic value.

What is an image?

An image is represented as a two dimensional function f(x, y) where x and y are spatial

co-ordinates and the amplitude of f at any pair of coordinates (x, y) is called the intensity of the

image at that point.

A grayscale image is a function I (xylem) of the two spatial coordinates of the image

plane.

I(x, y) is the intensity of the image at the point (x, y) on the image plane.

I (xylem) takes non-negative values assume the image is bounded by a rectangle [0, a] [0, b]I:

[0, a] [0, b] [0, info)

Color image:

It can be represented by three functions, R (xylem) for red, G (xylem) for green and B

(xylem) for blue.

An image may be continuous with respect to the x and y coordinates and also in

amplitude. Converting such an image to digital form requires that the coordinates as well as the

amplitude to be digitized. Digitizing the coordinates values is called sampling. Digitizing the

amplitude values is called quantization.

Coordinate convention:

The result of sampling and quantization is a matrix of real numbers. We use two

principal ways to represent digital images. Assume that an image f(x, y) is sampled so that the

resulting image has M rows and N columns. We say that the image is of size M X N. The values

of the coordinates (xylem) are discrete quantities. For notational clarity and convenience, we use

integer values for these discrete coordinates. In many image processing books, the image origin

is defined to be at (xylem)=(0,0).The next coordinate values along the first row of the image are

(xylem)=(0,1).It is important to keep in mind that the notation (0,1) is used to signify the second

sample along the first row. It does not mean that these are the actual values of physical

coordinates when the image was sampled. Following figure shows the coordinate convention.

Note that x ranges from 0 to M-1 and y from 0 to N-1 in integer increments.

The coordinate convention used in the toolbox to denote arrays is different from the

preceding paragraph in two minor ways. First, instead of using (xylem) the toolbox uses the

notation (race) to indicate rows and columns. Note, however, that the order of coordinates is the

same as the order discussed in the previous paragraph, in the sense that the first element of a

coordinate topples, (alb), refers to a row and the second to a column. The other difference is that

the origin of the coordinate system is at (r, c) = (1, 1); thus, r ranges from 1 to M and c from 1 to

N in integer increments. IPT documentation refers to the coordinates. Less frequently the toolbox

also employs another coordinate convention called spatial coordinates which uses x to refer to

columns and y to refers to rows. This is the opposite of our use of variables x and y.

Image as Matrices:

The preceding discussion leads to the following representation for a digitized image

function:

f (0,0) f(0,1) .. f(0,N-1)

f (xylem)= . . .

. . .

The right side of this equation is a digital image by definition. Each element of this array

is called an image element, picture element, pixel or pel. The terms image and pixel are used

throughout the rest of our discussions to denote a digital image and its elements.

. . .

f= . . .

Where f (1,1) = f(0,0) (note the use of a moonscape font to denote MATLAB quantities).

Clearly the two representations are identical, except for the shift in origin. The notation f(p ,q)

denotes the element located in row p and the column q. For example f(6,2) is the element in the

sixth row and second column of the matrix f. Typically we use the letters M and N respectively

to denote the number of rows and columns in a matrix. A 1xN matrix is called a row vector

whereas an Mx1 matrix is called a column vector. A 1x1 matrix is a scalar.

Matrices in MATLAB are stored in variables with names such as A, a, RGB, real array

and so on. Variables must begin with a letter and contain only letters, numerals and underscores.

As noted in the previous paragraph, all MATLAB quantities are written using mono-scope

characters. We use conventional Roman, italic notation such as f(x ,y), for mathematical

expressions

Reading Images:

Images are read into the MATLAB environment using function imread whose syntax is

Imread (filename)

Here filename is a spring containing the complete of the image file(including any

applicable extension).For example the command line

Reads the JPEG (above table) image chest ray into image array f. Note the use of single

quotes () to delimit the string filename. The semicolon at the end of a command line is used by

MATLAB for suppressing output If a semicolon is not included. MATLAB displays the results of

the operation(s) specified in that line. The prompt symbol (>>) designates the beginning of a

command line, as it appears in the MATLAB command window.

Data Classes:

Although we work with integers coordinates the values of pixels themselves are not

restricted to be integers in MATLAB. Table above list various data classes supported by

MATLAB and IPT are representing pixels values. The first eight entries in the table are refers to

as numeric data classes. The ninth entry is the char class and, as shown, the last entry is referred

to as logical data class.

All numeric computations in MATLAB are done in double quantities, so this is also a

frequent data class encounter in image processing applications. Class unit 8 also is encountered

frequently, especially when reading data from storages devices, as 8 bit images are most common

representations found in practice. These two data classes, classes logical, and, to a lesser degree,

class unit 16 constitute the primary data classes on which we focus. Many ipt functions however

support all the data classes listed in table. Data class double requires 8 bytes to represent a

number uint8 and into 8 require one byte each, uint16 and int16 requires 2bytes and unit 32.

Name Description

Element).

Uint16 unsigned 16_bit integers in the range [0, 65535] (2byte per element).

Unit 32 unsigned 32_bit integers in the range [0, 4294967295](4 bytes per element).

Int8 signed 8_bit integers in the range [-128,127] 1 byte per element)

Int 16 signed 16_byte integers in the range [32768, 32767] (2 bytes per element).

Int 32 Signed 32_byte integers in the range [-2147483648, 21474833647] (4 byte per

element).

Logical values are 0 to 1 (1byte per element).

Int 32 and single required 4 bytes each. The char data class holds characters in Unicode

representation. A character string is merely a 1*n array of characters logical array contains only

the values 0 to 1,with each element being stored in memory using function logical or by using

relational operators.

Image Types:

1 .Intensity images;

2. Binary images;

3. Indexed images;

4. R G B images.

Most monochrome image processing operations are carried out using binary or intensity

images, so our initial focus is on these two image types. Indexed and RGB colour images.

Intensity Images:

An intensity image is a data matrix whose values have been scaled to represent

intentions. When the elements of an intensity image are of class unit8, or class unit 16, they have

integer values in the range [0,255] and [0, 65535], respectively. If the image is of class double,

the values are floating point numbers. Values of scaled, double intensity images are in the range

[0, 1] by convention.

Binary Images:

Binary images have a very specific meaning in MATLAB.A binary image is a logical

array 0s and1s.Thus, an array of 0s and 1s whose values are of data class, say unit8, is not

considered as a binary image in MATLAB .A numeric array is converted to binary using function

logical. Thus, if A is a numeric array consisting of 0s and 1s, we create an array B using the

statement.

B=logical (A)

If A contains elements other than 0s and 1s.Use of the logical function converts all

nonzero quantities to logical 1s and all entries with value 0 to logical 0s.

If c is a logical array, this function returns a 1.Otherwise returns a 0. Logical array can be

converted to numeric arrays using the data class conversion functions.

Indexed Images:

Matrix map is an m*3 arrays of class double containing floating point values in the range

[0, 1].The length m of the map are equal to the number of colors it defines. Each row of map

specifies the red, green and blue components of a single color. An indexed images uses direct

mapping of pixel intensity values color map values. The color of each pixel is determined by

using the corresponding value the integer matrix x as a pointer in to map. If x is of class double

,then all of its components with values less than or equal to 1 point to the first row in map, all

components with value 2 point to the second row and so on. If x is of class units or unit 16, then

all components value 0 point to the first row in map, all components with value 1 point to the

second and so on.

RGB Image:

An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet

corresponding to the red, green and blue components of an RGB image, at a specific spatial

location. An RGB image may be viewed as stack of three gray scale images that when fed in to

the red, green and blue inputs of a color monitor

Produce a color image on the screen. Convention the three images forming an RGB color

image are referred to as the red, green and blue components images. The data class of the

components images determines their range of values. If an RGB image is of class double the

range of values is [0, 1].

Similarly the range of values is [0,255] or [0, 65535].For RGB images of class units or

unit 16 respectively. The number of bits use to represents the pixel values of the component

images determines the bit depth of an RGB image. For example, if each component image is an

8bit image, the corresponding RGB image is said to be 24 bits deep.

Generally, the number of bits in all component images is the same. In this case the

number of possible color in an RGB image is (2^b) ^3, where b is a number of bits in each

component image. For the 8bit case the number is 16,777,216 colors.

CHAPTER-6

CONCLUSION

We have showed the working of an algorithm to detect obstacles. But the challenge always lies

in running it real-time on the vehicle with a quick update rate so as to react quickly

Fig. 13: Scene 3: (a) lower image; (b) depth map; (c) black and white image of obstacle map; (d)

lower image with obstacles overlaid; (e) polar map representation of obstacles

to changes in the environment. The first among our future steps should be optimizing the run

time of this algorithm and possibly explore efficient platforms and parallel architecture if

required to run it online. There are other quick depth-estimation algorithms [18], [16] in the

literature that are more suitable for real time applications at the cost of reduced accuracy. So a

sensible trade off has to be made on the choice of the depth estimation algorithm.

The obstacle detection algorithm is found to be decently robust in detecting obstacles sticking

out of the ground. But it does not particularly consider holes or cliffs. Given the depth

map and 3D location of points in view, it is easy to build a few more features in the pipeline to

seamlessly handle these kinds of obstacles as well. For example, we could classify the road

based on the fact that it has a steady gradient in depth value and plan a path for the vehicle only

along the definite road, if exists, to avoid falling off cliffs. And of course, besides just obstacles,

the vision system should also detect lane markings, sign boards, bicyclists hand signals, etc to

complete the whole perception package of the vehicle.

CHAPTER-7

REFERENCES

[1] Don Murray and James J Little. Using real-time stereo vision for mobile robot navigation.

Autonomous Robots, 8(2):161171, 2000.

[2] Hans P Moravec. Rover visual obstacle avoidance. In IJCAI, pages 785790, 1981.

[3] Don Murray and Cullen Jennings. Stereo vision based mapping and navigation for mobile

robots. In Robotics and Automation, 1997. Proceedings., 1997 IEEE International

Conference on, volume 2, pages 16941699. IEEE, 1997.

[4] Nicola Bernini, Massimo Bertozzi, Luca Castangia, Marco Patander, and Mario Sabbatelli.

Real-time obstacle detection using stereo vision for autonomous ground vehicles: A survey.

In Intelligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on,

pages 873878. IEEE, 2014.

[5] Alberto Elfes. Using occupancy grids for mobile robot perception and navigation. Computer,

22(6):4657, 1989.

[6] Hernan Badino, Uwe Franke, and Rudolf Mester. Free space com-putation using stochastic

occupancy grids and dynamic programming. In Workshop on Dynamical Vision, ICCV, Rio

de Janeiro, Brazil, volume 20, 2007.

[7] Florin Oniga and Sergiu Nedevschi. Processing dense stereo data using elevation maps: Road

surface, traffic isle, and obstacle detection.

Vehicular Technology, IEEE Transactions on, 59(3):11721182, 2010.

[8] Andreas Wedel, Annemarie Meiner, Clemens Rabe, Uwe Franke, and Daniel Cremers.

Detection and segmentation of independently moving objects from dense scene flow. In

Energy minimization methods in computer vision and pattern recognition, pages 1427.

Springer, 2009.

[9] Uwe Franke, Clemens Rabe, Hernan Badino, and Stefan Gehrig. 6d-vision: Fusion of stereo

and motion for robust environment perception. In Pattern Recognition, pages 216223.

Springer, 2005.

[10] Philip Lenz, Julius Ziegler, Andreas Geiger, and Martin Roser. Sparse scene flow

segmentation for moving object detection in urban environ-ments. In Intelligent Vehicles

Symposium (IV), 2011 IEEE, pages 926 932. IEEE, 2011.

[11] A Talukder, R Manduchi, A Rankin, and L Matthies. Fast and reliable obstacle detection

and segmentation for cross-country navigation. In

Intelligent Vehicle Symposium, 2002. IEEE, volume 2, pages 610618. IEEE, 2002.

[12] Massimo Bertozzi, Luca Bombini, Alberto Broggi, Michele Buzzoni, Elena Cardarelli,

Stefano Cattani, Pietro Cerri, Alessandro Coati, Stefano Debattisti, Andrea Falzoni, et al.

Viac: An out of ordinary experiment. In Intelligent Vehicles Symposium (IV), 2011 IEEE,

pages 175180. IEEE, 2011.

[13] Alberto Broggi, Michele Buzzoni, Mirko Felisa, and Paolo Zani. Stereo obstacle

detection in challenging environments: the viac experience. In

Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages

15991604. IEEE, 2011.

[14] Kyung-Hoon Bae, Dong-Sik Yi, Seung Cheol Kim, and Eun-Soo Kim. A bi-directional

stereo matching algorithm based on adaptive matching window. In Optics & Photonics 2005,

pages 590929590929. Interna-tional Society for Optics and Photonics, 2005.

[15] HanSung Kim and Kwanghoon Sohn. Hierarchical depth estimation for image synthesis

in mixed reality. In Electronic Imaging 2003, pages 544553. International Society for Optics

and Photonics, 2003.

[16] Jochen Schmidt, Heinrich Niemann, and Sebastian Vogt. Dense disparity maps in real-

time with an application to augmented reality. In Applica-tions of Computer Vision, 2002.

(WACV 2002). Proceedings. Sixth IEEE Workshop on, pages 225230. IEEE, 2002.

[17] Deqing Sun, Stefan Roth, and Michael J Black. Secrets of optical flow estimation and

their principles. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE

Conference on, pages 24322439. IEEE, 2010.

[18] Heiko Hirschmuller. Accurate and efficient stereo processing by semi-global matching

and mutual information. In Computer Vision and Pat-tern Recognition, 2005. CVPR 2005.

IEEE Computer Society Conference on, volume 2, pages 807814. IEEE, 2005.

- Kinema Tic 1Transféré parAmit Singh
- Funciones Biblioteca MatLabTransféré parPaúl A Muñoz
- Referencia a Partir de FotosTransféré pargilberto
- Kyto_MethodForMeasuringStereoCameraDepthAccuracyTransféré parRicardo Romero
- ExerciseTransféré parNguyen Nam
- Psychtoolbox3Transféré parVahid Irani
- curve fitting in MatlabTransféré parRichard Frank Choque
- 20. Eng-SSCE Secure Communication Based on a Chaotic-P.karthikTransféré parImpact Journals
- Strategy Insight 2013_0517 - 17 May 2013Transféré parStrategicInnovation
- Hout PerceptionTransféré parEdward Araneta Queipo
- lec4.pdfTransféré parBalram Purnaa
- Airborne LIDAR and GIS Tools in Transmission Line Re-Rating ProjectsTransféré parMohd Razif Sumairi
- Function VarargoutTransféré parJulian Riveros
- A Survey of Machine Vision Techniques for Fruit Sorting and GradingTransféré parInternational Journal for Scientific Research and Development - IJSRD
- controls mannual.docxTransféré parUmair Afzal Shukla
- MATLAB Desktop Keyboard ShortcutsTransféré parJuan Gomez
- mathlab7.pdfTransféré parJeandelaSagesse
- ex1Transféré parapi-322416213
- Read Elapsed Time From Stopwatch - MATLAB Toc - MathWorks IndiaTransféré parBarath
- 06147421Transféré parCamilo Ernesto Pardo Beainy
- matlabTransféré parkarthik0433
- Progres Kelompok 3Transféré parRadianSitumeang
- Chap07C_AT_Stages.pptxTransféré parAulia Luqman Lionar
- SystemRequirements-Release2013bTransféré parRahul_Kittu
- Mann TA TutorialTransféré parM Madan Gopal
- MATLAB & Simulink - 2 page introductionTransféré parshreenandanrao
- MFL67372322_02___edtTransféré parpanchoconce
- THE STUDY OF SLIDER CRANK MECHANISMTransféré parNicusorDuta
- Document ScribdTransféré parABHIJITHMK
- ChetanTransféré parSandip Gupta

- AVR Project BookTransféré parSathiya Nathan
- 2017 NESA CatalogTransféré parJuan Guerra
- HP9000 ContainersTransféré parArvindBalachandran
- ElectricityTransféré parSri Pratheepan
- Pressure & Temp CalTransféré parRaj Solanki
- UCN International Brochure.pdfTransféré parBogdan Curta
- DTV BookletTransféré parapi-3710509
- Arc Oil and Gas Resource Application Form111Transféré parVERGARAPEDRO
- Probleme PAC2 in VHDLTransféré parGeorge Lungu
- 2) Product Marketing Budget Template.xlsTransféré parShravan
- Cardboard Chair ProjectTransféré parhecteezyy
- SweetFX_Settings_Flight Simulator X_Coconut's FSX - DX10 Improved 2016Transféré parBrunoVinicius
- FS-2 FusionManager Product DescriptionTransféré parRicardo Bolanos
- 94830-186831-1-PB.pdfTransféré parfpopescu1425
- IoLua!Transféré paradrian_ze9553
- ALV_StepByStep_12Aug20083211281700880Transféré parKarthik Reddy
- 3865Transféré parsabatino123
- Csca Shopping Centre Database Data DictionaryTransféré paralps_7777
- Updated List of Non-programmable CalculatorsTransféré parJenica Deguzman
- LAN Switching and WirelessTransféré parJibin Chacko
- Enable SSO in SAP System Using Kerberos AuthenticationTransféré parPILLINAGARAJU
- UVM Tutorial for Candy Lovers – 9Transféré parjagdish
- Meta-Analysis Fixed-effect vs Random-effects ModelsTransféré parNurul Mawaddah
- Activ Inspire ActivoteTransféré parapi-2644786
- A Secure Erasure Code-Based Cloud Storage System With Secure Data ForwardingTransféré parJAYAPRAKASH
- A Comparison of Groupware Evaluation MethodologiesTransféré parSara Madelaine Gutiérrez
- PHI Learning EEE Price List Subject WiseTransféré parPHI Learning Pvt. Ltd.
- 8200HPTransféré parjromnu2
- 16-06-08 BSA Amicus Curiae Brief SCOTUS Samsung v. AppleTransféré parFlorian Mueller
- Scales for Jazz Improvisation Dan Haerle PDF DownloadTransféré parHaditya Armstrong