Vous êtes sur la page 1sur 25

Plan of the Talk

Introduction Stereo Vision

Parallel Processing: Architecture


Summary and conclusions

Introduction
We are planning to design and implement a

real time stereo vision-based hardware and software architecture, based on a full-custom massively parallel hardware, capable of giving depth map of the image.
We aim to achieve a latency of about 20ms so

it can be implemented in real time.

Algorithm design and Implementation

With two (or more) cameras we can infer depth, by means of triangulation, if we are able to find corresponding (homologous) points in the two images

Symmetric support Combines accuracy of adaptive weights approaches with

efficiency of traditional (correlative) approach Deploys a regularized range filter computed on a block basis of size wxw Increase noise robustness Efficient pixel-wise cost computation by means of integralimage/box-filtering schemes Results comparable to top performing approaches Segment Support and Adaptive Weights Fast: 32 sec on Teddy (w=3) Moreover, several trade-off speed vs accuracy are feasible: 14 sec (w=5) , 9 sec (w=7), 5 sec (w=9)

Exploits the mutual relationships among neighbouring

pixels by explicitly modelling the continuity constraints Very accurate (significant improvements near depth discontinuities and low textured regions) Notable improvements compared to state-of-the-art approaches Fast 37 sec* on Teddy (un-optimized code) deploying the disparity hypotheses provided by Fast Bilateral Stereo Fast: 15 sec* on Teddy (un-optimized code) deploying the disparity hypotheses provided by Fixed Window

Architecture

Stereo Vision and Learning Algorithms can be easily

implemented in parallel. [4][5][6]


These algorithms consists of high no. of mutually

independent processes at any particular instant.


The speed with which a single processor can process

data(number of instructions per second, FLOPS) cannot be increased beyond a limit:


It is difficult to cool faster CPUs. Faster CPUs demand smaller chip size which again creates

more heat.
Using a fast/parallel computer :
Can solve existing problems/more problems in less time. Can solve completely new problems leading to new findings.

Computation time for problems depend on the followings:


Input-Output:- requirement of disk read/write. Communication:- lot of data to be communicated

among processors. Gig-bit Ethernet are used. Memory:- large amount of data to be available in the main memory (DRAM). Latency and bandwidth are very important. Processor:- Problems in which a large number or computations have to be done. Cache :- Better memory hierarchy (RAM,L1,L2,L3..) and efficient use

Requirement
Data acquisition (Frames) Stereo vision computation Data Storage in real-time Adapted for stereo vision algorithms

Solution Proposed
Effective CPU utilization Parallel and pipeline stages Checking Memory Latency time Balancing slow external storage units and fast closer memory units

Microprocessors

Floating Point (FP) units + SIMD C/C++ (+ assembly) Power, cost and size are the main drawbacks

Application-specific integrated circuit (ASIC) Low power & low cost processor

C/C++ Costly Physically large, high power consumption no SIMD (often) raw power high power dissipation and cost programming is difficult (CUDA and OpenCL help) efficient, low power (<1 W), low cost programming language: VHDL coding is difficult and tailored for specific devices

GPUs (Graphic Processing Units)


FPGA (Field Programmable Gate Array)


Using standard parallel processing units Designing interface between components following standards

Pros Can be used for solving many problems ( Servers, DNA, Dynamic Simulation etc.) Less detailed programming Cons Generally requires more hardware then required for specific problem Costly Physically large, high power consumption

Using Application-specific integrated circuit (ASIC)


ASICs can achieve superior performance for a limited set of

applications. ASICs need long design cycle and restrict the flexibility of the system and exclude any post-design optimizations and upgrades in features and algorithms
Reconfigurable systems (FPGA-based real-time Systems)
Reduce the time, cost and expertise requirements in

hardware-based algorithm implementation Can be reprogrammed to facilitate improvement and modification in design

Thread (Core) Memory Unit

SRAM

SRAM

SRAM Permanent Memory

SRAM

N units

Process Distribution

SRAM

SRAM

SRAM

SRAM

N units

SRAM

SRAM

SRAM

[1] S. Mattoccia, S. Giardino, A. Gambini, Accurate and efficient cost aggregation strategy for stereo correspondence based on approximated joint bilateral filtering, Asian Conference on Computer Vision (ACCV2009) [2] S. Mattoccia, A locally global approach to stereo correspondence, 3D Digital Imaging and Modelling (3DIM2009) [3] F. Tombari, S. Mattoccia, L. Di Stefano, E. Addimanda, Classification and evaluation of cost aggregation methods for stereo correspondence, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2008) [4] A Parallel Reconfigurable Architecture for Real-Time Stereo Vision , Lei Chen Yunde Jia Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing , Embedded Software and Systems, 2009. ICESS '09. [5] An on-chip parallel memory architecture for a stereo vision system , Motten, A. Claesen, L. Expertise Centre for Digital Media, Hasselt Univ., Diepenbeek, Belgium , Electronics, Circuits, and Systems (ICECS), 2010 17th IEEE International Conference on 12-15 Dec. 2010 [6] http://danstrother.com/2011/01/24/fpga-stereo-vision-project/ [7] http://www.iucaa.ernet.in/~jayanti/parallel.html [8] An Overview of Parallel computing , Jayanti Prasad ,Inter-University Centre for Astronomy & Astrophysics, Pune, India (411007) , May 20, 2011 [9] Stereo Vision: Algorithms and Applications, Stefano Mattoccia, DEIS, University of Bologna, stefano.mattoccia@unibo.it, http://www.vision.deis.unibo.it/smatt/stereo.htm

Thank You !

Rajesh Uniyal

2k10/EC/112

Naman Madan

2k10/EC/086

Vous aimerez peut-être aussi