Académique Documents
Professionnel Documents
Culture Documents
Xing Mei1,2 , Xun Sun1 , Mingcai Zhou1 , Shaohui Jiao1 , Haitao Wang1 , Xiaopeng Zhang2
1
Samsung Advanced Institute of Technology, China Lab
2
LIAMA-NLPR, Institute of Automation, Chinese Academy of Sciences
{xing.mei,xunshine.sun,mingcai.zhou,sh.jiao,ht.wang}@samsung.com,xpzhang@nlpr.ia.ac.cn
1. 1 = 1 , 2 = 2 , if 1 < , 2 < .
This step takes in the aggregated matching cost volume The disparity with the minimum 2 value is selected as pix-
(denoted as 1 ) and computes the intermediate disparity re- el ps intermediate result.
sults. To further alleviate the matching ambiguities, an opti-
mizer with smoothness constraints and moderate parallelis- 2.4. Multi-step Disparity Refinement
m should be adopted. We employ a multi-direction scan- The disparity results of both images (denoted as and
line optimizer based on Hirschmullers semi-global match- ) computed by the previous three steps contain outliers
ing method [2]. in occlusion regions and at depth discontinuities. After de-
Four scanline optimization processes are performed in- tecting these outliers, the simplest refinement method is to
dependently: 2 along horizontal directions and 2 along ver- fill them with nearest reliable disparities [11], which is only
tical directions. Given a scanline direction r, the path cost useful for small occlusion regions. We instead handle the
disparity errors systematically in a multi-step process. Each
step tries to remove the errors caused by various factors.
Outlier Detection: The outliers in are first detect-
ed with left-right consistency check: pixel p is an outlier if
(p) = (p ( (p), 0)) doesnt hold. Outliers are
further classified into occlusion and mismatch points, since
they require different interpolation strategy. We follow the
method proposed by Hirschmuller [2]: for outlier p at dis-
parity (p), the intersection of its epipolar line and (a) before outlier handling (b) after outlier handling
is checked. If no intersection is detected, p is labelled as
occlusion, otherwise mismatch. Figure 5. The disparity error maps for the Teddy image pair. The
Iterative Region Voting: The detected outliers should errors are marked in gray (occlusion) and black (non occlusion).
The disparity errors are significantly reduced in the outlier han-
be filled with reliable neighboring disparities. Most accu-
dling process.
rate stereo algorithms employ segmented regions for outlier
handling [2, 20], which are not suitable for GPU implemen-
tation. We process these outliers with the constructed cross-
based regions and a robust voting scheme.
For an outlier pixel p, all the reliable disparities in its
its cross-based support region are collected to build a his-
togram p with max + 1 bins. The disparity with the
highest bin value (most votes) is denoted as p . And the
total number of the reliable pixels is denoted as p =
=max (a) before discontinuity adjustment (b) after discontinuity adjustment
=0 p (). ps disparity is then updated with p if
enough reliable pixels and votes are found in the support
Figure 6. The errors around depth discontinuities are reduced after
region:
the adjustment step.
p (p )
p > , > (6)
p
where , are two threshold values. Sub-pixel Enhancement: Finally, a sub-pixel enhance-
To process as many outliers as possible, the voting pro- ment process based on quadratic polynomial interpolation
cess runs for 5 iterations. The filled outliers are marked is performed to reduce the errors caused by discrete dispar-
as reliable pixels and used in the next iteration, such that ity levels [20]. For pixel p, its interpolated disparity is
valid disparity information can gradually propagate into oc- computed as follows:
clusion regions.
Proper Interpolation: The remaining outliers are filled 2 (p, + ) 2 (p, )
= (7)
with a interpolation strategy that treats occlusion and mis- 2(2 (p, + ) + 2 (p, ) 22 (p, ))
match points differently. For outlier p, we find the nearest
reliable pixels in 16 different directions. If p is an occlu- where = (p), + = + 1, = 1. The final
sion point, the pixel with the lowest disparity value is se- disparity results are obtained by smoothing the interpolated
lected for interpolation, since p most likely comes from the disparity results with a 3 3 median filter.
background; otherwise the pixel with the most similar col- To verify the effectiveness of the refinement process, the
or is selected for interpolation. With region voting and in- average error percentages in various regions after perform-
terpolation, most outliers are effectively removed from the ing each refinement step are presented in Figure 7. The four
disparity results, as shown in Figure 5. refinement steps successfully reduce the error percentage in
Depth Discontinuity Adjustment: In this step, the dis- all regions by 3.8%, but their contributions are distinct for
parities around the depth discontinuities are further refined different regions: for non-occluded regions, voting and sub-
with neighboring pixel information. We first detect all the pixel enhancement are most effective for handling the mis-
edges in the disparity image. For each pixel p on the dis- match outliers; for discontinuity regions, the errors are sig-
parity edge, two pixels p1 , p2 from both sides of the edge nificantly reduced by voting, discontinuity adjustment and
are collected. (p) is replaced by (p1 ) or (p2 ) sub-pixel enhancement; most outliers in all regions are re-
if one of the two pixels has smaller matching cost than moved with voting and interpolation, and small errors due
2 (p, (p)). This simple method helps to reduce the s- to discontinuities and quantization are reduced by adjust-
mall errors around discontinuities, as shown by the error ment and sub-pixel enhancement. A systematic integration
maps in Figure 6. of these steps guarantees a strong post-processing method.
3. CUDA Implementation 1 2 1 2
10 30 34 17 20 6
Compute Unified Device Architecture (CUDA) is a pro- 1 2
gramming interface for parallel computation tasks on N- 1.0 3.0 15 20 0.4
VIDIA graphics hardware. The computation task is cod-
ed into a kernel function, which is performed concurrently Table 1. Parameter settings for the Middlebury experiments
on data elements by multiple threads. The allocation of the
threads is controlled with two hierarchical concepts: grid
and block. A creates a grid with multiple blocks, and imum errors both in non-occluded regions and near depth
each block consists of multiple threads. The performance of discontinuities. Compared to algorithms such as CoopRe-
the CUDA implementation is closely related to thread allo- gion [17], the results on the Tsukuba image pair are not
cation and memory accesses, which needs careful tuning in competitive. The Tsukuba image pair contains some very
various computation tasks and hardware platforms. Given dark and noisy regions near the lamp and the desk, which
image resolution and disparity range , we briefly lead to incorrect cross-based support regions for aggrega-
describe the implementation issues of our algorithm. tion and refinement.
Cost Initialization: This step is parallelized with We run the algorithm both on CPU and on graphics
threads. The threads are organized into a 2D grid and hardware. For the four data sets (Tsukuba, Venus, Ted-
the block size is set to 32 32. Each thread takes care of dy and Cones), the CPU implementation requires 2.5 sec-
computing a cost value for a pixel at a given disparity. For onds, 4.5 seconds, 15 seconds and 15 seconds respectively,
census transform, a square window is require for each pixel, while the GPU implementation requires only 0.016 second-
which requires loading more data into the shared memory s, 0.032 seconds, 0.095 seconds and 0.094 seconds respec-
for fast access. tively. The GPU-friendly system design brings an impres-
Cost Aggregation: A grid with threads is cre- sive 140 speedup in the processing speed. The average
ated for both steps of the aggregation process. For cross proportions of the GPU running time for the four compu-
construction, we set the block size to or , such that tation steps are 1%, 70%, 28% and 1% respectively. The
each block can efficiently handle a scanline. For cost aggre- iterative cost aggregation step and the scanline optimization
gation, we follow the method proposed by Zhang et al. [24], process dominate the running time.
which works similar to the first step. Each thread sums up a Finally, we test our system on two stereo video se-
pixels cost values horizontally and vertically in two passes. quences: a book arrival scene from the HHI database
Data reuse with shared memory is considered in both steps. (512 384, 60 disparity levels), and an Ilkay scene from
Scanline Optimization: This step is different from the Microsoft i2i database (320 240, 50 disparity levels). To
previous steps, because the process is sequential in the s- test the generalization ability of the system, we use the same
canline direction and parallel in the orthogonal direction. set of parameters as the Middlebury datasets, and no tempo-
A grid with or threads is created accord- ral coherence information is employed in the computation
ing to the scanline direction. threads are allocated for process. The snapshots for the two examples are presented
each scanline, such that path costs on all disparity level- in Figure 9, and a video demo that runs at about 10FPS
s can be computed concurrently. Synchronization between is available at http://xing-mei.net/resource/
the threads is needed for finding the minimum cost of the video/adcensus.avi. Our system performs reason-
previous pixel on the same path. ably well on these examples, but the results are not as con-
Disparity Refinement: Each step of the refinement pro- vincing as the Middlebury datasets: artifacts are visible
cess works on the intermediate disparity images, which can around depth boarders and occlusion regions.
be efficiently processed with threads. We briefly discuss the limitations of the current system
with the video examples. The disparity errors come from
4. Experimental Results several aspects: first, the support regions defined by the
cross skeleton rely heavily on color and connectivity con-
We test our system with the Middlebury benchmark [12]. straints. For practical scenes the cross construction process
The test platform is a PC with Core2Duo 2.20GHz CPU and can be easily corrupted by dark regions and image noise. S-
NVIDIA GeForce GTX 480 graphics card. The parameters mall regions without enough support area can be produced,
are given in Table 1, which are kept constant for all the data which brings significant errors for later computation steps
sets. such as cost computation and region voting. Bilateral fil-
The disparity results are presented in Figure 8. Our sys- tering might be used as a pre-process to reduce the noise
tem ranks first in the Middlebury evaluation, as shown in while preserving the image edges [1, 15]. Second, the well-
Table 2. Our algorithm performs well on all the data sets, designed multi-stage mechanism is a double-edged sword.
and gives the best results on the Venus image pair with min- It help us to get accurate results and remove the errors step
by step in a systematic way, but it also brings a large set [8] S. Mattoccia, F. Tombari, and L. D. Stefano. Stereo vision
of parameters. By carefully tuning individual parameters, enabling precise border localization within a scanline opti-
the disparity quality can be improved, but such a scheme mization framework. In Proc. ACCV, pages 517527, 2007.
is usually laborious and impractical for various real-world [9] C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and
applications. A possible solution is to analyze the robust- M. Gelautz. Fast cost-volume filtering for visual correspon-
ness of the parameters with ground truth data and adaptively dence and beyond. In Proc. CVPR, 2011.
[10] C. Richardt, D. Orr, I. Davies, A. Criminisi, and N. A. Dodg-
set the unstable parameters with different visual contents.
son. Real-time spatiotemporal stereo matching using the
Automatic parameter estimation within an iterative frame-
dual-cross-bilateral grid. In Proc. ECCV, pages 63116316,
work [25] might also be used to avoid the tricky parameter 2010.
tuning process. [11] D. Scharstein and R. Szeliski. A taxonomy and evaluation of
dense two-frame stereo correspondence algorithms. IJCV,
5. Conclusions 47(1-3):742, 2002.
[12] D. Scharstein and R. Szeliski. Middlebury stereo evaluation -
This paper has presented a near real-time stereo system version 2, 2010. http://vision.middlebury.edu/
with accurate disparity results. Our system is based on sev- stereo/eval/.
eral key techniques: AD-Census cost measure, cross-based [13] X. Sun, X. Mei, S. Jiao, M. Zhou, and H. Wang. Stereo
support regions, scanline optimization and a systematic re- matching with reliable disparity propagation. In Proc. 3DIM-
finement process. These techniques significantly improve PVT, pages 132139, 2011.
the disparity quality without sacrificing performance and [14] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kol-
parallelism, which are suitable for GPU implementation. mogorov, A. Agarwala, M. Tappen, and C. Rother. A com-
Although our system presents nice results for the Middle- parative study of energy minimization methods for markov
bury data sets, applying it in real world applications is still random fields with smoothness-based priors. IEEE TPAMI,
30(6):10681080, 2008.
a challenging task, as shown by the video examples. Real
[15] C. Tomasi and R. Manduchi. Bilateral filtering for gray and
world data usually contains significant image noise, rectifi-
color images. In Proc. ICCV, pages 839846, 1998.
cation errors and illumination variation, which might cause
[16] F. Tombari, S. Mattoccia, and L. D. Stefano. Segmentation-
serious problems for cost computation and support region based adaptive support for accurate stereo correspondence.
construction. And robust parameter setting methods are al- In Proc. PSIVT, pages 427438, 2007.
so important to produce satisfactory results. We would like [17] Z. Wang and Z. Zheng. A region based stereo matching algo-
to explore these topics in the future. rithm using cooperative optimization. In Proc. CVPR, pages
18, 2008.
Acknowledgment [18] Y. Wei, C. Tsuhan, F. Franz, and C. H. James. High per-
formance stereo vision designed for massively data parallel
The authors would like to thank Daniel Scharstein for the platforms. IEEE TCSVT, 99:111, 2010.
Middlebury test bed and personal communications. [19] Q. Yang, C. Engels, and A. Akbarzadeh. Near real-time
stereo for weakly-textured scenes. In Proc. BMVC, pages
References 8087, 2008.
[20] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister.
[1] K. He, J. Sun, and X. Tang. Guided image filtering. In Proc. Stereo matching with color-weighted correlation, hierarchi-
ECCV, 2010. cal belief propagation and occlusion handling. IEEE TPAMI,
[2] H. Hirschmuller. Stereo processing by semiglobal matching 31(3):492504, 2009.
and mutual information. IEEE TPAMI, 30(2):328341, 2008. [21] K.-J. Yoon and I.-S. Kweon. Adaptive support-weight ap-
[3] H. Hirschmuller and D. Scharstein. Evaluation of stere- proach for correspondence search. IEEE TPAMI, 28(4):650
o matching costs on images with radiometric differences. 656, 2006.
IEEE TPAMI, 31(9):15821599, 2009. [22] R. Zabih and J. Woodfill. Non-parametric local transforms
[4] A. Hosni, M. Bleyer, and M. Gelautz. Near real-time stereo for computing visual correspondence. In Proc. ECCV, pages
with adaptive support weight approaches. In Proc. 3DPVT, 151158, 1994.
2010. [23] K. Zhang, J. Lu, and G. Lafruit. Cross-based local stereo
[5] A. Hosni, M. Bleyer, M. Gelautz, and C. Rheman. Local matching using orthogonal integral images. IEEE TCSVT,
stereo matching using geodesic support weights. In Proc. 19(7):10731079, 2009.
ICIP, pages 20932096, 2009. [24] K. Zhang, J. Lu, G. Lafruit, R. Lauwereins, and L. V. Gool.
Real-time accurate stereo with bitwise fast voting on cuda.
[6] A. Klaus, M. Sormann, and K. Karner. Segment-based stere-
In Proc. ICCV Workshop, 2009.
o matching using belief propagation and a self-adapting dis-
[25] L. Zhang and S. M. Seitz. Estimating optimal parameter-
similarity measure. In ICPR, pages 1518, 2006.
s for mrf stereo from a single image pair. IEEE TPAMI,
[7] J. Liu and J. Sun. Parallel graph-cuts by adaptive bottom-up
29(2):331342, 2007.
merging. In Proc. CVPR, pages 2181 2188, 2010.
8.0
Dis c. Error All. Error
Non.Occ Error
s ca nline optimiza tion 8
2.4 s ca nline optimiza tion s ca nline optimiza tion
7.5
7
re gion voting
Non.Occ Error
Dis c. Error
inte rpola tion
All. Error
2.2 inte rpola tion 6
dis c. a djus tme nt 7.0
Figure 7. The average error percentages in non-occlusion, discontinuity and all regions after performing each refinement step.
Figure 8. Results on the Middlebury data sets. First row: disparity maps generated with our system. Second row: disparity error maps with
threshold 1. Errors in unoccluded and occluded regions are marked in black and gray respectively.
Table 2. The rankings in the Middlebury benchmark. The error percentages in different regions for the four data sets are presented.