Académique Documents
Professionnel Documents
Culture Documents
ABSTRACT 2-D-to-3-D conversion is one way to make full use of 2-D contents to produce 3-D contents.
Real-time 2-D-to-3-D conversion is required for 3-D consumer electronic devices, which demands fast
processing speed especially for high-definition videos. In this paper, we propose a reconfigurable VLSI
architecture for real-time 2-D-to-3-D conversion. Two different depth-retrieval methods are implemented in
this architecture in order to support the choice of a best method or combination of different methods. The
proposed architecture can also support different resolutions (4K, 1080p, and 720p) and the original view
can be configured as either left view or right view. In order to overcome the problem of ‘‘Memory Wall’’,
we propose a data reuse method to reduce memory traffic for the proposed architecture so that the overall
performance is improved to realize real-time conversion. The experiment result shows that the implemented
2-D-to-3-D architecture can achieve state-of-the-art throughput (4K@30f/s).
INDEX TERMS 2D-to-3D conversion, data reuse, memory traffic, real-time, reconfigurable.
2169-3536
2017 IEEE. Translations and content mining are permitted for academic research only.
26604 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 5, 2017
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
W. Xu et al.: Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion
number of edge pixels in the current frame is not known until III. PROPOSED ARCHITECTURE FOR
the end of the frame for our hardware implementation and 2D-TO-3D CONVERSION
the number of edge pixels is usually very close between two In this section, we first present the overall architecture (Fig. 5)
adjacent frames. Therefore, Edgesum needs to be updated at for real-time 2D-to-3D conversion, which supports the choice
the end of each frame. of a best method or optimal combination of different methods.
Edgecount Then we explain each module of the proposed architecture in
Depthglobal = × 255 (3) detail.
Edgesum
At last, the combination of Y , Cb and Cr color channels
is used to refine the global depth to achieve a local depth
according to (4), where f1 , f2 and f3 are linear functions. The
local depths of all the pixels in the current frame compose the
initial depth map.
Depthlocal = Depthglobal × f1 (Cr ) × f2 (Cb ) × f3 (Y ) (4)
B. DIBR
Depth-image-based rendering (DIBR) is used to generate
the new view from the original 2D video sequence and the
extracted depth information (depth map). DIBR includes four
parts (Fig. 3), pre-processing of the initial depth map, convert-
ing depth to disparity, pixel shifting and hole filling [21], [22].
A. OVERALL ARCHITECTURE
The 2D-to-3D architecture adopts a pixel-level pipeline.
The 2D sequence is inputted pixel by pixel and outputted
FIGURE 3. The workflow of DIBR.
unchanged as the original view. The new view is generated
First, as a pre-processing step, a Gaussian filter is used to after the processing of two modules, depth map generation
smooth the initial depth map and the final depth value for and DIBR. The input buffer is designed as a line buffer to store
each pixel is generated. The Gaussian filter employs a 5 × 5 several lines of the original image in order to be accessed by
template (Fig. 4), which needs 24 reference pixels around the different modules of the proposed 2D-to-3D architecture at
target pixel. the same time. The configuration module can be configured
to select a best depth generation algorithm for the depth
map generation module, which supports two different depth
map generation methods. The configuration module can also
be configured to support different resolutions, such as 4K,
1080p or 720p. The depth map generation module is used
to generate an initial depth map. After that, DIBR module
processes the initial depth map and generates a new view.
FIGURE 4. Template for Gaussian filter.
Depth map buffer stores several lines of the initial depth map,
which will be pre-processed by Gaussian filter in the DIBR
Second, the smoothed depth value is converted to the dis- module.
parity between the position of the pixel in the original view
and that in the new view, according to (5). D represents the B. RECONFIGURABLE DEPTH MAP GENERATION
viewing distance from the display. tc is the interpupillary dis-
The depth map generation module supports two depth-
tance which is the human eye separation. Z is the normalized
retrieval methods, an edge-based method and an edge-color-
depth value for each pixel.
based method (Fig. 6). The Sobel module detects whether a
D pixel belongs to an edge, which is shared by the two depth-
disparity = xR − xL = tc (1 − ) (5)
D−Z retrieval methods. The global depth module and local depth
Third, the pixels of the new view are generated according module are only used for the edge-color-based method to
to the disparity and the original view. However, there may generate the global depth and the local depth respectively. The
be some holes between pixels of the new view because some value conversion module is used by the edge-based method to
of the pixels fall in the same position after pixel shifting. convert the values 1 and 0 to 255 and 0 respectively to get an
Therefore, at last, hole-filling is applied to fill color into these initial depth. The initial depth values generated by the best
holes if there are still holes. depth-retrieval method can be selected through configuration
FIGURE 8. Parallel multipliers with an adder tree for four modules, fx, fy,
threshold and Gassian filter.
One 24-bit pixel from the input buffer is divided to three 8-bit
data, Cr_data, Cb_data and Y_data. Three linear functions are
computed in parallel by three modules, F(Cr), F(Cb) and F(Y)
respectively. The final multiplication of the four values in (4)
is performed in pairs for less delay, (DepthGlobal , f1 (Cr )) as a FIGURE 12. Architecture of Disparity, Shift and Hole fill.
pair and (f2 (Cb ) , f3 (Y )) as a pair. Therefore, there are three
multipliers in the Multiply module for (DepthGlobal ×f1 (Cr )), 2) DISPARITY, SHIFT AND HOLE FILL
(f2 (Cb ) × f3 (Y )) and the final local depth (Local_depth_data) The disparity module receives the smoothed depth value from
respectively. the Gaussian filter one by one (Fig. 12). The depth value is
first normalized and then transformed to the disparity accord-
ing to (5). The Shift module first transforms the disparity to
the address of the register array with a decoder. Then the
pixels of the original view from the input buffer are stored
into the register array according to the address. The L/R signal
from the Configurations module is used to choose whether
the conversion is from left view to right view or reverse. The
FIGURE 11. DIBR Architecture. Hole fill module is used to fill color into the holes which are
generated after the pixel shifting. The pixel from the Shift
C. DIBR ARCHITECTURE module is first judged whether it is a hole or not. If it is a
The DIBR architecture includes four modules (Fig. 11), hole, the pixels near it are used to fill the hole. There is also
Gaussian Filter, Disparity, Shift and Hole Fill. As a pre- a register array to store the line of pixels for hole filling. The
processing module, Gaussian Filter smoothes the initial depth output data from the hole fill module are the final data of the
map from the depth map buffer and the final depth value for new view.
each pixel is generated. The final depth value is converted to
the disparity by the disparity module. The pixels of the new D. MEMORY TRAFFIC REDUCTION FOR
view are generated by the shift module using the disparity and REAL-TIME CONVERSION
the original view. The hole fill module is applied to fill color We propose three levels of data reuse to reduce the memory
into the holes generated by the shift module. traffic for the real-time 2D-to-3D architecture. Exploiting
data reuse can improve the performance of memory access
1) GAUSSIAN FILTER and help overcome the ‘‘Memory Wall’’ to ensure the real-
The Gaussian filter reads the depth values from the depth map time processing. The data reuse method is implemented in
buffer and employs a 5 × 5 template (Fig. 4) to smooth the the buffers design for Sobel module and Gaussian module
depth map. A highly parallel architecture is implemented to to reduce the both the off-chip and on-chip memory traffic
ensure real-time processing (Fig. 8). The design of Gaussian in the proposed architecture. Sobel and Gaussian are two
filter module is very similar to that of fx in Sobel module widely used methods in many applications, so the data reuse
except that the scale of the Gaussian filter is larger. First, method proposed in this paper can also be used in other
a multiplier array with 25 multipliers is used to do the applications. Some basic concepts are explained in Table 1.
25 multiplications in parallel. Then, an adder tree is used to One frame or depth map is divided into many blocks (BK)
implement the sums of 25 values in the lowest latency. with the block size N×N. A block strip (BS) represents a row
TABLE 2. Memory traffic and buffer size for different data reuse levels.
TABLE 3. Memory traffic and buffer size of different data reuse levels for shifted into the register array. Nine pixels are outputted to the
Sobel to proecess one frame.
computing logic of Sobel module at each cycle. In this way,
the memory traffic between the input buffer and the register
array is reduced to 3×W×H, which is 1/3 of the case with no
data reuse.
TABLE 5. Implementation of the proposed 2D-to-3D architecture and a 32KB L1 data cache, a 32KB L1 instruction cache and a
comparison with other implementations.
256KB L2 cache. The four cores share an 8MB L3 unified
cache. In [18], the system is implemented on a notebook
computer. CPU of the notebook is a 1.60 GHz quad-core
CPU with 6M cache, featuring simultaneous multithreading.
The notebook also has a 1.375 GHz GPU with 7 stream
processors inside. Each stream processor consists of 16 cores.
As an FPGA implementation [14], DIBR is implemented on
a Xilinx Virtex-5 FPGA platform. We find that the proposed
design can achieve higher throughput than the other three
platforms.
V. CONCLUSION
In this paper, we propose a reconfigurable VLSI architecture
for real-time 2D-to-3D conversion. The proposed architecture
can support two depth-retrieval methods and different resolu-
tions. We implement a novel data reuse method to reduce the
memory traffic for real-time conversion. Experiment results
show that our design achieves state-of-the-art performance,
which is higher than the performance of implementations on
various platforms (FPGA, ASIC, CPU and GPU).
ACKNOWLEDGMENT
The authors would like to thank all the reviewers for
their valuable comments. The authors also want to thank
Xiudong Li for his contribution to this work.
REFERENCES
[1] Royal Philips Electronics. (2008). Whitepaper WOWvx BlueBox. [Online].
Available: http://www.business-sites.philips.com/global/en/gmm/images/
3d/3dcontentceationproducts/downloads/BlueBox?white?paper.pdf
[2] C. Wu, G. Er, X. Xie, T. Li, X. Cao, and Q. Dai, ‘‘A novel method for
semi-automatic 2D to 3D video conversion,’’ in Proc. 3DTV Conf. True
Vis. Capture, Transmiss. Display 3D Video, May 2008, pp. 65–68.
FIGURE 23. Layout of the 2D-to-3D chip. [3] H.-M. Wang, C.-H. Huang, and J.-F. Yang, ‘‘Depth maps interpolation from
existing pairs of keyframes and depth maps for 3D video generation,’’ in
Proc. IEEE Int. Symp. Circuits Syst., May/Jun. 2010, pp. 3248–3251.
TABLE 6. Performance comparison with FPGA, CPU and GPU. [4] H.-M. Wang, C.-H. Huang, and J.-F. Yang, ‘‘Block-based depth maps
interpolation for efficient multiview content generation,’’ IEEE Trans.
Circuits Syst. Video Technol., vol. 21, no. 12, pp. 1847–1858, Dec. 2011.
[5] I. A. Ideses, L. P. Yaroslavsky, B. Fishbain, and R. Vistuch, ‘‘3D from
compressed 2D video,’’ Proc. SPIE, vol. 6490, pp. 64901C-1–64901C-8,
Mar. 2007.
[6] P. Favaro and S. Soatto, ‘‘A geometric approach to shape from defocus,’’
IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 3, pp. 406–417,
Mar. 2005.
[7] W. J. Tam, C. Vázquez, and F. Speranza, ‘‘Three-dimensional TV: A novel
method for generating surrogate depth maps using colour information,’’
Proc. SPIE, vol. 7237, pp. 72371A-1–72371A-9, Feb. 2009.
[8] J. Zhang, Y. Yang, and Q. Dai, ‘‘A novel 2D-to-3D scheme by visual
attention and occlusion analysis,’’ in Proc. 3DTV Conf., True Vis. Capture
Transmiss. Display 3D Video, May 2011, pp. 1–4.
[9] H. Dong et al., ‘‘An automatic depth map generation for 2D-to-3D conver-
sion,’’ in Proc. 18th IEEE Int. Symp. Consum. Electron. (ISCE), Jun. 2014,
pp. 1–2.
[10] T.-C. Chen et al., ‘‘Brain-inspired framework for fusion of multiple
depth cues,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 7,
2D-to-3D algorithm was also implemented on other plat-
pp. 1137–1149, Jul. 2013.
forms, such as CPU [19], CPU+GPU [18], and FPGA [14]. [11] Z. Zhang, S. Yin, L. Liu, and S. Wei, ‘‘Real-time time-consistent
Table 6 gives the performance comparison between different 2D-to-3D video conversion based on color histogram,’’ in Proc. IEEE Int.
platforms. In [19], the 2D-to-3D application is implemented Conf. Consum. Electron. (ICCE), Jan. 2015, pp. 179–180.
[12] H. Dong, S. Yin, G. Jiang, L. Liu, and S. Wei, ‘‘An Automatic Depth Map
on a 4-core laptop system, with one Intel Core i7-2820QM Generation Method by Image Classification,’’ in Proc. IEEE Int. Conf.
processor running at 2.3 GHz. Each core is equipped with Consum. Electron. (ICCE), Jan. 2015, pp. 179–180.
[13] W. A. Wulf and S. A. McKee, ‘‘Hitting the memory wall: Implications ZHEN ZHANG received the B.S. degree in micro-
of the obvious,’’ ACM SIGARCH Comput. Archit. News, vol. 23, no. 1, electronics from Tsinghua University, China,
pp. 20–24, Mar. 1995. in 2010, where he is currently pursuing the Ph.D.
[14] Y.-K. Lai, Y.-C. Chung, and Y.-F. Lai, ‘‘Hardware implementation for real- degree. His current research interests include VLSI
time 3D rendering in 2D-to-3D conversion,’’ in Proc. IEEE Int. Symp. SoC design and high-level synthesis of CGRA.
Circuits Syst. (ISCAS), May 2013, pp. 893–896.
[15] Y.-C. Fan, B.-L. Lin, and S.-Y. Chou, ‘‘Hardware structure of 2D to 3D
image conversion system for digital archives,’’ in Proc. IEEE 17th Int.
Symp. Consumer Electron. (ISCE), Jun. 2013, pp. 111–112.
[16] S.-F. Hsiao, J.-W. Cheng, W.-L. Wang, and G.-F. Yeh, ‘‘Low latency design
of Depth-Image-Based Rendering using hybrid warping and hole-filling,’’
in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2012, pp. 608–611.
[17] Y.-R. Horng, Y.-C. Tseng, and T.-S. Chang, ‘‘VLSI architecture for real-
time HD1080p view synthesis engine,’’ IEEE Trans. Circuits Syst. Video
Technol., vol. 21, no. 9, pp. 1329–1340, Sep. 2011. HAO DONG received the B.S. degree from the
[18] S.-F. Tsai, C.-C. Cheng, C.-T. Li, and L.-G. Chen, ‘‘A real-time 1080p 2D- School of Instrumentation Science and Optoelec-
to-3D video conversion system,’’ IEEE Trans. Consum. Electron., vol. 57, tronics Engineering, Beihang University, Beijing,
no. 2, pp. 915–922, May 2011. China, in 2012. He is currently pursuing the
[19] J. Feng, Y. Chen, E. Li, Y. Du, and Y. Zhang, ‘‘Accelerating 2D-to-3D video M.S. degree with the Institute of Microelectronics,
conversion on multi-core systems,’’ in Proc. 11th Int. Conf. Appl. Parallel Tsinghua University, Beijing. His main research
Sci. Comput. (PARA), Jun. 2012, pp. 410–421.
interests include computer vision and VLSI SoC
[20] W. J. Tam, A. S. Yee, J. Ferreira, S. Tariq, and F. Speranza, ‘‘Stereoscopic
design.
image rendering based on depth maps created from blur and edge informa-
tion,’’ Proc. SPIE, vol. 5664, pp. 104–115, Mar. 2005.
[21] W.-Y. Chen, Y.-L. Chang, S.-F. Lin, L.-F. Ding, and L.-G. Chen, ‘‘Effi-
cient depth image based rendering with edge dependent depth filter and
interpolation,’’ in Proc. IEEE Int. Conf. Multimedia Expo, Amsterdam,
The Netherlands, Jul. 2005, pp. 1314–1317.
[22] L. Zhang, C. Vázquez, G. Huchet, and W. J. Tam, ‘‘DIBR-based conversion RUI SHI received the B.S. degree from Tsinghua
from monoscopic to stereoscopic and multi-view video,’’ in 3D-TV System University, Beijing, China, in 2012, where he
With Depth-Image-Based Rendering. New York, NY, USA: Springer, 2013, is currently pursuing the M.S. degree with the
pp. 107–143. Institute of Microelectronics. His research inter-
[23] W.-Y. Chen, Y.-L. Chang, H.-K. Chiu, S.-Y. Chien, and L.-G. Chen, ‘‘Real- ests include mobile computing and computing
time depth image based rendering hardware accelerator for advanced
architecture.
three dimensional television system,’’ in Proc. IEEE Int. Conf. Multimedia
Expo (ICME), Jul. 2006, pp. 2069–2072.