Vous êtes sur la page 1sur 10

Received October 13, 2017, accepted November 24, 2017, date of publication November 28, 2017,

date of current version December 22, 2017.

Digital Object Identifier 10.1109/ACCESS.2017.2778220

Reconfigurable VLSI Architecture for

Real-Time 2D-to-3D Conversion
1 School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China
2 Institute of Microelectronics, Tsinghua University, Beijing 100084, China
Corresponding author: Shouyi Yin (yinsy@tsinghua.edu.cn)
This work was supported in part by the China Major S&T Project under Grant 2013ZX01033001-001-003, in part by NNSF of China
under Grant 61774094, Grant 61602285, and Grant 61602284, and in part by the Shandong Provincial Natural Science Foundation, China,
under Grant ZR2015FQ009.

ABSTRACT 2-D-to-3-D conversion is one way to make full use of 2-D contents to produce 3-D contents.
Real-time 2-D-to-3-D conversion is required for 3-D consumer electronic devices, which demands fast
processing speed especially for high-definition videos. In this paper, we propose a reconfigurable VLSI
architecture for real-time 2-D-to-3-D conversion. Two different depth-retrieval methods are implemented in
this architecture in order to support the choice of a best method or combination of different methods. The
proposed architecture can also support different resolutions (4K, 1080p, and 720p) and the original view
can be configured as either left view or right view. In order to overcome the problem of ‘‘Memory Wall’’,
we propose a data reuse method to reduce memory traffic for the proposed architecture so that the overall
performance is improved to realize real-time conversion. The experiment result shows that the implemented
2-D-to-3-D architecture can achieve state-of-the-art throughput (4K@30f/s).

INDEX TERMS 2D-to-3D conversion, data reuse, memory traffic, real-time, reconfigurable.

I. INTRODUCTION Different depth-retrieval methods may be effective in

Three-dimensional video is becoming more and more popular different scenarios, so it is necessary to support differ-
and attracting more and more viewers to experience vivid ent depth-retrieval methods in a system for selection of
stereoscopic visual effects. More and more common con- a best method or optimal combination of different meth-
sumer electronics can play 3D video, such as 3D-TV, laptop, ods [10]. On the other hand, real-time 2D-to-3D conversion is
tablet and so on. However, it is costly and time-consuming required for 3D consumer electronic devices, which demand
to produce high quality 3D contents directly. To alleviate the fast processing speed especially for high-definition videos.
problem of 3D material shortage, converting 2D to 3D is one ‘‘Memory Wall’’ [13] is an obstacle for real-time conver-
way to make full use of the already existing 2D materials. sion, which rises from the performance gap between com-
The difference between 2D contents and 3D contents is that putation speed and memory access speed. A few VLSI
3D contents contain the depth information which represents architectures are proposed to accelerate real-time 2D-to-3D
the relative distances between the objects. The generation conversion or view synthesis [14]–[17], but these architec-
of comfortable and natural depth is important for 2D-to-3D tures did not consider the depth-retrieval part of 2D-to-3D
conversion. There are two steps for 2D-to-3D conversion. conversion. A CPU+GPU platform is proposed to achieve
First, a depth map with the depth values of all pixels is 1080p@30fps 2D-to-3D conversion in [18]. After the
extracted from a 2D picture. Second, depth-image-based ren- optimization of the 2D-to-3D conversion algorithm,
dering (DIBR) is applied to generate new views according to 1080p@36fps is achieved on an Intel Core i7 processor [19].
the original view and the depth map. Semiautomatic depth In this paper, we propose a reconfigurable VLSI archi-
map generation [1]–[4] is very time-consuming. Therefore, tecture for real-time 2D-to-3D conversion. The proposed
algorithms for automatic depth map generation are proposed architecture is a complete (including both depth generation
to reduce the time overhead [5]–[12]. and DIBR) 2D-to-3D VLSI architecture for high-definition

2017 IEEE. Translations and content mining are permitted for academic research only.
26604 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 5, 2017
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
W. Xu et al.: Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion

videos. The real-time performance of the 2D-to-3D archi-

tecture can reach 1080p@120fps or 4K@30fps. The pro-
posed architecture can support two different depth-retrieval
methods, edge-based method and color-edge-based method,
in order to support the choice of a best method or the
best combination of different methods [10]. The proposed
architecture can also support different input/output video
resolutions, such as 4K, 1080p and 720p. The inputted
original view can be outputted as left view or right view
according to the habits of the viewers. The hardware mod-
ules are reused by different configurations to reduce the
hardware overhead. In order to overcome the problem of
‘‘Memory Wall’’, we propose different levels of data reuse FIGURE 1. Workflow of Sobel edge detection.
to reduce memory traffic so that the overall performance is
improved to realize the real-time 2D-to-3D conversion. The
proposed data reuse method is implemented in the on-chip
memory design of the 2D-to-3D architecture. The contribu-
tions of this paper are as follows. (1) A real-time complete
2D-to-3D VLSI architecture is proposed and implemented.
(2) The proposed architecture is reconfigurable to support FIGURE 2. The templates for Sobel edge detection.
different depth-retrieval methods and different resolutions.
(3)The proposed data reuse method is useful to reduce the
memory traffic not only for 2D-to3D conversion but also for Y direction gradient (Gy) and the threshold gradient (Gth) are
other similar applications. similar to that of X direction gradient and two other templates
The rest of the paper is organized as follows. The 2D-to-3D in Fig. 2 are used. The three templates share the same input
algorithms supported by the proposed architecture are intro- pixels. The final gradient G is computed according to (2) and
duced in Section II. The proposed 2D-to-3D VLSI architec- compared with the threshold (Gth) to determine whether the
ture is presented in Section III. Experiment results are shown pixel is on an edge. The Sobel operation results in the output
and analyzed in Section IV. Section V is the conclusion. of binary values 1 (on the edge) and 0 (not on the edge).
Then the binary values 1 and 0 are transformed to values
II. 2D-TO-3D ALGORITHMS of 255 and 0 respectively and an initial depth map is generated
In this section, we present the two 2D-to-3D algorithms for the input image [20].
supported by the proposed architecture. The two algorithms
Gx = (−1) × f (x − 1, y − 1) + 0 × f (x, y − 1)
are both divided into two parts, depth-retrieval and DIBR.
For the two 2D-to-3D algorithms, the depth-retrieval parts are + 1 × f (x + 1, y − 1) + (−2) × f (x − 1, y)
different while the DIBR parts are the same. + 0 × f (x, y) + 2 × f (x + 1, y) + (−1)
× f (x − 1, y + 1) + 0 × f (x, y + 1)
+ 1 × f (x + 1, y + 1) (1)
Two depth-retrieval methods, edge-based method and edge- q
color-based method, are implemented in the proposed archi- G = G2x + G2y (2)
tecture to generate the initial depth map. The two methods are
presented in this subsection. 2) EDGE-COLOR-BASED METHOD
The edge-color-based method considers two depth cues, edge
1) EDGE-BASED METHOD and color [18]. There are three steps to generate the initial
The edge-based method uses the edge information in the depth map.
image as the depth cue. Sobel operator is used for edge detec- First, we use Sobel edge detection to judge whether a pixel
tion. The Sobel edge detection computes the edge direction of belongs to an edge, which is the same as the edge-based
target pixel and checks whether the target pixel is on an edge. method. The outputs of this step are 1 (on the edge) and
The workflow of Sobel edge detection is described 0 (not on the edge), which are sent to the next step.
in Fig. 1. The computation for edge detection utilizes the first Second, using the edge information, a global depth for
two templates (matrices) in Fig. 2 to computes the gradients each row is obtained from the cumulative horizontal edge
of the target pixel in X direction (Gx) and Y direction (Gy) histogram, according to (3). Depthglobal is the global depth
respectively. This step needs eight reference pixels around the for a row, which means that all the pixels in one row have the
target pixel to multiply by the two templates. For example, same global depth. Edgecount is the number of edge pixels for
Gx is computed as in (1), where f(x,y) is the pixel in a the current row. Edgesum is the total number of edge pixels in
frame with the coordinate of x and y. The computations of the previous frame. The previous frame is used because the

VOLUME 5, 2017 26605

W. Xu et al.: Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion

number of edge pixels in the current frame is not known until III. PROPOSED ARCHITECTURE FOR
the end of the frame for our hardware implementation and 2D-TO-3D CONVERSION
the number of edge pixels is usually very close between two In this section, we first present the overall architecture (Fig. 5)
adjacent frames. Therefore, Edgesum needs to be updated at for real-time 2D-to-3D conversion, which supports the choice
the end of each frame. of a best method or optimal combination of different methods.
Edgecount Then we explain each module of the proposed architecture in
Depthglobal = × 255 (3) detail.
At last, the combination of Y , Cb and Cr color channels
is used to refine the global depth to achieve a local depth
according to (4), where f1 , f2 and f3 are linear functions. The
local depths of all the pixels in the current frame compose the
initial depth map.
Depthlocal = Depthglobal × f1 (Cr ) × f2 (Cb ) × f3 (Y ) (4)

Depth-image-based rendering (DIBR) is used to generate
the new view from the original 2D video sequence and the
extracted depth information (depth map). DIBR includes four
parts (Fig. 3), pre-processing of the initial depth map, convert-
ing depth to disparity, pixel shifting and hole filling [21], [22].

FIGURE 5. Overall architecture for 2D-to-3D conversion.

The 2D-to-3D architecture adopts a pixel-level pipeline.
The 2D sequence is inputted pixel by pixel and outputted
FIGURE 3. The workflow of DIBR.
unchanged as the original view. The new view is generated
First, as a pre-processing step, a Gaussian filter is used to after the processing of two modules, depth map generation
smooth the initial depth map and the final depth value for and DIBR. The input buffer is designed as a line buffer to store
each pixel is generated. The Gaussian filter employs a 5 × 5 several lines of the original image in order to be accessed by
template (Fig. 4), which needs 24 reference pixels around the different modules of the proposed 2D-to-3D architecture at
target pixel. the same time. The configuration module can be configured
to select a best depth generation algorithm for the depth
map generation module, which supports two different depth
map generation methods. The configuration module can also
be configured to support different resolutions, such as 4K,
1080p or 720p. The depth map generation module is used
to generate an initial depth map. After that, DIBR module
processes the initial depth map and generates a new view.
FIGURE 4. Template for Gaussian filter.
Depth map buffer stores several lines of the initial depth map,
which will be pre-processed by Gaussian filter in the DIBR
Second, the smoothed depth value is converted to the dis- module.
parity between the position of the pixel in the original view
and that in the new view, according to (5). D represents the B. RECONFIGURABLE DEPTH MAP GENERATION
viewing distance from the display. tc is the interpupillary dis-
The depth map generation module supports two depth-
tance which is the human eye separation. Z is the normalized
retrieval methods, an edge-based method and an edge-color-
depth value for each pixel.
based method (Fig. 6). The Sobel module detects whether a
D pixel belongs to an edge, which is shared by the two depth-
disparity = xR − xL = tc (1 − ) (5)
D−Z retrieval methods. The global depth module and local depth
Third, the pixels of the new view are generated according module are only used for the edge-color-based method to
to the disparity and the original view. However, there may generate the global depth and the local depth respectively. The
be some holes between pixels of the new view because some value conversion module is used by the edge-based method to
of the pixels fall in the same position after pixel shifting. convert the values 1 and 0 to 255 and 0 respectively to get an
Therefore, at last, hole-filling is applied to fill color into these initial depth. The initial depth values generated by the best
holes if there are still holes. depth-retrieval method can be selected through configuration

26606 VOLUME 5, 2017

W. Xu et al.: Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion

FIGURE 8. Parallel multipliers with an adder tree for four modules, fx, fy,
threshold and Gassian filter.

FIGURE 6. Architecture for the reconfigurable depth map generation. The

white module is reusable between two depth-retrieval methods while the
grey modules are not.

FIGURE 7. Architecture of Sobel module.

FIGURE 9. Architecture of global depth module.

module (Algorithm selection) according to the content of the

image, and then stored to the depth map buffer. 2) GLOBAL DEPTH MODULE
The architecture of the global depth module is shown
1) SOBEL MODULE in Fig. 9. It generates the global depth for each row. In this
The Sobel module (Fig. 7) checks whether the target pixel design, Row/Col Count module gives control commands to
is on the edge. It receives three pixels each cycle from the determine the end of a line or a frame, according to input
input buffer. The three pixels are from adjacent three lines in signals from the Sobel module (Edge_ready) and the configu-
the input buffer. There is a register array for storing the input ration module (Width/Height of the frame). The edge count of
pixels because nine pixels (3×3) are needed each cycle for the current row is accumulated in the Edge_row module. The
the computing units of Sobel module. Edge_frame module keeps both the accumulated edge count
The fx module and the fy module computes the gradients in the current frame and the sum of edge count in the previous
of target pixel in x direction and y direction respectively, frame. The global depth of a row is computed by Depth_comp
using the first two templates in Fig. 2. The threshold module module when Row/Col Count module finds the end of this
computes the threshold with the third template in Fig. 2 and row, using the edge count of the current row in Edge_row
reads the same nine pixels (3×3) as fx module or fy module and the edge count of the previous frame in Edge_frame
from the register array. The fx module, the fy module and according to (3). Edgesum is supposed to be edge count of
the threshold module all use multiple multipliers to finish the current frame, but we use previous frame to enable the
the multiplications in parallel (Fig. 8). After that, an adder pipeline structure. Therefore, overflow should be avoided in
tree is implemented to finish the additions and the gradi- the Depth_comp module.
ents (Gx, Gy and Gth) are generated. This design ensures the
fastest processing speed for Sobel edge detection. 3) LOCAL DEPTH MODULE
Final gradient module computes the combined gradient G When the global depth of a row is inputted to the local depth
of x direction and y direction, according to (2). After that, module, as shown in Fig. 10, local depths of all pixels in
we can check whether the target pixel is on an edge by this row are computed one by one according to (4). A depth
comparing G with Gth. register is used to store the global depth of the current row.

VOLUME 5, 2017 26607

W. Xu et al.: Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion

FIGURE 10. Architecture of local depth module.

One 24-bit pixel from the input buffer is divided to three 8-bit
data, Cr_data, Cb_data and Y_data. Three linear functions are
computed in parallel by three modules, F(Cr), F(Cb) and F(Y)
respectively. The final multiplication of the four values in (4)
is performed in pairs for less delay, (DepthGlobal , f1 (Cr )) as a FIGURE 12. Architecture of Disparity, Shift and Hole fill.
pair and (f2 (Cb ) , f3 (Y )) as a pair. Therefore, there are three
multipliers in the Multiply module for (DepthGlobal ×f1 (Cr )), 2) DISPARITY, SHIFT AND HOLE FILL
(f2 (Cb ) × f3 (Y )) and the final local depth (Local_depth_data) The disparity module receives the smoothed depth value from
respectively. the Gaussian filter one by one (Fig. 12). The depth value is
first normalized and then transformed to the disparity accord-
ing to (5). The Shift module first transforms the disparity to
the address of the register array with a decoder. Then the
pixels of the original view from the input buffer are stored
into the register array according to the address. The L/R signal
from the Configurations module is used to choose whether
the conversion is from left view to right view or reverse. The
FIGURE 11. DIBR Architecture. Hole fill module is used to fill color into the holes which are
generated after the pixel shifting. The pixel from the Shift
C. DIBR ARCHITECTURE module is first judged whether it is a hole or not. If it is a
The DIBR architecture includes four modules (Fig. 11), hole, the pixels near it are used to fill the hole. There is also
Gaussian Filter, Disparity, Shift and Hole Fill. As a pre- a register array to store the line of pixels for hole filling. The
processing module, Gaussian Filter smoothes the initial depth output data from the hole fill module are the final data of the
map from the depth map buffer and the final depth value for new view.
each pixel is generated. The final depth value is converted to
the disparity by the disparity module. The pixels of the new D. MEMORY TRAFFIC REDUCTION FOR
view are generated by the shift module using the disparity and REAL-TIME CONVERSION
the original view. The hole fill module is applied to fill color We propose three levels of data reuse to reduce the memory
into the holes generated by the shift module. traffic for the real-time 2D-to-3D architecture. Exploiting
data reuse can improve the performance of memory access
1) GAUSSIAN FILTER and help overcome the ‘‘Memory Wall’’ to ensure the real-
The Gaussian filter reads the depth values from the depth map time processing. The data reuse method is implemented in
buffer and employs a 5 × 5 template (Fig. 4) to smooth the the buffers design for Sobel module and Gaussian module
depth map. A highly parallel architecture is implemented to to reduce the both the off-chip and on-chip memory traffic
ensure real-time processing (Fig. 8). The design of Gaussian in the proposed architecture. Sobel and Gaussian are two
filter module is very similar to that of fx in Sobel module widely used methods in many applications, so the data reuse
except that the scale of the Gaussian filter is larger. First, method proposed in this paper can also be used in other
a multiplier array with 25 multipliers is used to do the applications. Some basic concepts are explained in Table 1.
25 multiplications in parallel. Then, an adder tree is used to One frame or depth map is divided into many blocks (BK)
implement the sums of 25 values in the lowest latency. with the block size N×N. A block strip (BS) represents a row

26608 VOLUME 5, 2017

W. Xu et al.: Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion

TABLE 1. Basic concepts.

FIGURE 14. Level BK+ data reuse in a frame or a depth map.

FIGURE 13. Level BK data reuse in a frame or a depth map.

FIGURE 15. Level BS data reuse in a frame or a depth map.

TABLE 2. Memory traffic and buffer size for different data reuse levels.

of blocks. W and H stand for the width and height of a frame



For Sobel edge detection or Gaussian filter, a block-based
processing mode is used in a frame or in a depth map. This FIGURE 16. Input buffer design with a Level BS data reuse.
processing mode contains at least two levels of data reuse,
block (BK) level and block strip (BS) level. For the BK
level data reuse (Fig. 13), an (N−1)×N buffer is needed for N +n pixels are loaded into the buffer and there are n+1 target
reusable data between two neighboring blocks. For the BS lines or block strips which are processed at the same time.
level data reuse (Fig. 15), a W×(N−1) buffer is used for The memory traffic of BK+ is (N+n)×W×H/(n+1) with the
storing reusable data between two neighboring block strips. buffer size (N−1) × (N+n) (Table 2). The different data reuse
With no data reuse (Table 2), the memory traffic is supposed levels gives a good trade-off between the memory traffic and
to be N×N×W×H pixels per frame because W×H blocks buffer size. According to Table 2, we can choose a proper
with the size of N×N need to be loaded. In this case, no buffer data reuse level for a given application scenario. For example,
is needed. The memory traffic is reduced to N×W×H and we can choose Level BS if the performance is important while
W×H for the two data reuse levels, BK and BS respectively. the buffer size is sufficient.
However, a more efficient data reuse level usually requires a
larger buffer, (N−1) ×N for BK and (N−1) ×W for BS. 2) IMPLEMENTATION OF DATA REUSE FOR SOBEL MODULE
There is another data reuse level between BK and BS, According to the proposed data reuse methods, we design the
which is named as BK+ (Fig. 14). In each step of BK+, input buffer to implement the BS level data reuse (Fig. 16)

VOLUME 5, 2017 26609

W. Xu et al.: Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion

FIGURE 17. Register array design for Sobel module.

FIGURE 18. Depth map buffer designed for Gaussian filter.

TABLE 3. Memory traffic and buffer size of different data reuse levels for shifted into the register array. Nine pixels are outputted to the
Sobel to proecess one frame.
computing logic of Sobel module at each cycle. In this way,
the memory traffic between the input buffer and the register
array is reduced to 3×W×H, which is 1/3 of the case with no
data reuse.


The depth map in our design is produced by the depth map
generation module on chip instead of storing the depth map
in the off-chip memory. A depth map buffer is designed
to store the depth map on chip (Fig. 18). The depth buffer
avoids storing the depth data to the off-chip memory and
then loading them to the on-chip memory so it can reduce
and a register array to implement the BK level data reuse
a lot of off-chip memory traffic and can improve the overall
(Fig. 17) so that a two-level data reuse method is implemented
performance of 2D-to-3D. The structure of the depth map
for Sobel edge detection module.
buffer is similar to that of the input buffer. The difference is
The input buffer contains four LineStorages (Fig. 16),
that the 8-bit depth data are stored in the depth map buffer
each of which can store a line of pixels and works as a
instead of 24-bit pixels. Because the window size of Gaussian
shifting buffer. A shifting buffer means that the pixels in
filter is 5×5 in our design, four DepthStorages are used in
the buffer are shifted from the left to the right in the buffer.
the depth map buffer. Each DepthStorage contains a line of
The first LineStorage receives pixels one by one from the
depth data.
input. The other LineStorages receive pixels one by one from
In our implementation, Level BK data reuse is also
their neighboring LineStorages. The pixels from the top two
employed. A 4×5 register array is designed to reduce mem-
LineStorages are used by DIBR module and local depth
ory traffic from depth map buffer to Gaussian filter module
module respectively. The length of LineStorage is designed
on chip, which is similar to the shifting register array for the
to be 4096 pixels to support both the maximum resolution
Sobel module.
of 4K and other resolutions less than 4K.
For Sobel edge detection, the block size N equals 3. Two TABLE 4. Memory traffic and buffer size for different data reuse levels of
LineStorages are used for Sobel to store two lines of pixels. gaussian filter.
The input pixel (Pixel_in) is directly used as the third pixel
line of Sobel edge detection. In this way, Level BS data reuse
is realized in the input buffer for Sobel edge detection. The
number of pixels loaded from off-chip is W×H when using
the input buffer (SRAM) for the Sobel module, while the
number of pixels from off-chip is 9×W×H with no data reuse
(Table 3). The input buffer size is 2×W to implement the
Level BS data reuse for Sobel edge detection.
For Sobel edge detection, a shifting register array is also
designed for implementing Level BK data reuse (Fig. 17).
Six (2×3) registers form a two-column register array. In each The memory traffic and buffer size of different data reuse
cycle, three pixels are received from the input buffer and levels for Gaussian filter are listed in Table 4 with N=5

26610 VOLUME 5, 2017

W. Xu et al.: Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion

and n=2. If there is no depth buffer on chip (no data reuse),

the off-chip memory traffic is at least 25×W×H bytes. The
reason is that the depth data are needed to be loaded from
off-chip memory. In our implementation, all depth data are
stored and reused on chip and no depth data need to be
loaded from off-chip memory. Level BK data reuse is used
to reduce the memory traffic on chip between the depth
map buffer and the 4×5 register array for Gaussian filter.
If Level BK+ or Level BS is used on chip, the memory
traffic is reduced and buffer size is increased, compared with
Level BK.


In this section, we first give the results (red-cyan images) of
the two 2D-to-3D conversion algorithms which are supported FIGURE 21. The 2D-to-3D platform with a Xilinx Spartan-6 FPGA chip,
by the proposed architecture. Then we describe the FPGA which is used to implement and verify the proposed 2D-to-3D
platform for implementing the 2D-to-3D architecture. At last,
we give the synthesis results and comparison with other
architectures and other platforms like CPU and GPU.

FIGURE 19. The red-cyan images achieved by the edge-color-based

FIGURE 22. Photo of the real-time 2D-to-3D FPGA Platform.

through the HDMI interface (HDMI TX/RX) to the evalu-

ation board. The 2D-to-3D module is on the FPGA chip with
two HDMI interfaces. The output of the 2D-to-3D module
is displayed on a 3D-TV through an HDMI interface. The
HDMI chips and the FPGA chip are configured by the I2C
bus which is connected with the PC through USB interface.
Fig. 22 gives the photo of our FPGA platform.
FIGURE 20. The red-cyan images achieved by the edge-based method.
A. 2D-TO-3D RESULTS To the best of our knowledge, there is not a complete
We give the red-cyan images which are generated by the two 2D-to-3D VLSI architecture in literature. Therefore, we
2D-to-3D methods supported by the proposed architecture compare the implementation of our design with a VSRS
in Fig. 19 and Fig. 20 respectively. We can see that the design [17] and a DIBR design [23] (Table 5).
edge-color-based method usually achieves better results than We implemented the proposed 2D-to-3D architecture in
the edge-based method. However, the focus of this paper is synthesizable Verilog HDL. Synthesis results using TSMC
not on the algorithm improvement but on the architecture 65GP technology are given in Table 5. The gate count
innovation. The proposed architecture gives a chance for (809.0K) and area (1.76mm×1.79mm) demonstrate that our
the selection of a best method or combination of different design costs a small hardware resource. Compared with
methods. the other two implementations, our implementation gets the
state-of-the-art performance, 4K@30f/s. Fig. 23 gives the
B. FPGA IMPLEMENTATION OF THE layout of the chip. In the proposed design, we employ module
PROPOSED ARCHITECTURE reuse between different configurations to reduce the chip
We have implemented the proposed 2D-to-3D architec- area. For example, two depth-retrieval methods can share the
ture on a Xilinx Spartan-6 FPGA platform (Fig. 21). The Sobel module, the input buffer, the depth map buffer and the
2D sequence is inputted from the personal computer (PC) DIBR module.

VOLUME 5, 2017 26611

W. Xu et al.: Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion

TABLE 5. Implementation of the proposed 2D-to-3D architecture and a 32KB L1 data cache, a 32KB L1 instruction cache and a
comparison with other implementations.
256KB L2 cache. The four cores share an 8MB L3 unified
cache. In [18], the system is implemented on a notebook
computer. CPU of the notebook is a 1.60 GHz quad-core
CPU with 6M cache, featuring simultaneous multithreading.
The notebook also has a 1.375 GHz GPU with 7 stream
processors inside. Each stream processor consists of 16 cores.
As an FPGA implementation [14], DIBR is implemented on
a Xilinx Virtex-5 FPGA platform. We find that the proposed
design can achieve higher throughput than the other three

In this paper, we propose a reconfigurable VLSI architecture
for real-time 2D-to-3D conversion. The proposed architecture
can support two depth-retrieval methods and different resolu-
tions. We implement a novel data reuse method to reduce the
memory traffic for real-time conversion. Experiment results
show that our design achieves state-of-the-art performance,
which is higher than the performance of implementations on
various platforms (FPGA, ASIC, CPU and GPU).

The authors would like to thank all the reviewers for
their valuable comments. The authors also want to thank
Xiudong Li for his contribution to this work.

[1] Royal Philips Electronics. (2008). Whitepaper WOWvx BlueBox. [Online].
Available: http://www.business-sites.philips.com/global/en/gmm/images/
[2] C. Wu, G. Er, X. Xie, T. Li, X. Cao, and Q. Dai, ‘‘A novel method for
semi-automatic 2D to 3D video conversion,’’ in Proc. 3DTV Conf. True
Vis. Capture, Transmiss. Display 3D Video, May 2008, pp. 65–68.
FIGURE 23. Layout of the 2D-to-3D chip. [3] H.-M. Wang, C.-H. Huang, and J.-F. Yang, ‘‘Depth maps interpolation from
existing pairs of keyframes and depth maps for 3D video generation,’’ in
Proc. IEEE Int. Symp. Circuits Syst., May/Jun. 2010, pp. 3248–3251.
TABLE 6. Performance comparison with FPGA, CPU and GPU. [4] H.-M. Wang, C.-H. Huang, and J.-F. Yang, ‘‘Block-based depth maps
interpolation for efficient multiview content generation,’’ IEEE Trans.
Circuits Syst. Video Technol., vol. 21, no. 12, pp. 1847–1858, Dec. 2011.
[5] I. A. Ideses, L. P. Yaroslavsky, B. Fishbain, and R. Vistuch, ‘‘3D from
compressed 2D video,’’ Proc. SPIE, vol. 6490, pp. 64901C-1–64901C-8,
Mar. 2007.
[6] P. Favaro and S. Soatto, ‘‘A geometric approach to shape from defocus,’’
IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 3, pp. 406–417,
Mar. 2005.
[7] W. J. Tam, C. Vázquez, and F. Speranza, ‘‘Three-dimensional TV: A novel
method for generating surrogate depth maps using colour information,’’
Proc. SPIE, vol. 7237, pp. 72371A-1–72371A-9, Feb. 2009.
[8] J. Zhang, Y. Yang, and Q. Dai, ‘‘A novel 2D-to-3D scheme by visual
attention and occlusion analysis,’’ in Proc. 3DTV Conf., True Vis. Capture
Transmiss. Display 3D Video, May 2011, pp. 1–4.
[9] H. Dong et al., ‘‘An automatic depth map generation for 2D-to-3D conver-
sion,’’ in Proc. 18th IEEE Int. Symp. Consum. Electron. (ISCE), Jun. 2014,
pp. 1–2.
[10] T.-C. Chen et al., ‘‘Brain-inspired framework for fusion of multiple
depth cues,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 7,
2D-to-3D algorithm was also implemented on other plat-
pp. 1137–1149, Jul. 2013.
forms, such as CPU [19], CPU+GPU [18], and FPGA [14]. [11] Z. Zhang, S. Yin, L. Liu, and S. Wei, ‘‘Real-time time-consistent
Table 6 gives the performance comparison between different 2D-to-3D video conversion based on color histogram,’’ in Proc. IEEE Int.
platforms. In [19], the 2D-to-3D application is implemented Conf. Consum. Electron. (ICCE), Jan. 2015, pp. 179–180.
[12] H. Dong, S. Yin, G. Jiang, L. Liu, and S. Wei, ‘‘An Automatic Depth Map
on a 4-core laptop system, with one Intel Core i7-2820QM Generation Method by Image Classification,’’ in Proc. IEEE Int. Conf.
processor running at 2.3 GHz. Each core is equipped with Consum. Electron. (ICCE), Jan. 2015, pp. 179–180.

26612 VOLUME 5, 2017

W. Xu et al.: Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion

[13] W. A. Wulf and S. A. McKee, ‘‘Hitting the memory wall: Implications ZHEN ZHANG received the B.S. degree in micro-
of the obvious,’’ ACM SIGARCH Comput. Archit. News, vol. 23, no. 1, electronics from Tsinghua University, China,
pp. 20–24, Mar. 1995. in 2010, where he is currently pursuing the Ph.D.
[14] Y.-K. Lai, Y.-C. Chung, and Y.-F. Lai, ‘‘Hardware implementation for real- degree. His current research interests include VLSI
time 3D rendering in 2D-to-3D conversion,’’ in Proc. IEEE Int. Symp. SoC design and high-level synthesis of CGRA.
Circuits Syst. (ISCAS), May 2013, pp. 893–896.
[15] Y.-C. Fan, B.-L. Lin, and S.-Y. Chou, ‘‘Hardware structure of 2D to 3D
image conversion system for digital archives,’’ in Proc. IEEE 17th Int.
Symp. Consumer Electron. (ISCE), Jun. 2013, pp. 111–112.
[16] S.-F. Hsiao, J.-W. Cheng, W.-L. Wang, and G.-F. Yeh, ‘‘Low latency design
of Depth-Image-Based Rendering using hybrid warping and hole-filling,’’
in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2012, pp. 608–611.
[17] Y.-R. Horng, Y.-C. Tseng, and T.-S. Chang, ‘‘VLSI architecture for real-
time HD1080p view synthesis engine,’’ IEEE Trans. Circuits Syst. Video
Technol., vol. 21, no. 9, pp. 1329–1340, Sep. 2011. HAO DONG received the B.S. degree from the
[18] S.-F. Tsai, C.-C. Cheng, C.-T. Li, and L.-G. Chen, ‘‘A real-time 1080p 2D- School of Instrumentation Science and Optoelec-
to-3D video conversion system,’’ IEEE Trans. Consum. Electron., vol. 57, tronics Engineering, Beihang University, Beijing,
no. 2, pp. 915–922, May 2011. China, in 2012. He is currently pursuing the
[19] J. Feng, Y. Chen, E. Li, Y. Du, and Y. Zhang, ‘‘Accelerating 2D-to-3D video M.S. degree with the Institute of Microelectronics,
conversion on multi-core systems,’’ in Proc. 11th Int. Conf. Appl. Parallel Tsinghua University, Beijing. His main research
Sci. Comput. (PARA), Jun. 2012, pp. 410–421.
interests include computer vision and VLSI SoC
[20] W. J. Tam, A. S. Yee, J. Ferreira, S. Tariq, and F. Speranza, ‘‘Stereoscopic
image rendering based on depth maps created from blur and edge informa-
tion,’’ Proc. SPIE, vol. 5664, pp. 104–115, Mar. 2005.
[21] W.-Y. Chen, Y.-L. Chang, S.-F. Lin, L.-F. Ding, and L.-G. Chen, ‘‘Effi-
cient depth image based rendering with edge dependent depth filter and
interpolation,’’ in Proc. IEEE Int. Conf. Multimedia Expo, Amsterdam,
The Netherlands, Jul. 2005, pp. 1314–1317.
[22] L. Zhang, C. Vázquez, G. Huchet, and W. J. Tam, ‘‘DIBR-based conversion RUI SHI received the B.S. degree from Tsinghua
from monoscopic to stereoscopic and multi-view video,’’ in 3D-TV System University, Beijing, China, in 2012, where he
With Depth-Image-Based Rendering. New York, NY, USA: Springer, 2013, is currently pursuing the M.S. degree with the
pp. 107–143. Institute of Microelectronics. His research inter-
[23] W.-Y. Chen, Y.-L. Chang, H.-K. Chiu, S.-Y. Chien, and L.-G. Chen, ‘‘Real- ests include mobile computing and computing
time depth image based rendering hardware accelerator for advanced
three dimensional television system,’’ in Proc. IEEE Int. Conf. Multimedia
Expo (ICME), Jul. 2006, pp. 2069–2072.

WEIZHI XU received the Ph.D. degree in

computer architecture from the Institute of Com-
puting Technology, Chinese Academy of Sci- LEIBO LIU received the B.S. degree in electronic
ences, China, in 2013. He was a Post-Doctoral engineering from Tsinghua University, Beijing,
Researcher with the Institute of Microelectron- China, in 1999, and the Ph.D. degree from the
ics, Tsinghua University. He is currently with Institute of Microelectronics, Tsinghua University,
Shandong Normal University as an Assistant in 2004. He is currently serves as an Associate
Professor. His research interests include high per- Professor with the Institute of Microelectronics,
formance computing and parallel processing. Tsinghua University. His research interests include
reconfigurable computing, mobile computing, and

SHOUYI YIN received the B.S., M.S., and Ph.D.

degrees in electronic engineering from Tsinghua
University, China, in 2000, 2002, and 2005, SHAOJUN WEI was born in Beijing, China, in
respectively. He was with the Imperial College 1958. He received the Ph.D. degree from the
London as a Research Associate. He is currently Faulté Polytechnique de Mons, Belgium, in 1991.
with the Institute of Microelectronics, Tsinghua He is currently a Professor with the Institute of
University, as an Associate Professor. He has Microelectronics, Tsinghua University. He is also
authored or co-authored over 40 refereed papers, a Senior Member of the Chinese Institute of Elec-
and served as TPC member or reviewers for the tronics. His main research interests include VLSI
international key conferences and leading journals. SoC design, EDA methodology, and ASIC design.
His research interests include SoC design, reconfigurable computing, and
mobile computing.

VOLUME 5, 2017 26613