Académique Documents
Professionnel Documents
Culture Documents
+ + =
i i i i
d d s b s
(4)
1
1
1 1 2
+
+ + =
i i i i
s s d c d
(5)
2
1
2 1 2
+ + =
i i i i
d d s d s
(6)
2
1 i i
s k s =
(7)
2
2 i i
d k d =
(8)
where
i
x
2
and
1 2 + i
x
are even and odd input elements,
respectively and o / 1 = a , o| / 1 = b , | / 1 = c ,
o / 1 = d , o|o, =
1
k and , o| /
2
= k and
, o | o and , , , are the coefficients employed in the
lifting scheme and listed in Table 1,
0
i
s and
0
i
d represent
the input odd and even parts respectively,
n
i
s
and
n
i
d
(n=1,2) represent the intermediate value obtained in the
lifting process, and
i
s and
i
d represent the low pass and
high pass parts of the output signals, respectively.
1
k and
2
k are also known as k
L
and k
H
, respectively. In fact in
flipping scheme there will be a final scaling stage by fac-
tors K
L
and K
H
which will be applied in the last filtering
step.
TABLE 1
LFTING SCHEME PARAMETERS
Parameter Approximate Value
o
-1.586134342059924
|
-0.052980118572961
0.882911075530934
o
0.443506852043971
,
1.230174104914001
In flipping scheme from one side (3) is merged with (4)
and on the other side (5) is merged with (6) resulting the
outputs in stage 2 as [4]:
1
1
1 0 1
+ + =
i i i i
d d bs s
0 0
1
0
1
0
1
0 0 0
i i i i i i i
s s ad s s ad bs + + + + + + =
+
(9)
2
1
2 1 2
+ + =
i i i i
d d ds s
1 1
1
1 1
1
1 1 1
i i i i i i i
s s cd s s cd ds + + + + + + =
+
(10)
The following data path is introduced by Hao et al. [6] to
implement (9) and (10).
This data path employs one multiplier and two adders
to produce one output per clock. This data path has the
advantage of lower critical path over the traditional lift-
ing scheme for WT. The critical path in the flipping
scheme is one multiplier in contrast to the critical path
of flipping scheme which includes one multiplier (Tm)
and two adders (Ta).
Fig. 1. Datapath for realization of the first stage of flipping scheme
[6]
Hao et al. [6] introduced the hardware in Fig. 2 to im-
plement the 1D WT according to flipping scheme. We
refer to the architecture in Fig.2 as basic unit hereafter.
This architecture is the basic building block for the 1D
and 2D flipping based WT hardwares.
We propose the architecture in Fig. 3 for hardware rea-
lization of 3D WT which can be used in video coding or
3D image coding. The proposed architecture for 3D WT
employs three basic units for spatial and temporal trans-
forms. In the video coding, spatial transforms are applied
to the horizontal and vertical directions in each frame,
where as the temporal transform applies to the sequence
of transformed images. On the other hand in the 3D im-
age coding all the transforms are spatial transforms and
the first two transforms applies to the horizontal and ver-
tical directions of each slice of the 3D image and the third
WT is applied to the WT of the slices in the 3D image. The
proposed architecture has two transpose modules and
one scaling unit (SUN) besides the basic units. The re-
sulted WT coefficients can be saved in an external memo-
ry module to store the LLL subband of the resulted WT
coefficients for the next levels of 3D WT. The proposed
architecture can be used for 1D and 2D WT besides the
3D WT.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 179
Fig. 2. Proposed architecture in [6] to implement 1D WT
Fig. 3. Proposed architecture to implement 3D WT
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 180
In the proposed architecture the WT is applied to the
rows of the image and the Low (L) and High (H) frequen-
cy coefficients are generated by the first one dimensional
WT. The produced L and H coefficients are reordered by
the Transpose 1 module to set in a suitable order for the
next basic unit module. Fig 4. indicates Transpose module
1 [6]. The controllers of the multiplexer in Fig. 4 are ap-
propriately derived by the main control unit to produce
the outputs in the desired order. The proposed architec-
ture in [6] requires only one transpose module, because it
realizes 2D WT and the coefficients generated by the
second basic WT unit are the final outputs of the architec-
ture. In our architecture we require another transpose
unit to reorder the coefficients of the second WT stage for
the third WT stage.
Fig. 4. Transpose module 1 [6]
The second basic unit applies WT to the columns of the
image and generates LL, LH, HL and HH coefficients of
four consecutive 2D WT. In this stage, similar to the pre-
vious stage, the coefficients are reordered by the Trans-
pose module 2 to come to an appropriate order for the
third stage. The proposed architecture for Transpose
module 2 is depicted in Fig. 5. The transpose module 2
reorders the 4 groups of consecutive LL, LH, HL and HH
coefficients to four of four LL, four LH, four HL and four
HH coefficients. Hence, we require 16 serially connected
registers and one 16 to 1 multiplexer in Transpose module
2 as depicted in Fig. 5.
Fig. 5. Proposed architecture for Transpose module 2
The third basic unit applies WT to the corresponding
coefficients in the consecutive images and produces LLL,
LLH, LHL, LHH, HLL, HLH, HHL and HHH coefficients.
The resulted coefficients require scaling to generate the
final outputs in accordance to the 9/7 WT; the scaling is
performed in the Scaling Unit (SUN) which is shown in
Fig. 6. Since our architecture is proposed to implement
1D, 2D and 3D WTs and the scaling coefficients for these
WTs are different a multiplexer is used in the SUN unit to
apply appropriate scaling according to the generated
coefficient and the type of WT. In Fig. 6 k
H
and k
L
are the
scaling coefficients for 1D WT and k
LL
, k
HL
, k
LH
and k
HH
are the scaling coefficients for 2D WT where as k
LLL
, k
LLH
,
k
LHL
, k
LHH
, k
HLL
,k
HLH
, k
HHL
and k
HHH
are the scaling coeffi-
cients for 3D WT. The resulted Low frequency coefficients
are stored in the external memory unit to be used in the
next transform level. The multiplier in Fig. 6 multiplies
the input coefficient by the appropriate scaling factor in
the output of the multiplexer in Fig. 6.
Fig. 6. Proposed SUN module
The SNU unit provides the final WT coefficients. The
output of SUN unit is fed to an external memory unit to
store the LLL subband coefficients, which will be applied
via the input multiplexer to the WT architecture in order
to apply the next levels of the WT.
The proposed architecture requires 7 multipliers and
12 adders and 15(4+2N+MN) storage elements for 3D
WT of an MNF image sequence. Our architecture is a
serial architecture which produces one WT coefficient per
clock. The critical path of our architecture consists of only
one multiplier and hence, its longest delay path is Tm. In
the following section we compare the resources and tim-
ing specifications of our architecture with a number of
architectures proposed in literature for 3D WT.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 181
TABLE 2
COMPARING THE ARCHITECTURES FOR 3D WT
Type of
WT
Type of
Struct
# of
outputs
per clock
Critical
path
Buffer memory
for MN image
# of
Adders
# of
Multipliers
Architecture
Lifting parallel 4 T
m
+2T
a
4(N+2)M 72 96 Dai[8]
Lifting parallel 4 T
m
+2T
a
3.5N
2
+4N 40 48 Das[9]
Lifting parallel 8 T
m
+2T
a
5.5N
2
+8N 96 56 Xiong[7]
Flipping serial 1 T
m
15(4+2N+MN) 12 7 Proposed
3 SIMULATION RESULTS
The proposed architecture is implemented by VHDL and
applied to an image sequence including six 352288 im-
ages. The hardware simulation results verified by the re-
sults derived from the software in C language for flipping
based 3D WT. The specifications of the proposed architec-
ture and 3 previously proposed architectures for 3D WT
are listed in Table 2.
The simulation results indicate that the critical path of
the proposed architecture is equal to one multiplier delay
(Tm) while the critical path delay for the rest of the archi-
tectures increases by two adders delay (2Ta) besides one
multiplier delay (Tm). Moreover the proposed architec-
ture requires 7 multipliers and 12 adders which respec-
tively are about 1/7 and 2/7 of the next architecture with
lowest computational resources [9] in Table I. Even
though the proposed architecture is a serial architecture
which outputs one coefficient per clock and the architec-
ture in [9] is a parallel architecture which out puts four
samples per clock, the reduction in the computational
resources of the proposed architecture compared to [9] is
about two times of their per clock output rate ratios. Be-
sides the aforementioned advantages the proposed archi-
tecture can be used to generate 1D and 2D WT coefficients
as well.
4 CONCLUSION
In this paper we introduce a serial architecture for flip-
ping based 3D WT. The proposed architecture requires
1/7 of multipliers and 2/7 of the adders which are re-
quired by the tested architectures with lowest processing
elements. Even though the proposed architecture is a
serial architecture which outputs of number of coeffi-
cients of parallel architecures per clock, the reduction in
the computational resources of the proposed architecture
compared to the parallel architectures is less than 2/7.
Furthermore the proposed architecture reduces the
critical path of the other proposed architectures from
2Ta+Tm to only one Tm. Moreover the 1D and 2D WT
coefficients can be realized by the proposed architecture
besides the 3D WT. Therefore, the proposed architecture
can be used in the real time 1D, 2D and 3D WT applica-
tions which require minimum computational resources
and critical path.
REFERENCES
[1] Chung-Jr Lian; Kuan-Fu Chen ; Hong-Hui Chen and Liang-
Gee Chen, Lifting based discrete wavelet transform architec-
ture for JPEG2000, The 2001 IEEE International Symposium on
Circuits and Systems, ISCAS 2001, Vol. 2, pp. 445 - 448 vol. 2,
2001
[2] G. Dillen, B. Georis, J.D. Legat and O. Cantineau, Combined
line-based architecture for the 5-3 and 9-7 wavelet transform of
JPEG2000, IEEE Transactions on Circuits and Systems for Vid-
eo Technology, Vol. 13, No. 9, pp. 944 950, Sept. 2003
[3] Bing-Fei Wu and Chung-Fu Lin, A high-performance and
memory-efficient pipeline architecture for the 5/3 and 9/7 dis-
crete wavelet transform of JPEG2000 codec, IEEE Transactions
on Circuits and Systems for Video Technology, Vol.15, No. 12,
pp. 1615 1628, Dec. 2005
[4] C. T. Huang, P. C. Tseng, and L. G. Chen, Flipping Structure:
An Efficient VLSI Architecture for Lifting-Based Discrete Wave-
let Transform, IEEE TRANSACTIONS ON SIGNAL
PROCESSING, VOL. 52, NO. 4, APRIL 2004
[5] Y. Hao, Y. Liu, R. Wang, High Performance Hardware Imple-
mentation Architecture for DWT of Lifting Scheme, Interna-
tional Conference on Intelligent Information Hiding and Mul-
timedia Signal Processing, 2008
[6] Y. Hao, Y. Liu, R. Wang, Efficient parallel hardware architec-
ture for lifting-based discrete wavelet transform, Chinese
Control and Decision Conference (CCDC 2008), 2008
[7] C. Xiong, J.Hao, J.Tian,J. Liu, Efficient array architecture for
multi-dimensional lifting-based discrete wavelet transform,
IEEE Trans.Signal Processing. (Octobr 2006)1089-1099
[8] Q. Dai, X. Chen, C. Lin, A novel VLSI architecture for multidi-
mensional discrete wavelet transform, IEEE Trans.Circuits
Systems Video Technol. (August 2004)11051110.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 182
[9] B. Das, S. Banerjee, Data-folded architecture for running 3D
DWT using 4-tap Daubechies filters, IEE Proc. Circuits
Devices Systems. (February 2005) 1724
[10] G. Karlsson and M. Vetterli, Three-dimensional subband cod-
ing of video, Proc IEEE Int. Conf. Acoustics, Speech, and Sig-
nal Processing, pp. 1100-1103, April, 1988.
[11] B. J. Kim and W. Pearlman, An embedded wavelet video coder
using three-dimensional set partitioning in hierarchical trees
(SPIHT), Proc. DCC97, IEEE Data Compression Conference,
pp. 251-260, Mar. 1997.
[12] A. Said and W. A. Pearlman, A new, fast, and efficient image
codec based on set partitioning in hierarchical trees, IEEE
Trans. On Circuits and System for VideoTechnology, vol. 6, pp.
243-250, June 1996..
[13] A. Secker and D. Taubman, HIGHLY SCALABLE VIDEO
COMPRESSION USING A LIFTING-BASED 3D WAVELET-
TRANSFORM WITH DEFORMABLE MESH MOTION COM-
PENSATION . Proceedings of International Conference on
image Processing,. Vol. 3, pp. 749 752, 24-28 June 2002.
[14] A. Secker and D. Taubman, Motion-compensated highly scal-
able video compression using an adaptive 3D wavelet trans-
form based on lifting, Proc. IEEE Int. Conf. Image Proc., pp.
1029-1032, Oct. 2001.
Zahra Zaremojtahedi received the B.Sc. degree in computer engi-
neering from computer engineering department of Central Tehran
branch of Iran Islamic Azad University. She is currently the M.Sc.
student in the computer engineering department of Science and
Research branch of Iran Islamic Azad University, Tehran, Iran. Her
research interests include hardware design for image and video
coding applications.
Farzad Zargari (M07-SM11) received his B.Sc. degree in Electrical
Engineering from Sharif University of Technology and his M.Sc. and
Ph.D. degrees in Electrical Engineering from University of Tehran, all
in Tehran, Iran.
He is currently a research associate at the information technology
department of research institute for ICT, formerly known as Iran Tele-
com Research Center (ITRC), Ministry of Telecommunications and
Information Technology of Iran. He is also a teaching academic staff
in the computer engineering department of Science and Research
branch of Islamic Azad University. His research interests include
multimedia systems, image and video signal processing algorithms,
and hardware implementation of image and video coding standards.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 183