A Very Low Computational Resource Flipping Based Architecture For 3D Wavelet Transform

A very low computational resource flipping
based architecture for 3D Wavelet Transform

Zahra ZareMojtahedi, Farzad Zargari
Abstract Wavelet Transform (WT) is among the most employed transforms in image and video coding. Lifting scheme is
proposed to reduce the computational load of WT. In spite of the advantages of lifting scheme, its hardware realization suffers
from high critical path latency. As a result, Flipping scheme has been proposed to reduce the critical path latency in the
hardware realization of WT. Even though, there are proposed architectures for hardware realization of 1D and 2D flipping based
WT, there are not proposed any architecture for 3D flipping based WT. In this study we propose a serial architecture for
hardware implementation of the 3D flipping scheme WT. The proposed architecture has lower critical path latency and also
employs very lower computational resources compared to the other proposed architectures for 3D WT. Moreover besides the
3D WT, 1D and 2D WTs are supported by the proposed architecture as well.
Index Terms 3D Wavelet Transform, Computational resources, Critical path, Flipping Scheme, Image compression, Lifting
Scheme

1 INTRODUCTION
HE 3D wavelet transform is used in various applica-
tions such as video coding or 3D image coding.
JPEG2000 is one of the recent international image and
video coding standards and is based on global transform
by using WT. Lifting scheme is used in the JPEG2000
standard for WT. Even though, the computational load of
lifting scheme is reduced compared to the previous me-
thods for WT, the WT still takes relatively a large portion
of coding time in the JPEG2000 coding process. As a re-
sult the hardware realization of WT is considered as a
solution to reduce the coding time of JPEG2000 in real
time applications [1]-[3]. The hardware realizations of
lifting scheme suffer from the high critical path time and
flipping scheme is introduced as a solution to resolve this
problem [4]. Even though there are proposed hardware
solutions for 1D and 2D flipping WT [5], [6] and also
hardware solutions for 3D lifting WT [7]-[9], to the best
knowledge of us, there is not proposed any architecture
for 3D flipping WT.
3D WT has various applications. One of its applica-
tions is JP3D which is an extension of JPEG2000 standard
for coding of Three-Dimensional data. It is proposed to
code the volumetric image data sets produced by various
Three-Dimensional image acquisition techniques such as
computed tomography (CT), position emission tomogra-
phy (PET) and magnetic resonance imaging (MRI).

Another application for 3D WT is video coding.
Karlsson et al proposed the use of the separable 3D WT
for video compression [10]. Separable 3D WT are also
employed in [11], which extends the well-known SPIHT
[12] coding algorithm to the temporal dimension. This
coder has the advantage of producing a fully embedded
rate-scalable bit-stream, with low complexity relative to
motion-compensated coders [13]. A. Secker and D. Taub-
man proposed a new approach, which uses the lifting
realization of the temporal DWT with motion compensa-
tion applied along the lifting steps [14].
In this paper we propose a serial architecture for
hardware realization of 3D WT based on flipping scheme
which reduces the critical path latency compared to its
counterpart architectures for 3D lifting based WT. Pro-
ducing 1D and 2D WTs besides the 3D WT and very low
computational resources are among the other advantages
of the proposed architecture compared to the previous
architectures proposed in the literature.
The rest of the paper is organized as follows. In section
2, we introduce our proposed architecture followed by
simulation results in section 3. The concluding remarks
are given in section 4.
2 THE PROPOSED ARCHITECTURE
The lifting scheme for WT by using 9/7 Debauchees filter
is performed as follows
i i
x s
2
0
=
(1)
1 2
0
+
=
i i
x d
(2)
0
1
0 0 1
+
+ + =
i i i i
s s d a d
(3)
T

- Zahra ZareMojtahedi is with the computer engineering department of
Science and Research branch of Iran Islamic Azad University, Tehran,
Iran.
- Farzad Zargari is with the department of information technology of
research institute for ICT, formerly known as Iran Telecom Research
Center (ITRC), Tehran, Iran.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 178

1
1
1 0 1
+ + =
i i i i
d d s b s
(4)
1
1
1 1 2
+
+ + =
i i i i
s s d c d
(5)
2
1
2 1 2
+ + =
i i i i
d d s d s
(6)
2
1 i i
s k s =
(7)
2
2 i i
d k d =
(8)

where
i
x
2
and
1 2 + i
x
are even and odd input elements,
respectively and o / 1 = a , o| / 1 = b , | / 1 = c ,
o / 1 = d , o|o, =
1
k and , o| /
2
= k and
, o | o and , , , are the coefficients employed in the
lifting scheme and listed in Table 1,
0
i
s and
0
i
d represent
the input odd and even parts respectively,
n
i
s
and
n
i
d
(n=1,2) represent the intermediate value obtained in the
lifting process, and
i
s and
i
d represent the low pass and
high pass parts of the output signals, respectively.
1
k and
2
k are also known as k
L
and k
H
, respectively. In fact in
flipping scheme there will be a final scaling stage by fac-
tors K
L
and K
H
which will be applied in the last filtering
step.

TABLE 1
LFTING SCHEME PARAMETERS

Parameter Approximate Value
o
-1.586134342059924
|
-0.052980118572961
0.882911075530934
o
0.443506852043971
,
1.230174104914001

In flipping scheme from one side (3) is merged with (4)
and on the other side (5) is merged with (6) resulting the
outputs in stage 2 as [4]:
1
1
1 0 1
+ + =
i i i i
d d bs s

0 0
1
0
1
0
1
0 0 0
i i i i i i i
s s ad s s ad bs + + + + + + =
+

(9)
2
1
2 1 2
+ + =
i i i i
d d ds s

1 1
1
1 1
1
1 1 1
i i i i i i i
s s cd s s cd ds + + + + + + =
+

(10)
The following data path is introduced by Hao et al. [6] to
implement (9) and (10).
This data path employs one multiplier and two adders
to produce one output per clock. This data path has the
advantage of lower critical path over the traditional lift-
ing scheme for WT. The critical path in the flipping
scheme is one multiplier in contrast to the critical path
of flipping scheme which includes one multiplier (Tm)
and two adders (Ta).

Fig. 1. Datapath for realization of the first stage of flipping scheme
[6]

Hao et al. [6] introduced the hardware in Fig. 2 to im-
plement the 1D WT according to flipping scheme. We
refer to the architecture in Fig.2 as basic unit hereafter.
This architecture is the basic building block for the 1D
and 2D flipping based WT hardwares.
We propose the architecture in Fig. 3 for hardware rea-
lization of 3D WT which can be used in video coding or
3D image coding. The proposed architecture for 3D WT
employs three basic units for spatial and temporal trans-
forms. In the video coding, spatial transforms are applied
to the horizontal and vertical directions in each frame,
where as the temporal transform applies to the sequence
of transformed images. On the other hand in the 3D im-
age coding all the transforms are spatial transforms and
the first two transforms applies to the horizontal and ver-
tical directions of each slice of the 3D image and the third
WT is applied to the WT of the slices in the 3D image. The
proposed architecture has two transpose modules and
one scaling unit (SUN) besides the basic units. The re-
sulted WT coefficients can be saved in an external memo-
ry module to store the LLL subband of the resulted WT
coefficients for the next levels of 3D WT. The proposed
architecture can be used for 1D and 2D WT besides the
3D WT.


Fig. 2. Proposed architecture in [6] to implement 1D WT

Fig. 3. Proposed architecture to implement 3D WT

In the proposed architecture the WT is applied to the
rows of the image and the Low (L) and High (H) frequen-
cy coefficients are generated by the first one dimensional
WT. The produced L and H coefficients are reordered by
the Transpose 1 module to set in a suitable order for the
next basic unit module. Fig 4. indicates Transpose module
1 [6]. The controllers of the multiplexer in Fig. 4 are ap-
propriately derived by the main control unit to produce
the outputs in the desired order. The proposed architec-
ture in [6] requires only one transpose module, because it
realizes 2D WT and the coefficients generated by the
second basic WT unit are the final outputs of the architec-
ture. In our architecture we require another transpose
unit to reorder the coefficients of the second WT stage for
the third WT stage.

Fig. 4. Transpose module 1 [6]
The second basic unit applies WT to the columns of the
image and generates LL, LH, HL and HH coefficients of
four consecutive 2D WT. In this stage, similar to the pre-
vious stage, the coefficients are reordered by the Trans-
pose module 2 to come to an appropriate order for the
third stage. The proposed architecture for Transpose
module 2 is depicted in Fig. 5. The transpose module 2
reorders the 4 groups of consecutive LL, LH, HL and HH
coefficients to four of four LL, four LH, four HL and four
HH coefficients. Hence, we require 16 serially connected
registers and one 16 to 1 multiplexer in Transpose module
2 as depicted in Fig. 5.

Fig. 5. Proposed architecture for Transpose module 2

The third basic unit applies WT to the corresponding
coefficients in the consecutive images and produces LLL,
LLH, LHL, LHH, HLL, HLH, HHL and HHH coefficients.
The resulted coefficients require scaling to generate the
final outputs in accordance to the 9/7 WT; the scaling is
performed in the Scaling Unit (SUN) which is shown in
Fig. 6. Since our architecture is proposed to implement
1D, 2D and 3D WTs and the scaling coefficients for these
WTs are different a multiplexer is used in the SUN unit to
apply appropriate scaling according to the generated
coefficient and the type of WT. In Fig. 6 k
H
and k
L
are the
scaling coefficients for 1D WT and k
LL
, k
HL
, k
LH
and k
HH

are the scaling coefficients for 2D WT where as k
LLL
, k
LLH
,
k
LHL
, k
LHH
, k
HLL
,k
HLH
, k
HHL
and k
HHH
are the scaling coeffi-
cients for 3D WT. The resulted Low frequency coefficients
are stored in the external memory unit to be used in the
next transform level. The multiplier in Fig. 6 multiplies
the input coefficient by the appropriate scaling factor in
the output of the multiplexer in Fig. 6.

Fig. 6. Proposed SUN module

The SNU unit provides the final WT coefficients. The
output of SUN unit is fed to an external memory unit to
store the LLL subband coefficients, which will be applied
via the input multiplexer to the WT architecture in order
to apply the next levels of the WT.
The proposed architecture requires 7 multipliers and
12 adders and 15(4+2N+MN) storage elements for 3D
WT of an MNF image sequence. Our architecture is a
serial architecture which produces one WT coefficient per
clock. The critical path of our architecture consists of only
one multiplier and hence, its longest delay path is Tm. In
the following section we compare the resources and tim-
ing specifications of our architecture with a number of
architectures proposed in literature for 3D WT.


TABLE 2
COMPARING THE ARCHITECTURES FOR 3D WT

Type of
WT

Type of
Struct

# of
outputs
per clock
Critical
path
Buffer memory
for MN image
# of
Adders
# of
Multipliers
Architecture
Lifting parallel 4 T
m
+2T
a
4(N+2)M 72 96 Dai[8]
m
+2T
a
3.5N
2
+4N 40 48 Das[9]
m
+2T
a
5.5N
2
+8N 96 56 Xiong[7]
Flipping serial 1 T
m
15(4+2N+MN) 12 7 Proposed

3 SIMULATION RESULTS
The proposed architecture is implemented by VHDL and
applied to an image sequence including six 352288 im-
ages. The hardware simulation results verified by the re-
sults derived from the software in C language for flipping
based 3D WT. The specifications of the proposed architec-
ture and 3 previously proposed architectures for 3D WT
are listed in Table 2.
The simulation results indicate that the critical path of
the proposed architecture is equal to one multiplier delay
(Tm) while the critical path delay for the rest of the archi-
tectures increases by two adders delay (2Ta) besides one
multiplier delay (Tm). Moreover the proposed architec-
ture requires 7 multipliers and 12 adders which respec-
tively are about 1/7 and 2/7 of the next architecture with
lowest computational resources [9] in Table I. Even
though the proposed architecture is a serial architecture
which outputs one coefficient per clock and the architec-
ture in [9] is a parallel architecture which out puts four
samples per clock, the reduction in the computational
resources of the proposed architecture compared to [9] is
about two times of their per clock output rate ratios. Be-
sides the aforementioned advantages the proposed archi-
tecture can be used to generate 1D and 2D WT coefficients
as well.
4 CONCLUSION
In this paper we introduce a serial architecture for flip-
ping based 3D WT. The proposed architecture requires
1/7 of multipliers and 2/7 of the adders which are re-
quired by the tested architectures with lowest processing
elements. Even though the proposed architecture is a
serial architecture which outputs of number of coeffi-
cients of parallel architecures per clock, the reduction in
the computational resources of the proposed architecture
compared to the parallel architectures is less than 2/7.

Furthermore the proposed architecture reduces the
critical path of the other proposed architectures from
2Ta+Tm to only one Tm. Moreover the 1D and 2D WT
coefficients can be realized by the proposed architecture
besides the 3D WT. Therefore, the proposed architecture
can be used in the real time 1D, 2D and 3D WT applica-
tions which require minimum computational resources
and critical path.
REFERENCES
[1] Chung-Jr Lian; Kuan-Fu Chen ; Hong-Hui Chen and Liang-
Gee Chen, Lifting based discrete wavelet transform architec-
ture for JPEG2000, The 2001 IEEE International Symposium on
Circuits and Systems, ISCAS 2001, Vol. 2, pp. 445 - 448 vol. 2,
2001
[2] G. Dillen, B. Georis, J.D. Legat and O. Cantineau, Combined
line-based architecture for the 5-3 and 9-7 wavelet transform of
JPEG2000, IEEE Transactions on Circuits and Systems for Vid-
eo Technology, Vol. 13, No. 9, pp. 944 950, Sept. 2003
[3] Bing-Fei Wu and Chung-Fu Lin, A high-performance and
memory-efficient pipeline architecture for the 5/3 and 9/7 dis-
crete wavelet transform of JPEG2000 codec, IEEE Transactions
on Circuits and Systems for Video Technology, Vol.15, No. 12,
pp. 1615 1628, Dec. 2005
[4] C. T. Huang, P. C. Tseng, and L. G. Chen, Flipping Structure:
An Efficient VLSI Architecture for Lifting-Based Discrete Wave-
let Transform, IEEE TRANSACTIONS ON SIGNAL
PROCESSING, VOL. 52, NO. 4, APRIL 2004
[5] Y. Hao, Y. Liu, R. Wang, High Performance Hardware Imple-
mentation Architecture for DWT of Lifting Scheme, Interna-
tional Conference on Intelligent Information Hiding and Mul-
timedia Signal Processing, 2008
[6] Y. Hao, Y. Liu, R. Wang, Efficient parallel hardware architec-
ture for lifting-based discrete wavelet transform, Chinese
Control and Decision Conference (CCDC 2008), 2008
[7] C. Xiong, J.Hao, J.Tian,J. Liu, Efficient array architecture for
multi-dimensional lifting-based discrete wavelet transform,
IEEE Trans.Signal Processing. (Octobr 2006)1089-1099
[8] Q. Dai, X. Chen, C. Lin, A novel VLSI architecture for multidi-
mensional discrete wavelet transform, IEEE Trans.Circuits
Systems Video Technol. (August 2004)11051110.

[9] B. Das, S. Banerjee, Data-folded architecture for running 3D
DWT using 4-tap Daubechies filters, IEE Proc. Circuits
Devices Systems. (February 2005) 1724
[10] G. Karlsson and M. Vetterli, Three-dimensional subband cod-
ing of video, Proc IEEE Int. Conf. Acoustics, Speech, and Sig-
nal Processing, pp. 1100-1103, April, 1988.
[11] B. J. Kim and W. Pearlman, An embedded wavelet video coder
using three-dimensional set partitioning in hierarchical trees
(SPIHT), Proc. DCC97, IEEE Data Compression Conference,
pp. 251-260, Mar. 1997.
[12] A. Said and W. A. Pearlman, A new, fast, and efficient image
codec based on set partitioning in hierarchical trees, IEEE
Trans. On Circuits and System for VideoTechnology, vol. 6, pp.
243-250, June 1996..
[13] A. Secker and D. Taubman, HIGHLY SCALABLE VIDEO
COMPRESSION USING A LIFTING-BASED 3D WAVELET-
TRANSFORM WITH DEFORMABLE MESH MOTION COM-
PENSATION . Proceedings of International Conference on
image Processing,. Vol. 3, pp. 749 752, 24-28 June 2002.
[14] A. Secker and D. Taubman, Motion-compensated highly scal-
able video compression using an adaptive 3D wavelet trans-
form based on lifting, Proc. IEEE Int. Conf. Image Proc., pp.
1029-1032, Oct. 2001.

Zahra Zaremojtahedi received the B.Sc. degree in computer engi-
neering from computer engineering department of Central Tehran
branch of Iran Islamic Azad University. She is currently the M.Sc.
student in the computer engineering department of Science and
Research branch of Iran Islamic Azad University, Tehran, Iran. Her
research interests include hardware design for image and video
coding applications.

Farzad Zargari (M07-SM11) received his B.Sc. degree in Electrical
Engineering from Sharif University of Technology and his M.Sc. and
Ph.D. degrees in Electrical Engineering from University of Tehran, all
in Tehran, Iran.
He is currently a research associate at the information technology
department of research institute for ICT, formerly known as Iran Tele-
com Research Center (ITRC), Ministry of Telecommunications and
Information Technology of Iran. He is also a teaching academic staff
in the computer engineering department of Science and Research
branch of Islamic Azad University. His research interests include
multimedia systems, image and video signal processing algorithms,
and hardware implementation of image and video coding standards.


A Very Low Computational Resource Flipping Based Architecture For 3D Wavelet Transform

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A Very Low Computational Resource Flipping Based Architecture For 3D Wavelet Transform

Transféré par

Droits d'auteur :

Formats disponibles

A very low computational resource flipping

based architecture for 3D Wavelet Transform

Vous aimerez peut-être aussi