Vous êtes sur la page 1sur 4

Proceedings of ICSP ' 96

A Fast Algorithm of the DCT and IDCT for VLSI Implementation


Mong Ying Hou Zhaohuan Institute of Acoustics, Chinese Academy of Sciences P 0 BOX 2712, Beijing 100080, P.R.China

ABSTRACT Since DCT performs very close to the statistically optimal Karhunen-Loeve Transform (KLT), it is widely used in digital signal processing, especially for speech and image data compression. The DCT algorithms and VLSI atchitectures for real-time computation capabilities are required urgently. It is known that VLSI implementation of distributed arithmetic is very eacient for computing convolution. Here, an algorithm is presented to convert the DCT and inverse DCT(1DCT) to skew-convolution. VLSI implementation of the algorithm has same advantage as any implementation using distributed arithmetic.
1 INTRODUCTION Smce the mtroduchon of DCT in the 1970'~~ a considerable amount of research has been performed on algonthm, archtecture and processor designs for computmg DCT And also DCT and mverse DCT(1DCT) have been extensively used m areas of Qlgital speech and mage processing In parhcular, the DCT has become an mtegral part of several standards such as JPEG, MPEG and CCITT Recommendahon H 26 1 Tradihonally, the DCT is accomplished by usmg mulhpliers to mplement the butterfly structure of vmous fast algonthms But m VLSI realizahon, the mulbpliers and the megular architecture and complicated routmg of the butterfly approach reqwe more silicon area and slow the speed It is known thal wth dstnbuted anthmehc the resultmg VLSI archtecture has a hghly regular structure and e l m a t e s
0-7803-2912-0 637

the need of multipliers, while circular and skew-circular convolution can be computed efficiently using mstributed arithmetic. Therefore, in the paper we present an algorithm that converts DCT to skew-circular convolution by using number theorehc transforms ("ITS) technique. The algorithm is very efficiently far VLSI application.
2. THE ALGORITHM Given an input sequence {x(n),n= 0,1,-..,N- l} ,

the I-D DCT is defined as folllows:


n=O

X ( k )=

lr(2n + Ilk. mCx(n)cos-2N


N- 1
r-0

k = lJ;.-,N-l

(1)

In the following derivalhon of the algorithm, the c o n s t a n t m is ignored and N is assumed to be a power of 2. The definition equation of the odd indexed DCT components can be rewritten as follows: N-l n(2n -t 1)(2k + 1) X(2k + 1) = Zx(n)ms2N n=o

N/ 2 - 1

+
=

n=O

4 2 ( N - 1 - 2n) + 1](2k x(N-l-h)ws-2N

+ 1)

N/24
IF0

2 4 % +1 x 2 + 1) [x(2n)-x(N-l-h)]co~4N

k = 0,1;-., N / 2 - 1

(2)

It is known from Number Theoxy that there is one written as follows: to one mapping re1 2ki,, + 1 = (-1)'3' mod4M

The right side of (1 1) does not depend on the index s ,so X(2ks,,+ 1) can be sampled to X(X, 1). Not

considering the negation sign in right side of ( 1 l), for


i = 0,1,..-, N / 2 , we can write
N/2-I

272.3'" z'(2n,)~~~-

J=o

as the following matrix operation:


'2k,.,

+1
+
+

2k;,, + 1 < N
2ki~,+N/2

. 3 0a a s -a s -27.3 4N 4N
(5)
2n.3 ms4N
4N

a s -277.32 4N
4N

K, 277.32 a s -2n.J ... a 4N 4N


N

2k;~z+N/Z

2k,+l=.
2k,.~+N/2
2k,J+NJ2

+ <
+

27.3 a s -a 3 3a s -277.3' ... -?g a s 4N


4N

<

2k;,+ 1
\

2k;,+ 1< N

...

...

- ..

...

Therefore, ( 2 ) can be rewritten as follows:


X ( 2 k s , , + 1)
NJZ-1

J=o

[x(2nJ) - x ( N - 1-2n,)]cos

2z(4n1 + 1)(2k,., + 1) 4N

(7)
where,

z(2n3) = x(2n3)- x(N - 1 - 2 n J )

(8)

...

...

...

...

...

1 1. . . 1

Considering: 3"' mod4N = 2 N cos(-x) = m x , we obtain:

+1

and

z(aJ)a

W4QJ +1X2kq, +D
4N

Replacing (10) in (9) and with some derivation, it is possible to written (7) as follows: X(2k,,, + 1) = X ( 2 k , + 1)
N/2-1 z'(2n,)cos---NIZ-I

(13)

where, the mportant above.


cos

lation is used in the denvation


= -cos-

2 z 3'+'
'

2k,,, +. 1 < N

4N
2n.3"'

( 1 1)
else

2n. 3 N / 2 + , 4N

2~-3' 4N

The ( 1 2 ) and (13) show that N / 2 points odd mdexed DCT components can be computed as a skew-

4N
\
\

638

circular convolution of length N/2 . From (13), we also that by precomputing and storing all possible 2n-3' combination of one set of fixed coefficients COS 4N i = 0,1,. ..,N / 2 - 1 , the multipliers can be replaced by
see

and even can also be divided into odd and even parts of smaller size and so on, until the transform length becomes 1. The last 1 point transforms wth a scallng the DCT component X (0) . factor

m,

memory look-up tables and large savings in number of arithmetic operations can be arcluved in VLSl realization when using distributed arithmetic. Above all, the procedure of computing the odd index DCT components are shown. It includes: 1. an input mapping x(2n) -+ x(2nj), j = O,l,-.., N / 2 accordmg to ( 4 ) and (6). 2. subtractions 'according to (8) w i t h appropriate negations as follows:

Fig.1 illustrates the algorithm of computmg the DCT with N=8.

( n = 0,1,--. N - 1 ), the definition can be rewritten as


the following two equations:
x(2.n) = &if
N/2-1

c
k=O

X(2k + 1)cos

27i.(4n+ 1)(2k+1) 4N

z'(2n,) =

3. a skew-circular convolution of sequence {z* ( 2 n )

x(2n,) - x ( N - 1 - 2n,)
-x(N

2n, < N 2n, > N

-x(2n,)

- 1 - 2nl)

x(N-I-h)=-@3

+ J " f
k=l

2 n . 3' 4N i =O,l,..-N/2-1}accorhg to (13) (noting the constant f a c t o r m is combined ulth cos terms) and obtamng

j=O,l,.-.,~2-l)and constant sequence " + {-

27r. (4n + 1x2+ 1 ) 4N 2n.(4n+ 1).2k X(2k)cos + )" 4N


N/Z-l

c
k=O

X(2k+l)ws

(1 6 ) We can see that the odd input indexed part

X ( 2 , + l),(i = O,l,..., N / 2 ) wth appropnate negabons dependmg on 2kS,,+ 1 < N or not as (1 1) 4 an output mappmg X ( q +~+~(2.k+l),r=Q.-;N/2-1
according to ( 3 ) and ( 5 ) The even lndexed DCT components {X(2k),k = 0, 1,. ..,N / 2 - l} can be computed as followmg

X(2k) =

z ( 2 n + 1)2k m-0 2N N/2-1 n ( 4 n + 1)2k = x(2n)cos n=O 2N N/2-1 4 2 ( N - 1- 2n + 1)]2k + x ( N - 1-2n)cos n=O 2N NIZ-l n(4n + 1)k = [ x ( 2 n )+ x ( N - 1 - 2 n ) ] c o s - 7
,V-l

to equation (1) and even input indexed part 2 ~ * ( 4+ ,n 1) .2k X ( 2 k )cos+&ZX(O) can k=l 4N be Qvided into odd input indexed and even lnput indexed parts again and so on, until the last part 1s m X ( 0 ) . Therefore, major computation for IDCT is

dvy

x(n) cos

same as for DCT. The slight difference IS that DCT needs some preadhtions, whle IDCT needs some postadditions. 3. CONCLUSION ~n this 'paper, a fast algorithm for computing the DCT and IDCT is presented. Because the VLSl implementationof Qstributed arithmebc is very efficient for computing Convolution, we turn the major computation for DCT and IlDCT to skew-convolubon by using NTTs techmque. T h i s , the resultmg archtecture of DCT and IDCT consist of memones. adders, and

n=O

2.-I *

(14) Obviously, thls is a N / 2 point DCT of (x(2n) + x ( N - 1 - 2n),n = 0,1,.. ., N / 2 - 1) . It can be hvided into odd and even indexed components again

639

registers only, no multiplier that is required in any butterfly structure of implementation. Moreover, major computabon for DCT and IDCT is similar, except that DCT needs some preadditions, while IDCT needs some postadditiom. Therefore, a processor can be devised to computer DCT and IDCT with little overhead. The advantage of the aigorithm is the input data mapping and output results mapping. REFERENCES
[ 1]B. G. Lee, A new algorithm to compute the discrete cosine transform, IEEE Trans. Acoust., Speech, Signal Processing, vol. 32, no. 6, pp. 1243-1245, Dec. 1984. [2]M.T.Sun, LWu, A concurrent architecture for VLSI mplementation of hscrete cosine transform,IEEE Trans. Circuits And Systems, vol, CAS-34,no. 8, pp. 992-994, Aug. 1987. [31H. S . Hou, A fast recursive algorithm for computing the chscrete cosine t r ~ f o r m . IEEE Trans. Acoust.,

Speech, Signal Processing, vol. 35, no. 10, pp. 14551461, Oct. 1987. [4]Weipng A new algorithm to compute the DCT and its inverse, IEEE Trans. Signal Processing, vo1.39, no. 6 , pp. 1305-1313, Jq1991. [5]S. Uramoto et al.,A 100 MHz 2-D dwrete cosine transform core processor, IEEE J. Solid-state Circuits, vol. 27, pp.492-499, Apr. 1992. [6]E. Feig and S . Winogard, Fast algorithms for the hscrete cosine transform, IEEE Trans. Signal Processing., vol 40, pp. 2174-2193, Sept, 1992. [qD. Slaweclu and W.Li, DCTADCT processor g h data rate image &g, IEEE Trans. design for h Circuits Syst. fide0 Technol., vol. 2 pp. 135-146, June 1992 [SlAvani~dra Mabsetti, Alan N. Willson, A 100 MHz 2-D 8 x 8 DCT/IDCT processor for HDTV application, IEEE Trans. Circuits Syst. Video Technol vol. 5, no. 2, pp1.58-165, Apr. 1995. C91H.J. Nussbaumer, Fast Fourier Transform and Convolution Algorithm. New York: Springer, 1987.

Fig. 1 Computing the DCT (N=8)

640

Vous aimerez peut-être aussi