Académique Documents
Professionnel Documents
Culture Documents
1 Introduction
The eld of Discrete Wavelet Transforms (DWTs) is an amazingly recent one. The basic
principles of wavelet theory were put forth in a paper by Gabor in 1945 [4], but all of the
denitive papers on discrete wavelets, an extention of Gabor's theories involving functions
with compact support, have been published in the past three years. Although the Discrete
Wavelet Transform is merely one more tool added to the toolbox of digital signal processing,
it is a very important concept for data compression. Its utility in image compression has
been eectively demonstrated. This paper discusses the DWT and demonstrates one way in
which it can be implemented as a real-time signal processing system. Although this paper
will attempt to describe a very general implementation, the actual project used the STAR
Semiconductor SPROClab digital signal processing system.1 A complete wavelet transform
system as described herein is available from STAR Semiconductor [7].
1
set of functions which are dened by a recursive dierence equation
,1
MX
(x) = ck (2x , k); (1)
k=0
where the range of the summation is determined by the specied number of nonzero coe-
cients M . The number of nonzero coecients is arbitrary, and will be referred to as the order
of the wavelet. The value of the coecients is, of course, not arbitrary, but is determined
by constraints of orthogonality and normalization. Generally, the area under the wavelet
\curve" over all space should be unity, which requires that
X
ck = 2: (2)
k
Equation (1) is orthogonal to its translations; i.e., (x)(x , k)dx =R 0. What is also desired
R
is an equation which is orthogonal to its dilations, or scales; i.e., (x) (2x , k)dx = 0.
Such a function does exist, and is given by
X
(x) = (,1)k c1,k (2x , k); (3)
k
which is dependent upon the solution of (x). Normalization requires that
X
ck ck,2m = 20m (4)
k
which means that the above sum is zero for all m not equal to zero, and that the sum of the
squares of all coecients is two. Another important equation which can be derived from the
above conditions and equations is
X
(,1)k c1,k ck,2m = 0: (5)
k
A good way to solve for values of equation (1) is to construct a matrix of coecient values.
This is a square M M matrix where M is the number of nonzero coecients. The matrix is
designated L, with entries Lij = c2i,j . This matrix always has an eigenvalue equal to 1, and
its corresponding (normalized) eigenvector contains, as its components, the value of the
function at integer values of x. Once these values are known, all other values of the function
(x) can be generated by applying the recursion equation to get values at half-integer x,
quarter-integer x, and so on down to the desired dilation. This eectively determines the
accuracy of the function approximation.
This class of wavelet functions is constrained, by denition, to be zero outside of a small
interval. This is what makes the wavelet transform able to operate on a nite set of data, a
property which is formally called \compact support." Most wavelet functions, when plotted,
appear to be extremely irregular. This is due to the fact that the recursion equation assures
that a wavelet function is non-dierentiable everywhere. The functions which are normally
used for performing transforms consist of a few sets of well-chosen coecients resulting in
a function which has a discernible shape. Two of these functions are shown in Figure 1;
the rst is the Haar basis function, chosen because of its simplicity for the following dis-
cussion of wavelets, and the second is the Daubechies-4 wavelet, chosen for its usefulness
2
Wavelet c0 c1 c2 c3 c4 c5
Haar 1.0p 1.0p p p
Daubechies-4 14 (1 + 3) 14 (3 + 3) 14 (3 , 3) 14 (1 , 3)
Daubechies-6 0.332671 0.806891 0.459877 -0.135011 -0.085441 0.035226
Table 1: Coecients for three named wavelet functions.
in data compression. They are named for pioneers in wavelet theory [3, 5].2 The nonzero
coecients ck which determine these functions are summarized in Table 1. Coecients for
the Daubechies-6 wavelet, one used in the discussion of the wavelet transformer hardware
implementation, are also given in Table 1.
Wavelet Phi function Wavelet Phi function
0.8 1
0.6
phi x
phi x
0.5
0.4
0
0.2
0
-2 -1 0 1 2 3 -2 -1 0 1 2 3
x x
3
the \low-pass" lter function, related to equation (1):
ai =
1 N X i = 1; : : : ;
N
(6)
2 j=1 c2i,j+1 fj ; 2;
and the other half is produced by the \high-pass" lter function, related to equation (3):
bi =
X
1 N (,1)j+1c
j +2,2i fj ; i = 1; : : : ;
N
(7)
2 j=1 2:
where N is the input block size, c are the coecients, f is the input function, and a and b
are the output functions. (In the case of the lattice lter, the low- and high-pass outputs
are usually referred to as the odd and even outputs, respectively.) The derivation of these
equations from the original and equations can be found in [3]. In many situations, the
odd, or low-pass output contains most of the \information content" of the original input
signal. The even, or high-pass output contains the dierence between the true input and
the value of the reconstructed input if it were to be reconstructed from only the information
given in the odd output. In general, higher-order wavelets (i.e., those with more non-zero
coecients) tend to put more information into the odd output, and less into the even output.
If the average amplitude of the even output is low enough, then the even half of the signal may
be discarded without greatly aecting the quality of the reconstructed signal. An important
step in wavelet-based data compression is nding wavelet functions which causes the even
terms to be nearly zero.
The Haar wavelet is useful for explanations because it represents a simple interpolation
scheme. For example, a sampled sine wave of sixteen data points (note that this is a power of
two, as required) is shown in Figure 2. After passing these data through the lter functions,
Sampled sinusoid
1 o
o o
o o
0.5
o o
amplitude
0 o o
o o
-0.5
o o
o o
-1 o
0 2 4 6 8 10 12 14 16
sample
o
0.5 0.5
o o o
amplitude
amplitude
o
o
0 0 o
o
o o
o
-0.5 -0.5
o
-1
o -1
-1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8
sample sample
Figure 3: Low- and high-pass outputs from wavelet decomposition of the sampled sinusoid.
high-pass lter obviously contains less information than the low-pass output. If the signal is
reconstructed by an inverse low-pass lter of the form
fj
L
=
X 2,
N=2
c i j ai ; j =1 ; : : : ; N; (8)
i=1
then the result is a duplication of each entry from the low-pass lter output. This is a wavelet
reconstruction with 2 data compression. Since the perfect reconstruction is a sum of the
inverse low-pass and inverse high-pass lters, the output of the inverse high-pass lter can
be calculated; it looks like that shown in Figure 4. This is the result of the inverse high-pass
Inverse Low-pass Reconstruct Inverse High-pass Reconstruct
1 o o 1
o o
o o
0.5 0.5
o o o o
o o
amplitude
amplitude
o o
o o
0 0 o o
o o
o o o o
o o
-0.5 -0.5
o o
o o
-1
o o -1
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
sample sample
Figure 4: Inverse low- and high-pass lter outputs in sampled sinusoid reconstruction.
lter function
fj
H
=
X(,1) +1
N=2
j
cj +1,2i bi ; j =1 ; : : : ; N: (9)
j =1
The perfectly reconstructed signal is
f = f
L
+ f
H
; (10)
5
where each f is the vector with elements fj . Using other coecients and other orders
of wavelets yields similar results, except that the outputs are not exactly averages and
dierences, as in the case using the Haar coecients.
2.3 Dilation
Since most of the information exists in the low-pass lter output, one can imagine taking this
lter output and transforming it again, to get two new sets of data, each one quarter the size
of the original input. If, again, little information is carried by the high-pass output, then
Low-Pass
Odd
High-Pass 1st Dilation
X X X X X X X X Even
Low-Pass
Odd
2nd Dilation
High-Pass
X X X X Even
Low-Pass
Odd
3rd Dilation
High-Pass
X X Even
Low-Pass
X Odd
4th Dilation
High-Pass
X Even
o o o o
o o 0.5
0.8 o
o o
amplitude
amplitude
0.6
o o 0 o
o o
0.4 o o
o o
-0.5
o
0.2 o o
o
-1 o o
0 o
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
sample sample
1st dilation 1st dilation
0.8 o o
1 o
o
0.6
o
0.8 0.4
o
o o
o 0.2
amplitude
amplitude
0.6
0
o o
0.4 -0.2
o
-0.4
o
0.2
-0.6
o
o o
0
-1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8
sample sample
2nd dilation 2nd dilation
0.5
1
o
0.4
o
0.3 o
0.8
0.2
0.6 0.1
amplitude
amplitude
o o
0
o
0.4 -0.1
o
-0.2
o
0.2 -0.3
-0.4
0 -0.5
-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
sample sample
3d dilation 3d dilation
0.5
0.7 o
0.4
0.6 0.3
o
0.2
o
0.5
0.1
amplitude
amplitude
0.4
0
0.3 -0.1 o
-0.2
0.2
-0.3
0.1
-0.4
0 -0.5
-1 -0.5 0 0.5 1 1.5 2 -1 -0.5 0 0.5 1 1.5 2
sample sample
o
0.4
0.6
0.3
0.5 0.2
0.1
0.4
amplitude
amplitude
o
0
0.3
-0.1
0.2 -0.2
-0.3
0.1
-0.4
0 -0.5
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
sample sample
Figure 6: Decompositions of a sixteen sample block of data, using Haar wavelet coecients.
7
Reconstruct, 2x compression Reconstruct, 2x compression
0.8 o o o o
1 o o
o o
0.6
o o
0.8 0.4
o o
o o o o
o o 0.2
amplitude
amplitude
0.6
0
o o o o
0.4 -0.2
o o
-0.4
o o
0.2
-0.6
o o
o o o o
0
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
sample sample
o o o o
0.2
0.8
0.1
0.6
amplitude
amplitude
o o o o o o o o
0
o o o o
0.4
-0.1
o o o o
0.2 -0.2
o o o o
0 -0.3
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
sample sample
o o o o o o o o
0.7 o o o o o o o o 0.15
0.6
o o o o o o o o 0.1
0.5
0.05
amplitude
amplitude
0.4
0
0.3
0.2 -0.05
0.1 -0.1
o o o o o o o o
0
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
sample sample
Figure 7: Reconstructions of a sixteen sample block of data, using Haar interpolation and data compression.
8
length N , so the lowest possible frequency which can be represented by the decomposition is
clearly limited by the number of samples in the block, as opposed to the Fourier treatment
in which the decomposition includes all frequencies down to zero due to its innite support.
The rule, therefore, is that successive dilations represent lower and lower frequency content
by halves. It is also clear that high rates of compression may require large block sizes of the
input, so that more dilations can be made, and so that lower frequencies can be represented
in the decomposition.
One should note that careful handling of mean values and low frequencies is required for
eective compression and reconstruction. This is particularly true for image data, where
small dierences in mean values between adjacent blocks can give a reconstructed image a
\mottled" appearance.
to six coecients as they appear in the pyramid algorithm described in Section 2.2, but the
nature of the lattice lter is such that the six coecients of the wavelet description appear
as three gain factors
in the lattice and one scaling gain at the end. There are always
half as many \rungs" in the lattice lter as there are coecients in the pyramid algorithm
description. One rung is associated with each
coecient, and the placement of the
gain
alternates at each successive rung, as shown in Figure 8. An 8 -order lter would have four
th
rungs, with the structure of
repeated for the new coecient
. The lter would have
1 3
10
3.2 Data Handling
Splitting the input stream is only one of a number of data-handling tasks needed to perform
a wavelet decomposition. The other tasks are described below.
There are three basic parameters which describe a wavelet transformer, all (essentially)
independent of one another. One is the lter length, which re
ects the number of coecients
describing the wavelet functions. This aects the number of butter
y lter stages required in
the lattice lter, and the number of delays involved. The second is the block size of the input
data to be transformed, which must be a power of two. The third is the number of nested
levels, or dilations, of transformation; each pass of a data stream through the lattice lter
produces two sets of transformed data points, each half as long as the original. To transform
one level deeper the odd-numbered outputs, which represent the low-frequency data of the
input, are processed through another lter which again produces half-length outputs. If the
original input stream is segmented into blocks of size 2D , then the result of the Dth level
of transformations is a single data point, and further splitting of the data is impossible.
Usually, a full decomposition of this kind is not desired, but for good data compression, a
number of dilations will be needed. For thoroughness, the original SPROC implementation
was required to perform all possible dilations.
The unfortunate property of wavelet reconstruction is that the deepest nested dilation
calculated in the decomposition must be the rst to be reconstructed. This means that all
transformed data must be saved in memory and output in the reverse order in which they
were originally calculated (level-wise, that is; within a dilation, the data must be output in
the order in which they were calculated). So the size of the input blocks and the resolution
of the wavelet decomposition are memory-limited. Most of the work done by the wavelet
transformer is scheduling the lter(s) and managing the inputs and outputs.
The above story gets even more complicated. There is a fundamental problem with
lattice lters: Unlike the convolution equation which is the best mathematical description of
wavelets, the lattice lter doesn't really produce even and odd output streams that are half
the length of the input; the use of delay stages results in output streams which are half the
length of the input plus the number of delay stages. It is not desirable to output all these
extra numbers, because that would require an increasingly greater rate of processing for the
output of each decomposition level; additionally, it is not desirable because the convolution
description of wavelets assures that a full decomposition of the wavelet can be performed
which always produces the same number of output data as input data. There are two ways
to resolve this problem; one is to truncate the output and discard the extra endpoint values.
Done correctly, this gives a reconstruction which is very close to the original, but not quite
exact. The other method involves assuming a \circular" denition of the input, in which case
the input stream is wrapped around and fed back through the lter, and the new outputs
replace the rst outputs (which were calculated on the same data but before all the delay
stages were lled with nonzero values). This method yields a perfect reconstruction, but
makes scheduling and data routing even more of a headache.
It should be noted that recirculation of the input causes the transform to \see" high-
frequency kinks at the boundary between the endpoints of the data block. This prevents
the even output response from being perfectly
at and so interferes with data compression.
However, the number of transform output values aected by the recirculation is equal to
the number of delays in the lattice lter. Good data compression can still be achieved by
not discarding the aected values, and the proportion of these values to the total number of
11
even outputs becomes negligible for large block sizes; i.e., for a Daubechies-6 wavelet (which
has two delay elements) operating on a block of size 16 and yeilding an even output of size
eight, six of these values may be discarded for 16 (8 + 2) = 1 6 times compression for that
= :
dilation instead of the usual 2 compression. However, if the block size is 256 and the even
output is size 128, then 126 of these values may be discarded for 256 (128 + 2) 1 97 times
= :
In order to perform the proper handling of the data as described above, a system was needed
to transfer the proper values back and forth to the lattice lter. The circular-denition
problem necessitated the use of two lters in parallel in order that the output could keep up
with the input for the particular set of parameters chosen for this implementation. Formally,
the number of lters required is
1 D X
#lters = d N (2N ,d + )e
2 d=1
;
where is the input block size, is the number of dilations to be performed, and is the
N D
filter output
scheduler
output
placer
serial out
Figure 9: Block diagram of the SPROC Wavelet Transformer
This form of the wavelet transformer has put all of the design complexity into the process
of generating the lookup tables for the input and output schedulers. This is a routing
problem, and looks very much like the problem encountered when trying to place wires in
a gate array. Heuristics such as those used for programming gate arrays can be applied in
order to automate the process of creating the tables. The diagrams in Figures 11 and 12
and Table 2 outline this process. Figure 11 shows the creation of the reconstruction lter.
Arbitrary numbers are assigned to label all inputs and outputs of two lattice lters and those
of the reconstruction lter as a whole. This is done for each of the dilations in turn, always
using the rst available lattice lter after the inputs for each calculation become available.
Note that the rst two outputs of each level (dilation) are not assigned. This is due to the
circular-input-data eect; the rst two inputs are always re-routed through the lattice lter
after the other inputs have been taken care of, while the rst two outputs are discarded.
The unnumbered lines in the table are cycles in which the lter is not used. Note that the
rst output appears twelve cycles after the rst input. This, added to the hardware delay,
gives the total throughput delay of the wavelet reconstruction lter.
Figure 12 shows one solution to the routing problem. This is accomplished with a modied
\left-edge" algorithm, which works in the following manner: First, the output nets are placed;
all of these nets have one end xed at position zero. Next, the shift register size is made just
13
Generic Lattic Filter (3-stage)
γ0 +1 γ2 β
Evenin Σ Σ Σ Evenout
−1 −γ1 −1
+1 γ1 +1
γ Σ +1 Σ γ Σ
Oddin Oddout
0 2 β
D2 out D2 in D3 out D3 in
large enough to accomodate the output nets, and the input nets are placed; these have one
end xed at the top of the shift register. If a collision occurs between an input and output
net, the shift register top is raised by one location, and the process continues until all of
the inputs and outputs t into the shift register. Finally, all the other nets are placed in a
similar manner to the input nets, starting with those nets which begin at step zero. These
nets are placed in the rst available spot counting from the top of the shift register. If no
space is available, the shift register size is again increased by one and the process is repeated.
Note that since the input scheduler module runs before the output scheduler, new output
values can be placed in the same memory location as input values which were used for the
last time on the same \step."
Table 2 is made directly from the numbers in Figures 11 and 12, to match each of the lter
inputs and outputs with its exact memory location at any given moment in sample-time.
The delays are determined in a similar manner; part of this process is not shown in the
diagrams. The shift register actually has two more locations than shown in Figure 12; the
extra two are placed at the top of the shift register, one to hold the value zero for resetting
the lattice lters, and the other as a \discard pile" for calculated results which are not used.
As an example of how values are placed in these diagrams, look at the tenth input in
Figure 11. This input has been given the label \9." According to the schedule, this input
appears at step 9, and is used at step 11 (as counted by the module counter), where it
is the odd input to the second lattice lter. Because it is part of the second set of inputs
to the topmost dilation of the reconstruction, it must be recirculated through the lattice
lter at the end of the calculations for that dilation. The schedule shows that this occurs at
the second lter on step 3. Figure 12 shows the path of this net, which starts as an input
at step 9, goes to step 11, wraps back around to step 0, and ends at step 3. It starts at
memory location 23 and ends at location 13, having been shifted by this amount. Finally,
this network is logged in the lookup table of Table 2. At step 11, the net is used as the oddin
value for lter 2, and the network is found at memory location 21 as seen from Table 12.
This memory location is the value put in the lookup table. The same procedure is needed
for the oddin value for lter 2 at step 3. The input is automatically taken care of, since the
input is always placed at the top of the memory block (actually two places below the top;
see above), or at memory location 23.
Initially, it was planned to create one transformer, using the Daubechies-6 wavelet, a data
block size of sixteen, and performing all four possible dilations of the input. A C program
14
Wavelet Scheduling: I/O Handling
Input e /o Level e / o Step e /o Level e / o
0 49 14 1 70 71 0
1 0 1 4 1 48 15 1 74 75
2 0 1 4 2 37 8 1 78 79
3 0 1 4 20 21 3 36 9 1 82 83
4 21 2 3 4
5 20 3 3 5
6 21 2 3 26 27 6
7 20 3 3 30 31 7 27 4 2
8 26 5 2 8
9 31 6 2 36 37 9
10 30 7 2 40 41 10 37 8 1
11 27 4 2 44 45 11 36 9 1
12 26 5 2 48 49 12 4110 1 54 55
13 40 11 1 58 59 13
14 45 12 1 62 63 14
15 44 13 1 66 67 15
0 (16)
1
3rd Round
2 Output
3
4 30
5 33
6 32
7 35
8 34
9 37
10 36
11 39
12 38
13 41
14 40
15 (31) 43
42
45
2-Filter Recomposition 44
31
Figure 11: Solving the scheduling problem by hand: I/O handling. The two columns represent two lattice
lters operating in parallel.
15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Key:
23 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Inputs and outputs used by modules inplace and outplace
22 0 1
21
14 15
0 20
1
2 3 8 9 48
10 11 12 13 Intermediate results used by inschedule
0 4 5 6 7
20 21 36 40
2 3 Intermediate results used by outschedule
19 21 20 26 30
18 36
21 20 40 Delay elements
17 27 31 26
48 30 4 5
16 41 44 49
15 37
31 Output values positioned by outschedule
41
14 37
13
8 9 26 6 Numbers corresponding to I/O Handling Schedule
49 27 44
12 45
16
Step: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Data for le "recin1.dat" (Filt1) and "recin2.dat" (Filt2)
Evenin
Filt1 21 20 19 19 17 17 17 15 16 12 13 17 9 12 12 22
Filt2 6 10 25 25 25 16 25 25 21 21 21 25 25 25 25 21
Oddin
Filt1 22 21 21 21 19 19 20 20 20 16 16 21 21 21 21 23
Filt2 13 13 25 25 25 20 25 25 21 21 21 25 25 25 25 21
Delay2in
Filt1 20 19 24 17 16 15 14 13 12 11 10 13 20 20 20 24
Filt2 18 17 25 25 25 24 25 25 24 9 8 25 25 25 25 20
Delay3in
Filt1 19 18 24 16 15 14 13 12 11 10 9 12 16 19 19 24
Filt2 17 16 25 25 25 24 25 25 24 8 7 25 25 25 25 19
Data for le "recout1.dat" (Filt1) and "recout2.dat" (Filt2)
Evenout
Filt1 25 21 25 25 19 19 25 20 20 16 21 2 3 4 5 25
Filt2 7 8 25 25 25 25 25 25 25 25 1 25 25 25 25 6
Oddout
Filt1 25 20 25 25 17 17 25 15 16 12 16 1 2 3 4 25
Filt2 6 7 25 25 25 25 25 25 25 25 0 25 25 25 25 5
Delay2out
Filt1 20 25 18 17 16 25 14 13 12 11 25 21 21 21 21 21
Filt2 18 25 25 25 25 15 25 25 10 9 14 25 25 25 25 19
Delay3out
Filt1 19 25 17 16 15 25 13 12 11 10 25 17 20 20 20 20
Filt2 17 25 25 25 25 14 25 25 9 8 13 25 25 25 25 18
17
implementing this system is listed in Appendix B. By using a program which automates the
process of creating lookup tables, changes to the block size or number of dilations performed
require little work. New wavelet coecients can be generated by a number of methods found
in the literature, and installed in the existing structure. (Note Figure 13, where the lattice
coecients are entered as parameters of each lattice lter module.) Higher-order wavelets
such as the Daubechies-8 require new routines for the lattice lter having extra \rungs,"
or else a lattice lter routine which uses the wavelet order as one of its parameters, and
generates the proper lattice structure automatically. All of the scheduling for new sizes of
the lattice lter changes due to the change in the number of delay elements.
The nal program revisions made the size of the code independent of the block size of the
wavelet transform. Before the revisions, the implementation called for a shift register, the
size of which was roughly proportional to the block size of the input data to be transformed.
By having a loop which moved the data one place ahead in memory, the size of that segment
of code, when executing, was proportional to the size of the shift register. This problem
was eliminated by using a pointer to indicate the beginning of the register, and shifting the
pointer instead of the data. The only tradeo was that each read/write of the shift register
must take into account the fact that the given address may exceed the allocated memory
space of the shift register, and therefore must wrap around to the beginning of the register.
The result of this was that more code locations were needed to do the extra wraparound
calculations, but the time saved over shifting all the register data, even for fairly small
registers, was quite signicant.
3.5 Running the Transformer
The wavelet transformer as it was nally implemented corresponds to Figure 9 except that
the structure was duplicated to have both the decomposition and reconstruction parts in the
same system, with the output of the decomposition section becoming the input to the recon-
struction section, and the output of the reconstruction going to the serial output module.
The schematic of this system as built with SPROClab is shown in Figure 13. In reference to
this schematic, altinsch is an input scheduler, altoutsc is the output scheduler, altinpl
handles the system input, and altoutpl handles the system output. The plug module is
an assembler routine with zero lines of code; it terminates an output which is not needed.
The \spec=" parameter on the input and output scheduler modules indicates an ASCII le
containing the lookup table values, in the order found in Table 2. The \memsize=" parame-
ter indicates the length of the shift register. There are two shift registers in this schematic:
One is for the decomposition lter and one is for the reconstruction lter. Note that gamma
coecients in the diagram are half the value of those listed above. This is due to numerical
limits of the hardware, and this division by two is compensated for within the lattice lter
module.
The system is run by connecting any signal source to the serial input port, and comparing
it to the reconstructed output from the serial output port. The decomposed wavelet outputs
can be viewed by probing the output of the decomposition lter. This signal is the concate-
nation of all four dilations, which unfortunately are not easily broken up into components
for viewing on an oscilloscope, but one can get a fairly good idea of what is occurring at
each dilation from the signal as it appears. The wavelet output is not static, but constantly
changes as long as the frequency of the input signal does not equal the rate of the A/D
converters divided by the input block size (e.g., sixteen). This demonstrates the dierence
18
between the Fourier and the wavelet concepts of frequency distribution; for a continuous
input signal from a function generator, the frequency distribution is static over innite time,
but the frequency distribution over a short period of time is not, unless the period of time
happens to cover exactly an integral number of cycles of the function.
A number of properties of discrete wavelets can be immediately conrmed by the trans-
former. Changing the dc bias voltage of the incoming signal aects the low-pass output of
the last dilation; in the case that all possible dilations are performed, only a single value in
the dilation changes. When the frequency of the input is very low, the outputs of the rst
one or two dilations are nearly zero, but increase in amplitude with increasing frequency.
The Daubechies-6 function produces a much
atter response over a larger frequency range
than the Haar function for most input waveforms. When the Haar wavelet is used to de-
compose a low frequency square wave, the resultant wavelet outputs are all zero except for
the low-pass output of the last dilation. This is expected due to the square wave shape of
the Haar function.
References
[1] Aszkenasy, R. \A Cascadable Wavelet Chip for Image Compression: Theory, Design, and
Practice," (unpublished note), 1991.
[2] Aware, Inc. \Aware Wavelet Transform Processor (Preliminary)," 1991.
[3] Daubechies, I. \Orthonormal Bases of Compactly Supported Wavelets," Comm. Pure
Applied Mathematics, vol. 41, 1988, pp. 909{996.
[4] Gabor, D. \Theory of Communication," J. Inst. of Electr. Eng., vol. 93, 1946, pp. 429{
457.
[5] Haar, A. \Zur Theorie der Orthogonalen Funktionensysteme," Mathematics Annal., vol.
69, 1910, pp. 331{371.
[6] Mallat, S. \A Theory for Multiresolution Signal Decomposition: The Wavelet Repre-
sentation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, 1989, pp.
674{693.
[7] STAR Semiconductor Corp. SPROClab Development System User's Guide, STAR Semi-
conductor, Warren, NJ, 1990.
[8] Strang, G. \Wavelets and Dilation Equations: A Brief Introduction," SIAM Review, vol.
31, no. 4, December 1989, pp. 614{627.
[9] Vaidyanathan, P., \Lattice Structures for Optimal Design and Robust Implementation
of Two-Channel Perfect Reconstruction QMF Banks," IEEE Trans. ASSP, vol. 36, No.
1, January 1988, pp.81{86.
19
trigger=port0
rate=19531.25
zone=tz1
SERIN1
SER_IN
blocksize=16
zone=tz1
CNT1
COUNTER
Figure 13: OrCAD schematic of the wavelet transformer.
O
U O
T U
LATT1 INSCH1 T LATT3 INSCH3
1 4 1 1 4
EVENIN 2 5 EVENIN EVENIN EVENIN
ODDIN 3 6 ODDIN COUNT 1 1 ODDIN 2 5 ODDIN COUNT 1
D2IN 4 7 D2IN OFFSET 2
3 4
INP1 D2IN 3
4
6
7 D2IN OFFSET 2 INP2
D3IN D3IN MEM_IN MEM_OUT D3IN D3IN MEM_IN 3 4 MEM_OUT
MEM_OUT 8 3
2 MEM_IN MEM_OUT 8 3
2 MEM_IN
ALTINSCH 1 OFFSET ALTINSCH 1 OFFSET
spec="decin0.dat" IN spec="recin0.dat" IN
memsize=49 ALTINPL memsize=26 ALTINPL
EVENOUT 5 blocksize=16 memsize=49 EVENOUT 5 blocksize=16 memsize=26
ODDOUT 6 OUTSCH1 SREG1 ODDOUT 6 OUTSCH3 SREG2
D2OUT 7 4 EVENOUT 2 MEM_PTR D2OUT 7 4 EVENOUT 2 MEM_PTR
D3OUT 8 5
6 ODDOUT COUNT 1 D3OUT 8 5 ODDOUT COUNT 1
LATTFILT 7 D2OUT OFFSET 2 6 D2OUT OFFSET 2
gamma1=-1.21275 D3OUT MEM_IN 3 1 OFFSET LATTFILT 7 D3OUT MEM_IN 3 1 OFFSET
gamma2=-.2730474 MEM_OUT 8 1 ALTSHIFT
gamma1=-.052944
gamma2=-.2730474 MEM_OUT 8 1 ALTSHIFT
gamma3=-.052944 ALTOUTSC memsize=49 gamma3=-1.21275 ALTOUTSC memsize=26
beta=-.332671 spec="decout0.dat" zone=tz1 beta=-.332671 spec="recout0.dat" zone=tz1
memsize=49 memsize=26
blocksize=16 I
PLUG1 OUTP1 blocksize=16 I
PLUG2 OUTP2
LATT2 INSCH2 PLUG LATT4 INSCH4 PLUG
EVENIN 1 4 EVENIN 1 MEM_PTR EVENIN 1 4 EVENIN 1 MEM_PTR
20
ODDIN 2 5 ODDIN 1
COUNT 2 2 OUT ODDIN 2 5 ODDIN 1
COUNT 2 2 OUT
D2IN 3 6 D2IN OFFSET 3 D2IN 3 6 D2IN OFFSET 3
D3IN 4 7 D3IN MEM_IN ALTOUTPL D3IN 4 7 D3IN MEM_IN ALTOUTPL
MEM_OUT 8 memsize=49 MEM_OUT 8 memsize=26
ALTINSCH ALTINSCH 1 1
spec="decin1.dat" spec="recin1.dat"
memsize=49 memsize=26
I I
EVENOUT 5 blocksize=16 EVENOUT 5 blocksize=16 N N
ODDOUT 6 OUTSCH2 ODDOUT 6 OUTSCH4
D2OUT 7 4 EVENOUT D2OUT 7 4 EVENOUT
D3OUT 8 5
6 ODDOUT COUNT 21 D3OUT 8 5
6 ODDOUT COUNT 1
LATTFILT 7 D2OUT OFFSET 3 LATTFILT 7 D2OUT OFFSET 2
gamma1=-1.21275 D3OUT MEM_IN 8 gamma1=-.052944 D3OUT MEM_IN 3
gamma2=-.2730474 MEM_OUT gamma2=-.2730474 MEM_OUT 8
gamma3=-.052944 ALTOUTSC gamma3=-1.21275 ALTOUTSC
beta=-.332671 spec="decout1.dat" beta=-.332671 spec="recout1.dat" SEROUT1 SEROUT2
memsize=49 memsize=26 SER_OUT SER_OUT
blocksize=16 blocksize=16 dest=port2 dest=port3
Star Semiconductor
Title
Wavelet Transformer
Size Document Number REV
B NS-01
Date: October 21, 1991 Sheet 1 of 1
Appendices
A Pyramid Algorithm to Lattice Filter: A Derivation
A.1 Deriving the Lattice Structure
Deriving the lattice lter structure is not a trivial task. This appendix will show the neces-
sary steps, starting with the Mallat pyramid algorithm as presented by Strang [8], needed
to achieve a form from which Vaidyanathan [9] has shown that the lattice lter can be
constructed.
We begin with the four lter functions, the two decomposition lters (6) and (7) and
the two reconstruction lters (8) and (9). From here, we take a direct z-transform, and
manipulate it in a manner similar to that used for a proof of the convolution property of the
z-transform. For the low-pass lter,
X1 1 MX ,1
() = 2n,j j ,n (11)
a z
2
n=,1 j =0
c f z ;
MX ,1 1
X
= 21 f j 2n,j
,n
c z ;
j =0 n=,1
1 MX,1 ( X 1
, m
)
, j2
= j m 2
2 j=0 m=,1
f c z z ;
X1
( 2) = 12
a z n
,n ( )
c z f z :
n=,1
The 2 term implies that is half the length of . is the order of the lter. Note that
z a f M
all arrays are indexed from zero for simplicity. The above equation can be rewritten as a
low-pass operator,
a z ( 2) = ( ) ( )
L z f z
where MX,1
( )= 2 1 ,n (12)
L z n c z :
n=0
Here, we have changed the limits in order to sum over the nonzero coecients only. The
same steps can be applied to the high-pass lter, and this is presented below:
X1 1 MX ,1
() =
b z
2 (,1)j j+1,2n j ,n
c f z ; (13)
n=,1 j =0
MX ,1 1
X
= 12 f j (,1)j j+1,2n ,n
c z ;
j =0 n=,1
1 MX ,1 ( X 1 )
= 2 (,1) 2n , m ,1 , m , 2j , 12
f j ,m 2 c z z z ;
j =0 m=,1
1
X
( 2) = 12
b z (,1)n 1,n ,n ( )
c z f z :
n=,1
b z
2
( ) = () ()H z f z
21
where MX,
( ) = 12 (,1)n ,n ,n
1
H z (14) c1 z :
n =0
This procedure can also be applied to the inverse lter functions to get
( ) = X
M, n 1
L z n c z ;
n =0
= (,) L z
1
(15)
;
MX,
(,1)n ,n n
1
( ) =
H z c1 z ;
n =0
= (,) H z
1
(16) :
Two proofs are needed to show that these transforms represent quadrature mirror lters.
The rst invokes the condition (4) and is:
+
L L H H = ( ) ( , )+ ( ) ( , )
L z L z
1
H z H z
1
" M, # "M , #
= 2 1 X , n 1X n 1
n n
c z c z
n
" M,
=0 n # "M , #
=0
+ 1 X
(,1) ,nn , n 1 X n
(,1) ,n n 1
2n
c1 z c1 z
=0 n =0
MX , 1 MX,
= 21
1 1
+
n 2 c ,n
2 2
c1
n =0n =0
1
( ) = 21
X
L z n ,n c z
n ,1 =
1 X1
(,) = 2 ,n
L z ,n1
c z
n ,1 =
1
, M, ( , ) = 1 X
2 ,1 ,n, M ,
( 1) 1
z L z c ( 1)
M,
, M , (, , ) = 1 X (,1)n ,n
1
( 1) 1
z L
2n z
=0
c1
z
, M , (, , ) =
( 1)
L ()
z
1
H z : (18)
Now the equations (17) and (18) match the requirements cited in [9] for a quadrature mirror
lter, or lattice structure. Reference the Vaidyanathan paper [9] for the remainder of the
proof.
If there appears to be a subscript problem in the above equations, it is because the Mallat
pyramid algorithm works only if a circular denition of coecients is assumed; i.e., in the
case of the high-pass lter, the coecients of subscript (1 , ) will never match up with n
22
the input function fn unless either the coecient space or the input function space wraps
around on itself. In this case we have dened cn = cn+M . This fact is often passed over in
the literature.
A.2 Conrming the Daubechies-6 Lattice Structure
Although the derivation of the lattice structure is dicult, a proof that the 6th-order lattice
structure is equivalent to the Mallat pyramid algorithm is fairly straightforward, if somewhat
messy. By directly working out the equations of the lattice lter outputs as a function of
the inputs, working in z-transform space with each delay unit contributing z,1, we get the
following for decomposition:
h
eo = oi
0 z ,2 + oi
1 z ,1 , oi
0
1
2 z ,1 , oi
2
, ei
0
2z0 , ei
0
1z,1 , ei
1
2z,1 + eiz,2 ;
i
h
oo = oi
0
2 z ,2 , oi
0
1 z ,1 , oi
1
2 z ,1 , oi
(19)
i
+ ei
0 + ei
0
1
2z,1 , ei
1z,1 + ei
2z,2 ; (20)
and two similar equations for reconstruction, except that the
coecients are reversed
(
0 *)
2 ) due to dierences in structure of the reconstruction lter. Here ei and oi stand
for the even and odd inputs, respectively, and eo and oo stand for the even and odd outputs,
respectively (reference Figure 8).
When performing the inverse z-transform of the decomposition lter, we take into account
the fact that the input is split into even and odd components. To an input function f (n),
each term oi contributes (n , 0) and each term ei contributes (n , 1). Each z,1 contributes
(n , 2). The even output eo becomes the high-pass lter output a(n), and the odd output
oo becomes the low-pass lter output b(n). The equations are:
bn=2 = ( )fn,5 + (
0 )fn,4 + (
2
1 +
1
0 )fn,3
+(
1 ,
0
1
2 )fn,2 + (
0
2 )fn,1 , (
2 )fn; (21)
an=2 = (
2 )fn,5 , (
0
2 )fn,4 + (
0
1
2 ,
1 )fn,3
,(
0
1 +
1
2 )fn,2 + (
0 )fn,1 , ( )fn: (22)
These equations are evaluated at even n only. Similar equations can be made from the
reconstruction lter equations, assuming that oi becomes b(n) and that ei becomes a(n);
each z,1 contributes (n , 1). The even outputs eo become f (n) for even n only, and the odd
outputs oo become f (n) for odd n only.
The last necessary step of the proof is to show that while equation (22) can be written
an =
X
1Nc
2n,j +1 fj ;
N
n = 1; : : : ; ;
2 j=1 2
there are only as many sum terms as there are nonzero coecients, so this can be validly
rewritten as a sum over the nonzero coecient space, as given in the pyramid algorithm
equations. If this is done, then the coecients in (22) correspond directly to the pyramid
algorithm coecients (except for a scaling factor), and the two forms are equivalent. The
purpose of the scaling factor mentioned above is to make the overall scaling the same for
23
the reconstruction and decomposition lters. Note that in the Mallat pyramid algorithm
discussion, decomposition lter outputs were scaled by 2, and the recomposition outputs by
1. For more regularity, this factor haspbeen evenly distributed between the decomposition
and reconstruction as the constant 1= 2.
The above equations can be easily solved for a one-rung lattice lter in order to nd
the lattice coecients for the Haar wavelet, which cannot be computed by some methods
currently in use. These coecients are shown below, along with the solution to the above
equations for the 6th-order Daubechies wavelet.
Wavelet
0
1
2 p
Haar 1.0 0.0 0.0 1= 2
Daubechies-6 -2.425492 -0.546095 -0.105888 -0.332671
Table 3: Lattice Coecients for two wavelet functions.
24
B Wavelet Transform Program
/* Daub6ds.c -- a wavelet decomposition program for the following */
/* parameters: */
/* Input block size = 16 */
/* Number of dilations = 4 */
/* Wavelet type = Daubechies 6th-order */
/* Written by Tim Edwards, June 1991 - October 1991 */
/* Note: for a reconstruction program, SRSIZE must be redefined, */
/* the gamma coefficients reversed, and new values entered */
/* in the lookup tables. */
#include <stdio.h>
#define SRSIZE 49 /* size of the decomposition xform shift register */
#define BLKSIZE 16 /* input block size--must be a power of two */
#define NUMFILT 2 /* number of lattice filters required */
#define HALFORD 3 /* one half of the order of the filter */
main() {
25
static int dintable[HALFORD-1][NUMFILT][BLKSIZE] = {
44, 47, 47, 41, 37, 39, 35, 37, 47, 36, 34, 42, 47, 43, 42, 43,
48, 43, 48, 38, 48, 36, 48, 38, 48, 44, 48, 40, 48, 44, 48, 48,
43, 47, 47, 40, 36, 38, 31, 36, 47, 35, 27, 34, 47, 38, 25, 42,
48, 42, 48, 37, 48, 35, 48, 35, 48, 33, 48, 33, 48, 31, 48, 48 };
43, 42, 38, 40, 36, 38, 36, 37, 34, 36, 34, 40, 32, 44, 48, 44,
48, 48, 48, 37, 48, 32, 48, 48, 48, 28, 48, 48, 48, 26, 48, 48 };
/* input placement */
evenin[k] = memory[evenintable[k][i]];
oddin[k] = memory[oddintable[k][i]];
for (l = 0; l < HALFORD - 1; l++)
din[k][l] = memory[dintable[l][k][i]];
26
dout[k][l] = e[k][l - 1] + gamma[l] * din[k][l - 1];
e[k][l] = e[k][l - 1] * gamma[l] - din[k][l - 1];
}
else { /* inverted rung */
dout[k][l] = e[k][l - 1] * gamma[l] + din[k][l - 1];
e[k][l] = e[k][l - 1] - gamma[l] * din[k][l - 1];
}
}
evenout[k] = beta * e[k][HALFORD - 1];
oddout[k] = beta * dout[k][HALFORD - 1];
}
/* output "placement" */
27