Vous êtes sur la page 1sur 4

DSP ENGINE FOR ULTRA-LOW-POWER AUDIO APPLICATIONS

R CSMorling., IKale., SJMorris' and F Custode'


* University of Westminster Zarlink Semiconductor
+

Applied DSP and VLSI Research Group, Medical BU, 10815 Rancho BemardoRd, Suite 210,
115 New Cavendish Street, London, WIW 6 U W ,UK San Diego, CA 92 127, USA.

ABSTRACT 2.1 First-Stage Decimator


An ultra-low-power DSP decimationlinterpolation structure is The first stage of decimation uses a fourth-degree slink filter which
described demonstrating how algorithmic and architectural is so named from its Fourier transfer function:
schemes were employed for ultimate power efficiency in a DSP H,(v,) = slink4(32,v,).exp(-jl24zv,) (I)
based chip set for audio applications. This circuit was designed
and synthesised for a low VT 0.35pm CMOS process allowing where vu is frequency normalised to the modulator rate of
Nyquist rate signals to be decimated from a high OSR S A front- 1280 kHz and slink(N, x ) = sin(Nm)/Nsin(m). The z-domain
end and interpolation post voice processing at the back end. The transfer function is
DSP has been fabricated and operates down to 0.9 V. At 1.25 V,
its current consumption is only 90 pA.
1. INTRODUCTION where ,z is the z-transform variable referenced to the modulator
This paper describes the design and performance of a dedicated rate. The most efficient implementation of this decimator is
processor implementing the DSP required for an ultra-low-power obtained by the direct implementation of Equation (2) shown in
Z-A Codec. Three key factors influence the power required to Figure 2. This structure is especially useful for Z - h converter
implement a given signal processing function. applications [I]. Note that circular arithmetic must be used
Use of arithmetic-efficient signal processing algorithms. i.e. throughout the structure. Any word wraps occuning in the
obtaining the maximum filtering for the minimum arithmetic. accumulators are automatically unwrapped by the differencers.
Adoption of power-efficient VLSI design.
Implementation using a process optimised for low-power.
The last requirement was achieved by using a logic library
specifically designed for low-power applications implemented in a
low-leakage DSM CMOS process. The first two requirements are
discussed in this paper.
2. STRUCTURAL MODEL O F THE DSP UNIT
The overall behavioural structure of the CODEC is shown in
Figure 1. It can be conveniently divided into two parts:
the ADC path consists of two stages of decimation from a
Figure 2: Slink4 Decimator Structure
1280 kHz data-stream from the analog modulator to 20 kilo-
samples/second (ksps), and a highpass filter; 2.2 Second-Stage Decimator
the DAC path consists of a highpass filter, an interpolator The final rate reduction from 40 ksps to 20 ksps is effected using
raising the sample rate from 20 ksps to 80 ksps, and a digital polyphase allpass structures. These filter structures are extremely
sigma-delta modulator operating at 640 ksps. effcient, especially in their Halfband (HE) form. The second
The two sections are only interconnected by means of the digital stage filtering is performed in two cascaded sub-stages.
loopback path which provides a means of testing the complete The first sub-stage is the Non-Halfband (NHB) two-path allpass
digital system with a 20 kps input and output signal. structure shown in Figure 3. The top branch is a second-order
NHB allpass filter and the lower branch is a third-order NHB
allpass filter (Figure 4). These filters are implemented using
Numeralor-Denominofor (NO) Tapped Delay Line (TDL)
structures. These use the minimum amount of arithmetic with 2
multiplications and 4 additions for a second-order NHB and half
this for a HB or first-order filter. The filter coefficients of such
filters are very insensitive to coefficient quantisation effects and
they were designed using the approach reported in [2]. Their
dynamic range performance is better than wave digital filter
(WDF) realisations [3]. Its only apparent disadvantage is that it
requires twice as many delayors than the WDF structures.
However, since the input and output are also the state variables,
they o f e n have to be stored in any case. Furthermore, the
Figure 1: Overall Behavioural Structure feedfonvard delay of the first stage can be shared between the
upper and lower paths and, when sections are cascaded, the

0-7803-7761-3/03/$17.00 02003 IEEE


v-357
feedback delay of the first stage can be shared with the feedfonvard 2.4 Highpass Filters
delayors of the second stage. Both the highpass filters at the output of the decimator and the
input of the interpolator are first-order filters implemented using
the same structure. This is an “EasyTune” design, so named
because the cut-off frequency and the gain adjustment (made to
keep the Nyquist gain unity) are controlled by only a single
parameter, p. The transfer function of the filter is:
A I I p s Filler

Figure 3: Polyphase NHB Lowpass Filter Structure


The structure (Figure 8) is an ND Time Delay and Accumulate
(TDA) form which is necessary because of the close proximity of
the pole and zero in the transfer function. It is compatible with the
allpass polyphase structures used in the decimator and interpolator
and uses only one multiplication. Consequently, it makes good use
of the allpass computing engine.

Figure 4: Seeoud- and Third-Order NHB ND TDL Allpass


Filter Structures
The signal is then split into two polyphase channels each operating Figure 8: “EasyTune” Highpass Filter Structure
at 20 ksps which are fed through first-order allpass filters and
recombined to form a single data stream (Figure 5 ) . This latter The frequency response ofthe decimator is shown in Figure 9.
structure is equivalent to a HB two-path allpass filter followed by a ., ~ . ~ l . . ~ . ~ ~ , ~ . _ ~ . ~
two-to-one decimation. Each branch is implemented as a first-order . , ~..I--L...L. l..-L. ..L
. 1
.
. -L..
ND TDL StNCNre (Figure 6 ) . 1o
I
-.+..+-
,
..,._
,
.
8 I , I
...,...
~~.
,

FMY(.L”Z

Figure 5: Halmand Polyphase Decimator Figure 9: Overall Decimator Transfer Function

X(4 4 2.5 AGC Envelope Detector


The signal processing required to produce an instantaneous
measurement of the signal power envelope consists of a relative
Hilbert transformer and a squarer. Its behavioural Structure is
shown in F i ~ r ~ .....l....o................................
I AIDanFi“..
Figure 6: 1st-Order ND TDL Allpass Filter Structure
2.3 Slink Compensator
The HE filter is followed by the slink compensator which
compensates for the passband roll-off of the slink decimator. The D,F”“nn* HiltmFi”
.................................................
j i
desired response of l/slink4(32, v1/32) (where y is frequency Figure 10: AGC Envelope Detector
normalised to 20 WIZ)is well approximated in the band of interest.
The structure, shown in Figure 7, is compatible with the structure Two basic types of operation are required to implement this
of the allpass filters to ease implementation in the same processor. algorithm: the allpass filters performing a differential Hilbert filter
and the sum-of-squares function. This results in the envelope
output being relatively insensitive to phase shifts of a sinusoidal
input. Note that, unlike the usual polyphase filter, the coefficients
are negative so the poles and zeros are real. The phase difference
is d 2 in the I Wz to 9 kHz band. Note that the absolute phase
shift is non-linear in both branches. It is the difference of the two
paths that results in a phase characteristic which is a good
approximation to the Hilbert transform.
Figure I: Slink Compensator

V-358
A HB ND TDL structure was used to realise the two halfband 3. DSP ARCHITECTURE
allpass filters. These structures are the same as the first-order The computational tasks of the CODEC can be partitioned into
allpass structures shown in Figure6 with the single delayors those operating at the modulator rate (1.28 MHz) and those
replaced by double delayors. The outputs of each allpass filter are operating at significantly lower speeds (between 2 0 W z and
separately squared then summed to produce the AGC output. 8OWz). The former are implemented by custom processors
whose architecture has the same form as their behavioural
2.6 Interpolator structure. The latter processing are undertaken by the Low-Speed
The interpolator translates the highpass-filtered and scaled input Processor (LSP) so named because it takes care of all the lower-
!?om the 20 ksps input rate by a factor of 4 to 80 ksps in two stages rate processing with the exception of the Slink differencer which
each implemented using polyphase filters. A simple zero-order- requires a 23-hit wordlength bigger than the 20 bits of the LSP.
hold (ZOH) register effects the final interpolation of 8. The overall block diagram of the architecture of the CODEC DSP
Both stages use the structure shown in Figure 1I to increase the is shown in Figure 13.
sample rate by a factor of 2. This is based on a halfband two-path
polyphase lowpass filter operating at the output rate. Since this is
equivalent to zero-insertion interpolation (ZII), there is a system r"S""I rn<nr.rr Slink
gain of 0.5. This is compensated by the removal of the 0.5 scaling
TO SrrYl ,..rhri
factor normally at the output of the filter. Because of the ZII and
sample offset in one path, the final addition is merely an
interleaving of the outputs !?om each allpass filter.

Allpas Filler
J+

Figure 11: Polyphase Interpolator Structure


Figure 13: Overall Architecture
The interpolator chain input has a gain of 0.5 to allow for transient
overshoot. A highpass filter, identical to that used at the end of the 3.1 Slink Decimator
decimator, is placed at the beginning of the Interpolator. The Slink Decimator consists of two parts: the Slink Accumulator
The first stage doubles the sample rate from 20 ksps to 40 ksps. In and the Slink Differencer. The architecture of the Slink
order to achieve the required frequency response, W O first-order Accumulator is a direct implementation of the Structure shown in
allpass filters, are used in cascade for each path of the polyphase Figure 2. with four separate accumulators in cascade operating at
structure. Again an ND TDL structure was used the analog modulator rate of 1280 ksps. In order to minimise the
The second stage of the interpolator takes the sample rate from propagation of glitches from one adder to the next, the registers are
40 ksps to 80 ksps. Since, for the second stage of interpolation, the placed in the foruard path rather than the feedback. Since the
transition band is much wider, single first-order ND TDL allpass input signal is always odd, the LSB processing is independent of
filters are sufficient in each path. the input signal. In the hardware implementation this bit is not
The final stage of interpolation is by a factor of 8 from 80 kHz to implemented and the LSB of the accumulator has a weight of 2.
640 kHz with ZOH filtering alone. Its Fourier transfer function is: To compensate for the loss of the true LSB, a I is injected into the
HZoH(~3= 2 )8slink(8,v3,) .exp(-j7nv,,) (4) carry input of the first accumulator on alternate cycles. The
where y, is frequency normalised to the modulator input rate of injection of carries into the later stages of the accumulator is
640kHz. At the output of the main interpolator, the first unnecessary as DC components in these stages are removed by the
replication of the input signal spectrum occurs around 8OkHz following differencers. This results in a slightly larger initial
which corresponds to the first spectral zero crossing of the ZOH. transient than if the appropriate carry inputs had been injected in
Thus, the first replication is attenuated by more than 20 dB. all four accumulators. However, this is still very small and short-
Figure 12 shows the overall frequency response of the interpolator. lived. The Slink Differencer computes the differences at a rate of
40 ksps and implements the cascade of four differencers shown in
Figure 2. Since the rate is relatively low, a single subtractor is
used in a dedicated special-purpose two-phase processor.
;.1 3.2 LSP Processor
4
The core of this processor is a two-phase Dverencer-Multiplier-
" a Adder unit ( D M C ) with an auxiliary two-point averager. This,
$- coupled with a Trip/e-PorrRAM(TPRAM),is capable of executing
-m
a first-order allpass filter in a single cycle. A second-order NHB or
'",WZ cascaded first-order allpass filter takes two processor cycles
Figure 12: Overall Interpolator Frequency Response because there are two separate multiplications required per filter.
The slink compensator and highpass filters (HPFs) are also
implemented in a single cycle. Thus the processor achieves 100%

v-359
multiplier utilisation! The two-point averager allows decimator (20 kHz) so the toggling of locations is effected by locating the
recombination and HPF gain adjustment to be done on-the-fly delayor pairs adjacent to each other on even address boundaries.
without wasting cycles in the main DMAC. A simplified view of For these address locations, the LSB of the address is
the architecture of this processor is shown in Figure 14. The basic automatically toggled at the end of each processing frame.
datapath is 20 bits wide which is necessary to keep the round-off
3.3 Control
noise well below the noise floor of the modulators. To remove any
The operation of the LSP is controlled by a microcode sequence
danger of word wrap, a guard bit is added at the output of the
which decodes the cycle counter of the Master Timer. A separate
differencer and another at the output of the multiplier. The two
microcode counter is not used. Since the LSP operates at a
guard bits are removed by the limiters which clip any overshoot at
640 kHz rate. there is a maximum of 32 processor cycles per
the output of the DMAC and averager. The output of the
Nyquist-rate sample operating at 20 kHz. Of these, 28 are actually
multiplier is rounded back to 19 fractional bits by the rounder
which used either the round-to-zero or the convmgent rounding used. For the remaining cycles the processor is suspended to
reduced power dissipation. The microcode is synthesised to gates
algorithm under microprogram control. The implicit division by 2
to minimise power and area.
wired into the averager uses convergent rounding. The state
At I .25 V it consumes only 80 pl per cycle when implemented on
variables are stored in a triple-port RAM (TRAM) with two
readlwrite addresses and one read-only address. Since only 29 the Zarlink 0.35 vm mixed-signal process. The whole processor,
locations are required, this is easily implemented in standard logic. including the digital Z-A modulator, consumes 90 pA at 1.25 V.
lnpuf fmm Scnd Inkrfncc loput hom Slink Ar~umuIalor
3.4 Design-for-Test Circuitry
Two areas of the LSP were identified as being difficult to test: the
TPRAM and the DMAC. Therefore, additional Dff hardware was
added to improve testabilitj' of these. In both cases, the test signal
input are generated intemally and the outputs are compressed to a
digital signature which are output serially. Both circuits are tested
using the normal clock strobes allowing at-speed testing and the
detection of timing-related faults. Since both circuits operate at a
640 WHZ rate with the T R A M read on the first half cycle and the
DMAC read during the second half cycle, the two serial outputs
are multiplexed to give a single serial signature output operating at
1280 Wz. Since the generation of the signatures is done with
dedicated circuitry, the signature analysis can mn during normal
operation and provide additional observability.
The digital loopback path is extremely useful for functionally
testing the digital circuitry in that the loop is entirely digital and
therefore the response can be predicted right down to the LSB.
These techniques achieved a fault coverage of over 98%.
4. POWER SAVING TECHNIQUES
The following general comments can be made:
Extremely effective multirate filtering can be achieved using
the ND-TDL allpass-based polyphase structure.
The DMAC processor architecture is capable of implementing
not only allpass filters hut also resonators and highpass filters
in a single cycle.
Clock gating two-phase latch-based timing offer lower power
Figure 14: LSP Architecture
and area compared with fully synchronous solutions.
In the allpass filters used in the differential Hilbert filter and the Signature-based BIST techniques provide a high degree of
NHB allpass filters used in the first suh-stage of the second stage fault coverage without the need for power-hungry scan paths.
decimator, the delayors occur in pairs. It is necessary to toggle the
mapping between the state variables and the RAM pair locations REFERENCES
on each cycle of the filters. Thus the second delayor is read on [ I ] Morling, R, I. Kale and C.W. Tsang, "The design of a sigma-
each cycle and the new feedfonvard or feedback sample is written delta CODEC for mobile telephone applications", Proc 2nd
back later in the same cycle. On the next execution of the filter, Int. Conf on Advanced A-D and D-A Conv. Techniques &
the addresses are swapped so that the first delayor becomes the their Appls. (ADDA '94),pp, 11-17. Cambridge, UK, July 94.
second and vice versa. Consequently, the content of the first [2] Kale, I., N.P. Murphy and M. V. Patel, "On establishing the
delayor is automatically available as the second delayor without bounds for binary scaled coefficients of fifth and seventh
the need for housekeeping transfers between memory locations. order polyphase halfband filters", Proc. IEEE Int. Symp. on
For the MiB filter operating at 40 Wz, this is simply implemented Ccts.&Sysfs.(lSCAS'94). vol.2, pp.473-476, London, May 94.
in the microcode by reversing the RAM locations in the second (31 Morling, R, & I Kale, "Dynamic range of allpass filter
invocation of the filter in the microcode. The differential Hilbert stmctures", Proc. IEEE In!. Symp. on Ccfs. & Sysfs.
filters, however, are only executed once per program cycle (ISCAS'O2). vol. 4, pp. 433-436, Phoenix, May 2002.

V-360

Vous aimerez peut-être aussi