Vous êtes sur la page 1sur 101

What is DSP?

Digital Signal Processing (DSP) is used in a wide variety of applications, and it is hard to find a good definition that is general. We can start by dictionary definitions of the words:

Digital Operating by the use of discrete signals to represent data in the form of numbers Signal A variable parameter by which information is conveyed through an electronic circuit Processing To perform operations on data according to programmed instructions
Which leads us to a si ple definition of:

Digital Signal processing Changing or analysing information, which is, measured as discrete sequences of numbers
!ote two uni"ue features of Digital Signal processing as opposed to plain old ordinary digital processing: Signals co e fro the real world # this inti ate connection with the real world leads to any uni"ue needs such as the need to react in real ti e and a need to easure signals and convert the to digital nu bers Signals are discrete # which eans the infor ation in between discrete sa ples is lost $he advantages of DSP are co on to Versatility: Digital syste s can be reprogra any digital syste s and include: ed for other applications (at least where

progra able DSP chips are used) Digital syste s can be ported to different hardware (for e%a ple a different DSP chip or board level product)

Repeatability: Digital syste s can be easily duplicated Digital syste s do not depend on strict co ponent tolerances Digital syste responses do not drift with te perature Simplicity: So e things can be done

ore easily digitally than with analogue syste s

DSP is used in a very wide variety of applications.

but

ost share so e co on features: $hey use a lot of aths ( ultiplying and adding signals) $hey deal with signals that co e fro the real world $hey re"uire a response in a certain ti e ost applications deal with

Where general#purpose DSP processors are concerned, signal fre"uencies that are in the audio range.

&onverting analogue signals


'ost DSP applications deal with analogue signals. $he analogue signal has to be converted to digital for

$he analogue signal # a continuous variable defined with infinite precision # is converted to a discrete se"uence of easured values, which are represented digitally. (nfor ation is lost in converting fro analogue to digital, due to: (naccuracies in the easure ent )ncertainty in ti ing *i its on the duration of the easure ent $hese effects are called "uantisation errors.

$he continuous analogue signal has to be held before it can be sa pled. +therwise, the signal would be changing during the easure ent.

+nly after it has been held can the signal be converted to a digital value.

easured, and the

easure ent

$he sa pling results in a discrete set of digital nu bers that represent easure ents of the signal # usually ta,en at e"ual intervals of ti e. !ote that the sa pling ta,es place after the hold. $his eans that we can so eti es use a slower -nalogue to Digital &onverter (-D&) than ight see re"uired at first sight. $he hold circuit ust act fast # fast enough that the signal is not changing during the ti e the circuit is ac"uiring the signal value # but the -D& has all the ti e that the signal is held to a,e its conversion. We don.t ,now what we don.t easure. (n the process of easuring the signal, so e infor ation is lost.

So eti es we ay have so e a priori ,nowledge of the signal, or be able to so e assu ptions that will let us reconstruct the lost infor ation.

a,e

-liasing
We only sa ple the signal at intervals. We don.t ,now what happened between the sa ples. - crude e%a ple is to consider a .glitch. that happened to fall between ad/acent sa ples. Since we don.t easure it, we have no way of ,nowing the glitch was there at all.

(n a less obvious case, we ight have signal co ponents that are varying rapidly in between sa ples. -gain, we could not trac, these rapid inter#sa ple variations. We ust sa ple fast enough to see the ost rapid changes in the signal. a,e

So eti es we ay have so e a priori ,nowledge of the signal, or be able to so e assu ptions about how the signal behaves in between sa ples. (f we do not sa ple fast enough, we cannot trac, co pletely the changes in the signal. ost rapid

So e higher fre"uencies can be incorrectly interpreted as lower ones.

(n the diagra , the high fre"uency signal is sa pled /ust under twice every cycle. $he result is, that each sa ple is ta,en at a slightly later part of the cycle. (f we draw a s ooth connecting line between the sa ples, the resulting curve loo,s li,e a lower fre"uency. $his is called .aliasing. because one fre"uency loo,s li,e another. !ote that the proble of aliasing is that we cannot tell which fre"uency we have # a high fre"uency loo,s li,e a low one so we cannot tell the two apart. 0ut so eti es we ay have so e a priori ,nowledge of the signal, or be able to a,e so e assu ptions about how the signal behaves in between sa ples, that will allow us to tell una biguously what we have. !y"uist showed that to distinguish una biguously between all signal fre"uency co ponents we ust sa ple faster than twice the fre"uency of the highest fre"uency co ponent.

(n the diagra , the high fre"uency signal is sa pled twice every cycle. (f we draw a s ooth connecting line between the sa ples, the resulting curve loo,s li,e the original signal. 0ut if the sa ples happened to fall at the 1ero crossings, we would see no signal at all # this is why the sa pling theore de ands we sa ple faster than twice the highest signal fre"uency. $his avoids aliasing. $he highest signal fre"uency allowed for a given sa ple rate is called the Nyquist frequency. -ctually, !y"uist says that we have to sa ple faster than the signal bandwidth, not the highest fre"uency. 0ut this leads us into ultirate signal processing, which is a ore advanced sub/ect.

-ntialiasing
!y"uist showed that to distinguish una biguously between all signal fre"uency co ponents we ust sa ple at least twice the fre"uency of the highest fre"uency co ponent. $o avoid aliasing, we si ply filter out all the high fre"uency co ponents before sa pling.

!ote that antialias filters sa pling.

ust be analogue # it is too late once you have done the

$his si ple brute force ethod avoids the proble of aliasing. 0ut it does re ove infor ation # if the signal had high fre"uency co ponents, we cannot now ,now anything about the . -lthough !y"uist showed that provide we sa ple at least twice the highest signal fre"uency we have all the infor ation needed to reconstruct the signal, the sa pling theore does not say the sa ples will look like the signal.

$he diagra shows a high fre"uency sine wave that is nevertheless sa pled fast enough according to !y"uist.s sa pling theore # /ust ore than twice per cycle. When straight lines are drawn between the sa ples, the signal.s fre"uency is indeed evident # but it loo,s as though the signal is a plitude odulated. $his effect arises because each sa ple is ta,en at a slightly earlier part of the cycle. )nli,e aliasing, the effect does not change the apparent signal fre"uency. $he answer lies in the fact that the sa pling theore says there is enough infor ation to reconstruct the signal # and the correct reconstruction is not /ust to draw straight lines between sa ples. $he signal is properly reconstructed fro the sa ples by low pass filtering: the low pass filter should be the sa e as the original antialias filter.

$he reconstruction filter interpolates between the sa ples to a,e a s oothly varying analogue signal. (n the e%a ple, the reconstruction filter interpolates between sa ples in a .pea,y. way that see s at first sight to be strange. $he e%planation lies in the shape of the reconstruction filter.s i pulse response.

$he i pulse response of the reconstruction filter has a classic .sin(%)2% shape. $he sti ulus fed to this filter is the series of discrete i pulses which are the sa ples. 3very ti e an i pulse hits the filter, we get .ringing. # and it is the superposition of all these pea,y rings that reconstructs the proper signal. (f the signal contains fre"uency co ponents that are close to the !y"uist, then the reconstruction filter has to be very sharp indeed. $his eans it will have a very long i pulse response # and so the long . e ory. needed to fill in the signal even in region of the low

4re"uency resolution
We only sa ple the signal for a certain ti e.

We cannot see slow changes in the signal if we don.t wait long enough. (n fact we ust sa ple for long enough to detect not only low fre"uencies in the signal, but also s all differences between fre"uencies. $he length of ti e for which we are prepared to sa ple the signal deter ines our ability to resolve ad/acent fre"uencies # the fre"uency resolution. We ust sa ple for at least one co plete cycle of the lowest fre"uency we want to resolve.

We can see that we face a forced co pro ise. We ust sa ple fast to avoid and for a long ti e to achieve a good fre"uency resolution. 0ut sa pling fast for a long ti e eans we will have a lot of sa ples # and lots of sa ples eans lots of co putation, for which we generally don.t have ti e. So we will have to co pro ise between resolving fre"uency co ponents of the signal, and being able to see high fre"uencies.

5uantisation
When the signal is converted to digital for , the precision is li ited by the nu ber of bits available. $he diagra shows an analogue signal, which is then converted to a digital representation # in this case, with 6 bit, precision. $he s oothly varying analogue signal can only be represented as a .stepped. wavefor due to the li ited precision. Sadly, the errors introduced by digitisation are both non#linear and signal dependent.

!on linear

eans we cannot calculate their effects using nor al

aths.

Signal dependent si ple eans.

eans the errors are coherent and so cannot be reduced by

$his is a co on proble in DSP. $he errors due to li ited precision (ie word length) are non linear (hence incalculable) and signal dependent (hence coherent). 0oth are bad news, and ean that we cannot really calculate how a DSP algorith will perfor in li ited precision # the only reliable way is to i ple ent it, and test it against signals of the type e%pected. $he non linearity can also lead to instability # particularly with ((7 filters.

$he word length of hardware used for DSP processing deter ines the available precision and dyna ic range. )ncertainty in the cloc, ti ing leads to errors in the sa pled signal.

$he diagra shows an analogue signal, which is held on the rising edge of a cloc, signal. (f the cloc, edge occurs at a different ti e than e%pected, the signal will be held at the wrong value. Sadly, the errors introduced by ti ing error are both non# linear and signal dependent. - real DSP syste suffers fro three sources of error due to li ited word length in the easure ent and processing of the signal: li ited precision due to word length when the analogue signal is converted to digital for errors in arith etic due to li ited precision within the processor itself li ited precision due to word length when the digital sa ples are converted bac, to analogue for

$hese errors are often called ."uantisation error.. $he effects of "uantisation error are in fact both non#linear and signal dependent. !on linear eans we cannot calculate their effects using nor al aths. Signal dependent eans that even if we could calculate their effect, we would have to do so separately for every type of signal we e%pect. - si ple way to get an idea of the effects of li ited word length is to odel each of the sources of "uantisation error as if it were a source of rando noise.

$he odel of "uantisation as in/ections of rando noise is helpful in gaining an idea of the effects. 0ut it is not actually accurate, especially for syste s with feedbac, li,e ((7 filters. $he effect of "uantisation error is often si ilar to an in/ection of rando noise.

$he diagra shows the spectru calculated fro a pure tone $he top plot shows the spectru with high precision (double precision floating point) $he botto plot shows the spectru bits $he effect loo,s very li,e low#level rando noise. $he signal to noise ratio is affected by the nu ber of bits in the data for at, and by whether the data is fi%ed point or floating point. when the sine wave is "uantised to 89

Su

ary
has three funda ental sources of li itation:

- DSP syste

*oss of infor ation because we only ta,e sa ples of the signal at intervals *oss of infor ation because we only sa ple the signal for a certain length of ti e 3rrors due to li ited precision (i.e. word length) in data storage and arith etic

$he effects of these li itations are as follows: -liasing is the result of sa pling, which

eans we cannot distinguish

between high and low fre"uencies *i ited fre"uency resolution is the result of li ited duration of sa pling, which eans we cannot distinguish between ad/acent fre"uencies

5uantisation error is the result of li ited precision (word length) when converting between analogue and digital for s, when storing data, or when perfor ing arith etic

-liasing and fre"uency resolution are funda ental li itations # they arise fro the athe atics and cannot be overco e. $hey are li itations of any sa pled data syste , not /ust digital ones. 5uantisation error is an artifact of the i perfect precision, and can be i proved upon by using an increased word length. (t is a feature peculiar to digital syste s. (ts effects are non#linear and signal dependent, but can so eti es be acceptably odeled as in/ections of rando noise.

$i e do ain processing &orrelation


&orrelation is a weighted oving average:

+ne signal provides the weighting function.

$he diagra

shows how a single point of the correlation function is calculated:

4irst, one signal is shifted with respect to the other $he a ount of the shift is the position of the correlation function point to be calculated 3ach ele ent of one signal is ultiplied by the corresponding ele ent of the

other $he area under the resulting curve is integrated

&orrelation re"uires a lot of calculations. (f one signal is of length ' and the other is of length !, then we need (! : ') ultiplications, to calculate the whole correlation function. !ote that really, we want to ultiply and then accu ulate the result # this is typical of DSP operations and is called a . ultiply2accu ulate. operation. (t is the reason that DSP processors can do ultiplications and additions in parallel.

&orrelation is a a%i u when two signals are si ilar in shape, and are in phase (or .unshifted. with respect to each other). &orrelation is a easure of the si ilarity between two signals as a function of ti e shift between the

$he diagra

shows two si ilar signals.

When the two signals are si ilar in shape and unshifted with respect to each other, their product is all positive. $his is li,e constructive interference, where the pea,s add and the troughs subtract to e phasise each other. $he area under this curve gives the value of the correlation function at point 1ero, and this is a large value. -s one signal is shifted with respect to the other, the signals go out of phase # the pea,s no longer coincide, so the product can have negative going parts. $his is a bit li,e destructive interference, where the troughs cancel the pea,s. $he area under this curve gives the value of the correlation function at the value of the shift. $he negative going parts of the curve now cancel so e of the positive going parts, so the correlation function is s aller. $he largest value of the correlation function shows when the two signals were si ilar in shape and unshifted with respect to each other (or .in phase.). $he breadth of the correlation function # where it has significant value # shows for how long the signals re ain si ilar.

&orrelation functions
$he correlation function shows how si ilar two signals are, and for how long they re ain si ilar when one is shifted with respect to the other. &orrelating a signal with itself is called autocorrelation. Different sorts of signal have distinctly different autocorrelation functions. We can use these differences to tell signals apart.

$he diagra

shows three different types of signal: noise is defined to be uncorrelated # this eans it is only si ilar to

7ando

itself with no shift at all. 3ven a shift of one sa ple either way eans there is no correlation at all, so the correlation function of rando noise with itself is a single sharp spi,e at shift 1ero. Periodic signals go in and out of phase as one is shifted with respect to the other. So they will show strong correlation at any shift where the pea,s coincide. $he autocorrelation function of a periodic signal is itself a periodic signal, with a period the sa e as that of the original signal. Short signals can only be si ilar to the selves for s all values of shift, so their autocorrelation functions are short.

$he three types of signal have easily recognisable autocorrelation functions.

-utocorrelation
-utocorrelation (correlating a signal with itself) can be used to e%tract a signal fro noise. $he diagra shows how the signal can be e%tracted fro the noise:

7ando noise has a distinctive .spi,e. autocorrelation function. - sine wave has a periodic autocorrelation function So the autocorrelation function of a noisy sine wave is a periodic function with a single spi,e, which contains all the noise power.

$he separation of signal fro noise using autocorrelation wor,s because the autocorrelation function of the noise is easily distinguished fro that of the signal.

&ross correlation (correlating a signal with another) can be used to detect and locate ,nown reference signal in noise.

$he diagra

shows how the signal can be located within the noise.

- copy of the ,nown reference signal is correlated with the un,nown signal. $he correlation will be high when the reference is si ilar to the un,nown signal. - large value for correlation shows the degree of confidence that the reference signal is detected. $he large value of the correlation indicates when the reference signal occurs.

&ross correlation to identify a signal

&ross correlation (correlating a signal with another) can be used to identify a signal by co parison with a library of ,nown reference signals.

$he diagra

shows how the un,nown signal can be identified.

- copy of a ,nown reference signal is correlated with the un,nown signal. $he correlation will be high if the reference is si ilar to the un,nown signal. $he un,nown signal is correlated with a nu ber of ,nown reference functions. - large value for correlation shows the degree of si ilarity to the reference. $he largest value for correlation is the ost li,ely atch.

&ross correlation is one way in which sonar can identify different types of vessel. 3ach vessel has uni"ue sonar .signature.. $he sonar syste has a library of pre#recorded echoes fro different vessels. -n un,nown sonar echo is correlated with a library of reference echoes. $he largest correlation is the ost li,ely atch.

&onvolution
&onvolution is a weighted oving average with one signal flipped bac, to front:

$he e"uation is the sa e as for correlation e%cept that the second signal (y;, # n<) is flipped bac, to front.

$he diagra $he diagra

shows how the un,nown signal can be identified. shows how a single point of the convolution function is calculated:

4irst, one signal is flipped bac, to front $hen, one signal is shifted with respect to the other $he a ount of the shift is the position of the convolution function point to be calculated 3ach ele ent of one signal is ultiplied by the corresponding ele ent of the

other $he area under the resulting curve is integrated

&onvolution re"uires a lot of calculations. (f one signal is of length ' and the other is of length !, then we need (! : ') ultiplications, to calculate the whole convolution function. !ote that really, we want to ultiply and then accu ulate the result # this is typical of DSP operations and is called a . ultiply2accu ulate. operation. (t is the reason that DSP processors can do ultiplications and additions in parallel. &onvolution is used for digital filtering. $he reason convolution is preferred to correlation for filtering has to do with how the fre"uency spectra of the two signals interact. &onvolving two signals is e"uivalent to ultiplying the fre"uency spectra of the two signals together # which is easily understood, and is what we ean by filtering. &orrelation is e"uivalent to ultiplying the complex conjugate of the fre"uency spectru of one signal by the fre"uency spectru of the other. &o ple% con/ugation is not so easily understood and so convolution is used for digital filtering. &onvolving by ultiplying fre"uency spectra is called fast convolution.

&onvolution to s ooth a signal

&onvolution is a weighted

oving average with one signal flipped bac, to front.

&onvolving a signal with a s ooth weighting function can be used to s ooth a signal:

$he diagra shows how a noisy sine wave can be s oothed by convolving with a rectangular s oothing function # this is /ust a oving average. $he s oothing property leads to the use of convolution for digital filtering.

&onvolution and correlation


&orrelation is a weighted oving average:

&onvolution is a weighted

oving average with one signal flipped bac, to front:

&onvolution and correlation are the sa e e%cept for the flip:

&onvolution is used for digital filtering. $he reason convolution is preferred to correlation for filtering has to do with how the fre"uency spectra of the two signals interact. &onvolving two signals is e"uivalent to ultiplying the fre"uency spectra of the two signals together # which is easily understood, and is what we ean by filtering. &orrelation is e"uivalent to ultiplying the complex conjugate of the fre"uency spectru of one signal by the fre"uency spectru of the other. &o ple% con/ugation is not so easily understood and so convolution is used for digital filtering. (f one signal is sy etric, convolution and correlation are identical:

(f one signal is sy etric, then flipping it bac, to front does not change it. So convolution and correlation are the sa e.

4re"uency analysis: 4ourier transfor s


=ean 0aptiste 4ourier showed that any signal or wavefor could be ade up /ust by adding together a series of pure tones (sine waves) with appropriate a plitude and phase. $his is a rather startling theory, if you thin, about it. (t eans, for instance, that by si ply turning on a nu ber of sine wave generators we could sit bac, and en/oy a 0eethoven sy phony. +f course we would have to use a very large nu ber of sine wave generators, and we would have to turn the on at the ti e of the 0ig 0ang and leave the on until the heat death of the universe. 4ourier.s theore assu es we add sine waves of infinite duration.

$he diagra shows how a s"uare wave can be ade up by adding together pure sine waves at the har onics of the funda ental fre"uency. -ny signal can be ade up by adding together the correct sine waves with appropriate a plitude and phase.

$he 4ourier transfor is an e"uation to calculate the fre"uency, a plitude and phase of each sine wave needed to a,e up any given signal.

$he 4ourier $ransfor (4$) is a athe atical for ula using integrals $he Discrete 4ourier $ransfor (D4$) is a discrete nu erical e"uivalent using su s instead of integrals $he 4ast 4ourier $ransfor (44$) is /ust a co putationally fast way to calculate the D4$

$he Discrete 4ourier $ransfor

involves a su

ation:

Where j is the s"uare root of inus one (defined as a nu ber whose sole property is that its s"uare is inus one). !ote that the D4$ and the 44$ involve a lot of ultiplying and then accu ulating the result # this is typical of DSP operations and is called a . ultiply2accu ulate. operation. (t is the reason that DSP processors can do ultiplications and additions

4re"uency spectra
)sing the 4ourier transfor , any signal can be analysed into its fre"uency co ponents.

$he diagra shows a recording of speech, and its analysis into fre"uency co ponents. With so e signals it is easy to see that they are co posed of different fre"uencies: for instance a chord played on the piano is obviously ade up of the different pure tones generated by the ,eys pressed. 4or other signals the connection to fre"uency is less obvious: for e%a ple a hand clap has a fre"uency spectru but it is less easy to see how the individual fre"uencies are generated. >ou can use a piano as an acoustic spectru a fre"uency spectru : analyser to show that a handclap has

+pen the lid of the piano and hold down the .loud. pedal &lap your hands loudly over the piano >ou will hear (and see) the strings vibrate to echo the clap sound $he strings that vibrate show the fre"uencies $he a ount of vibration shows the a plitude

3ach string of the piano acts as a finely tuned resonator.

)sing the 4ourier transfor , any signal can be analysed into its fre"uency co ponents. 3very signal has a fre"uency spectru . $he signal defines the spectru $he spectru defines the signal

We can ove bac, and forth between the ti e do ain and the fre"uency do ain without losing infor ation. $he above state ent is true athe atically, but is "uite incorrect in any practical sense # since we will lose infor ation due to errors in the calculation, or due to deliberately issing out so e infor ation that we can.t easure or can.t co pute. 0ut the basic idea is a good one when visualising ti e signals and their fre"uency spectra.

$he diagra shows a nu ber of signals and their fre"uency spectra. )nderstanding the relation between ti e and fre"uency do ains is useful: So e signals are easier to visualise in the fre"uency do ain So e signals are easier to visualise in the ti e do ain So e signals ta,e less infor ation to define in the ti e do ain So e signals ta,e less infor ation to define in the fre"uency do ain 4or e%a ple a sine wave ta,es a lot of infor ation to define accurately in the ti e do ain: but in the fre"uency do ain we only need three data # the fre"uency, a plitude and phase.

&onvolution
&onvolution is a weighted oving average with one signal flipped bac, to front:

&onvolution is the sa e as

ultiplying fre"uency spectra.

&onvolution by ultiplying fre"uency spectra can ta,e advantage of the 4ast 4ourier $ransfor # which is a co putationally efficient algorith . So this can be faster than convolution in the ti e do ain, and is called 4ast &onvolution.

$he 4ourier transfor duration.

assu es the signal is analysed over all ti e # an infinite

$his eans that there can be no concept of ti e in the fre"uency do ain, and so no concept of a fre"uency changing with ti e. 'athe atically, fre"uency and ti e are orthogonal # you cannot i% one with the other. 0ut we can easily understand that so e signals do have fre"uency co ponents that change with ti e. - piano tune, for e%a ple, consists of different notes played at different ti es: or speech can be heard as having pitch that rises and falls over ti e. $he Short $i e 4ourier $ransfor content changes with ti e: (S$4$) tries to evaluate the way fre"uency

$he diagra

shows how the Short $i e 4ourier $ransfor

wor,s:

$he signal is chopped up into short pieces -nd the 4ourier transfor is ta,en of each piece

3ach fre"uency spectru shows the fre"uency content during a short ti e, and so the successive spectra show the evolution of fre"uency content with ti e. $he spectra can be plotted one behind the other in a .waterfall. diagra as shown. (t is i portant to realise that the Short $i e 4ourier $ransfor involves accepting a contradiction in ter s because fre"uency only has a eaning if we use infinitely long sine waves # and so we cannot apply 4ourier $ransfor s to short pieces of a signal.

Short signals
$he 4ourier $ransfor wor,s on signals of infinite duration.

0ut if we only easure the signal for a short ti e, we cannot ,now what happened to the signal before and after we easured it. $he 4ourier $ransfor has to a,e an assu ption about what happened to the signal before and after we easured it. $he 4ourier $ransfor assu es that any signal can be ade by adding a series of sine waves of infinite duration. Sine waves are periodic signals. So the 4ourier $ransfor wor,s as if the data, too, were periodic for all ti e.

Short signals
(f we only easure the signal for a short ti e, the 4ourier $ransfor the data were periodic for all ti e. So eti es this assu ption can be correct: wor,s as is

$he diagra shows what happens if we only easure a signal for a short ti e: the 4ourier $ransfor wor,s as if the data were periodic for all ti e. (n the case chosen it happens that the signal is periodic # and that an integral nu ber of cycles fit into the total duration of the easure ent. $his eans that when the 4ourier $ransfor assu es the signal repeats, the end of one signal seg ent connects s oothly with the beginning of the ne%t # and the assu ed signal happens to be e%actly the sa e as the actual signal.

4re"uency lea,age
$here is a direct relation between a signal.s duration in ti e and the width of its fre"uency spectru : Short signals have broad fre"uency spectra *ong signals have narrow fre"uency spectra

(f we only easure the signal for a short ti e, the 4ourier $ransfor the data were periodic for all ti e.

wor,s as is

(f the signal is periodic, two case arise: (f an integral nu ber of cycles fit into the total duration of the easure ent, then when the 4ourier $ransfor assu es the signal repeats, the end of one signal seg ent connects s oothly with the beginning of the ne%t # and the assu ed signal happens to be e%actly the sa e as the actual signal. (f not "uite an integral nu ber of cycles fit into the total duration of the easure ent, and then when the 4ourier $ransfor assu es the signal repeats, the end of one signal seg ent does not connect s oothly with the beginning of the ne%t # the assu ed signal is si ilar to the actual signal, but has little .glitches. at regular intervals. $here is a direct relation between a signal.s duration in ti e and the width of its fre"uency spectru : Short signals have broad fre"uency spectra *ong signals have narrow fre"uency spectra

$he .glitches. are short signals. So they have a broad fre"uency spectru . -nd this broadening is superi posed on the fre"uency spectru of the actual signal:

(f the period e%actly fits the correct

easure ent ti e, the fre"uency spectru

is

(f the period does not spectru

atch the

easure ent ti e, the fre"uency

is incorrect # it is broadened

$his broadening of the fre"uency spectru deter ines the fre"uency resolution # the ability to resolve (that is, to distinguish between) two ad/acent fre"uency co ponents. +nly the one happy circu stance where the signal is such that an integral nu ber of cycles e%actly fit into the easure ent ti e gives the e%pected fre"uency spectru . (n all other cases the fre"uency spectru is broadened by the .glitches. at the ends. 'atters are ade worse because the si1e of the glitch depends on when the first easure ent occurred in the cycle # so the broadening will change if the easure ent is repeated. 4or e%a ple a sine wave .should. have a fre"uency spectru , which consists of one single line. 0ut in practice, if easured say by a spectru analyser, the fre"uency spectru will be a broad line # with the sides flapping up and down li,e 0at an.s cloa,. When we see a perfect single line spectru # for e%a ple in the charts so eti es provided with analogue to digital converter chips # this has in fact been obtained by tuning the signal fre"uency carefully so that the period e%actly fits the easure ent ti e and the fre"uency spectru is the best obtainable.

Windowing
(f we only easure the signal for a short ti e, the 4ourier $ransfor the data were periodic for all ti e. wor,s as if

(f not "uite an integral nu ber of cycles fit into the total duration of the easure ent, and then when the 4ourier $ransfor assu es the signal repeats, the end of one signal seg ent does not connect s oothly with the beginning of the ne%t # the assu ed signal is si ilar to the actual signal, but has little .glitches. at regular intervals. $he glitches can be reduced by shaping the signal so that its ends s oothly. Since we can.t assu e anything about the signal, we need a way to signal.s ends connect s oothly to each other when repeated. +ne way to do this is to ultiply the signal by a .window. function: atch ore

a,e any

$he easiest way to a,e sure the ends of a signal atch is to force the 1ero at the ends: that way, their value is necessarily the sa e.

to be

-ctually, we also want to a,e sure that the signal is going in the right direction at the ends to atch up s oothly. $he easiest way to do this is to a,e sure neither end is going anywhere # that is, the slope of the signal at its ends should also be 1ero. Put athe atically, a window function has the property that its value and all its derivatives are 1ero at the ends. 'ultiplying by a window function (called .windowing.) suppresses glitches and so avoids the broadening of the fre"uency spectru caused by the glitches.

Windowing can narrow the spectru

and

a,e it closer to what was e%pected

'ultiplying by a window function (called .windowing.) suppresses glitches and so avoids the broadening of the fre"uency spectru caused by the glitches. 0ut it is i portant to re e ber that windowing is really a distortion of the original signal:

$he diagra thought.

shows the result of applying a window function without proper

$he transient response really does have a broad fre"uency spectru # but windowing forces it to loo, ore as if it had a narrow fre"uency spectru instead. Worse than this, the window function has attenuated the signal at the point where it was largest # so has suppressed a large part of the signal power. $his eans the overall signal to noise ratio has been reduced. -pplying a window function narrows the fre"uency spectru at the e%pense of distortion and signal to noise ratio. 'any window functions have been designed to trade off fre"uency resolution against signal to noise and distortion. &hoice a ong the depends on ,nowledge of the signal and what you want to do with it.

Wavelets
4ourier.s theore assu es we add sine waves of infinite duration. is good at representing signals, which are

-s a conse"uence, the 4ourier $ransfor long and periodic. 0ut the 4ourier $ransfor and not periodic.

has proble s when used with signals, which are short,

+ther transfor s are possible # fitting the data with different sets of functions than sine waves. $he tric, is, to find a transfor which we are dealing. whose base set of functions loo, li,e the signal with

$he diagra shows a signal that is not a long, periodic signal but rather a periodic signal with decay over a short ti e. $his is not very well atched by the 4ourier $ransfor .s infinite sine waves. 0ut it ight be better atched by a different set of functions # say, decaying sine waves. Such functions are called .wavelets. and can be used in the .wavelet transfor .. !ote that the wavelet transfor cannot really be used to easure fre"uency, because fre"uency only has eaning when applied to infinite sine waves. 0ut, as with the Short $i e 4ourier $ransfor , we are always willing to stretch a point in order to gain a useful tool. $he 4ourier $ransfor .s real popularity derives not fro any particular athe atical erit, but fro the si ple fact that so e one (&ooley and $u,ey) anaged to write an efficient progra to i ple ent it # called the 4ast 4ourier $ransfor (44$). -nd now there are lots of 44$ progra s around for all sorts of processors, so it is li,ely the 44$ will re ain the ost popular ethod for any years because of its e%cellent support.

4iltering: 4iltering as a fre"uency selective process


4iltering is a process of selecting, or suppressing, certain fre"uency co ponents of a signal. - coffee filter allows s all particles to pass while trapping the larger grains. digital filter does a si ilar thing, but with ore subtlety. $he digital filter allows to pass certain fre"uency co ponents of the signal: in this it is si ilar to the coffee filter, with fre"uency standing in for particle si1e. 0ut the digital filter can be ore subtle than si ply trapping or allowing through: it can attenuate, or suppress, each fre"uency co ponents by a desired a ount. $his allows a digital filter to shape the fre"uency spectru of the signal. 4iltering is often, though not always, done to suppress noise. (t depends on the signal.s fre"uency spectru being different fro that of the noise:

$he diagra shows how a noisy sine wave viewed as a ti e do ain signal cannot be clearly distinguished fro the noise. 0ut when viewed as a fre"uency spectru , the sine wave shows as a single clear pea, while the noise power is spread over a broad fre"uency spectru . 0y selecting only fre"uency co ponents that are present in the signal, the noise can be selectively suppressed:

$he diagra shows how a noisy sine wave ay be .cleaned up. by selecting only a range of fre"uencies that include signal fre"uency co ponents but e%clude uch of the noise:

$he noisy sine wave (shown as a ti e signal) contains narrow band signal plus broad band noise $he fre"uency spectru is odified by suppressing a range outside the uch cleaner

signal.s fre"uency co ponents $he resulting signal (shown in the ti e do ain again) loo,s

Digital filter specifications


Digital filters can be ore subtly specified than analogue filters, and so are specified in a different way:

Whereas analogue filters are specified in ter s of their .?d0 point. and their .rolloff., digital filters are specified in ter s of desired attenuation, and per itted deviations fro the desired value in their fre"uency response: passband

The band of fre"uency co ponents that are allowed to pass


stopband passband ripple

The band of frequency components that are suppressed The maximum amount by which attenuation in the passband may deviate from nominal gain

stopband attenuation

The minimum amount by which frequency components in the stopband are attenuated
$he passband need not necessarily e%tend to the ? d0 point: for e%a ple, if passband ripple is specified as @.8 d0, then the passband only e%tends to a point at which attenuation has increased to @.8 d0. 0etween the passband and the stopband lies a transition band where the filter.s shape ay be unspecified.

!ote that the stopband attenuation is for ally specified as the attenuation to the top of the first side lobe of the filter.s fre"uency response. Digital filters can also have an .arbitrary response.: eaning, the attenuation is specified at certain chosen fre"uencies, or for certain fre"uency bands. Digital filters are also characterised by their response to an i pulse: a signal consisting of a single value followed by 1eroes:

$he i pulse response is an indication of how long the filter ta,es to settle into a steady state: it is also an indication of the filter.s stability # an i pulse response that continues oscillating in the long ter indicates the filter ay be prone to instability. $he i pulse response defines the filter /ust as well as does the fre"uency response.

4iltering in the fre"uency do ain


4iltering can be done directly in the fre"uency do ain, by operating on the signal.s fre"uency spectru :

$he diagra shows how a noisy sine wave can be cleaned up by operating directly upon its fre"uency spectru to select only a range of fre"uencies that include signal fre"uency co ponents but e%clude uch of the noise: $he noisy sine wave (shown as a ti e signal) contains narrow band signal plus broad band noise $he fre"uency spectru $he fre"uency spectru is calculated is odified by suppressing a range outside the the fre"uency spectru

signal.s fre"uency co ponents $he ti e do ain signal is calculated fro

$he resulting signal (shown in the ti e do ain again) loo,s

uch cleaner

4iltering in the fre"uency do ain is efficient, because every calculated sa ple of the filtered signal ta,es account of all the input sa ples. 4iltering in the fre"uency do ain is so eti es called .acausal. filtering because (at first sight) it violates the laws of cause and effect.

0ecause the fre"uency spectru contains infor ation about the whole of the signal # for all ti e values # sa ples early in the output ta,e account of input values that are late in the signal, and so can be thought of as still to happen. $he fre"uency do ain filter .loo,s ahead. to see what the signal is going to do, and so violates the laws of cause and effect. +f course this is nonsense # all it eans is we delayed a little until the whole signal had been received before starting the filter calculation # so filtering directly in the fre"uency do ain is perfectly per issible and in fact often the best ethod. (t is often used in i age processing. $here are good reasons why we ight not be able to filter in the fre"uency do ain: We ight not be able to afford to wait for future sa ples # often, we need to deliver the ne%t output as "uic,ly as possible, usually before the ne%t input is received We ight not have enough co putational power to calculate the 4ourier transfor We ight have to calculate on a continuous strea of sa ples without the

lu%ury of being able to chop the signal into convenient lu ps for the 4ourier transfor We ight not be able to /oin the edges of the signals s oothly after transfor ing bac, fro the fre"uency do ain

!one of the above reasons should a,e us ignore the possibility of fre"uency do ain filtering, which is very often the best ethod. (t is often used in i age processing, or certain types of e%peri ent where the data necessarily co es in bursts, such as !'7 or infrared spectroscopy.

Digital filter e"uation

+utput fro

a digital filter is

ade up fro

previous inputs and previous outputs,

using the operation of convolution: $wo convolutions are involved: one with the previous inputs, and one with the previous outputs. (n each case the convolving function is called the filter coefficients. $he filter can be drawn as a bloc, diagra : $wo convolutions are involved: one with the previous inputs, and one with the previous outputs. (n each case the convolving function is called the filter coefficients. $he filter can be drawn as a bloc, diagra : $he filter diagra can show what hardware ele ents will be re"uired when i ple enting the filter:

$he left hand side of the diagra shows the direct path, involving previous inputs: the right hand side shows the feedbac, path, operating upon previous outputs.

4ilter fre"uency response


Since filtering is a fre"uency selective process, the i portant thing about a digital filter is its fre"uency response. $he filter.s fre"uency response can be calculated fro its filter e"uation:

Where j is the s"uare root of inus one (defined as a nu ber whose sole property is that its s"uare is inus one). $he fre"uency response A(f) is a continuous function, even though the filter e"uation is a discrete su ation. Whilst it is nice to be able to calculate the fre"uency response given the filter coefficients, when designing a digital filter we want to do the inverse operation: that is, to calculate the filter coefficients having first defined the desired fre"uency response. So we are faced with an inverse proble . Sadly, there is no general inverse solution to the fre"uency response e"uation. $o a,e atters worse, we want to i pose an additional constraint on acceptable solutions. )sually, we are designing digital filters with the idea that they will be i ple ented on so e piece of hardware. $his eans we usually want to design a filter that eets the re"uire ent but which re"uires the least possible a ount of co putation: that is, using the s allest nu ber of coefficients. So we are faced with an insoluble inverse proble , on which we wish to i pose additional constraints. $his is why digital filter design is ore an art than a science: the art of finding an acceptable co pro ise between conflicting constraints. (f we have a powerful co puter and ti e to ta,e a coffee brea, while the filter calculates, the s all nu ber of coefficients ay not be i portant # but this is a pretty sloppy way to wor, and would be ore of an acade ic e%ercise than a piece of engineering.

4(7 filters
(t is uch easier to approach the proble of calculating filter coefficients if we si plify the filter e"uation so that we only have to deal with previous inputs (that is, we e%clude the possibility of feedbac,). $he filter e"uation is then si plified:

(f such a filter is sub/ected to an i pulse (a signal consisting of one value followed by 1eroes) then its output ust necessarily beco e 1ero after the i pulse has run through the su ation. So the i pulse response of such a filter ust necessarily

be finite in duration. Such a filter is called a 4inite ( pulse 7esponse filter or 4(7 filter.

$he filter.s fre"uency response is also si plified, because all the botto away:

half goes

(t so happens that this fre"uency response is /ust the 4ourier transfor coefficients. $he inverse solution to a 4ourier transfor 4ourier transfor .

of the filter

is well ,nown: it is si ply the inverse

So the coefficients for an 4(7 filter can be calculated si ply by ta,ing the inverse 4ourier transfor of the desired fre"uency response. Aere is a recipe for calculating 4(7 filter coefficients: BUT... Decide upon the desired fre"uency response &alculate the inverse 4ourier transfor )se the result as the filter coefficients

4(7 filter design by the window

ethod

So the filter coefficients for an 4(7 filter can be calculated si ply by ta,ing the inverse 4ourier transfor of the desired fre"uency response. BUT... $he inverse 4ourier transfor has to ta,e sa ples of the continuous desired

fre"uency response. $o define a sharp filter needs closely spaced fre"uency sa ples # so a lot of the So the inverse 4ourier transfor will give us a lot of filter coefficients 0ut we don.t want a lot of filter coefficients

We can do a better /ob by noting that: $he filter coefficients for an 4(7 filter are also the i pulse response of the filter $he i pulse response of an 4(7 filter dies away to 1ero So any of the filter coefficients for an 4(7 filter are s all -nd perhaps we can throw away these s all values as being less i portant

Aere is a better recipe for calculating 4(7 filter coefficients based on throwing away the s all ones: Pretend we don.t ind lots of filter coefficients Specify the desired fre"uency response using lots of sa ples &alculate the inverse 4ourier transfor $his gives us a lot of filter coefficients So truncate the filter coefficients to give us less $hen calculate the 4ourier transfor of the truncated set of coefficients to see if it still BUT... atches our re"uire ent

4(7 filter design by the window

ethod
of

4(7 filter coefficients can be calculated by ta,ing the inverse 4ourier transfor the desired fre"uency response and throwing away the s all values:

Pretend we don.t ind lots of filter coefficients Specify the desired fre"uency response using lots of sa ples &alculate the inverse 4ourier transfor $his gives us a lot of filter coefficients So truncate the filter coefficients to give us less $hen calculate the 4ourier transfor of the truncated set of coefficients to see if it still atches our re"uire ent

BUT... $runcating the filter coefficients eans we have a truncated signal. -nd a truncated signal has a broad fre"uency spectru :

So truncating the filter coefficients eans the filter.s fre"uency response can only be defined coarsely. *uc,ily, we already ,now a way to sharpen up the fre"uency spectru of a truncated signal, by applying a window function. So after truncation, we can apply a window function to sharpen up the filter.s fre"uency response:

So here is an even better recipe for calculating 4(7 filter coefficients: Pretend we don.t ind lots of filter coefficients Specify the desired fre"uency response using lots of sa ples &alculate the inverse 4ourier transfor $his gives us a lot of filter coefficients So truncate the filter coefficients to give us less -pply a window function to sharpen up the filter.s fre"uency response $hen calculate the 4ourier transfor of the truncated set of coefficients to see if it still atches our re"uire ent

$his is called the window BUT...

ethod of 4(7 filter design.

4(7 coefficients can be calculated using the window

ethod:

Pretend we don.t ind lots of filter coefficients Specify the desired fre"uency response using lots of sa ples &alculate the inverse 4ourier transfor $his gives us a lot of filter coefficients So truncate the filter coefficients to give us less -pply a window function to sharpen up the filter.s fre"uency response $hen calculate the 4ourier transfor of the truncated set of coefficients to see if it still atches our re"uire ent

BUT... 'ost window functions have a fi%ed attenuation to the top of their first side lobe:

!o atter how any filter coefficients you throw at it, you cannot i prove on a fi%ed window.s attenuation. $his eans that the art of 4(7 filter design by the window ethod lies in an appropriate choice of window function:

4or e%a ple, if you need attenuation of B@ d0 or less, then a rectangle window is acceptable. (f you need C? d0 you are forced to choose the Aanning window, and so on. Sadly, the better window functions need ore filter coefficients before their shape can be ade"uately defined. So if you need only BD d0 of attenuation you should choose a triangle window functions which will give you this attenuation: the Aa ing window, for e%a ple, would give you ore attenuation but re"uire ore filter coefficients to be ade"uately defined # and so would be wasteful of co puter power. $he art of 4(7 filter design by the window ethod lies in choosing the window function, which eets your re"uire ent with the ini u nu ber of filter coefficients. >ou ay notice that if you want an attenuation of ?@ d0 you are in trouble: the triangle window is not good enough but the Aanning window is too good (and so uses ore coefficients than you need). $he Eaiser window function is uni"ue in that its shape is variable. - variable para eter defines the shape, so the Eaiser window is uni"ue in being able to atch precisely the attenuation you re"uire without over perfor ing.

4(7 filter design by the e"uiripple

ethod
ethod.

4(7 filter coefficients can be calculated using the window

0ut the window ethod does not correspond to any ,nown for of opti isation. (n fact it can be shown that the window ethod is not opti al # by which we ean, it does not produce the lowest possible nu ber of filter coefficients that /ust eets the re"uire ent.

$he art of 4(7 filter design by the window ethod lies in choosing the window function, which eets your re"uire ent with the ini u nu ber of filter coefficients. (f the window ethod design is not good enough we have two choices:

)se another window function and try again Do so ething clever

$he 7e e1 3%change algorith is so ething clever. (t uses a athe atical opti isation ethod. $he following e%planation is not athe atically correct, but since we are trying to get an idea of what is going on, and not trying to duplicate the thin,ing of geniuses, it is worth going through anyway. )sing the window ethod to design a filter we ight proceed anually as follows: &hoose a window function that we thin, will do &alculate the filter coefficients &hec, the actual filter.s fre"uency response against the design goal (f it over perfor s, reduce the nu ber of filter coefficients or rela% the window function design $ry again until we find the filter with the lowest nu ber of filter coefficients possible (n a way, this is what the 7e e1 3%change algorith does auto atically. (t iterates between the filter coefficients and the actual fre"uency response until it finds the filter that /ust eets the specification with the lowest possible nu ber of filter coefficients. -ctually, the 7e e1 3%change algorith never really calculates the fre"uency response: but it does ,eep co paring the actual with the design goal. 7e e1 was a 7ussian. $wo - ericans # Par,s and 'c*ellan # wrote a 4+7$7-! progra to i ple ent the 7e e1 algorith . So this type of filter design is often called a Par,s 'c*ellan filter design. $he 7e e12Par,s 'c*ellan ethod produces a filter, which /ust eets the specification without over perfor ing. 'any of the window ethod designs actually perfor better as you ove further away fro the passband: this is wasted perfor ance, and eans they are using ore filter coefficients than they need. Si ilarly, any of the window ethod designs actually perfor better than the specification within the passband: this is also wasted perfor ance, and eans they are using ore filter coefficients than they need. $he 7e e12Par,s 'c*ellan ethod perfor s /ust as well as the specification but no better: one ight say it produces the worst possible design that /ust eets the specification at the lowest possible cost # al ost a definition of practical engineering. So 7e e12Par,s 'c*ellan designs have e"ual ripple # up to the specification but no ore # in both passband and stopband. $his is why they are often called e"uiripple designs.

$he e"uiripple design produces the ost efficient filters # that is, filters that /ust eet the specification with the least nu ber of coefficients. 0ut there are reasons why they ight not be used in all cases: - particular filter shape ay be desired, hence a choice of a particular window function 3"uiripple is very ti e consu ing # a design that ta,es a few seconds to co plete using the window ethod can easily ta,e ten or twenty inutes with the e"uiripple ethod $he window ethod is very si ple and easy to include in a progra # for e%a ple, where one had to calculate a new filter according to so e dyna ically changing para eters $here is no guarantee that the 7e e1 3%change algorith will converge # it ay converge to a false result (hence e"uiripple designs should always be chec,ed): or it ay not converge ever (resulting in hung co puters, divide by 1ero errors and all sorts of other horrors)

((7 filters: Digital filter e"uation


+utput fro a digital filter is ade up fro using the operation of convolution: previous inputs and previous outputs,

$wo convolutions are involved: one with the previous inputs, and one with the previous outputs. (n each case the convolving function is called the filter coefficients. (f such a filter is sub/ected to an i pulse (a signal consisting of one value followed by 1eroes) then its output need not necessarily beco e 1ero after the i pulse has run through the su ation. So the i pulse response of such a filter can be infinite in duration. Such a filter is called an (nfinite ( pulse 7esponse filter or ((7 filter.

!ote that the i pulse response need not necessarily be infinite: if it were, the filter would be unstable. (n fact for ost practical filters, the i pulse response will die away to a negligibly s all level. +ne ight argue that athe atically the response can go on for ever, getting s aller and s aller: but in a digital world once a level gets below one bit it ight as well be 1ero. $he (nfinite ( pulse 7esponse refers to the ability of the filter to have an infinite i pulse response and does not i ply that it necessarily will have one: it serves as a warning that this type of filter is prone to feedbac, and instability. $he filter can be drawn as a bloc, diagra :

$he filter diagra can show what hardware ele ents will be re"uired when i ple enting the filter:

$he left hand side of the diagra shows the direct path, involving previous inputs: the right hand side shows the feedbac, path, operating upon previous outputs.

4ilter fre"uency response


Since filtering is a fre"uency selective process, the i portant thing about a digital filter is its fre"uency response. $he filter.s fre"uency response can be calculated fro its filter e"uation:

Where j is the s"uare root of inus one (defined as a nu ber whose sole property is that its s"uare is inus one). $he fre"uency response A(f) is a continuous function, even though the filter e"uation is a discrete su ation. Whilst it is nice to be able to calculate the fre"uency response given the filter coefficients, when designing a digital filter we want to do the inverse operation: that is, to calculate the filter coefficients having first defined the desired fre"uency response. So we are faced with an inverse proble . Sadly, there is no general inverse solution to the fre"uency response e"uation. $o a,e atters worse, we want to i pose an additional constraint on acceptable solutions. )sually, we are designing digital filters with the idea that they will be i ple ented on so e piece of hardware. $his eans we usually want to design a filter that eets the re"uire ent but which re"uires the least possible a ount of

co putation: that is, using the s allest nu ber of coefficients. So we are faced with an insoluble inverse proble , on which we wish to i pose additional constraints. $his is why digital filter design is ore an art than a science: the art of finding an acceptable co pro ise between conflicting constraints. (f we have a powerful co puter and ti e to ta,e a coffee brea, while the filter calculates, the s all nu ber of coefficients ay not be i portant # but this is a pretty sloppy way to wor, and would be ore of an acade ic e%ercise than a piece of engineering.

$he 1 transfor
$he e"uation for the filter.s fre"uency response can be si plified by substituting a new variable, 1 :

!ote that 1 is a co ple% nu ber.

&o ple% nu bers can be drawn using an -rgand diagra . $his is a plot where the hori1ontal a%is represents the real part, and the vertical a%is the i aginary part, of the nu ber.

$he co ple% variable z is shown as a vector on the -rgand diagra . $he 1 transfor 1: is defined as a su of signal values % ;n< ultiplied by powers of

Which has the curious property of letting us generate an earlier signal value fro present one, because the 1 transfor of % ;n#8< is /ust the 1 transfor of % ;n< ultiplied by (821):

So the 1 transfor of the last signal value can be obtained by ultiplying the 1 transfor of the current value by (821). $his is why, in the filter diagra , the delay ele ents are represented for ally using the 821 notation.

$he

eaning of 1

1 is a co ple% nu ber:

When drawn on the -rgand diagra , 1 has the curious property that it can only have a agnitude of 8:

So 1, which is the variable used in our fre"uency response, traces a circle of radius 8 on the -rgand diagra . $his is called the unit circle. $he ap of values in the 1 plane is called the transfer function A (1).

$he fre"uency response is the transfer function A (1) evaluated around the unit circle on the -rgand diagra of 1:

!ote that in the sa pled data 1 plane, fre"uency response which helps to visualise the effect of aliasing. 1 is a co ple% nu ber:

aps onto a circle #

When drawn on the -rgand diagra , 1 has the curious property that it can only have a agnitude of 8:

So 1, which is the variable used in our fre"uency response, traces a unit circle on the -rgand diagra . -t first sight, 1 can have no value off the unit circle. 0ut if we use a athe atical fiction for a o ent, we can i agine that the fre"uency f could itself be a co ple% nu ber:

(n which case, the j fro ter :

i aginary fre"uency co ponent can cancel the j in the 1

-nd the i aginary co ponent of fre"uency introduces a straightforward e%ponential decay on top of the co ple% oscillation: showing that 1 can be off the unit circle, and that if it is this relates to transient response. $he i aginary fre"uency has to do with transient response, while the real fre"uency (both real as in actual, and real as in the real part of a co ple% nu ber) has to do with steady state oscillation. 4or real fre"uencies 1 lies on the unit circle. Falues of the transfer function A (1) for 1 off the unit circle relate to transient ter s:

$he position of 1, inside or outside the unit circle, deter ines the stability of transient ter s: (f 1 is inside the unit circle, the transient ter s will die away (f 1 is on the unit circle, oscillations will be in a steady state (f 1 is outside the unit circle, the transient ter s will increase

Poles and 1eroes


$he ((7 filter.s transfer function is a ratio of ter s. (f the nu erator beco es 1ero, the transfer function will also beco e 1ero # this is called a 1ero of the function (f the deno inator beco es 1ero, we have a division by 1ero # the function can beco e infinitely large # this is called a pole of the function

$he positions of poles (very large values) affect the stability of the filter:

$he shape of the transfer function A (1) is deter ined by the positions of its poles and 1eroes:

$his can be visualised using the rubber sheet analogy: ( agine the -rgand diagra laid out on the floor Place tall vertical poles at the poles of the function Stretch a rubber sheet over the poles -t 1eroes, pin the rubber sheet to the floor $he rubber sheet will ta,e up a shape which is deter ined by the position of the poles and 1eroes

$han,s are due to =i 7ichardson for the rubber sheet analogy, which ca e to ind while he was an instructor officer at the 7oyal !aval 3ngineering &ollege, Devonport. !ow the fre"uency response is the transfer function A (1) evaluated around the unit circle on the -rgand diagra of 1:

-nd since the shape of the transfer function can be deter ined fro of its poles and 1eroes, so can the fre"uency response.

the positions

$he fre"uency response can be deter ined by tracing around the unit circle on the -rgand diagra of the 1 plane: Pro/ect poles and 1eroes radially to hit the unit circle Poles cause bu ps Geroes cause dips $he closer to the unit circle, the sharper the feature

((7 filter design by i pulse invariance


Direct digital ((7 filter design is rarely used, for one very si ple reason: !obody ,nows how to do it

While it is easy to calculate the filter.s fre"uency response, given the filter coefficients, the inverse proble # calculating the filter coefficients fro the desired fre"uency response # is so far an insoluble proble . !ot any te%tboo,s ad it this. 0ecause we do not ,now how to design digital ((7 filters, we have to fall bac, on analogue filter designs (for which the athe atics is well understood) and then transfor these designs into the sa pled data 1 plane -rgand diagra . !ote that the filter.s i pulse response defines it /ust as well as does its fre"uency response. Aere is a recipe for designing an ((7 digital filter: Decide upon the desired fre"uency response Design an appropriate analogue filter &alculate the i pulse response of this analogue filter

Sa ple the analogue filter.s i pulse response )se the result as the filter coefficients

$his process is called the ethod of i pulse invariance. $he ethod of i pulse invariance see s si ple: but it is co plicated by all the proble s inherent in dealing with sa pled data syste s. (n particular the ethod is sub/ect to proble s of aliasing and fre"uency resolution.

((7 filter design by the bilinear transfor


$he ethod of filter design by i pulse invariance suffers fro aliasing.

$he aliasing will be a proble if the analogue filter prototype.s fre"uency response has significant co ponents at or beyond the !y"uist fre"uency. $he proble with which we are faced is to transfor the analogue filter design into the sa pled data 1 plane -rgand diagra . $he proble of aliasing arises because the fre"uency a%is in the sa pled data 1 plane -rgand diagra is a circle: (n the analogue do ain the fre"uency a%is is an infinitely long straight line (n the sa pled data 1 plane -rgand diagra the fre"uency a%is is a circle

!ote also that: (n the analogue do ain transient response is shown along the hori1ontal a%is (n the sa pled data 1 plane -rgand diagra radially outwards fro the center. transient response is shown

$he proble of aliasing arises because we wrap an infinitely long, straight fre"uency a%is around a circle. So the fre"uency a%is wraps around and around,

and any co ponents above the !y"uist fre"uency get wrapped bac, on top of other co ponents.

$he bilinear transfor is a ethod of s"uashing the infinite, straight analogue fre"uency a%is so that it beco es finite. $his is li,e s"uashing a concertina or accordion. $o avoid s"uashing the filter.s desired fre"uency response too uch, the bilinear transfor s"uashes the far ends of the fre"uency a%is the ost # leaving the iddle portion relatively uns"uashed:

$he infinite, straight analogue fre"uency a%is is s"uashed so that it beco es finite # in fact /ust long enough to wrap around the unit circle once only. $his is also so eti es called fre"uency warping Sadly, fre"uency warping does change the shape of the desired filter fre"uency response. (n particular, it changes the shape of the transition bands. $his is a pity, since we went to a lot of trouble designing an analogue filter prototype that gave us the desired fre"uency response and transition band shapes. +ne way around this is to warp the analogue filter design before transfor ing it to the sa pled data 1 plane -rgand diagra : this warping being designed so that it will be e%actly undone by the fre"uency warping later on. $his is called prewarping.

Direct for

4ilters can be drawn as diagra s:

$his particular diagra is called the direct for drawn directly fro the filter e"uation.

8 because the diagra

can be

$he filter diagra can show what hardware ele ents will be re"uired when i ple enting the filter:

$he left hand side of the diagra shows the direct path, involving previous inputs: the right hand side shows the feedbac, path, operating upon previous outputs.

Direct for
$he filter diagra

((
for direct for 8 can be drawn direct fro the filter e"uation:

$he bloc, diagra is in two halves: and since the results fro each half are si ply added together it does not atter in which order they are calculated. So the order of the halves can be swapped:

!ow, note that the result after each delay is the sa e for both branches. So the delays down the center can be co bined:

$his is called direct for B. (ts advantage is that it needs less delay ele ents. -nd since delay ele ents re"uire hardware (for e%a ple, processor registers) the direct for B re"uires less hardware and so is ore efficient than direct for 8. direct for B is also called canonic, which si ply nu ber of delay ele ents.. eans .having the ini u

$he transposition theore says that if we ta,e a filter diagra and reverse all the ele ents # swapping the order of e%ecution for every ele ent, and reversing the direction of all the flow arrows # then the result is the sa e: (f everything is turned bac, to front, it all wor,s /ust the sa e 8 diagra can be obtained by transposition of the

$his eans that the direct for direct for B diagra :

4or this reason, direct for 8 is often called transposed direct for B. Don.t as, e why these ter s see to be as confusing as they possibly could be # ( didn.t a,e the up. ( i agine athe aticians sit around at coffee brea, and co e up with new ways to spread despondency a ongst us lesser ortals. Aere are the two ain sources of confusion: Direct for 8 So called because it can be drawn direct from the filter e"uation Direct for $ransposed B

So called because it can be derived by changing the diagram of direct form 1

So called because it is obtained by transposition of direct form 2 - but really, this is ust direct form 1
&anonic

So called because it has the minimum number of delay elements - but really, it is ust direct form 2

5uantisation in ((7 filters


Digital filters are e%a ples of sa pled data syste s. Sa pled data syste s suffer fro proble s of li ited precision, which lead to "uantisation errors. -part fro errors when easuring signals, these arise within the hardware used for processing:

Pri ary sources of "uantisation error are: 3rrors in arith etic within the hardware (for e%a ple 89 bit fi%ed point round off) $runcation when results are stored ( ost DSP processors have e%tended registers internally, so truncation usually occurs when results are stored to e ory) 5uantisation of filter coefficients which have to be stored in e ory

$he effects of "uantisation, saturation and overflow are all non#linear, signal dependent errors. $his is bad news because non#linear effects cannot be calculated using nor al athe atics: and because signal dependent (coherent) errors cannot be treated statistically using the usual assu ptions about rando ness of noise # they will depend on what signals are being processed.

+ne e%a ple of a strange effect of non#linear syste s is li it cycles:

- non#linear syste can oscillate at a low bit level, even when there is no input. $his is not the sa e as the i pulse response being infinite # it is a result of a non# linear (or chaotic) syste . *i it cycles can so eti es be heard when listening to odern .sig a delta. digital to analogue converters. $hese chips use long digital filters, which are sub/ect to non#linear errors # and you can so eti es hear the effect of li it cycles as "uiet hisses or pops even when the digital output to the D-& is held steady. When treating "uantisation effects we usually ac,nowledge that these are non# linear, signal dependent errors:

but we often

odel these as if they were in/ections of rando

noise:

Sadly, with ((7 filters the non#linear, signal dependent effects do inate and the odel of "uantisation as rando noise is co pletely inade"uate. $he effects will also depend on the hardware used to i ple ent the filter: for e%a ple ost DSP processors have e%tended registers internally # whether these are used or not will affect the "uantisation error crucially. (t is not possible to odel the effects of "uantisation in an ((7 filter using si ple rando noise odels. So e idea of "uantisation effects in ((7 filters can be gained using co ple% statistical odels: but really the only way to evaluate the effects of "uantisation in an ((7 filter is to: ( ple ent the filter on the target hardware $est it with signals of the sort e%pected

5uantisation in ((7 filters


((7 filters are very sensitive to "uantisation errors. $he higher the order of the filter, the ore it suffers fro "uantisation effects: because the filter is ore co ple%, and so the errors accu ulate ore.

(n fact, ((7 filters are so sensitive to "uantisation errors that it is generally unrealistic to e%pect anything higher than a second order filter to wor,. $his is why ((7 filters are usually realised as second order sections. 'ost analogue filters (e%cept for 0essel filters) are also usually realised as second order sections, which is a convenient e%cuse but not the real one. Second order sections can be co bined to create higher order filters:

5uantisation in ((7 filters


5uantisation errors can be ini ised by ,eeping values large # so that the a%i u nu ber of bits is used to represent the . $here is a li it to how large nu bers can be, deter ined by the precision of the hardware used for processing. (f the a%i u nu ber si1e is e%ceeded, the hardware ay allow overflow or saturation:

Saturation and overflow are both non#linear "uantisation errors. !ote that overflow, although loo,ing ore drastic than saturation, ay be preferred. (t is a property of two.s co ple ent integer arith etic that if a series of nu bers are added together, even if overflow occurs at inter ediate stages, so long as the result is within the range that can be represented the result will be correct. +verflow or saturation can be avoided by scaling the input to be s all enough that overflow does not occur during the ne%t stage of processing. $here are two choices: Scaling the input so that overflow can never occur Scaling the input so that the biggest reasonably e%pected signal never overflows Scaling reduces the nu ber of bits left to represent a signal (dividing down eans so e low bits are lost), so it increases "uantisation errors. Scaling re"uires an e%tra ultiplier in the filter, which eans ore hardware:

!ote that hardware with higher precision or using floating point arith etic, re"uire scaling and so can i ple ent filters with less operations.

ay not

Parallel and cascade ((7 structures


0ecause ((7 filters are very sensitive to "uantisation errors, they are usually i ple ented as second order sections. $he parallel for is si ple:

$he outputs fro

each second order section are si ply added together.

(f scaling is re"uired, this is done separately for each section. (t is possible to scale each section appropriately, and by a different scale factor, to ini ise "uantisation error. (n this case another e%tra ultiplier is re"uired for each sectionH to scale the individual section outputs bac, to the sa e co on scale factor before adding the . $he order in which parallel sections are calculated does not outputs are si ply added together at the end. atter, since the

(n the cascade for , the output of one section for s the input to the ne%t:

'athe atically, it does not atter in which order the sections are placed # the result will be the sa e. $his assu es that there are no errors. (n practice, the propagation of errors is crucial to the success of an ((7 filter so the order of the sections in the cascade for is vital.

&ascade ((7 structure


(n the cascade for , the output of one section for s the input to the ne%t:

(n practice, the propagation of errors is crucial to the success of an ((7 filter so the order of the sections in the cascade, and the selection of which filter coefficients to group in each section, is vital: Sections with high gain are undesirable because they increase the need for scaling and so increase "uantisation errors (t is desirable to arrange sections to avoid e%cessive scaling

$o reduce the gain of each section we note that: Poles cause high gain (bu ps in the fre"uency response) Geroes cause low gain (dips in the fre"uency response) $he closer to the unit circle, the greater the effect $his suggests a way to group poles and 1eroes in each section to avoid high gain sections:

!ote that the pole closest to the unit circle will provide the highest gain because it is a large value close to the unit circle. $his can best be countered by pairing it with the 1ero closest to it. Aere is a recipe for grouping poles and 1eroes to create sections, which avoid high gain: Pair the pole closest to the unit circle with the 1ero closest to it (note: not closest to the unit circle) Do this for all the poles, wor,ing up in ter s of their distance fro the unit

circle -rrange the sections in order of how close their poles are to the unit circle

$he "uestion re ains, whether to place the high gain sections first or last. 7ecall that:

Poles are large values (high gain) $he closer to the unit circle, the higher the gain Poles cause bu ps in the fre"uency response $he closer to the unit circle, the sharper the bu p (high 5) Poles in the early stages affect the input to later stages Poles at late stages have the last word

So, the section with the pole closest to the unit circle will have the highest gain but also the sharpest shape. -s with so uch else in digital filter design, we are faced with a co pro ise between conflicting desires: Poles close to the unit circle in early stages cause high gain early on, so re"uire ore signal scaling and worse "uantisation errors later on Poles close to the unit circle in late stages cause significant noise shaping at a late stage

DSP processors: &haracteristics of DSP processors


-lthough there are any DSP processors, they are ostly designed with the sa e few basic operations in ind: so they share the sa e set of basic characteristics. $hese characteristics fall into three categories: Specialised high speed arith etic Data transfer to and fro the real world 'ultiple access e ory architectures

$ypical DSP operations re"uire a few specific operations:

$he diagra shows an 4(7 filter. $his illustrates the basic DSP operations: -dditions and ultiplications Delays -rray handling 3ach of these operations has its own special set of re"uire ents: -dditions and ultiplications re"uire us to: 4etch two operands Perfor the addition or ultiplication (usually both) Store the result or hold it for a repetition Delays re"uire us to: Aold a value for later use -rray handling re"uires us to: 4etch values fro consecutive e ory locations &opy data fro e ory to e ory $o suit these funda ental operations DSP processors often have: Parallel ultiply and add 'ultiple e ory accesses (to fetch two operands and store the result) *ots of registers to hold data te porarily 3fficient address generation for array handling Special features such as delays or circular addressing

'athe atics
$o perfor the si ple arith etic re"uired, DSP processors need special high#speed arith etic units. 'ost DSP operations re"uire additions and ultiplications together. So DSP processors usually have hardware adders and ultipliers, which can be used in parallel within a single instruction:

$he diagra shows the data path for the *ucent DSP?B& processor. $he hardware ultiply and add wor, in parallel so that in the space of a single instruction, both an add and a ultiply can be co pleted. Delays re"uire that inter ediate values be held for later use. $his ay also be a re"uire ent, for e%a ple, when ,eeping a running total # the total can be ,ept within the processor to avoid wasting repeated reads fro and writes to e ory. 4or this reason DSP processors have lots of registers, which can be used to hold inter ediate values:

7egisters

ay be fi%ed point or floating point for at.

-rray handling re"uires that data can be fetched efficiently fro consecutive e ory locations. $his involves generating the ne%t re"uired e ory address. 4or this reason DSP processors have address registers, which are used to hold addresses and can be used to generate the ne%t needed address efficiently:

$he ability to generate new addresses efficiently is a characteristic feature of DSP processors. )sually, the ne%t needed address can be generated during the data fetch or store operation, and with no overhead. DSP processors have rich sets of address generation operations: :rP :rPII 7egister indirect Post incre ent 7ead the data pointed to by the address in register rP

Aaving read the data, post incre ent the address pointer to point to the ne%t value in the array Aaving read the data, post decre ent the address pointer :rP## Post decre ent to point to the previous value in the array Aaving read the data, post incre ent the address pointer 7egister post :rPIIr( by the amount held in register rI to point to rI values incre ent further down the array Aaving read the data, post incre ent the address pointer :rPIIr(r 0it reversed to point to the ne%t value in the array, as if the address bits were in bit reversed order $he table shows so e addressing odes for the *ucent DSP?B& processor. $he asse bler synta% is very si ilar to & language. Whenever an operand is fetched fro e ory using register indirect addressing, the address register can be incre ented to point to the ne%t needed value in the array. $his address incre ent is free # there is no overhead involved in the address calculation # and in the case of the *ucent DSP?B& processor up to three such addresses ay be generated in each single instruction. -ddress generation is an i portant factor in the speed of DSP processors at their specialised operations. $he last addressing ode # bit reversed # shows how specialised DSP processors can be. 0it reversed addressing arises when a table of values has to be reordered by reversing the order of the address bits: 7everse the order of the bits in each address Shuffle the data so that the new, bit reversed, addresses are in ascending order $his operation is re"uired in the 4ast 4ourier $ransfor # and /ust about nowhere else. So one can see that DSP processors are designed specifically to calculate the 4ast 4ourier $ransfor efficiently.

(nput and output interfaces


(n addition to the athe atics, in practice DSP is ostly dealing with the real world. -lthough this aspect is often forgotten, it is of great i portance and ar,s so e of the greatest distinctions between DSP processors and general purpose icroprocessors:

(n a typical DSP application, the processor will have to deal with ultiple sources of data fro the real world. (n each case, the processor ay have to be able to receive and trans it data in real ti e, without interrupting its internal athe atical operations. $here are three sources of data fro the real world: Signals co ing in and going out &o unication with an overall syste controller of a different type &o unication with other DSP processors of the sa e type

$hese ultiple co unications routes ar, the ost i portant distinctions between DSP processors and general#purpose processors. When DSP processors first ca e out, they were rather fast processors: for e%a ple the first floating point DSP # the -$J$ DSP?B # ran at 89 'A1 at a ti e when P& co puter cloc,s were D 'A1. $his eant that we had very fast floating point processors: a fashionable de onstration at the ti e was to plug a DSP board into a P& and run a fractal ('andelbrot) calculation on the DSP and on a P& side by side. $he DSP fractal was of course faster. $oday, however, the fastest DSP processor is the $e%as $'S?B@&9B@8, which runs at B@@ 'A1. $his is no longer very fast co pared with an entry level P&. -nd the sa e fractal today will actually run faster on the P& than on the DSP. 0ut DSP processors are still used # why? $he answer lies only partly in that the DSP can run several operations in parallel: a far ore basic answer is that the DSP can handle signals very uch better than a Pentiu . $ry feeding eight channels of high "uality audio data in and out of a Pentiu si ultaneously in real ti e, without i pacting on the processor perfor ance, if you want to see a real difference. $he need to deal with these different sources of data efficiently leads to special co unication features on DSP processors

(nput and output interfaces ((

(n a typical DSP application, the processor will have to deal with ultiple sources of data fro the real world. (n each case, the processor ay have to be able to receive and trans it data in real ti e, without interrupting its internal athe atical operations. $here are three sources of data fro the real world: Signals co ing in and going out &o unication with an overall syste controller of a different type &o unication with other DSP processors of the sa e type

$he need to deal with different sources of data efficiently and in real ti e leads to special co unication features on DSP processors: Signals tend to be fairly continuous, but at audio rates or not uch higher. $hey are usually handled by high#speed synchronous serial ports. Serial ports are ine%pensive # having only two or three wires # and are well suited to audio or teleco unications data rates up to 8@ 'bit2s. 'ost odern speech and audio analogue to digital converters interface to DSP serial ports with no intervening logic. - synchronous serial port re"uires only three wires: cloc,, data, and word sync. $he addition of a fourth wire (fra e sync) and a high i pedance state when not trans itting a,es the port capable of $i e Division 'ultiple% ($D') data handling, which is ideal for teleco unications:

DSP processors usually have synchronous serial ports # trans itting cloc, and data separately # although so e, such as the 'otorola DSPD9@@@ fa ily, have asynchronous serial ports as well (where the cloc, is recovered fro the data). $i ing is versatile, with options to generate the serial cloc, fro the DSP chip cloc, or fro an e%ternal source. $he serial ports ay also be able to support separate cloc,s for receive and trans it # a useful feature, for e%a ple, in satellite ode s where the cloc,s are affected by Doppler shifts. 'ost DSP processors also support co panding to -#law or u#law in serial port hardware with no overhead # the -nalog Devices -DSPB868 and the 'otorola DSPD9@@@ fa ily do this in the serial port, whereas the *ucent DSP?B& has a hardware co pander in its data path instead. $he serial port will usually operate under D'- # data presented at the port is auto atically written into DSP e ory without stopping the DSP # with or without interrupts. (t is usually possible to receive and trans it data si ultaneously. $he serial port has dedicated instructions, which a,e it si ple to handle. 0ecause it is standard to the chip, this eans that any types of actual (2+ hardware can be supported with little or no change to code # the DSP progra si ply deals with the serial port, no atter to what (2+ hardware this is attached.

Aost co unications is an ele ent of any, though not all, DSP syste s. 'any syste s will have another, general purpose, processor to supervise the DSP: for e%a ple, the DSP ight be on a P& plug#in card or a F'3 card # si pler syste s ight have a icrocontroller to perfor a .watchdog. function or to initialise the DSP on power up. Whereas signals tend to be continuous, host co unication tends to re"uire data transfer in batches # for instance to download a new progra or to update filter coefficients. So e DSP processors have dedicated host ports, which are designed to co unicate with another processor of a different type, or with a standard bus. 4or instance the *ucent DSP?B& has a host port which is effectively an 6 bit or 89 bit (S- bus: the 'otorola DSPD9?@8 and the -nalog Devices -DSPB8@9@ have host ports which i ple ent the P&( bus. $he host port will usually operate under D'- # data presented at the port is auto atically written into DSP e ory without stopping the DSP # with or without interrupts. (t is usually possible to receive and trans it data si ultaneously. $he host port has dedicated instructions, which a,e it si ple to handle. $he host port i poses a welco e ele ent of standardisation to plug#in DSP boards # because it is standard to the chip, it is relatively difficult for individual designers to a,e the bus interface different. 4or e%a ple, of the BB ain different anufacturers of P& plug#in cards using the *ucent DSP?B&, B8 are supported by the sa e P& interface code: this eans it is possible to swap between different cards for different purposes, or to change to a cheaper anufacturer, without changing the P& side of the code. +f course this is not foolproof # so e engineers will always .i prove upon. a standard by a,ing so ething inco patible if they can # but at least it li its unwanted creativity. (nterprocessor co unications is needed when a DSP application is too uch for a single processor # or where any processors are needed to handle ultiple but connected data strea s. *in, ports provide a si ple eans to connect several DSP processors of the sa e type. $he $e%as $'S?B@&C@ and the -nalog Devices -DSPB8@9@ both have si% lin, ports (called .co ports. for the .&C@). $hese would ideally be parallel ports at the word length of the processor, but this would use up too any pins (si% ports each ?B bits wideK8LB, which is a lot of pins even if we neglect grounds). So a hybrid called serial2parallel is used: in the .&C@, co ports are 6 bits wide and it ta,es four transfers to ove one ?B bit word # in the B8@9@, lin, ports are C bits wide and it ta,es 6 transfers to ove one ?B bit word. $he lin, port will usually operate under D'- # data presented at the port is auto atically written into DSP e ory without stopping the DSP # with or without interrupts. (t is usually possible to receive and trans it data si ultaneously. $his is a lot of data ove ent # for e%a ple the $e%as $'S?B@&C@ could in principle use all its si% co ports at their full rate of B@ 'byte2s to achieve data transfer rates of 8B@ 'byte2s. (n practice, of course, such rates e%ist only in the drea s of ar,eting en since other factors such as internal bus bandwidth co e into play. $he lin, ports have dedicated instructions, which a,e the si ple to handle. -lthough they are so eti es used for signal (2+, this is not always a good idea since it involves very high speed signals over any pins and it can be hard for e%ternal hardware to e%actly eet the ti ing re"uire ents.

'e ory architectures


$ypical DSP operations re"uire si ple -dditions and any additions and ultiplications.

ultiplications re"uire us to:

4etch two operands Perfor the addition or ultiplication (usually both) Store the result or hold it for a repetition

$o fetch the two operands in a single instruction cycle, we need to be able to a,e two e ory accesses si ultaneously. -ctually, a little thought will show that since we also need to store the result # and to read the instruction itself # we really need ore than two e ory accesses per instruction cycle. 4or this reason DSP processors usually support ultiple e ory accesses in the sa e instruction cycle. (t is not possible to access two different e ory addresses si ultaneously over a single e ory bus. $here are two co on ethods to achieve ultiple e ory accesses per instruction cycle: Aarvard architecture 'odified von !eu ann architecture $he Aarvard architecture has two separate physical si ultaneous e ory accesses: e ory buses. $his allows two

$he true Aarvard architecture dedicates one bus for fetching instructions, with the other available to fetch operands. $his is inade"uate for DSP operations, which usually involve at least two operands. So DSP Aarvard architectures usually per it the .progra . bus to be used also for access of operands. !ote that it is often necessary to fetch three things # the instruction plus two operands # and the Aarvard architecture is inade"uate to support this: so DSP Aarvard architectures often also include a cache e ory which can be used to store instructions which will be reused, leaving both Aarvard buses free for fetching operands. $his e%tension # Aarvard architecture plus cache # is so eti es called an e%tended Aarvard architecture or Super Aarvard -7&hitecture (SA-7&). $he Aarvard architecture re"uires two e ory buses. $his a,es it e%pensive to bring off the chip # for e%a ple a DSP using ?B bit words and with a ?B bit address space re"uires at least 9C pins for each e ory bus # a total of 8B6 pins if the Aarvard architecture is brought off the chip. $his results in very large chips, which are difficult to design into a circuit. 3ven the si plest DSP operation # an addition involving two operands and a store of the result to e ory # re"uires four e ory accesses (three to fetch the two operands and the instruction, plus a fourth to write the result) this e%ceeds the

capabilities of a Aarvard architecture. So e processors get around this by using a odified von !eu an architecture. $he von !eu an architecture uses only a single e ory bus:

$his is cheap, re"uiring less pins that the Aarvard architecture, and si ple to use because the progra er can place instructions or data anywhere throughout the available e ory. 0ut it does not per it ultiple e ory accesses. $he odified von !eu an architecture allows ultiple e ory accesses per instruction cycle by the si ple tric, of running the e ory cloc, faster than the instruction cycle. 4or e%a ple the *ucent DSP?B& runs with an 6@ 'A1 cloc,: this is divided by four to give B@ illion instructions per second ('(PS), but the e ory cloc, runs at the full 6@ 'A1 # each instruction cycle is divided into four . achine states. and a e ory access can be ade in each achine state, per itting a total of four e ory accesses per instruction cycle:

(n this case the odified von !eu an architecture per its all the e ory accesses needed to support addition or ultiplication: fetch of the instructionH fetch of the two operandsH and storage of the result. 0oth Aarvard and von !eu an architectures re"uire the progra er to be careful of where in e ory data is placed: for e%a ple with the Aarvard architecture, if both needed operands are in the sa e e ory ban, then they cannot be accessed si ultaneously.

3%a ple processors


-lthough there are any DSP processors, they are ostly designed with the sa e few basic operations in ind: so they share the sa e set of basic characteristics. $his enables us to draw the processor diagra s in a si ilar way, to bring out the si ilarities and allow us to concentrate on the differences:

$he diagra co on.

shows a generalised DSP processor, with the basic features that are

$hese features can be seen in the diagra the *ucent DSP?B&:

for one of the earliest DSP processors #

$he *ucent DSP?B& has four e ory areas (three internal plus one e%ternal), and uses a odified von !eu an architecture to achieve four e ory accesses per instruction cycle # the von !eu an architecture is shown by the presence of only a single e ory bus. (t has four floating#point registers: the address generation registers also double as general purpose fi%ed point registers. $he *ucent DSP?B& has a host port: showing that this chip is designed to be integrated into syste s with another syste controller # in this case, a icrocontroller or P& ((S-) bus. *oo,ing at one of the ore recent DSP processors # the -nalog Devices -DSPB8@9@ # shows how si ilar are the basic architectures:

$he -DSPB8@9@ has a Aarvard architecture # shown by the two e ory buses. $his is e%tended by a cache, a,ing it a Super Aarvard -7&hitecture (SA-7&). !ote, however, that the Aarvard architecture is not fully brought off chip # there is a special bus switch arrange ent, which is not shown on the diagra . $he B8@9@ has two serial ports in place of the *ucent DSP?B&.s one. (ts host port i ple ents a P&( bus rather than the older (S- bus. -part fro this, the B8@9@ introduces four features not found on the *ucent DSP?B&: $here are two sets of address generation registers. DSP processors co only have to react to interrupts "uic,ly # the two sets of address generation registers allow for swapping between register sets when an interrupt occurs, instead of having to save and restore the co plete set of registers. $here are si% lin, ports, used to connect with up to si% other B8@9@ processors: showing that this processor is intended for use in designs. ultiprocessor

$here is a ti er # useful to i ple ent DSP

ultitas,ing operating syste

features using ti e slicing. $here is a debug port # allowing direct non#intrusive debugging of the processor internals.

Data for ats


DSP processors store data in fi%ed or floating#point for ats. (t is worth noting that fi%ed#point for at is not "uite the sa e as integer:

$he integer for at is straightforward: representing whole nu bers fro @ up to the largest whole nu ber that can be represented with the available nu ber of bits. 4i%ed#point for at is used to represent nu bers that lie between @ and 8: with a .binary point. assu ed to lie /ust after the ost significant bit. $he ost significant bit in both cases carries the sign of the nu ber. $he si1e of the fraction represented by the s allest bit is the precision of the fi%ed#point for at. $he si1e of the largest nu ber that can be represented in the available word length is the dyna ic range of the fi%ed point for at $o a,e the best use of the full available word length in the fi%ed#point for at, the progra er has to a,e so e decisions: (f a fi%ed point nu ber beco es too large for the available word length, the progra er has to scale the nu ber down, by shifting it to the right: in the process lower bits ay drop off the end and be lost (f a fi%ed#point nu ber is s all, the nu ber of bits actually used to represent it is s all. $he progra er ay decide to scale the nu ber up, in order to use ore of the available word length (n both cases the progra er has to ,eep a trac, of by how uch the binary point has been shifted, in order to restore all nu bers to the sa e scale at so e later stage. 4loating point for at has the re ar,able property of auto atically scaling all nu bers by oving, and ,eeping trac, of, the binary point so that all nu bers use the full word length available but never overflow:

4loating point nu bers have two parts: the antissa, which is si ilar to the fi%ed point part of the nu ber, and an e%ponent which is used to ,eep trac, of how the binary point is shifted. 3very nu ber is scaled by the floating#point hardware: (f a nu ber beco es too large for the available word length, the hardware auto atically scales it down, by shifting it to the right (f a nu ber is s all, the hardware auto atically scale it up, in order to use the full available word length of the antissa

(n both cases the e%ponent is used to count how any ti es the nu ber has been shifted. (n floating point nu bers the binary point co es after the second ost significant bit in the antissa. $he bloc, floating point for at provides so e of the benefits of floating point, but by scaling bloc,s of nu bers rather than each individual nu ber:

0loc, floating point nu bers are actually represented by the full word length of a fi%ed point for at. (f any one of a bloc, of nu bers beco es too large for the available word length, the progra er scales down all the nu bers in the bloc,, by shifting the to the right (f the largest of a bloc, of nu bers is s all, the progra er scales up all nu bers in the bloc,, in order to use the full available word length of the antissa

(n both cases the e%ponent is used to count how any ti es the nu bers in the bloc, have been shifted. So e specialised processors, such as those fro Gilog, have special features to support the use of bloc, floating point for at: ore usually, it is up to the progra er to test each bloc, of nu bers and carry out the necessary scaling. $he floating#point for at has one further advantage over fi%ed point: it is faster. 0ecause of "uantisation error, a basic direct for 8 ((7 filter second order section re"uires an e%tra ultiplier, to scale nu bers and avoid overflow. 0ut the floating# point hardware auto atically scales every nu ber to avoid overflow, so this e%tra ultiplier is not re"uired:

Precision and dyna ic range


$he precision with which nu bers can be represented is deter ined by the word length in the fi%ed#point for at, and by the nu ber of bits in the antissa in the floating#point for at. (n a ?B#bit DSP processor the antissa is usually BC bits: so the precision of a floating point DSP is the sa e as that of a BC bit fi%ed point processor. 0ut floating point has one further advantage over fi%ed point: because the hardware auto atically scales each nu ber to use the full word length of the antissa, the full precision is aintained even for s all nu bers:

$here is a potential disadvantage to the way floating point wor,s. 0ecause the hardware auto atically scales and nor alises every nu ber, the errors due to truncation and rounding depend on the si1e of the nu ber. (f we regard these errors as a source of "uantisation noise, then the noise floor is odulated by the si1e of the signal. -lthough the odulation can be shown to be always downwards (that is, a ?B bit floating point for at always has noise which is less than that of a BC bit fi%ed point for at), the signal dependent odulation of the noise ay be undesirable: notably, the audio industry prefers to use BC bit fi%ed point DSP processors over floating point because it is thought by so e that the floating point noise floor odulation is audible. $he precision directly affects "uantisation error. $he largest nu ber, which can be represented, deter ines the dyna ic range of the data for at. (n fi%ed#point for at this is straightforward: the dyna ic range is the range of nu bers that can be represented in the available word length. 4or floating point for at, though, the binary point is oved auto atically to acco odate larger nu bers: so the dyna ic range is deter ined by the si1e of the e%ponent. 4or an 6 bit e%ponent, the dyna ic range is close to 8,D@@ d0:

So the dyna ic range of a floating#point for at is enor ously larger than for a fi%ed#point for at:

While the dyna ic range of a ?B bit floating point for at is large, it is not infinite: so it is possible to suffer overflow and underflow even with a ?B bit floating point for at. - classic e%a ple of this can be seen by running fractal ('andelbrot) calculations on a ?B bit DSP processor: after "uite a long ti e, the fractal pattern ceases to change because the incre ent si1e has beco e too s all for a ?B bit floating point for at to represent. 'ost DSP processors have e%tended precision registers within the processor:

$he diagra shows the data path of the *ucent DSP?B& processor. -lthough this is a ?B bit floating point processor, it uses C@ and CD bit registers internally: so results can be held to a wider dyna ic range internally than when written to e ory.

7eview of DSP Processors


-lthough there are any DSP processors, they are ostly designed with the sa e few basic operations in ind: so they share the sa e set of basic characteristics. We can learn a lot by considering how each processor differs fro its co petitors, and so gaining an understanding of how to evaluate one processor against others for particular applications. - si ple processor design li,e the *ucent DSP?B& shows the basic features of a DSP processor: 'ultiple on#chip e ories 3%ternal e ory bus Aardware add and ultiply in parallel *ots of registers Serial interface Aost interface

$he DSP?B& is unusual in having a true von !eu an architecture: rather than use ultiple buses to allow ultiple e ory accesses, it handles up to four se"uential e ory accesses per cycle. $he D'- controller handles serial (2+, independently in and out, using cycle stealing which does not disturb the DSP e%ecution thread.

$he si ple DSP?B& design uses the address registers to hold integer data: and there is no hardware integer ultiplier: astonishingly, integers have to be converted to floating point for at, then bac, again, for ultiplication. We can e%cuse this lac, of fast integer support by recalling that this was one of the first DSP processors, and it was designed specifically for floating point, not fi%ed point,

operation: the address registers are for address calculations, with integer operations being only a bonus. 4or a fi%ed point DSP, the address generation needs to be separated fro the integer data registers: this ay also be efficient for a floating point DSP if integer calculations are needed very often. *ucent.s ore odern fi%ed point DSP89processor shows the separation of fi%ed point fro address registers:

$he DSP89- also shows a ore conventional use of ultiple internal buses (Aarvard plus cache) to access two e ory operands (plus an instruction) . further arith etic unit (shifter) has been added.

DSP often involves a need to switch rapidly between one tas, and another: for e%a ple, on the occurrence of an interrupt. $his would usually re"uire all registers currently in use to be saved, and then restored after servicing the interrupt. $he DSP89- and the -nalogue Devices -DSPB868 use two sets of address generation registers:

$he two sets of address generation registers can be swapped as a fast alternative to saving and restoring registers when switching between tas,s. $he -DSPB868 also has a ti er: useful for i ple enting .ti e sliced. tas, switching, such as in a real ti e operating syste # and another indication that this processor was designed with tas, switching in ind. (t is interesting to see how far a anufacturer carries the sa e basic processor odel into their different designs. $e%as (nstru ents, -nalog Devices and 'otorola all started with fi%ed point devices, and have carried forward those designs into their floating point processors. -$J$ (now *ucent) started with floating point, then brought out fi%ed#point devices later. $he -nalog Devices -DSPB8@9@ loo,s li,e a floating#point version of the integer -DSPB868:

$he B8@9@ also has si% high#speed lin, ports, which allow it to connect with up to si% other processors of the sa e type. +ne#way to support ultiprocessing is to have any fast inter#processor co unications ports: another is to have shared e ory. $he -DSPB8@9@ supports both ethods. Shared e ory is supported in a very clever way: each processor can directly access a s all area of the internal e ory of up to four other processors. -s with the -DSPB868, the B8@9@ has lots of internal e ory: the idea being, that ost applications can wor, without added e%ternal e ory: note, though, that the full Aarvard architecture is not brought off chip, which eans they really need the on#chip e ory to be big enough for ost applications.

$he proble of fi%ed#point processors is "uantisation error, caused by the li ited fi%ed#point precision. 'otorola reduce this proble in the DSPD9@@B by using a BC bit integer word length:

$hey also use three internal buses # one for progra , two for data (two operands). $his is an e%tension of the standard Aarvard architecture, which goes beyond the usual tric, of si ply adding a cache, to allow access to two operands and the instruction at the sa e ti e. +f course, the proble of BC bit fi%ed point is its e%pense: which probably e%plains why 'otorola later produced the cheap, 89 bit DSPD98D9 # although this loo,s li,e a 89 bit variant of the DSPD9@@B:

-nd of course there has to be a floating#point variant # the DSPL9@@B loo,s li,e a floating#point version of the DSPD9@@B:

$he DSPL9@@B supports ultiprocessing with an additional .global busM, which can connect to other DSPL9@@B processors: it also has a new D'- controller with its own bus

$he $e%as $'S?B@&BD is "uite an early design. (t does not have a parallel ultiply2add: the ultiply is done in one cycle, the add in the ne%t and the DSP has to address the data for both operations. (t has a odified Aarvard bus with only one data bus, which so eti es restricts data e ory accesses to one per cycle, but it does have a special .repeat. instruction to repeat an instruction without writing code loops

$he $e%as $'S?B@&D@ is the &BD brought up to date: the ultiply2add can now achieve single cycle e%ecution if it is done in a hardware repeat loop. (t also uses shadow registers as a fast way to preserve registers when conte%t switching. (t has auto atic saturation or rounding (but it needs it, since the accu ulator has no guard bits to prevent overflow), and it has parallel bit anipulation, which is useful in control applications

$he $e%as $'S?B@&?@ carries on so e of the features of the integer &BD, but introduces so e new ideas. (t has Fon !eu an architecture with ultiple e ory accesses in one cycle, but there are still separate internal buses, which are ultiple%ed onto the &P). (t also has a dedicated D'- controller.

$he $e%as $'S?B@&C@ is si ilar to the &?@, but with high#speed co unications ports for ultiprocessing. (t has si% high#speed parallel co ports, which connect with other &C@ processors: these are 6 bits wide, but carry ?B bit data in four successive cycles.

$he $e%as $'S?B@&9@ is radically different fro other DSP processors, in using a Fery *ong (nstruction Word (F*(W) for at. (t issues a BD9#bit instruction, containing up to 6 separate . ini instructions. to each of 6 functional units. 0ecause of the radically different concept, it was hard to co pare this processor with other, ore traditional, DSP processors. 0ut despite this, the .&9@ can still be viewed in a si ilar way to other DSP processors, which a,es apparent the fact that it basically has two data paths each capable of a ultiply2accu ulate.

!ote that this diagra is very different fro $his is for several reasons:

the way $e%as (nstru ents draw it.

$e%as (nstru ents tend to draw their processors as a set of subsyste s, each with a separate bloc, diagra y diagra does not show so e of the ore esoteric features such as the

bus switching arrange ent to address the ultiple e%ternal e ory accesses that are re"uired to load the eight . ini#instructions. -n i portant point is raised by the placing of the address generation unit on the diagra . $e%as (nstru ents draw the &9@ bloc, diagra as having four arith etic units in each data path # whereas y diagra shows only three. $he fourth unit is in fact the address generation calculation. 4ollowing y practice for all other DSP processors, ( show address generation as separate fro the arith etic units # address calculation being assu ed by the presence of address registers, as is the case in all DSP processors. $he &9@ can in fact choose to use the address generation unit for general purpose calculations if it is not calculating addresses # this is si ilar, for e%a ple, to the *ucent DSP?B&: so $e%as (nstru ents. approach is also valid # but for ost classic DSP operations address generation would be re"uired and so the unit would not be available for general purpose use. $here is an interesting side effect of this. $e%as (nstru ents rate the &9@ as a 89@@ '(PS device # on the basis that it runs at B@@ 'A1, and has two data paths each with four e%ecution units: B@@ 'A1 % B % CK89@@ '(PS. 0ut fro y diagra , treating the address generation separately, we see only three e%ecution units per data path: B@@ 'A1 % B % ?K8B@@ '(PS. $he latter figure is that actually achieved

in "uoted bench ar,s for an 4(7 filter, and reflects the device.s ability to perfor arith etic. $his illustrates a proble in evaluating DSP processors. (t is very hard to co pare li,e with li,e # not least, because all anufacturers present their designs in such a way that they show their best perfor ance. $he lesson to draw is that one cannot rely on '(PS, '+PS or 'flops ratings but ust carefully try to understand the features of each candidate processor and how they differ fro each other # then a,e a choice based on the best atch to the particular application. (t is very i portant to note that a DSP processor.s specialised design eans it will achieve any "uoted '(PS, '+PS or 'flops rating only if progra ed to ta,e advantage of all the parallel features it offers.

Progra

ing a DSP processor: - si ple 4(7 filter

$he si ple 4(7 filter e"uation:

can be i ple ented "uite directly in & language: y[n] = 0.0; for (k = 0; k < N; k++) y[n] = y[n] + c[k] * x[n-k]; 0ut this naive code is inefficient:

$he code is inefficient because: it uses array indices ;i< rather than pointers :ptr it needs a lot of address arith etic it repeatedly accesses the output array

)sing pointers
- naive & language progra to i ple ent an 4(7 filter is inefficient because it accesses array ele ents by array inde%: y[n] = 0.0; for (k = 0; k < N; k++) y[n] = y[n] + c[k] * x[n-k]; $o understand why accessing by array inde% is inefficient, re e ber that an array is really /ust a table of nu bers in se"uential e ory locations. $he & co piler only ,nows the start address of the array. $o actually read any array ele ent the co piler first has to find the address of that particular ele ent. So whenever an array ele ent is accessed by its array inde% ;i< the co piler has to a,e a calculation:

$he diagra shows how the co piler would calculate the address of an array ele ent specified by inde% as %;n # ,<. $he calculation re"uires several steps: *oad the start address of the table in e ory *oad the value of the inde% n *oad the value of the inde% , &alculate the offset ;n # ,< -dd the offset to the start address of the array $his entails five operations: three reads fro e ory, and two arith etic operations. +nly after all five operations can the co piler actually read the array ele ent. & language provides the .pointer. type precisely to avoid the inefficiencies of accessing array ele ents by inde%. (n &, the synta% :ptr indicates that ptr is a pointer which eans: $he variable ptr is to be treated as containing an address $he .:. eans the data is read fro that address Pointers can be odified after the data has been accessed. $he synta% :ptrII eans: $he variable ptr is to be treated as containing an address $he .:. eans the data is read fro that address $he .II. eans that, having read the data, the pointer ptr is incre ented to point to the ne%t se"uential data ele ent -ccessing the array ele ents using pointers is ore efficient than by inde%:

3ach pointer still has to be initialised: but only once, before the loopH and only to the start address of the array, so not re"uiring any arith etic to calculate offsets. Within the loop, the pointers are si ply incre ented so that they point auto atically to the ne%t array ele ent ready for the ne%t pass through the loop. )sing pointers is ore efficient than array indices on any processor: but it is especially efficient for DSP processors because DSP processors are e%cellent at address arith etic. (n fact, address incre ents often co e for free. 4or e%a ple, the *ucent DSP?B& processor has several .free. odes of pointer address generation: :rP # register indirect : read the data pointed to by the address in register rP :rPII # post incre ent: having read the data, post incre ent the address pointer to point to the ne%t value in the array :rPIIr( # register post incre ent: having read the data, post incre ent the address pointer by the amount held in register rI to point to rI values further down the array :rPIIr(r # bit reversed: having read the data, post incre ent the address pointer to point to the ne%t value in the array, as if the address bits were in bit reversed order $he address incre ents are perfor ed in the sa e instruction as the data access to which they refer: and they incur no overhead at all. 'ore than this, as we shall see later, ost DSP processors can perfor two or three address incre ents for free in each instruction. So the use of pointers is crucially i portant for DSP processors. So e & co pilers opti ise code. 4or e%a ple, one of the $e%as (nstru ents & co pilers would, with full opti isation selected, ta,e the initial naive & code but produce asse bler that corresponds closely to the code using pointers. $his is very nice but there are three cautions to be observed: +pti isation can often be used only in restrictive circu stances # for e%a ple in the absence of interrupts +pti isation is co piler dependent: so code that relies on co piler opti isation could beco e very inefficient when ported to another co piler

+ne reason to use & is so that the progra er can write code that is very close to the operation of the processor. $his is often desirable in DSP, where we want to have a high degree of control over e%actly what the processor is doing at all ti es. +pti isation changes the code you wrote into code the co piler thought was better: in the worst case the code ay not actually wor, when opti ised.

*i iting

e ory accesses

'e ory accesses are bottlenec,s.

DSP processors can a,e ultiple e ory accesses in a single instruction cycle. 0ut the inner loop of the 4(7 filter progra re"uires four e ory accesses: three reads for each of the operands, and one write of the result to e ory. 3ven without counting the need to load the instruction, this e%ceeds the capacity of a DSP processor. 4or instance the *ucent DSP?B& can a,e four e ory accesses per instruction cycle: two reads of operands, plus one write of the result, plus the read of one instruction. 3ven this is not enough for the si ple line of & code that for s the inner loop of the 4(7 filter progra . 4ortunately, DSP processors have lots of registers, which can be used to hold values inside the processor for later use # thus econo ising on e ory accesses. We can see that the result of the inner loop is used again and again during the loop: it as the code is written, it has to be read fro e ory and then written bac, to e ory in each pass. 'a,ing this a register variable will allow it to be held within the processor, thus saving two e ory accesses: register float temp; temp = 0.0; for (k = 0; k < N; k++) temp = temp + *c_ptr++ * *x_ptr--; $he & declaration .register float te p. eans that variable te p is to be held in a processor register: in this case, a floating point register. $he inner loop now only re"uires two e ory accesses, to read the two operands :cNptr and :%Nptr (three accesses if you count the instruction load) # this is now within the capabilities of the DSP processor in a single instruction. - s all point to note is that the initialisation of the register variable temp=0.0 is wasted. (t is si ple to a,e use of this necessary initialisation to a,e the first calculation, thus reducing the nu ber of iterations of the inner loop: register float temp; temp = *c_ptr++ * *x_ptr--;

for (k = 1; k < N; k++) temp = temp + *c_ptr++ * *x_ptr--; $his leads to a ore efficient & progra for the 4(7 filter: float y[N], c[N], x[N]; float *y_ptr, *c_ptr, *x_ptr; register float temp; int n, k; y_ptr = &y[0]; for (n = 0; n < N-1; n++) { c_ptr = &c[0]; x_ptr = &x[N-1]; temp = *c_ptr++ * *x_ptr--; for (k = 1; k < N; k++) temp = temp + *c_ptr++ * *x_ptr--; *y_ptr++ = temp; } }

-sse bler
$o illustrate transcribing the & progra for the 4(7 filter into DSP asse bly language, we will use the asse bler synta% of the *ucent DSP?B& processor. $his processor is e%cellent for this purpose, because its asse bler synta% is re ar,ably si ilar to & language and so a,es it easy to see how the & code aps onto the underlying DSP architecture. (t is i portant to note that the illustration re ains valid in general for ost DSP processors, since their basic design features are so si ilar: but the other processors have ore i penetrable asse bler synta%. :r? is e"uivalent to the & synta% *c_ptr :r?II is e"uivalent to the & synta% :cNptrII a8 is e"uivalent to the & declaration float te p So e e%a ples of si ple DSP?B& instructions show the si ilarity to & further:

a1!"r# fetch a floating point value from memory pointed to by address register r2 and store it in the float register a1 a1!"r#$$ fetch a floating point value from memory pointed to by address register r# and store it in the float register a1% having done so, increment address register r# to point to the next floating point value in memory
$he general DSP?B& instruction synta% shows the typical DSP processor.s ability to perfor a ultiplication and addition in a single instruction: a = b + c * d 3ach ter in the instruction can be any of the four floating#point registers, or up to three of the ter s can access data through address registers used as pointers: a@Ka8 I aB : a? # using only registers a@Ka8 I :rB : :r? # using pointers for two e ory reads

a8Ka8 I :rBII : :r?II # using pointers for e ory reads and incre enting those pointers -r ed with the above rudi entary ,nowledge of this DSP processor.s asse bler synta%, we can substitute asse bler variables for the & variables: temp: - a1 (floating point register) y_ptr: - r2 (address register to be used as a pointer) c_ptr: - r3 (address register to be used as a pointer) x_ptr: - r4 (address register to be used as a pointer) $he appropriate asse bler can now be written underneath the & code, e%ploiting the great si ilarity of the asse bler to & in this case: temp = *c_ptr++) * *x_ptr--); a1 = *r3++ * *r4-for (k = 1; k < N-1; k++) do 0,r1 temp = temp + *c_ptr++ * *x_ptr--) a1 = a1 + *r3++ * *r4-*y_ptr++ = temp *r2++ = a1 !ote that for this processor, one line of & co piles down to one asse bler instruction. $he .do @,r8. instruction is an efficient and concise way to replace the loop control: it eans, Odo the ne%t (@I8) instructions (r8I8) ti es. $his is an e%a ple of a .1ero overhead do loop.: the processor supports this special instruction with no overhead at all for the actual e%ecution of the loop control.

7eal ti e
0oth the naive 4(7 filter progra and its ore efficient version assu e we can access the whole array of past input values repeatedly:

0ut this is not the case in real ti e. 7eal ti e syste s face a continuing strea of input data: often, they have to operate on one input sa ple at a ti e and generate one output sa ple for each input sa ple:

- si ilar restriction is li,ely if the filter progra is i ple ented as a subroutine or function call. +nly the current input and output are available to the filter so the filter function itself has to aintain so e history of the data and update this history with each new input sa ple. 'anage ent of the history ta,es up so e processing ti e. $he filter needs to ,now the ost recent ;!< input sa ples. So the real ti e filter has to aintain a history array, which is updated with each new input sa ple by shifting all the history data one location toward @:

$he necessary updating of the history array involves si ply adding two e%tra lines to the & progra , to i ple ent the array shifting:

$he pointer to previous input sa ples, *x_ptr, is replaced by a pointer to the history array, *hist_ptr.- new pointer, *hist1_ptr, is initialised to point one further down the history array and is used in the shifting of data down the array. $he two e%tra lines of & code represent e%tra co putation: actually, the filter now ta,es two lines of & code instead of one for the inner loop.

'(PS, '+PS and 'flops

$he develop ent of efficient asse bly language code shows how efficient a DSP processor can be: each asse bler instruction is perfor ing several useful operations. 0ut it also shows how difficult it can be to progra such a specialised processor efficiently. temp = *c_ptr++) * *x_ptr--); a ! "r#$$ " "r%&& for (k = 1; k < N-1; k++) do '(r temp = temp + *c_ptr++ * *x_ptr--) a ! a $ "r#$$ " "r%&& *y_ptr++ = temp "r)$$ ! a 0ear in ind that we use DSP processors to do specialised /obs fast. (f cost is no ob/ect, then it ay be per issible to throw away processor power by inefficient coding: but in that case we would perhaps be better advised to choose an easier processor to progra in the first place. - sensible reason to use a DSP processor is to perfor DSP either at lowest cost, or at highest speed. (n either case, wasting processor power leads to a need for ore hardware, which a,es a ore e%pensive syste , which leads to a ore e%pensive final product which, in a sane world, would lead to loss of sales to a co petitive product that was better designed. +ne e%a ple shows how essential it is to a,e sure a DSP processor is progra ed efficiently:

$he diagra shows a single asse bler instruction fro the *ucent DSP?B& processor. $his instruction does a lot of things at once: $wo arith etic operations (an add and a ultiply) $hree e ory accesses (two reads and a write) +ne floating point register update $hree address pointer incre ents -ll of these operations can be done in one instruction. $his is how the processor can be ade fast. 0ut if we don.t use any of these operations, we are throwing away the potential of the processor and ay be slowing it down drastically. &onsider how this instruction can be translated into '(PS or 'flops. $he processor runs with an 6@ 'A1 cloc,. 0ut, to achieve four e ory accesses per instruction it uses a odified von !eu an e ory architecture, which re"uires it to divide the syste cloc, by four, resulting in an instruction rate of B@ '(PS. (f

we go into anic ar,eting or '+PS ratings as follows:

ode, we can have fun wor,ing out ever#higher '(PS

*' +,z clock


B@ '(PS K B@ '+PS but B floating point operators per cycle K C@ '+PS and four e ory accesses per instruction K 6@ '+PS plus three pointer incre ents per instruction K 9@ '+PS plus one floating point register update K B@ '+PS a,ing a grand total '+PS rating of B@@ '+PS Which e%ercise serves to illustrate three things: '(PS, '+PS and 'flops are isleading easures of DSP power 'ar,eting en can s"uee1e astonishing figures out of nothing +f course, we o itted to include in the '+PS rating (as so e anufacturers do) the possibility of D'- on serial port and parallel port, and all those associated incre ents of D'- address pointers, and if we had ultiple co ports, each with D'-, we could go really wild... -part fro a cheap laugh at the e%pense of ar,eting, there is a very serious lesson to be drawn fro this e%ercise. Suppose we only did adds with this processor? $hen the 'flops rating falls fro a respectable C@ 'flops to a pitiful B@ 'flops. -nd if we don.t use the e ory accesses, or the pointer incre ents, then we can cut the '+PS rating fro B@@ '+PS to B@ '+PS. (t is very easy indeed to write very inefficient DSP code. *uc,ily it is also "uite easy, with a little care, to write very efficient DSP code.

Vous aimerez peut-être aussi