Vous êtes sur la page 1sur 20

Digital Signal Processing

BITS Pilani
Pilani|Dubai|Goa|Hyderabad

BITS Pilani
Pilani|Dubai|Goa|Hyderabad

Previous class:
Importance of LTI system for DSP
Different decomposition of DT signals
Convolution
Computation required in DSP

BITS Pilani
Pilani|Dubai|Goa|Hyderabad

Today class:
Evolution of DSP architecture
Numeric Representation used in DSP
Fixed point
Floating point

Analysis of computation required for FIR filter

Can you write the expression for 8-tap FIR


filter ?
Y[n] = a0 X[n]+ a1 X[n-1]+ a2
X[n-2]+ -- - - - +a7X[n-7]
Most recurring computation is
multiplication
and
then
accumulation (MAC)

DSP processors
On chip data memory to store input sample x(n)
On chip memory to store filter coefficients h(n) (for e.g. in
FIR filter example ai)
On chip memory to hold program or instructions that
support DSP operations
DSP processors require multiple operands
simultaneously. Hence, DSP processors should have
multiple operands fetch capacity and multiple memory
access in a single instruction cycle
Dedicated hardware multipliers and accumulators to
carry out multiply and accumulate operations
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

DSP~GPP
Real time
throughput
requirement
Not real time
Used in embedded
throughput needed
Desktop computing
application.
To support DSP
No special
computation like
features.
FFT, convolution,
special features are
provided.
Have MAC unit

DSP
Many clock cycles to
perform add, shift and
multiplication in
ordinary
microprocessors
Most DSP have
specialized instruction
to complete add, shift
and save the result in
1 cycle.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

DSP Architecture
Computers need instructions to operate
At every clock cycle they must be told what to do
If instructions are already stored, the computer has to
just fetch and execute them
These are called as stored program machines
Our computer has to fetch the instruction, operate on the
data and store the result
Courtesy : website www.elin.ttu.ee/~olev/lect1.pdf

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

What is the best suitable architecture for DSP?

Architectural evolution:
Von Neumann

Called as Von Neumann architecture.


Designed by: John Von Neumann, an American mathematician.
(1) Single memory and single bus for transferring the data into
and out of CPU
(2) Single memory shared by both the program instructions and
data.
Most computers today are of the Von Neumann design.

How many cycles needed for MAC instruction for two


numbers that reside in external memory?
1. Get the opcode of instruction
(opcode specifies what the operation is, tells the CPU
what to do)
2. Get data1
3. Get data2
4. Multiply and accumulate and store result.
(Assume that CPU computation takes very small time in
comparison to memory access)
So need four cycles.

Single-Cycle MAC unit

Can compute a sum of


n-products in n cycles

Harvard architecture

Developed at Harvard University (1940)


Program instructions and data can be fetched at the same time.
Increasing overall processing speed
Most present day DSPs use this dual bus architecture.
Ex: ADSP-21xx and AT&T's DSP16xx.

Cycles needed for MAC instruction in Harvard


architecture?

1. Instruction 1 fetched.
2. Instruction 1 decode and get data1 from DM and coefficient
from PM
3. Perform MAC operation and store result in DM as well as
fetch Instruction 2 from PM.
4. Instruction 2 decode get data1 from DM and coefficient
from PM
5. Perform MAC operation and store result in DM (for inst 2)
as well as fetch Instruction 3 from PM.
So single MAC operation need 3 cycles

Modified Harvard architecture

Three memory banks


How many memory access simultaneously possible?
Allow three independent memory accesses per instruction cycle.
Processors based on a three-bank modified Harvard architecture
include the Zilog Z893xx, Motorola DSP5600x, DSP563xx

Multiple-Access Memories
Using fast memories that
support multiple, sequential
accesses per instruction cycle
over a single set of buses
OR
Using multi-ported memories
that allow multiple concurrent
memory accesses over two or
more independent sets of buses.

This arrangement provides one program memory access


and two data memory accesses per instruction word.
Ex: Motorola DSP561xx processors.

Super Harvard Architecture (SHARCH DSP)

Part of program memory is used as data


memory.
Including an instruction cache in the CPU.
In a program, which part is executed
repeatedly?
The first time through a loop, slower operation
Next executions of the loop will be faster
This means that all of the memory to CPU
information transfers can be accomplished in a
single cycle.
EX: ADSP-2106x and new ADSP-211xx

Enhanced DSP
architectures:
Very Long Instruction Word (VLIW) architecture:
VLIW CPUs have four to eight
execution units.
One VLIW instruction encodes
multiple operations.
EX:if a VLIW device has four
execution units, then a VLIW
instruction for that device would
have four operation fields.
VLIW instructions are usually at least 64 bits in width.
VLIW CPUs use software (the compiler) to decide which
operations can run in parallel.
Hardware's complexity for instruction scheduling is reduced.
EX: TMS320 C6xx

Very Long Instruction Word


(VLIW)

A technique for
instruction-level
parallelism by
executing instructions
without dependencies
(known at compiletime)
in parallel
Example of a single
VLIW instruction:
F=a+b; c=e/g; d=x&y;
w=z*h;

Endians:
Big Endian(MSB in first location)
Little endian
How 12345678 will be stored in four
location starting from 4000 in each
case?
TI DSP: Little endian
Motorola DSP: Big endian

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Thanks for Your attentio

21

Vous aimerez peut-être aussi