Vous êtes sur la page 1sur 29

Digital Filtering In Hardware

Slide 1
Representations of DSP
• Mathematical formulations
Algorithms
• Behavioral description languages
• Applicative language
• Represents a set of equations satisfied by the variables, e.g. Silage
• Perspective language
• Explicitly specify the order of assignment, e.g. C and other HLLs
• Descriptive language
• Represents the structure of a DSP system, e.g. VHDL, Verilog
• Graphical representations
• Block diagrams
• Signal flow graph (SFG)
• Data flow graph (DFG)
• Dependence graph (DG)
VLSI DSP 2019 3-2
Block Diagrams (1)
• Consists of functional blocks connected with directed edges
• Functional block, e.g. Add, Mult
• Unit delay element
• Directed edge representing the data flow between blocks
• Basic blocks

VLSI DSP 2008 Y.T. Hwang 3-3


Block Diagrams (2)
• 3-tap FIR example

• Alternative block diagram with data broadcast

VLSI DSP 2008 Y.T. Hwang 3-4


Signal Flow Graph (1)
• A collection of nodes and directed edges
• Node: computation or task
• Directed edge (j,k)
• a linear transformation from node j to node k
• Usually as constant gain multiplier or delay elements
• Widely used in digital filter structures
• Flow graph reversal (transposition)
• A transform to obtain equivalent structure
• Applicable to single-input single output system
• Reverse the directions of all edges
• Exchange the input output node
• Retain the edge gain and edge delay

VLSI DSP 2008 Y.T. Hwang 3-5


Signal Flow Graph (2)
• SFG of a 3-tap FIR filter
Original SFG

Transposed SFG

VLSI DSP 2008 Y.T. Hwang 3-6


Signal Flow Graph (3)
• Limitations of transposition
• can be applied to MIMO systems described by symmetric transform matrices
• More on SFG
• Applicable to linear network
• Cannot be used to described multi-rate system

VLSI DSP 2008 Y.T. Hwang 3-7


Data Flow Graph (1)
• DFG
• Node: computation (function or subtask)
• Directed edge: data path or communication between nodes
• Associated edge delay: non-negative
• Associated node delay: execution time of each node
add

mpy

Block diagram Conventional DFG Synchronous DFG


VLSI DSP 2008 Y.T. Hwang 3-8
Data Flow Graph (2)
• Applications: high level synthesis
• Firing rules
• A node can fire whenever all the input data are available
• Concurrency: multiple nodes can be fired simultaneously
• Data driven (implicit) scheduling
• Precedence constraint
• Intra-iteration: imposed by edge with no delay
• Inter-iteration: imposed by edge with delay
• fine-grain (atomic) v.s. coarse grain DFG

VLSI DSP 2008 Y.T. Hwang 3-9


Data Flow Graph (3)
• 3-tap FIR filter example
Direct form

Transpose form

VLSI DSP 2008 Y.T. Hwang 3-10


Data Flow Graph (4)
• Synchronous DFG
• Number of data samples produced or consumed by each node is specified a
priori
• Single rate system
• Multi-rate system: different nodes working on different frequencies
• Multi-rate system can be represented by a single rate system via unfolding
(unrolling)

VLSI DSP 2008 Y.T. Hwang 3-11


Introduction to Iteration bound

• DSP algorithms often contain feedback loops


• Impose an inherent lower bound on the achievable iteration or sample period
• Iteration bound
• Impossible to achieve an iteration period less than the iteration bound even
with infinite HW
t
Iteration k-1
Iteration k
Iteration k+1
Iteration k+2
Iteration period
VLSI DSP 2008 Y.T. Hwang 4-12
Data Flow Graph Representations Execution
Intra-
iteration time of a
• For n = 0 to ∞ node
y(n) = ay(n-1) + x(n)
Critical path
AB

Inter-
iteration

• Iteration – execution of each DFG node once


• Precedence constraints
• Intra-iteration – no delay on edge
• Inter-iteration – at least one delay on edge

VLSI DSP 2008 Y.T. Hwang 4-13


Critical Path
• Critical path of a DFG
• The path with the longest computation time among all paths containing zero
delays
• The minimum computation time for one iteration of the DFG
• 6→3→2→1
• 5→3→2→1
• Iteration period = 5 u.t.
• Iteration bound
• Recursive DFG has a lower
bound on the shortest
iteration period

VLSI DSP 2008 Y.T. Hwang 4-14


Loop• Loop
bound
bound
and iteration bound (1)
• Minimum time to execute one loop in the DFG
• tl / wl: tl = loop computation time, wl = number of delays in the
loop

• (a) loop bound = (4+2)/2 = 3


• (b) loop bound 1 = (4+2)/2 = 3
• (b) loop bound 2 = (2+4+5)/1 = 11

VLSI DSP 2008 Y.T. Hwang 4-15


Loop bound and iteration bound (2)
• In (a), two independent sets of computing threads
• Two iterations in every 6 u.t.  iteration period = 3 u.t.
• A0→B0  A2→B2  A4→B4  A6→…
• A1→B1  A3→B3  A5→B5  A7→…

• In (b)
• Loop 1: A→B→A
• Loop 2: A→B→C→A (critical loop)

VLSI DSP 2008 Y.T. Hwang 4-16


Loop bound and iteration bound (3)
• Loop bound of the critical loop  iteration bound of the DSP
algorithm  tl 
T  max  
lL  wl 

 6 11
T  max  ,   11 u.t.
lL  2 1 

• Algorithms to find T∞
• Longest path matrix algorithm
• Minimum cycle mean algorithm
• Negative cycle detection algorithm

VLSI DSP 2008 Y.T. Hwang 4-17


Cut-set Retiming
• Feed-forward cut-set:
• Delay transfer theorem
• Adding arbitrary non-negative
number of delays to each edge
of a feed-forward cut-set of a
DFG will not alter its output,
except the output timing will
be delayed.
• Transfer the same amount of
• Feed-back cut-set delays from edges of the same
direction across a feed-back cut
set of a DFG to all edges of
opposing edges across the
same cut set will not alter the
output, but its timing.

(C) 2004-2006 by Yu Hen Hu


Feed-forward Cut-Set Retiming
• Consider the FIR digital filter and its DFG:
y(n) = b0x(n) + b1x(n-1) • Retiming:
ynew(n) = b0x(n-1) + b1x(n-2)
ynew(n) = y(n-1)
• Critical path = Max(TM, TA)
x(n) D
x(n-1)

X b0 X b1 x(n) D
x(n-1)

+ y(n) X b0 X b1

• Critical path length = TM+TA D D


• Select a cut set
• Insert a delay each to each edge in the cut set. + y(n)

(C) 2004-2006 by Yu Hen Hu


Feed-back Cut Set Retiming
• Consider an IIR digital filter
• Shift 1 delay to the other edge
y(n) = a·y(n-2) + x(n) across a feed-back cut set

x(n) y(n) x(n) y(n)


+ +
2D D
D


a

a

l • Filter remains unchanged.


loop bound = (TM+TA)/2
oop bound = (TM+TA)/2 clock cycle = Max(TM ,TA)
clock cycle = TM+TA

(C) 2004-2006 by Yu Hen Hu


Timing Diagram
• Assume tM = tA = 1 t.u.
• Before retiming
x(1) x(2) x(3) x(4)
MAC 1 2 3 4
y(1) y(2) y(3) y(4)

• After retiming

x(1) x(2) x(3) x(4) x(5) x(6) x(7) x(7)


Add 1 2 3 4 5 6 7 8
y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(7)
a y(1)

Mul 0 1 2 3 4 5 6 7 8
(C) 2004-2006 by Yu Hen Hu
Feed-back Cut Set Retiming
• Consider an IIR digital filter x(2k-1)=x(k)
y(n) = ay(n-1) + x(n) x(2k) = 0

x(n) y(n)
+ x(m) y(m)
+
D
2D


a

a

loop bound = (TM+TA)


throughput = 1/(TM+TA) Clock period = (TM+TA)
Throughput = 1/[2(TM+TA)]

(C) 2004-2006 by Yu Hen Hu


Slowdown + Retiming
Start with Start with

y(n) = a y(n-1) + x(n) y(n) = a y(n-2) + x(n)

x(n) y(n)
x(m) y(m) +
+
D
D D
D


a

a

loop bound = (TM+TA)/2


clock cycle = Max(TM ,TA) clock cycle = Max(TM ,TA)
Throughput = 1/[2max(TM,TA)] throughput = 1/ Max(TM ,TA)

(C) 2004-2006 by Yu Hen Hu


Example 3.2.1
a2 D a4
a6
a1
• Node delay = 1 t.u. D
• Before retiming: a5
a3
• Critical path: a3  a4  a5  a6
• Clock cycle time = 4
• 2 delay units D a4
D a2
• After cut-set retiming D a6
a1
• Critical path: a3  a5, a4  a6 D
D
• Clock cycle time = 2
D
• 6 delay units a3 a5
• After additional retiming 2D a4
D a2 D
• Critical path: none a6
a1 2D
• Clock cycle time = 1
D
• 11 delay units D
D 2D
a3 a5
(C) 2004-2006 by Yu Hen Hu
DFG Illustration of the Example

T = max. {(1+2+1)/2, (1+2+1)/3} = 2 T = max. {(1+2+1)/2, (1+2+1)/3} = 2


Cr. Path delay = 2+1 = 3 t.u Cr. Path Delay = max{2,2,1+1} = 2 t.u

(C) 2004-2006 by Yu Hen Hu


Dependence Graph (DG)
• The basic representation of an algorithm. • No implementation or hardware constraint
• Shows only dependency among operations. are imposed on DG.

• No notion of delay is represented.


• No loop, cycle allowed.
• Can be used to represent asynchronous
operations.
• Most useful in exploiting inherent parallelism
in the algorithm

(C)2002-2006 Yu Hen Hu 26
Data Flow Graph
• Node: • Example
• Computation y(n) = a*y(n-1) + b*u(n)
• Associated with a computing time. • The delay of 1 u.t. indicates that to compute y(n+1)
• Direct edge: in the next iteration depends on result y(n) of the
present iteration.
• data path and delay
• Delay labeled with D or positive integer on edges
• Delay: iteration count

(C)2002-2006 Yu Hen Hu 27
DFG
x(n) D D
• Intra-iteration dependency
• A direct edge without any delay
(4) M0 (4) M1 (4) M2
• Inter-iteration dependency
• Direct edge with 1 or more delays y(n)
A0 A1
• Node computing delay labeled with
(2) (2)
parenthesis.
• Critical path: longest path between … • Recursive DFG: contains loops. Must
• Example: critical path delay = 4+2+2 = 8 t.u. have at least one delay element
along any loop. Otherwise, the
algorithm is NON-computable!

(C)2002-2006 Yu Hen Hu 28
Loop bound and Iteration bound
D
(2)
t
(4) (5)
i
iloop A B C
Tloop 
d
iloop
i
2D

T  Max Tloop (2) (4)


all loops

A B
• T{A-B-A} = (2+4)/2 = 3 t.u.
2D
• T = max{(2+4)/2, (2+4+5)/1}
= max{3, 11} = 11

(C)2002-2006 Yu Hen Hu 29