Académique Documents
Professionnel Documents
Culture Documents
Paper 5.2
66
The problem can now be formulated:
Minimize wi * di,
ai ‘C,
subject to
xij - Mlk I 0. for C, I j I C,,,,, 1 SkS m;
oicm (7) Fig.1 Data Flow Gnph
‘k
L’i
x,
j=S.
xij= 1, for all ~i ER-; (8) step1
--
x l x2 x3 x4 x5 x6 x7 x8 x9 x10 x l l
8
c
8
-
8
L’i
j s
Fig.2 Distribution Graph of b e 1 Scheduling
T’i + c~.(C,, + 1)di - T ; - ~ j * ( C =+d2).dj I -1 ,
for all oi +oj. (11) x l d x3x4 xS x6x7 x8 x 9 x l O x l l
Paper 5.2
67
The number of variables m the formulation grows as T'J + cJ.(C,, + l)d,-T'j-Cj.(C,, + 1 + di)-dj 5 1 4 , .
0 (d en'). where d is the number of control steps considered in for all oi + oj. (11.1)
each zone and n' is the number of operations whose time frame
inmsects the zone. If an operation cannot be finished within the zone. it may still
occupy some resources. Therefore, the formulation of the next
5. GENERALIZATIONS zone must be take into account the cross mne operations both
in resource utilization and precedence relations.
We have generalis the ILP formulations to the follow-
ing variations: 53. Functional Pipellnlng (Plpellned Data Path)
(1) Scheduling with
a) Chaining A pipelined data path allows the execution of multiple
b) Multi-cycle operations by non-pipelined function units tasks concurrently. Two consecutive tasks can be initiated with
c) Multicycle operations by pipelined function units a certain interval, which is called the latency of the pipelined
(2) Functional p i p e l i i g data path.
(3) Loop folding For a given latency 1. the operations in c o d steps
(4) Mutually exclusive operations j + p f ( p = 0, 1.2. . . . ) are executed simultaneously and can-
( 5 ) Scheduling under bus constraint not share the same function units. Consequently, constraint (2)
(6) Minimizing lifetimes of variables is modified as
The formulations for scheduling with chaining, multi-
cycle operations by m-pipelined function units and minimizing
LYJ
lifetimes of variables have been discussed in [ll]. We concen- I:
p = o OiEFU
x i i w 5 MIk, for 1 5 j 5 1. 1 5 A 5 m. (2.2)
trate on the other variations. k'
Paper 5.2
60
N m 2 , .... N m ' . etc. N m can be defined as follows: length is 17 cycles. The run time for the various experiments of
1:
the example depends on the number of 0-1 variables and is
within tens of seconds.
FI =1~ ~ N F ifUthe
~ node is an XOR no&
Nm= NFUi if the node is an AND node 6.1. Non-plpelined data path
I =l
Tables 1 and 2 show the results with a non-pipelined data
qj if the node is a leaf path. The multiplier can be non-pipelined (Table 1) or pipe
lined (Table 2). We also take into account the cost of busses
Let Nm
'k
,,(e)be the number of function units of type All the results are optimal.
ftrequired at control step j . as illustrated in the above function.
constraint (2) is changed to 63. Functional plpellnlng (plpellned data path)
N F u l k J ( e ) S Mlk . for 15 j 5 s , 15 t~ m . (23)
The data flow graph of the fifth order filter b used to test
functional pipelining. ( T h e rre no data depmdenciea between
iterations; i.e. the outpts of the data flow grlrph will not feed
back into the inputs. ) We have achieved the minimal numbex
5 5 . Scheduling under Bus Constraint ~ ~each latency and we have also minimized the
of r e s o u r ~for
In a bussed architecture, when more than one operation delay time. The results are shown in the first and second parts
which share a common input variable are scheduled into the of Table 3 for non-pipelined and pipelined multiplies; nspec-
same control step, the number of busses needed for that variable tively. The third paxt of Table 3 shows the results of [17]
is only one ( via broadcasting ). Thus. the n u m k of busses where the maximum delay is set at 10 cycles. Note that in their
required at a control step equals the number of distinct input implementation, the cycle time is longer so that a multiplication
variables of all the operations assigned to this step. (The or two additions can be executed within a single control step.
hypothesis is that the read/wTite phases are interleaved and the
number of reads is more than that of mites at any step.) Sup- 63. Loop folding
pose the input variables at control step j are VI. va ..., v .1. I . The critical path length of the fifth filter can be reduced
We introduce a 0-1 integer variable y r j for v, (1 Ir I Iv I ) at to 16 cycles after loop folding or retiming while preseaving the
step j . where y r j is 1 if v, is accessed at this step and 0 if not inter-iteration data precedences. Tables 4 and 5 show the
accessed. We have the constraint that the number of y r J which minimal sample period (= loop length) and delays using a non-
are assigned to be 1 is less than the number of busses, i.e. pipelined multiplier and a pipelined multiplier. Here, delay
Iv I means the number of control steps required for the entire DFG
~ y , j I N ~f o: r l I j I s . (14) to be executed. Although delay time is ignored by other sys-
r=l
tems, we are wncemed with it and try to minimize it for two
Since the transfer of variables during a control step (y,~) is reasons: First, with respect to the sample period, which
directly related to the assignment of operations to a control step corresponds to the throughput of the system. the delay h e is
( x i j ) . we have to define the relationship between them. Let v, directly related to the tum around time, which is one of the
be a shared input of a group of r, operations, o , ~ .o ,..... ~ and
most important performance criteria Second, a longer delay
increases the lifetimes of the variables. Thus, " i z i n g the
or,. The value of y r j is defined as follows: If delay time will potentially reduce the register cost. Note that
+lj=.5.2j= . . . =.5.1j=0,then y, is given a value 0; otherwise, Spaid first retimed the DFG, then, pexformed a scheduling to
it is given a value of 1: i.e.. y r j = O R ( ~ , ~ j , l r ,, ~. j. . J r , j ) . find the loop length (called clock cycle in Spaid). Our loop
The following constraint is included to satisfy the definition of
Table 1: Non-pipelined Data Path
Y rj :
6. EXPERIMENTAL RESULTS
The system called ALPS has been implemented and
tested. The programs for list scheduling, ASAP, ALAP, and Table 2: Non-pipelined Data Path
ILP formulations are wriuen in C on a VAX 11/8550 running
ULTRIX operating system, and the ILP formulation is solved
using the LINDO [15] package on a VAX 11/8800 running
VMS operating system. LINDO starts with an optimal linear
programming solution and produces an optimal integer solution
using the branch-and-bound method. The fifth order wave filter
which was borrowed from [16] is given to illustrate various
requirements. It contains 26 additions and 8 multiplications. As
most systems do, we suppose a multiplication takes 2 cycles
while an addition takes 1 cycle to complete. The critical path
Paper 5.2
69
REFERENCES
Table 3: Fifth order Filter with Pipelined Data Path
D.D. Gajski. N.D. Putt and BM. Pangrle, "Silicon Com-
pilation (Tutorial)", Proceedings of the IEEE 1986 C w -
tom Integrated Circuits Conference, Rochester NY. pp.
a
102-110,May 1986.
P.G. Paulin and J.P. Knight, "Force-Directed S d d u h g
in Automatic Data Path Synthesis". Proc. of the 24th
Design Automation Conference, pp. 195-202.June 1987.
pipelined Multiplier
B.M. Pangrle and D.D. Gajski, "State Synthesis and Con-
nectivity Binding for Microarchitecture Compilatiofl
Proc. of ICCAD-86, pp. 210-213,November 1986.
Multipliers 2 2 2 2 E.F. Girczyc. "Automatic Generation of Microsequend
18 19 19 17 18 a0 22 23 - 33 Data Path to Realize ADA Circuit Description", Ph.D.
Thesis Carleton Univ.. 1984.
$-
Result of 1171
Latency - 2 3 1
4 S 6 7 8 9 N. Park and A.C. Parker. "Sehwa: A Software Package
Adden - 1 3 1100 1 7
13 6 5 5 6 4 for Synthesis of Pipelines fiom Behavioral
Multipliers - 4 4 1 2 3 2 2 2 2 - - Specifications", IEEE T-CAD, pp. 356-370.March 1988.
C.Y. Hitchock and D.E. Thomas, "A Method of
Automatic Data Path Synthesis", Proc. of the 20th
folding technique performs retiming and scheduling simultane- Design Automation Conference, pp. 484489. June 1983.
ously, which makes a better solution possible. Our scheduler is P.G. Paulin, and J.P. Knight, "Scheduling and Binding
also able to make a scheduling under the self-timed [12] Algorithm for High Level Synthesis". Proc. o f 26th
requirement. Design Automation Confeence. pp. 1-6.June 1989.
L. Hafer and A.C. Parker, "A Formal Method for the
3. CONCLUSION Specification, Analysis, and Design of Register-Transfer
In this paper, we have presented a new approach for Level Digital Logic". Proc. ofthe 18th Design Automa-
scheduling in data path synthesis under resource constraint. Our tion Conference. pp. 546-553.June 1981.
approach includes list scheduling, ASAP,ALAP and ILP. With [91 H. DeMan, J. Rabaey, P. Six and L. C l m q
it, we are able to solve all the benchmarks optimally in a few "Cathedral-II: A Silicon Compiler for Digital Signal Ro-
seconds. In addition to the model, a new technique, called cessing". IEEE Design and T a t , pp. 13-25,December
Zone Scheduling. is proposed to solve large size problems. 1986.
This method schedules a block of control steps at one time, B.S. Harolm, and M.I. Elmasry, "Architectural Synthe~is
allowing us to take a more global view of the scheduling prob- for DSP Silicon Compiler", IEEE T-CAD, pp. 431-447,
lem. Excellent results are obtained when using it to solve a April 1989.
large size problem. Jiahn-Hung Lee, YuChin Hsu and Youn-Long Lin "A
New Integer Linear Programming Formulation for the
Table 4: Loop Folding Scheduling Roblem in Data Path Synthesis", Prec. of
ICCAD-89. November 1989.
[12] G. Goossens. J. Vandewalle, and H. De Man, "LOOP
Optimization in Register-Transfer Scheduling for DSP-
System", Proc. 4 the 26th Design Automation Confer-
ence, pp. 826-831.June 1989.
[13] E.M. Girczyc. "Loop Winging - a Data Flow Approach
to Functional Pipelining", Proceedings of the IEEE
ISCAS. pp 382-385.May 1987.
[14] F. Rose, C. Leiserson. and J. Saxe, "Optimizing Syn-
t If 1 d-timed &I* U required. thesis Circuitry by Retiming". Proc. ColTech Conf. on
VLSI, pp. 41-67,Computer Sci. Press, 1983.
Table 5: Loop Folding [15 "LINDO:Linear INteractive and Discrete Optimizer for
Linear, Integer, and Quadratic programming poblems."
p
U N D O Systems, Inc.
System II &aid I ALPS [16] S.Y. Kung, H.J. Whitehouse and T. Kailath. "VLSI and
Modem Signal Processing". Prentice Hall, pp. 258-264.
1985.
[17] Ki So0 Hwang. Albert E. Casavant, Ching-Tand Chang
and Manuel A,d'Abreu, "Scheduling and Hardware Shar-
Sample perid m y ing in Pipelined Data Paths". IEEE Pruc. Int. Conf. C m
2834 pp. 24-27,November 1989.
Paper 5.2
70