Académique Documents
Professionnel Documents
Culture Documents
Balakrishna Kumthekar #
# University of Colorado
Boulder, CO 80309
Luca Benini
Enrico Macii z
Fabio Somenzi #
Universita di Bologna
z Politecnico di Torino
Bologna, ITALY 40122
Torino, ITALY 10129
Abstract
1 Introduction
2 Background
8 f = fj fj
x
x0
and 9 f = f j + f j
x
x0
p(x)
The block diagram of the main loop of the optimization algorithm is shown in Figure 2. The rst step consists of simulating
the FPGA network to estimate the power dissipation. The user
can provide typical long input pattern streams, possibly coming
from behavioral/RT-level simulation. Alternatively, the input
probability distributions can be supplied. Within the loop, the
network is re-simulated every N iterations in order to update
the switching statisticsand to ascertainthat there has been a decrease in the power consumption after every N optimization
steps. In order to speed-up the procedure, only a few patterns
(m % of the total) are used for the in-loop simulations. Both
N and m are user-denable parameters.
int
int
int
q(o,x)
x
int
i1
LUT
o1
i2
LUT
o2
in
LUT
on
int
Original LUTs
z
Power Estimate (SW)
f(i)
Optimized LUTs
Build Neighborhood
h(x)
opt
opt
opt
opt
opt
Once the neighborhood and the cluster members are identied,
the Boolean relation F is computed through Equation 1. The
key optimization performed by our procedure consists of nding minimum-power compatible functions for re-programming
the LUTs in the cluster. In the following, let f (i ), f (i ),
j = 1; ; maxCluster represent the functions implemented by
the LUTs in the cluster before and after re-programming, respectively. Notice that the support variables i of the multioutput cluster function f(i) are the union of the i .
To enforce the constraint that the network connectivity be left
unchanged, the Boolean relation F must be restricted according
to constant support constraint to yield R F . Relation R is a
restriction of Boolean relation F with the property that if functions f , j = 1; ; maxCluster compatible with F and with
the same support as the original f exist, then they are compatible with R as well. The usefulness of R is that it eliminates
many compatible functions of F that do not satisfy support
constraints, without excluding any valid solution. We have followed the algorithm proposed by Kukimoto and Fujita [6] for the
computation of R. Unfortunately, although R does not contain
functions which are compatible with F and do not meet the
support constraint, it is not guaranteed to contain only valid
compatible functions either. Hence, the correctness of the solutions extracted from Boolean relation R must be checkedagainst
the support constraints.
After R has been computed, the LUTs in the cluster are ordered for decreasing SW C . Starting from the LUT with
highest switched capacitance, the min-power compatible functions are computed. Assume that LUT j has been selected. We
determine the lower bound l (i ) and the upper bound u (i )
for it. Function l has the same support as f and it is the
function with minimum ON-set compatible with R, while u is
the function with maximum ON-set (and same support as f )
compatible with R. In symbols: l (i ) h(i ) u (i ), 8h(i )
compatible with R.
To compute l and u we rst extract from R the Boolean relation R R which contains all and only the functions with
support i :
j
opt
opt
opt
opt
stop = iter = 0;
frequency = int ;
do f
if(iter % frequency == 0) f
if(iter == 0)
N
sortedNodes
i j
) 9o2( o
oj
) R(i; o)
(2)
l (i ) = R j j (i ; o ) R j j (i ;o )
u (i ) = R j j (i ; o )
j
j o
j o
j o
(3)
(4)
opt
= SortUnlockedNodes(network);
foreach(node 2 sortedNodes)
if(node is not processed) f
cluster = SelectCluster(node,maxCluster);
neighbor = SelectNeighborhood(cluster,maxZNodes);
F (i o) = ComputeRelation(cluster,neighbor);
R(i o) = RestrictAccordingToSupporConstraints(F (i o));
ChooseCompatibleFunctions(Fr (i o ),potNodeTable);
;
R (i ;o ) = 8 2( i
j
();
PerformPartialSimulation();
if(Power is increased) f
Undo the last Nint changes;
Leave nodes locked;
PerformCompleteSimulation
else
if (size
of potNodeTable
>
ReProgramAndLockBestNode
else
stop = 1;
iter++;
g while(stop == 0);
PerformCompleteSimulation
0)
(potNodeTable);
();
SIS
.pat
.fpga
PREX
.cap
VIS
.i_stat
PSIM
.opt
CLB
70
94
113
115
150
214
307
440
470
515
628
774
801
830
1406
PI
41
41
10
60
33
14
233
173
50
178
16
133
207
257
256
PO
32
32
6
26
25
8
140
137
22
123
1
81
108
224
245
Init.
155
199
156
167
254
327
330
572
652
907
652
1170
1393
919
1887
Fin.
118
101
117
151
200
246
313
562
482
723
627
787
1188
782
1341
R
22
77
52
26
52
91
59
58
206
207
39
102
319
163
461
Sav.
23.8
48.9
25.3
9.4
21.4
24.8
5.1
1.7
26.2
20.2
3.8
32.7
14.7
14.9
28.9
Time
343
765
221
881
1570
1696
1110
1582
17222
12443
1558
1475
24920
6948
28675
5 Conclusions
We have presented a technique to perform power-oriented reconguration of a system implemented using LUT-based FPGAs. Our approach has the distinctive property of being applicable to designs for which the layout has already been generated. Our method operates locally on the various LUT clusters
of the original network and performs best on large examples, as
demonstrated by the experimental results we have reported.
References