Académique Documents
Professionnel Documents
Culture Documents
Eric Kusse
Jan Rabaey
EECS Department
University of California at Berkeley
Berkeley, CA. USA
(now with Intel Corp, Hillsboro OR.)
EECS Department
University of California at Berkeley
Berkeley, CA. USA
jan@eecs.berkeley.edu
ekusse@ichips.intel.com
1. ABSTRACT
1.1 Keywords
FPGAs, Low Energy, Dual Voltage, Pass-transistors, Power,
Embedded, Low Swing, Interconnect Network.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. To copy otherwise, to republish, to post on servers or to
redistribute to lists, requires prior specific permission and/ or a fee.
Vdd
Energy
5v
4.2mW/MHz
3.3v
92uW/MHzI
Logic Function
Output
2. INTRODUCTION
With the trend towards integration of complete system
functionality on a single die (systems-on-a-chip), the need
has arisen for the combination of heterogeneous
programmable architectures on a chip [5]. A variety of
companies [3], [4] and academic research groups are
currently striving to integrate traditional micro- and/or DSP
processor with large macromodules of embedded PGA
(programmable gate arrays). The intended function of these
programmable logic modules (PLMs) varies from usermodifiable periphery to the implementation of highperformance signal-processing functions that are not
efficiently implemented on traditional programmable
architectures. While this approach can have a major impact
on the performance and flexibility of these future systemson-a-chip, it is severely hampered by the energy-inefficiency
of today's FPGA implementations. Energy (and/or power)
consumption has become a major concern in IC design and
Design Example
unspecified
7.5uW/gatel
3.3v
5.5uW/MHz
54x54 Multiplier
2.5v
2.23mW/MHz
DSP Processor
Iv
0.2 ImW/MHz
StrongArm Microprocessor
1.5v
2.1mW/MHz
Alpha Microprocessor
2v
60mW/MHz
MHz
155
Component
Energy (mWI
MHz=nJ)
Estimated
Cap. (pF)
0,025
1.10
0.062,0.140
2.5,5.6
0.040-0.050
1.95-2.4
11110 Power
0.041
1.64
0.054,0.088
2.7,4.4
0.022,0.040
1.1,2.2
Double Line
0.06-0.107
3-5.35
Single Line
0.048-0.088
2.4-4.4
Clock Connection
0.030
1.5
0.128
6.4
Carry Chain
0.050
2.5
D Clock Power
5%
10 Power
9%
Clock Power
21%
Interconnect
Power
65%
156
Double Lines
o long Lines
!lelB Input
Lines
.ClBOutput
Lines
ClB Input
Lines
21%
Single Lines
37"
Double Lines
27%
Carry Line.
3%
10%
Single Lin
Double Line.
Single Lines
50%
Double Line.
37%
157
To
Direct Path
Interconnect
~
~
Level 1
Swit<;h
Matnx
.. .: Levyl 1 &
.i ~opz 4x
i M~~
One can see that most of the direct interconnect flows either
vertically or diagonally. The reason for this is that the levell
interconnect (Figure 6) is more aptly suited for horizontal
routing along the dataflow direction. However, by including
the vertical and diagonal connections, many paths avoid
using the more general purpose interconnect layer allowing
the general purpose layer of interconnect to be greatly
simplified. In addition, many mapping targets (adders,
comparators, etc.) require data to be passed across the width
of a datapath from bit-slice to bit-slice, this task is extremely
Levyll"
Honz 4x
Vert4x
Swi~h
Matnx
158
s~,1
7 +---------1---1-----;/
2i
m Low VI (400rnV)
Vdd
/ : '-
Vdd-2volts
....---'--V...,ddL-1.5V0i!
Vswing-l.lv
CinterconnectT
inverters
Path
Energy (pJ)
AorB Input
.275 (l45x)
95 (20x)
C input
.34 (l47x)
140 (l7x)
.265 (94x)
120 (9x)
.33 (l24x)
180 (9x)
.115 (417x)
55 (43x)
Cap. (fF)
.135 (355x)
60 (4Ox)
.14 (343x)
65 (37x)
.42 (l43x)
200 (15x)
.435 (138x)
220 (l3.6x)
.066
40
.13 (115x)
60 (12.5x)
.063
28
.13
32.5
.13
33
.08
20
From the data, one can see that the cell capacitances and
energy have been greatly reduced. The average energy
reduction for the various components was over two orders
of magnitude. The substantial improvement in energy
consumption comes from a combination of lower voltage
and lower capacitances. The average capacitance of
resources was lowered by 10x-15x. Much of the decrease in
capacitance can be. attributed to efficient sizing of drivers,
5. Results
The layout of a mini-array of 8 cells (4x2) is shown in
Figure 9. From this design, the following table was
extracted listing energy and extracted capacitance data for
the paths in the PGA basic cell. When looking at the table,
one should remember that the relationship between the
159
6. Summary
In this paper, we presented an analysis of the sources of
power dissipations in FPGA modules. We concluded that
interconnect presents the large majority of energy
dissipation. We have presented a logic module that
addresses some of the problems posed by the interconnect,
as well as a dual-voltage circuit design approach that helps
to reduce the supply voltage while maintaining
performance.
b
a4
b3
a3
b2
a2
bI
aI
bO
aO
D
D
D
D
D
7. Acknowledgments
We would like to acknowledge DARPA's support for the
Pleiades project (under the ACS program). Also, we want
give credit to our colleagues in the project, who provided
valuable inspiration and insight.We are grateful to Scott
Hauck for making his designs available for study.
8. References
D
D
D
D
D
[3]
[4]
[5]
[6]
[7]
[8]
160
(December 1997).
"Motorola chip to combine ColdFire, FPGA cores",
http://techweb.cmp.comieetinews/98/992newslmotorola.html.
National Semiconductor's Adaptive Systems on-aChip, http:Uwww.national.comiappinfo/milaero/
napa 1000.
Rabaey J.,et al., ""Heterogeneous Reconfigurable Systems", in Proc. Sips 97, Leicester, (Nov. 1997),24-34
Trimberger, S., "Field Programmable Gate Array Technology, Kluwer Academic Publishers, Boston Mass.,
1994.
Xilinx Corporation, "XC4000 Field Programmable
Gate Arrays: Programmable Logic Databook", 1996.
Xilinx Corporation, "Application Brief #14, A Simple
Method of Estimating Power in XC4000 XLIEXIE
FPGAs", 1997.