Académique Documents
Professionnel Documents
Culture Documents
The increasing use of portable computing and wireless sistor stacks and used that model in a genetic algorithm
communication systems makes power dissipation a major framework to estimate the minimum and maximum values
concern in low power circuit designs [2], [20]. Therefore, for leakage power dissipation.
accurate power estimation tools are of critical importance. This rest of this paper is organized as follows. Sec-
In CMOS digital circuits, power dissipation consists of tion II presents an accurate standby leakage current model
dynamic and static components. In circuits with a high in transistor stacks with the consideration of body eect
supply voltage, a relatively high transistor threshold volt- and DIBL. Section III describes the genetic algorithm to
age can be used, making subthreshold current negligible. estimate the minimum and maximum values for leakage
That is the common assumption for the existing gate-level power. Implementation details and experimental results
techniques for average power estimation [1] [3] [4] [5] [6] for MCNC and ISCAS benchmark circuits are shown in
[7] [14] [15] [16] [17] [18] [22] [23]. However, low power ap- Section IV. Finally, conclusions are given in Section V.
plications have been driving the supply voltage to become
lower and lower, which requires the device threshold to be II. An Accurate Leakage Current Model
reduced so as to satisfy performance requirements. Unfor- An accurate estimate of standby leakage power must con-
tunately, due to the exponential relationship between leak- sider circuit topology as well as signal levels when the cir-
age current and threshold voltage, such scaling leads to a cuit is idle. Kawahara [12] demonstrated this in the design
dramatic increase in leakage current. Consequently, leak- of a low power decoded-drivers for a DRAM. An extra tran-
age power is no longer negligible in low voltage circuits. sistor was placed between the supply line and the pull-up
This necessitates the development of accurate estimation transistor for the driver. This causes a slight reverse bias
tools for leakage power. between the gate and source of the pull-up transistor when
The accuracy of leakage power estimation is critically de- both transistors are turned o. Because subthreshold cur-
pendent on the standby leakage current model. In this pa- rent is exponentially dependent on gate bias, a substantial
per, we have developed an accurate standby leakage current current reduction was obtained. This phenomenon is re-
model which has been veried by HSPICE. Considering re- ferred to as the \stacking eect".
verse biasing between gate and source in transistor stacks, We have developed a more general model of the stack-
DIBL (Drain Induced Barrier Lowering), and the body ef- ing eect [11]. This model considers the general case of
fect, the leakage power of a circuit depends on primary transistor stacks of arbitrary height. It takes into account
input combinations. Hence, the primary input combina- both body eect and DIBL. DIBL (reduction of threshold
tion(s) corresponding to the minimum leakage power can voltage as VDS increases) is especially signicant for sub-
be applied to the circuit during standby mode to minimize micron devices. The leakage of a transistor stack is shown
This research was supported in part by Intel, DARPA (F33615-95- to directly depend on the magnitude of the DIBL eect.
C-1625), NSF CAREER award (9501869-MIP ), and AT&T Let Figure 1 depict a transistor stack to be analyzed.
2
Vg2
Vd2
j =i+1
Vs 2 y
x1 Since we are now only interested in the magnitude of the
Vdi leakage current, we can use VDSN in equation (1) to com-
Vgi Vs i pute the leakage through the bottom transistor. To verify
this computation, one could compute the leakage of other
Isubth
x2
VgN-1
VdN-1
transistors in the stack.
Vs N-1 x3 Please note that this analysis only considers transistors
VgN
VdN
that are turned o. Transistors that are turned on can
be treated as a short circuit. Because of the very small
Vs N
Fig. 1. Schematic and notation for Fig. 2. Schematic for a 3- currents involved, the voltage drop across transistors that
stacking eect analysis input NAND are turned on will be orders of magnitude smaller than the
voltage drop across transistors in the subthreshold region.
In [9], it is noticed that leakage power depends on pri-
Steady state leakage values can be estimated as a function mary input combinations with no explanation for the mech-
of the number of transistors that are turned o. A more anism. The model proposed in this paper, on the other
detailed derivation can be found in [11]. The general ap- hand, can oer a clear explanation. Consider the 3-input
proach is to equate the subthreshold current through each CMOS NAND gate illustrated in Figure 2. For the 111
transistor and then solve for the voltage (VDSi ) across each input combination, the three NMOS transistors are turned
transistor. These voltages can then be used to estimate the on and treated as a short circuit, the leakage current of the
magnitude of the leakage current. The following analysis gate is the sum of the leakage current through the three
is done for an NMOS pull down stack, but is equally ap- PMOS transistors. For the 011, 101, and 110 combinations,
plicable to a PMOS stack. the leakage current is computed for the NMOS transistor
From the BSIM2 MOS transistor model [10], [21], [24], which is turned o. For the 001, 010, and 100 combina-
the subthreshold current of a MOSFET can be modeled as tions, the VDS of the second o transistor is rst obtained
?qV
Isubth = A e nkT (VG ?VS ?VTH0 ?
0 VS +VDS ) (1 ? e kTDS ) (1)
q by equation (2), substituting into equation (1) yields the
leakage current. For the 000 combination, VDS2 is rst ob-
where A = 0Cox 0 W ( kT )2 e1:8. VG , VD and VS are the tained by equation (2), then VDS3 is computed by equation
Leff q (3), and VDS1 = Vdd ? VS1 = Vdd ? VDS2 ? VDS3 by equation
gate voltage, drain voltage, and source voltage of the tran- (4). Substituting VDS1 and VS1 into equation (1) yields the
sistor, respectively. The bulk is connected to ground. VTH0 leakage current. Due to stacking eect and dierences in
is the zero bias threshold voltage. The body eect for small the leakage of PMOS and NMOS, the leakage current of a
values of VS is very nearly linear. It is represented by the gate can vary widely with input combinations.
term
0 VS , where
0 is the linearized body eect coe- Leakage current is exponentially dependent on the
cient. is the DIBL coecient, representing the eect of threshold voltage. Consequently, the values of VTH0 , sub-
VDS (VDS = VD ? VS ) on threshold voltage. Cox is the threshold swing coecient, body eect, and DIBL coe-
gate oxide capacitance. 0 is the zero bias mobility. n is cient are critical to the accuracy of this model. Leakage
the subthreshold swing coecient of the transistor. For the is also very sensitive to temperature, doubling for every 8o
conditions illustrated in Figure 1, the gate voltage is zero. to 10o K increase. This leakage model is only meaningful
First we equate the current of the rst and second tran- when the circuit has been idle for some time. Any where
sistor in the stack. We obtain equation (2) by solving for from a few microseconds up to several milliseconds may be
VDS2 in terms of Vdd and BSIM model parameters. required for the circuit to settle to quiescent levels [11].
VDS2 = q(1 +nkT 0 ln(
2 +
) A2
A1 e qV
nkT + 1)
dd
(2) III. Genetic Algorithm to Obtain Bounds for
Leakage Power
One can similarly equate the current through the (i?1)th Given a primary input combination, the logic value of
and ith transistors, solving for VDSi in terms of VDSi?1 . each internal node can be obtained simply by starting from
This results in equation (3). Equation (3) can be used primary inputs and simulating the circuit level by level
iteratively to nd VDSi for each transistor, starting with (level of a node is equal to the maximum of the level of
the third in the stack. Finally, VDS1 can be determined (if its fan-in nodes plus 1; level of all primary inputs being
desired) by subtracting the sum of these VDSi values from 0). After applying the standby leakage current model pre-
Vdd . sented in the previous section, the leakage current of each
gate can be evaluated. The leakage power Plkg of the sim-
VDSi = q(1nkT 0
+
) ln(1 + Ai?1 (1 ? e? kTq VDSi?1 ))
A (3) the sumcircuit
ulated for the given primary input combination is
of the leakage power consumed by all transistors,
i
3
i.e. X
Plkg = IDSi VDSi (5)
way, more highly t chromosomes have a higher number of
children in the succeeding generation. But when tnesses
Note VDSi and IDSi depend on primary input combina- become closer, things can be dierent. Consider a genera-
tions. This makes the total leakage power of a circuit also tion with the respective tnesses of the best and the worst
depend on primary inputs. chromosomes being 555.15 and 555.75 and the average t-
Enumerative method is the most straightforward method ness of the population being 555.45. The best and the
to obtained the minimumand maximumvalues for standby worst chromosomes will produce almost identical number
leakage power dissipation. However, the exponential com- of children in the next generation, and hence, the eect of
plexity with respective to the number of primary inputs natural selection is almost lost. By using linear normal-
makes it prohibitive for circuits with large number of pri- ization tness technique, we greatly increase the selection
mary inputs. The random search method involves generat- pressure in favor of the best chromosome.
ing a large number of primary inputs, evaluating the leak- After being selected as parents, the chromosomes are
age of each input, and keeping track of the minimum and entered into a mating pool for further operations such as
maximum. One advantage of the random search method is crossover and mutation. The crossover operation uses par-
that average leakage power Plkgavg can be obtained by ent chromosomes to produce children chromosomes by ex-
Plkgavg = (
Xm Plkg )=m
i (6)
changing substrings of the two parent chromosomes. In this
paper, a one-point crossover technique with crossover rate
of 0.8 is used. The mutation operation takes each chromo-
i=1 some of a generation and randomly changes the bit value
as long as the sample length m is long enough, where Plkgi with a given probability called bit mutation rate. In this
is the leakage power for primary input combination i. How- paper, the mutation rate is chosen to be 0.01.
ever, the primary input combination is randomly generated The genetic algorithm will stop after 50 generations and
with no knowledge of previous information. output the minimal leakage power as well as the best case
Genetic algorithm [8] has the advantage of exploiting primary input combination. The pseudo code of the genetic
historical information to speculate on new search points algorithm is given as follows.
with expected improved performance. It derives its be-
havior from a metaphor of natural selection and natural Minimum Leakage Power() f
genetics by the creation of a population of chromosomes GENERATIONS=1; MAX GEN=50;
or individuals represented by articial strings called chro- POPULATION SIZE=100;
mosomes. A general genetic algorithm works as follows. Chromosome length = the number of primary inputs
First an initial population is randomly generated. Then Initialize a population of chromosomes
the following steps are repeated. An objective function is do f
applied to evaluate the tness of each chromosome in the Evaluate each chromosome in the population
population. A new population will be produced by oper- Sort chromosomes by non-decreasing tnesses
ations such as selection, crossover, and mutation in that Apply linear normalization tness technique
order. Each such iteration is referred to as a generation. A Roulette wheel selection to select parent chromosomes
near optimal solution is obtained when stopping criterion Crossover & mutation to create children chromosomes
is satised. Produce a new generation with children chromosomes
The genetic algorithm used in this paper is similar to g While(++GENERATIONS < MAX GEN)
the one proposed by Goldberg [8]. For each simulated cir- g
cuit, 100 initial chromosomes (primary input combinations
of zeros and ones) are randomly generated. The objective It should be pointed out that the above genetic algorithm
function is to obtain the tness (leakage power) for the will give maximum leakage power as well as the worst case
applied primary input combination based on the leakage primary input combination if the chromosomes are sorted
current model presented in the previous section. After the by non-increasing tnesses.
100 chromosomes are evaluated, they are sorted by non-
decreasing tness. Therefore, for the present generation, IV. Implementation and Experimental Results
the rst chromosome represents the best case primary in- The genetic algorithm (GA) to estimate the standby
put combination which gives the minimum leakage power leakage power in CMOS circuits has been implemented in C
up to this point. Then a tness technique called linear nor- under the Berkeley SIS environment. In order to simplify
malization [13] is adopted to calculate tnesses that begin the analysis, gate-mapping was used to map the circuits
with 100 and linearly decrease to 1. It should be empha- to a library which contains NAND, NOR, INVERTER,
sized that the selection operation is based on the normal- and BUFFER. The supply voltage Vdd and the zero bias
ized tnesses instead of the actual values. threshold voltage VTH 0 used in this experiment are 1:0V
The reason for a tness technique is quite simple. A and 0:2V, respectively. The subthreshold swing coe-
commonly used selection technique is the roulette wheel cient is 1:5, which corresponds to the subthreshold slope of
technique in which the parent chromosomes are selected 89.8mv=decade. and
0 are 0.05 and 0.24 for NMOS and
with probabilities proportional to their tnesses. In this 0.047 and 0.11 for PMOS, respectively. For simplicity, all
4
TABLE I
Comparison of HSPICE and the leakage current model
X1 X0 567.5 567.5
Y0
Minimum Leakage Power
472.9 472.9
X1 X0 Maximum Leakage Power
Z0
Y1 378.3 378.3
HF HF
Z3
189.2 189.2
Z2 Z1
0.0 0.0
i2
i3
i4
i5
i6
i7
i8
i9
i10
0:3 while the channel widths for PMOSFETs and NMOS-
FETs are assumed to be 3.6m and 1.8m, respectively.
However, the method is not limited to such assumptions. Fig. 4. Leakage power in W for MCNC benchmark circuits
A. Validation of the Standby Leakage Current Model 625.3 625.3
99
80
55
67
54
31
28
55
C4
C4
C8
3
C1
C1
C2
C3
C5
C6
C7
TABLE II
Minimum & maximum leakage power (P lkgmin ,P lkgmax ) for MCNC & ISCAS benchmarks
Circuit PI's level CPU time (s) Plkgmin(W) Plkgmax (W) Plkgmax=Plkgmin Plkgavg
Chosen # # Rand GA Rand GA Rand GA Rand GA (%) (W)
i1 25 5 48.3 4.5 5.24 5.19 11.09 11.37 2.12 2.19 3.3 7.05
i2 201 4 194.8 17.1 11.63 9.89 37.21 48.41 3.20 4.89 52.8 15.89
i3 132 2 134.9 12.2 15.43 13.16 27.89 28.25 1.81 2.15 18.8 19.22
i4 192 4 228.8 20.8 21.32 17.79 46.78 51.10 2.19 2.87 31.1 27.12
i5 133 6 371.5 33.1 46.80 39.95 76.70 79.30 1.64 1.98 20.7 54.92
i6 138 3 478.3 45.5 44.81 43.49 86.91 92.42 1.94 2.13 9.8 70.45
i7 199 3 598.3 58.6 54.26 52.91 100.03 106.94 1.84 2.02 9.8 83.80
i8 133 8 2570.3 254.9 328.55 318.37 526.07 567.50 1.60 1.78 11.3 447.16
i9 88 7 683.8 67.2 72.56 69.08 152.02 158.35 2.10 2.29 9.0 133.81
i10 257 54 2881.8 282.5 446.18 432.28 509.82 542.45 1.14 1.25 9.6 471.59
C432 36 17 222.4 21.5 40.24 40.07 48.77 51.02 1.21 1.27 5.0 44.60
C499 41 11 554.12 53.4 101.75 100.54 111.83 114.56 1.10 1.14 3.6 106.39
C880 60 23 382.5 37.1 63.69 60.44 86.56 89.62 1.36 1.48 8.8 71.83
C1355 41 23 563.2 55.0 92.55 92.55 116.58 118.31 1.26 1.28 1.6 109.14
C1908 33 40 676.0 65.6 116.02 115.35 133.63 133.79 1.15 1.16 0.9 123.72
C2670 233 32 991.4 94.4 159.98 157.65 182.40 189.88 1.14 1.20 5.3 167.90
C3540 50 47 1296.4 127.0 226.51 223.55 265.27 270.07 1.17 1.21 3.4 242.00
C5315 178 49 2201.2 214.2 344.85 338.13 390.70 408.27 1.13 1.21 7.1 367.09
C6288 32 124 2921.8 276.0 530.52 530.52 590.05 595.87 1.11 1.12 0.9 561.76
C7552 207 43 3367.0 327.9 554.47 546.67 605.90 625.29 1.09 1.14 4.6 575.70
larger than the minimum value. Therefore, for some cir- age, the greater the level number, the more insensitive the
cuits, applying the input combination which gives the mini- leakage power to primary input combinations.
mum leakage power does reduce leakage power signicantly.
Column 11 of Table II shows the ratios of the maximum C. Comparison of GA and Random Search Technique
leakage power over the minimum obtained by the genetic In order to show the eciency and eectiveness of the
algorithm. These ratios represent the sensitivity of leak- genetic algorithm, the random search (RAND) method is
age power with respect to primary input combinations and performed for comparison. Sample numbers for GA and
the possible leakage power saving by applying dierent pri- RAND used in this experiment are 5; 000 and 50; 000, re-
mary input combinations. For the ten MCNC benchmarks spectively.
except i10, such ratios are over 1:7, for circuit i2, the ratio Table II reports the results for MCNC and ISCAS bench-
is even close to 5, which indicates that the leakage power is mark circuits obtained by the random search method and
very sensitive to primary input combinations. For the ten by the genetic algorithm. Columns 6 and 7 show the min-
ISCAS benchmarks, on the other hand, most of the ratios imum leakage power obtained by RAND and GA, respec-
are less than 1:3. This indicates that the leakage of ISCAS tively while columns 8 and 9 report the maximum values.
benchmarks are not sensitive to primary inputs, leaving lit- The last column of the table gives the average leakage
tle freedom to reduce leakage power by applying dierent power obtained by equation (6).
primary inputs. Results for minimumleakage power indicate that for 90%
The reason why ISCAS benchmarks are not as sensi- of the simulated circuits the genetic algorithm can give a
tive to primary input combinations as MCNC benchmarks tighter bound than the random search. For the other 10%
is probably due to the fact that ISCAS benchmarks have of the simulated circuits, both GA and RAND give the
considerably more levels of logic than MCNC benchmarks. same minimum values. For the maximum leakage power,
Except for circuit i10, all MCNC benchmarks have less the GA runs better for all simulated circuits. Consider
than 10 levels (the level number of each circuit is given in circuit i2. The maximumvalue obtained by GA is 48:41W
column 3). ISCAS benchmarks, on the other hand, have which is over 30% higher than 37:21W, the maximum
more than 20 levels except for circuits C432 and C499. obtained by RAND.
The most complicated ISCAS circuit C6288 has over 120 The tightness of the bounds obtained by RAND and GA
levels. As the number of logic levels increases, it becomes can be further illustrated by the percentage dierence be-
more and more dicult to control the state of gates that tween the maximum/minimum leakage ratios (next to last
are several levels away from the primary inputs. On aver- column in Table II). For the MCNC benchmarks, the av-
6
erage percentage dierence is nearly 20%. For circuit i2, 97-12, Purdue University, School of Electrical and Computer
the percentage dierence is over 50%. Engineering, 1997.
[12] T. Kawahara, et al., \Subthreshold current reduction for
Column 4 and column 5 in table II show the CPU time decoded-driver by self-reverse biasing," IEEE Journal of Solid-
for an ULTRA SPARC ONE with 256MB of memory. Since State Circuits, vol. 28, no. 11, pp. 1136-1143, Nov. 1993.
the genetic algorithm only searches 5; 000 combinations of [13] D. Lawrence, \Handbook of Genetic Algorithms," New York:
Van Nostrand Reinhold, 1991.
primary inputs, it is much faster than the random search [14] R. Marculescu, D. Marculescu, and M. Pedram, \Ecient Power
method which searches 50; 000 combinations. Estimation for Highly Correlated Input Streams," ACM/IEEE
Note that the CPU time shown in column 5 is the time Design Automation Conf., 1995, pp. 628-634.
[15] R. Marculescu, D. Marculescu, and M. Pedram, \Switching Ac-
for obtaining the minimum leakage power. In order to ob- tivity Analysis Considering Spatiotemporal Correlations," IEEE
tain the minimumand maximumvalues, the GA has to run Intl. Conf. on Computer-Aided-Design, pp. 294-299, 1994.
twice because the chromosomes in a generation are sorted [16] J. Monteiro, S. Devadas, and B. Lin, \A Methodology for E-
cient Estimation of Switching Activity in Sequential Logic Cir-
dierently. cuits," Intl. Workshop on Low Power Design, pp. 12-17, 1994.
[17] F.N.Najm, \Transition Density, A Stochastic Measure of Activ-
Summary & Conclusions ity in Digital Circuits" 28th ACM/ IEEE Design Automation
V.
Conf., pp. 644-649, 1991.
With the scaling of supply voltage and transistor thresh- [18] F.N.Najm, S. Goel, and I.N. Hajj, \Power Estimation in Se-
quential Circuits," ACM/IEEE Design Automation Conf., pp.
old, the leakage current is of critical importance. We pro- 635-640, 1995.
posed a novel technique to accurately estimate leakage cur- [19] J.M. Rabaey, \Digital Integrated Circuits," New Jersey:
Prentice-Hall, 1996.
rent in CMOS circuits during the standby mode of oper- [20] K. Roy and S. Prasad, \Circuit Activity Based Logic Synthe-
ation. Leakage current model uses the eect of transistor sis for Low Power Reliable Operations," IEEE Trans. on VLSI
stacks in both N-MOS and P-MOS trees and the results Systems, pp. 503-513, Dec. 1993.
indicate that the model is accurate. The leakage current [21] J. Sheu, D.L. Scharfetter, P.K. Ko, and M.C. Jeng, "BSIM:
Berkeley Short-Channel IGFET Model for MOS Transistors",
model oers a clear explanation of why the leakage power IEEE J. Solid-State Circuits, SC-22, 1987.
depends on primary input combinations. A genetic algo- [22] C.-Y. Tsui, M. Pedram, and A. M. Despain, \Exact and Ap-
proximate Methods for Calculating Signal and Transition Prob-
rithm based technique was used to determine the bounds abilities in FSMs," Intl. Workshop on Low Power Design, pp.
for leakage power in CMOS circuits due to dierent input 18-23, 1994.
combinations. Results for minimum and maximum leak- [23] M. G. Xakellis and F. N. Najm, \Statistical Estimation of the
Switching Activity in Digital Circuits," ACM/ IEEE Design Au-
age power indicate that for some circuits leakage power tomation Conf., pp. 728-733, 1994.
can vary widely with dierent primary input combinations. [24] \HSPICE User's Manual", Vol.II, 1996
Applying the best primary input combination to a circuit
during standby mode will signicantly reduce the leakage
power.
References