Vous êtes sur la page 1sur 10

Stochastic Methods for Transistor Size Optimization of CMOS VLSI Circuits

Robert Rogenmoser1, Hubert Kaeslin1, and Tobias Blickle2


1 Integrated Systems Laboratory Computer Engineering and Networks Laboratory ETH Zurich, Gloriastr. 35, 8092 Zurich, Switzerland 2

transistor sizes. We have tested a standard optimizer, a Monte Carlo scheme and a method based on Genetic Algorithms combined with very accurate SPICE simulations to automatically optimize transistor sizes of three di erent digital CMOS circuits. While the standard optimizer and the Monte Carlo scheme are advantageous for small circuits, the method based on Genetic Algorithms was found to be more stable for larger circuits.

Abstract. The performance of a CMOS circuit depends heavily on its

1 Introduction
Transistor size optimization is a traditional obligation in VLSI (Very Large Scale Integration) design. It is used to improve the performance of a circuit to achieve a design goal in a speci c technology. This design goal can either be boosting operating speed, lowering power consumption, or lowering area requirements. In this context the netlist of the circuit is already determined, only the width and the length of the MOS (Metal Oxide Semiconductor) transistors can be adjusted. The gures of merit depend in a complex way on the individual sizes of the transistors. Changing transistor sizes in a circuit often leads to surprising results, which are not easily predicted. It is hard to optimize digital CMOS circuits because their operation can only be modeled by distinguishing between di erent operating regions for each transistor depending on its terminal voltages (large signal behavior). In contrast, analog circuits can be expressed using small signal equivalents of the transistors, which are linearized in the operating point. Even for a two transistor digital circuit such as an inverter (see Fig. 1) it is very hard to get accurate design equations. For example, a step input to a single inverter results in two operating regions with two di erential equations (using level 2 models Sah64]). If the step input is changed to a more plausible but still not entirely realistic linear rising ramp, the circuit traverses ve regions of operation, each of which is described by its own di erential equation. In general, circuit designers manually optimize circuits by trial and error using an accurate circuit simulator such as SPICE. A good intuition and the designer's understanding of the transistors are important. Still, it is a tedious work of assigning sizes (i.e. width and length of the channel of the MOS) to all

MP

W/L

IN
MN W/L
a)

OUT
CLoad

IN
myinv.epsi 118 35 mmOUT
tpd10 tpd01

b)

Fig. 1. CMOS Inverter: a) Transistor Diagram, b) Input and Output Waveforms


transistors, verifying the performance by simulation, and starting this process over again by reassigning new transistor sizes. An experienced designer may obtain acceptable results after a few iterations, while less trained designers may spend much more time optimizing even a small circuit. Automated tools are therefore a welcome alternative. Some automated transistor sizing tools model transistors as switched RCelements Heu90]. The linearized netlist of a circuit can then be optimized using standard optimizers because such netlists consist of relatively simple equations, which are easy to di erentiate. The results may be satisfactory for a rst approximation but due to the simpli ed model they are often not su cient. A commercially available transistor sizing tool has been incorporated in the circuit simulator HSPICE Hsp]. It performs iterative simulations using accurate transistor models; however, nothing has been published about the algorithms used internally. While the optimization of very small circuits using this tool delivered reasonable results after a few iterations it was more di cult to obtain results for larger circuits. Optimization was often terminated before even mediocre termination criteria had been achieved. Therefore, we have investigated new ways based on stochastic optimization methods to automatically optimize transistor sizes of circuits. Two di erent methods will be described: (1) Guided random search using a single solution and (2) using probabilistic optimization based on genetic algorithms (GA). Method (1) has been published in Wur93] and was originally used for sizing (scaling) Domino CMOS circuits. We applied this method to entire digital CMOS circuits and added an adaptive scheme. We will refer to it as Monte Carlo (MC) scheme. Both methods use highly accurate SPICE simulation for the evaluation of the objective function and have been applied to combinational circuits but also to storage elements with internal feedbacks (e.g. ip- ops). All transistor in a circuit are sized by the GA or MC except prede ned minimum sized transistors like, for example, for pull-up or pull-down function. In HK94] also a CMOS VLSI circuit optimization tool based on genetic algorithms was presented. In contrast to our work, simpli ed linear models for the transistors were used. These models require much lower computation times but

the accuracy declines. To further reduce complexity only the output transistor pair of each gate of combinational blocks have been optimized. In the next section we will introduce the three digital CMOS circuits with their optimization criteria and objective functions. Then we will look at the optimization methods in Sect. 3. In Sect. 4 the results are presented and discussed before drawing conclusions in the last section.

2 CMOS Circuits and Objective Functions


The functionality of a CMOS circuit is determined by its transistor network topology. However, the individual sizes of the transistors determine the nal performance of the circuit and, if scaled wrongly, may even prevent the circuit from functioning correctly (e.g. in dynamic logic). For example, the p-channel transistor MP (PMOS) and the n-channel transistor MN (NMOS) in the inverter in Fig.1a have di erent electrical characteristics. Therefore their sizes have to be scaled to equalize signal propagation for high and low inputs (tpd10 != tpd01, see Fig.1b). Moreover, they have to be adjusted for the load presented at the output. In the simple case of an inverter most designers can optimize the transistor sizes using an accurate circuit simulator combined with extracted transistor models from fabricated devices. Typically, the PMOS is sized 1.5 to 3 times larger than the NFET from experience and after a few simulation runs the nal size of the two is xed. Because the current through a MOS transistor is proportional to W=L (W: width, L: length of the channel) only W is adjusted and L is kept minimal. While the optimization is straightforward for an inverter, it becomes more complex for larger circuits where more input and output signals and propagation paths have to be watched. In our analysis, we looked at the optimization of three di erent CMOS circuits having increasing complexity: (1) a 3-input NAND gate (NAND3), (2) a dynamic edge-triggered D- ip- op (DFF), and (3) a complementary pass gate full adder (CPLFA), see Table1. In contrast to HK94] we used highly accurate SPICE simulations to calculate the tness function in spite of the relatively large calculation time for one objective function.

Table 1. Speci cation of optimized CMOS circuits


Circuit Technology # MOS # MOS 1 Evaluation total optimized real CPU NAND3 1.0 m 6 6 10.1 s 3.0 s DFF 1.0 m 16 14 11.6 s 5.8 s CPLFA 0.6 m 34 26 46.5 s 36.7 s

MPA

MPB

MPC

A B C

MNA

nand3.epsi 50 MPB 49 mm
MPC

Fig. 2. 3-input CMOS NAND gate (NAND3) 2.1 Objective Functions


The most important optimization goal for these circuits was to minimize the propagation delay for all inputs to the output for both, the high to low (01) and the low to high (10) transitions with limited increase in power consumption. The objective function for the NAND3 was therefore: P 2 del f = 0:9 tt + 0:1 P 0 0 with : tdel = max(tA01; tA10; tB01; tB10; tC 01; tC 10) X P = Wi The values for tdel and P were adjusted to ensure that power consumption has a weight of about 10%. The value for tdel was determined by performing a SPICE simulation and measuring all propagation delays for each input combination and extracting the largest value. Power consumption was determined byP summation of all transistor widths W (for CMOS circuits: P = f C V 2 ; C W ). A similar objective function was used for the DFF. However, in a ip- op not only propagation delay tpd but also other parameters such as setup time tsu are important design parameters. Moreover, in the case of the present dynamic D- ip- op internal timings such as the precharge time tch has to be taken into consideration. Fig.3 shows such a dynamic ip- op which was used for highspeed designs RH96, RH95]. Hence, tdel has to be speci ed in more detail as: tdel = max(tsuM + tpdM ; 2 tch ) with : tsuM = max(tsu0; tsu1) tpdM = max(tpdQ0 ; tpdQ1; tpdQB0 ; tpdQB1 )
/

M1 M2 A B M3

M4 C

M7

Q
I2 M8

huro .epsi 87 40 mm
M6 E

M5

QB
FD M8a FU M9 FL

CLK

I1

Fig. 3. Dynamic edge-triggered D- ip- op (DFF)


For both circuits, NAND3 and DFF, the timings depend very much on the capacitive load on the outputs. This load has been adjusted so that it corresponds to twice the load that the circuit presents itself at the input (i.e. consecutive identical gates with fanout two). For the third circuit, the CPLFA (see Fig. 4), a slightly di erent objective function was chosen: f = P tdel Both values can easily be extracted from SPICE simulations. In this circuit all 24 transistor of the network have been optimized together with 2 transistors of an inverter at the output, which were then applied to all four output drivers (S to COb). For the six input drivers (A to CIb) the same sizes have been assigned to re ect the drive capability of the optimized circuit. The six capacitors on these drivers represent additional wiring capacitance.

3 Optimization Methods
3.1 Monte Carlo
The rst stochastic optimization method is based on Wur93], where transistors of Domino CMOS circuits were sized in an incremental way. The procedure looks as follows: 1. Assign a set of random sizes for each transistor 2. Evaluate the solution (calculate the tness function) 3. Increase, decrease each sizes of the set by a xed step (minimum feature size) or do not change it at all (with equal probability) 4. Evaluate the solution; if this set has a better tness continue with this set, otherwise use the previous one. 5. If maximum number of evaluations is reached stop, else goto 3. We used this Monte Carlo (MC) for all three circuits and incorporated an adaptive scheme. The step size is rst set to four times minimum feature size.

cplfa.epsi 118 72 mm

Fig. 4. Complementary Pass Gate Full Adder (CPLFA). For simplicity not all wires are drawn; nodes with same node are connected
After the rst third of the optimization it is reduced to twice the minimum feature size and to the minimum for the last third. Duration of the optimization was limited by the number of function evaluations.

3.2 Genetic Algorithms

Three di erent variants of a genetic algorithm have been used and are summarized in Table2. These parameters have been used to optimize the three circuits. The number of computations per evaluation was set to 400 for the NAND3, to 1000 for the DFF, and to 3000 for the CPLFA. All experiments were performed using the GENEsYs package Bac92]. The rst setup (PT) is the standard setup provided by this package. It uses tness proportional selection and two-point crossover. The second setup (RU) re ects empirical knowledge of optimal parameter settings. In particular, the following changes have been made: { ranking selection, as this is known to overcome some serious disadvantages of proportional selection Whi89, MSV93a], { uniform crossover that has shown its superiority in several investigations (see, e.g., Sys89]), { the optimal mutation rate of 1/n, where n is the length of the individual in bits MSV93b]. The third setup (ES) is based on evolution strategies Sch81] and uses no recombination but the adaptive mutation scheme AMEM (adaptive mutation excluding mutation rates) (see, e.g. Bac94]).

Table 2. The parameters used in the three di erent setups


PT RU ES 20 20 20 proportional Ranking Truncation (elitism) (elitism) max = 1:1 T = 50% Crossover method two-point uniform none Crossover probability 0.6 0.6 Mutation rate 0.03 0.008 AMEM Population size Selection method

4 Results and Discussion


In order to compare the performance of the di erent optimizers, the maximum number of individuals (or simulations) was set equal for all approaches. Each experiment was performed 10 times. The overall best solutions found for the di erent circuits are shown in Table3. The MC method seems to be well suited for all three problems. However, looking at the results of several simulations reveals a slightly di erent picture.

Table 3. Best Fitness Value for Di erent Optimization Methods


Circuit NAND3 DFF CPLFA Monte Carlo MC 0.345 0.977 1.321 Genetic Algorithm PT RU 0.361 0.373 1.094 1.086 1.562 1.425 ES 0.350 1.009 1.791

Figures 5a{c show the statistics for the three problems and the four optimizers at the end of each run. The line through the box marks the data's median, while the upper and lower edge of the box mark the upper and lower quartile of the data. The box encloses the interquartile range (IQR), that is the range of half of the data that clusters around the middle of the data. The tails reach to upper and lower adjacent values, where the adjacent values are the largest (smallest) datum that is not more than 1.5 IQR above or below the upper and lower quartile. All other data points are plotted beyond the ends of the tails as dots. For the two small circuits (NAND3 and DFF) the MC outperforms the three genetic algorithms in both performance and accuracy. For the large example however, the MC has a much larger variance. The most stable optimizer for the large problem is the RU genetic algorithm. For the CPFLA circuit the performance of the three genetic algorithms are compared in Fig. 6. Here, the average of the 10 best individuals in each generation is shown as a function of the generation number. The graph shows the superiority of the RU setup. The performance of the MC is also added to this graph, however only every 20th solution has been incorporated.

NAND3
0.5 1.6

DFF
2.8 2.6

CPLFA

0.45

1.4

2.4

F itnes s

F itnes s

0.4

1.2

stat.epsi 118 51 mm

F itnes s

2.2 2 1.8 1.6 1.4

0.35

0.3 PT RU ES MC

0.8 PT RU ES MC

1.2 PT RU ES MC

Strategy

Strategy

Strategy

a)

Fig. 5. Statistics for the three circuits

b)

c)

5 RU PT 4.5 ES MC 4

3.5

result2.ps 119 74 mm

2.5

Fig. 6. The optimization processes in dependence of the generations for the three GA
20 40 60 80 100 120 140

1.5

setups and the MC used. The graph shows the average of the best individuals for CPLFA.

Also, further computation time might have lead to better results as the objective function values continue to decline. Some of the results were surprising, as for example the resulting transistor sizes of a 2-input NAND gate depicted on Fig. 7. In this NAND gate, the NMOS are tapered as one might expect, but the PMOS have di erent sizes, which seems to be wrong at rst glance and are unlikely to be assigned even by an experienced VLSI designer. Simulation of this NAND gate results in equal rise and fall times for any input combination. The sizing therefore must be correct. Analyzing the circuit in more detail reveals the reason. If node Q has to be

MPB 28/2

MPA 22/2

Q A
nand.epsi 67 59 mm
10/2

MNA C 13/2 MNB

Fig. 7. 1 m CMOS NAND gate with transistor sizes optimized by GA


charged by transistor MPB , input B is low and transistor MNB is o . In the worst case input A is high and transistor MPB has to charge not only the output Q , but also the internal node C . On the other hand, if the transistor MPA has to charge the output Q , input A is low, transistor MNA is turning o , and no internal capacitance has to be charged. The di erence between the sizes of the two PFETs may appear insigni cant, but any additional transistor width translates straight to a higher power consumption. Furthermore, the capacitance of input A is more than 20 % smaller, which may appear insigni cant for standard designs. However, in high-speed designs this may just be the critical quantity to maximize performance.

5 Conclusion
In this paper we have shown that the important but tedious work of manually optimizing transistor sizes of VLSI circuits can be accurately performed by stochastic optimization. We have compared four stochastic optimization methods on three CMOS subcircuits of di erent complexity (6, 16, 34 MOSFETs). On the smaller two subcircuits the Monte Carlo method has been found to yield better results than the three genetic algorithms investigated. On the largest of the three subcircuits, in contrast, the genetic algorithms were found to yield signi cantly smaller variances provided they were allowed to run for a su cient number of generations, which suggests that the Monte Carlo methods tends to end up in a suboptimal solution more frequently as problem size increases. Among the genetic algorithms, the most consistent results were obtained by using ranking selection and uniform crossover. All our experiments started from a random seed - no initial guess was submitted. An advantage using genetic algorithms is the opportunity to include a

set of good initial solutions in the rst generation. The circuits obtained optimized had a comparable performance as those optimized by experienced designers. However, results were obtained automatically (e.g. 8h CPU time for the CPLFA on a Sparc S-10-30), while manual optimization of a circuit can take many hours of expensive engineer's time. The GA has been applied for the optimization of the dynamic logic- ip- ops used in the design of a 800 MHz 1 m CMOS adder RH96].

References
Bac92] Thomas Back. GENEsYs, 1992. Computer Science Department, LSXI, University of Dortmund, Baroper Str. 301, D-4600 Dortmund 50, Germany. Bac94] Thomas Back. Evolutionary Algorithms in Theory and Practice. PhD thesis, Fachbereich Informatik, Universitat Dortmund, 1994. Heu90] L. S. Heusler. Transistor Sizing for Timing Optimization of Combinational Digital CMOS Circuits. PhD thesis, ETH Zurich, 1990. HK94] A. M. Hill and S-M. Kang. Genetic Algorithm Based Design Optimization of CMOS VLSI Circuits. In Proceedings of the Third International Conference on Parallel Problem Solving from Nature - PPSN III, pages 546{555, October 1994. Hsp] HSPICE { Circuit Simulator, Metasoft. MSV93a] Heinz Muhlenbein and Dirk Schlierkamp-Voosen. Predictive models for the breeder genetic algorithm. Evolutionary Computation, 1(1), 1993. MSV93b] Heinz Muhlenbein and Dirk Schlierkamp-Voosen. The science of breeding and its application to the breeder genetic algorithm. Evolutionary Computation, 1(4), 1993. RH95] R. Rogenmoser and Q. Huang. A 375 MHz 1- m CMOS 8-Bit Multiplier. In Proceedings of the 1995 Symposium on VLSI Circuits, pages 13{14, June 1995. RH96] R. Rogenmoser and Q. Huang. An 800-MHz 1- m CMOS Pipelined 8-bit Adder using True Single-Phase Clocked Logic-Flip-Flops. IEEE Journal of Solid{State Circuits, 31(3):401{409, March 1996. Sah64] C. T. Sah. Characteristics of the Metal-Oxide-Semiconductor Transistors. IEEE Transactions on Electron Devices, ED-11:324{345, July 1964. Sch81] H.-P. Schwefel. Numerical Optimization of Computer Models. Wiley, Chichester, 1981. Sys89] Gilbert Syswerda. Uniform crossover in genetic algorithms. In J. David Scha er, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 2{9, San Mateo, CA, 1989. Morgan Kaufmann Publishers. Whi89] Darrell Whitley. The GENITOR algorithm and selection pressure: Why rank-based allocation of reproductive trials is best. In J. David Scha er, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 116 { 121, San Mateo, CA, 1989. Morgan Kaufmann Publishers. Wur93] L. T. Wurtz. An E cient Scaling Procedure for Domino CMOS Logic. IEEE Journal of Solid{State Circuits, 28(9):979{982, September 1993. a This article was processed using the L TEX macro package with LLNCS style

Vous aimerez peut-être aussi