Ee141 Hw4 Sol

UNIVERSITY OF CALIFORNIA, BERKELEY
College of Engineering
Department of Electrical Engineering and Computer Sciences
Elad Alon Homework #4 - Solutions EECS141

Due Thursday, September 24, 5pm, box in 240 Cory
PROBLEM 1: Inverter Chain
In this problem you will optimize the delay of a chain of four inverters. The load
capacitance is CL=64*Cin, where Cin represent the capacitance of the first inverter in the
chain. Assume that the input capacitance of the first inverter is Cunit, =0.8, and tinv is the
unit delay of an inverter as defined in lecture ( i.e., tp = tinv(+f) ).
a) Size the inverters (with respect to Cin) to minimize the delay.
Solution:
From the lectures we know that the optimal way to size the inverter chain for
minimal delay is to size every inverter for the same fanout f. Since we are given
the input and the output capacitance and the number of stages, we can find the
total fanout (F) and the fanout of each stage (f) as:
64Cin
F= = 64
Cin
f = 4 64 = 2 2 2.82 .
Now we know that the optimal sizing for the chain is (starting from the beginning
of the chain): Cin , fCin , f 2Cin , f 3Cin . The exact numbers are shown on figure
below.
b) What is the optimal delay?
Solution:
Every stage has the same fanout and therefore the same delay: t d ,1 = t inv ( + f ).
The total delay is therefore:
t d = 4t d ,1 = 4t inv ( + f ) 14.5t inv .
c) Now add an additional load of 500*Cin after the 3rd inverter in the chain. With the
same sizing as in part a), now what is the delay of the chain?
Solution:
With the sizing from part a), the fanout of the third inverter in the chain will
change due to the added capacitive load. The delay of this inverter is now:
500 + 22.7
t d ,3 = t inv ( + f 3 ) = t inv ( + ) = t inv ( + 65.3)
8
and therefore the total delay is:
t d = 3t d ,1 + t d ,3 = 3t inv ( + f ) + t inv ( + f 3 ) 77t inv .
d) [BONUS] How could you modify design of the chain (i.e., change sizes, add or
remove stages, etc.) to improve the delay of the circuit from part c)?
Solution:
The large fixed load from this capacitor is now going to be the dominant factor
for the overall delay of the chain. This means that we should probably treat the
chain of the first 3 inverters as a new sizing problem, still with the same Cin for
the first stage (and leaving the last inverter the same size), but with the final load
equal to 522.7 Cin. (Well see next week that this heuristic will indeed get us
close to the true optimal results.)
Assuming we stick with just 3 inverters for this new chain, the new overall fanout
will be:
522.7Cin
F= = 522.7 .
Cin
So, the new f we should be targeting for the first 3 stages is:
f = 3 522.7 8.05
Hence, the total delay is now:
td = 3td ,1 + td ,3 = 3tinv ( + f ) + tinv ( + f1 ) 30.2tinv
which is obviously drastically reduced comparing to the result from part c). In
fact, we can do even better than this if we allow ourselves to change the number
of stages before the fixed capacitive load. In this case, the optimal number of
stages is log4(522.7) 4.52, which if we round to 4 (5) stages gives an f of 4.78
(3.5) and a delay of ~25.95tinv (25.11tinv).
PROBLEM 2: Complex Logic and LE
For this problem assume that =0.8 and tinv is the unit delay of an inverter as defined in
lecture ( i.e., tp = tinv(+f) ). Express all the delay values in terms of tinv.
a) Implement the logic function given by the expression shown below as a complex
CMOS gate followed by an inverter. Assuming the complex gate is sized for
equal rise and fall delays, what is its LE?
Out= AB+CD
Solution:
Cin,gate 6
The LE of the complex gate is: LE = = = 2.
Cin,inv 3
b) Implement the same logic function using only NAND2, NOR2, and inverters.
Solution:
c) Add a load capacitance of CL=128*Cin at the output of your circuit from part a).
Now size the circuit to minimize the delay. What is the minimum delay? Note
that you can assume that it is always the input closest to the output node that
switches i.e., you do not need to include any of the parasitics on the
intermediate nodes of the complex gate when calculating delay.
Solution:
In part a) we found the LE of the complex CMOS gate to be equal to 2. Since the
fanout is F=128, we can calculate the total path effort to be:
PE=LEF=2*128=256.
Our circuit has only two stages, and the optimal sizing to minimize the delay is to
distribute EF evenly on both stages. To size the chain, first we need to find the
EF/stage:
EF / stage = 2 256 = 16 .
Now that we know the EF/stage, we can size the inverter:
CL 128Cin
C2 = = = 8Cin .
EF / stage 16
Its always good to double the check of the size of the first gate to make sure that
we get Cin with the EF/stage we calculated (in case we made a mistake
somewhere):
C2 8C
C1 = LE1 = 2 in = Cin .
EF / stage 16
To find the overall delay, we need to know the parasitic delay of the complex
gate, which since weve sized the gate for equal drive strength as a unit inverter
we can find just by comparing the diffusion capacitance at the output:
C par,gate (4 + 4 + 2 + 2)C d
pgate = = = 4 .
C in,inv (2 + 1)C g
Putting it all together, the optimal delay of this chain is:
td = td , gate + td ,inv = tinv ( p gate + EF ) + tinv ( + EF ) = 36tinv .
d) Add a load capacitance CL=128*Cin at the output of your circuit from part b) and
size the circuit to minimize the delay. Note that Cin in parts a) and b) must be
equal. Now what is the minimum delay?
Hint: You might want to change the implementation from part b) so that the
circuit has a total number of stages that is close to the optimal one. Also, pay
attention to which gates are better drivers and try to rearrange the circuit so that
they are placed closer to the load.
Solution:
For the implementation presented in part b), the total LE of the chain is:
LE = LE1LE2 = 4/315/31=20/9.
Since the fanout is F=128, total path effort is:
PE=LEF=20/9*128=284.44.
Now we can calculate the optimal number of stages:

N = log 4 PE = log 4 284.44 = 4.1 .
So, the conclusion is that our implementation should have 4 stages, which our
implementation in part b) indeed has. It will however turn out that a better
implementation for part b) is possible, but first lets work out the sizing and delay
for this chain.
Finalizing on four stages and given our PE, we can now find the EF/stage:
EF / stage = 4 284.44 = 4.1
This leads to optimal sizings of:
CL 128Cin
C4 = LE 4 = 1 = 31.2Cin
EF 4.1
C4 31.2Cin
C3 = LE 3 = 5 /3 = 12.7Cin
EF 4.1
C3 12.7Cin
C2 = LE2 =1 = 3.1Cin
EF 4.1
C2 3.1Cin
C1 = LE1 = 4 /3 Cin (Checking the first stage)
EF 4.1
We also need to find pNAND2 and pNOR2 to plug in to the delay equation:
C par,NAND 2 (2 + 2 + 2)C d
pNAND 2 = = = 2 .
C in,inv (2 + 1)C g
C par,NOR 2 (4 +1+1)C d
pNOR 2 = = = 2
C in,inv (2 +1)C g
Hence, the overall delay is:

td = td , NAND 2 + td , NOR 2 + 2td , INV = tinv (2 + 2 + 2 ) + 4tinv ( EF / stage) 21.2tinv
As previously mentioned, we can actually come up with an implementation for

part b) that is both faster and will have a little bit less area because it uses more
inverters closer to the final load. (Note that you will receive full credit for
correctly analyzing/sizing the original implementation; the new implementation is
described here only for the sake of completeness.) This implementation is shown
below:
In this case, LE = LE1LE2 = 4/34/311=16/9, and PE and EF are:
PE=LEF=16/9*128=227.56.
EF / stage = 4 227.56 3.89 .
This leads to optimal sizings of:
CL 128Cin
C4 = LE4 = 1 = 32.9Cin
EF 3.89
C4 32.9Cin
C3 = LE3 = 1 8.5Cin
EF 3.89
C3 8.5Cin
C2 = LE2 = 4 / 3 = 2.9Cin
EF 3.89
C2 2.9Cin
C1 = LE1 = 4 / 3 Cin (Checking first stage)
EF 3.89
Hence, the overall optimal delay is now:
td = 2t d ,NAND2 + 2td ,INV = 2t inv ( pNAND2 + EF) + 2t inv ( + EF) 20.4tinv ,
which is almost half of the value we got in part c).

e) Compare the implementations from part c) and d) from the standpoint of total
transistor width (which is related to the overall area and power of the design).
Solution:
Since gate capacitance is directly proportional to transistor width, the easiest way
to compare these two implementations is to sum all the input capacitances of all
the gates in the circuit. Note that for complex circuits we have to include the input
capacitances from every input.
For the circuit from part c) we get that the total amount of gate capacitance is:
Ctot,c = 4 Cin + 8Cin = 12Cin .
On the other hand, for the circuit from part d) with the original implementation
(the one that uses a NOR2), the total gate capacitance is:
Ctot ,c = 2 Cin + 2 Cin + 2 3.1Cin + 2 12.7Cin + 31.2Cin = 66.8Cin
With the improved implementation (that uses only NAND2 and INV), the total
gate capacitance is:
Ctot,c = 2 Cin + 2 Cin + 2 2.9Cin + 8.5Cin + 32.9Cin = 51.2Cin .
So, as promised, the second implementation of part d) indeed has both lower area
and lower delay than the first implementation. However, the circuit in part d) is
~4 times larger than part c), although it does buy us substantially reduced (by
almost a factor of two) delay.
f) If CL=2*Cin, which implementation is now faster? Which one is better in terms of

total transistor width? Note that you do not need to repeat all of the calculations; a
well-explained, intuitive answer will receive the full credit.
Solution:
If the final load is reduced to 2*Cin, the PE for part c) is now approximately:
PE=LEF=2*2=4
For part d) (with the improved implementation) the PE is:
PE=LEF=16/9*2=3.56.
In both cases it turns out that the optimal number of stages is 1. However, since
the logic function cannot be implemented in one stage (remember, using static
CMOS every stage has to invert), the minimum number of stages in terms of
functionality is 2, which matches the circuit from part c). Specifically, the
EF/stage of the circuit in c) would be 2, while the EF/stage for the chain in d)
would be ~1.37. The conclusion is that for such a small load (2Cin), the
implementation from part c) would not only be faster, it would also be quite a bit
smaller.
PROBLEM 3: Switch Model and LE
In this problem you will use the switch model to find the delay of NAND2 and NOR2
gates and estimate what the LE of these gates are. Assume that in our technology the gate
capacitance CG = 2fF/m and that the junction capacitance CD = 1.6fF/m. Also assume
that Rp=20 k/ and Rn=10 k/.
a) Using the switch model, calculate delay of a NAND2 driving 4 identical copies of
itself when the bottom input (i.e., the transistor that is closest to the power supply)
switches. You should calculate the delay both for the rising and the falling edge.
Solution:
For the NAND2 gate shown on the figure above, in order to calculate tpHL we
should assume that the B input switches while A =1. The switch model for this
situation is presented on the figure on the next page.
Using Elmore delay, the time constant for this circuit is:
HL = RN Cint + 2RN (C L + C par ) ,
where assuming that each transistor is sized to be 2m wide (note that any choice
of size will result in the same final delay), then the capacitors are:
Cint = 2Cg + 2Cd + 2Cd = 10.4 fF
C par = 2C d + 2C d + 2C d = 9.6 fF
CL = 4 Cin = 4 4Cg = 32 fF
Similarly, the resistance of each transistor is:
L 0.09m
RN = Rn = 10k = 450,
W 2 m
Therefore, tpHL for the NAND gate will be:
t d , HL = ln2 HL = ln2 [450 10.4 fF + 900 (9.6 fF + 32 fF)] = 29.2 ps.

Now we need to calculate tpLH (still with B switching and A=1); the switch model
for this case is presented below:
Again using Elmore time constants:
LH = RP (C L + C par + Cint )
L 0.09m
where RP = Rp = 20k = 900, and therefore:
W 2m
t d ,LH = ln2 LH = ln2 900 (9.6 fF + 32 fF +10.4 fF)] = 32.4 ps.
b) Repeat part a) for a NOR2 gate when the input that controls the top-most PMOS
(i.e., the one closest to Vdd) switches.
Solution:
Similar to what we had in part a) of this problem, for the NOR2 gate shown on the
figure above, we should assume that B transitions while A=0. The switch model
for this situation is presented on the figure below.
Once again using Elmore to find tpHL:
LH = RP Cint + 2RP (CL + C par )
L 0.09m
where RP = Rp = 20k = 450, and the capacitors are:
W 4 m
Cint = 4C g + 4Cd + 4Cd = 20.8 fF
C par = 4C d + 1C d + 1C d = 9.6 fF
CL = 4 Cin = 4 5Cg = 40 fF
Therefore, tpLH is:
td ,LH = ln2 LH = ln2 [450 20.8 fF + 900 (9.6 fF + 40 fF)] = 37.4 ps.
Now to calculate tpHL,for which the switch model is shown below:
Now to calculate tpHL (still with B switching and A=0):
HL = RN (CL + C par + Cint )
L 0.09m
where RN = Rn = 10k = 900, and therefore:
W 1m
t d , HL = ln2 HL = ln2 900 (9.6 fF + 40 fF + 20.8 fF)] = 43.9 ps.
c) Using HSPICE and the graphical method described in class, extract the LEs of the
NAND2 and NOR2 gates. Assume that for NAND2 its always the bottom input
(one closest to ground) and for NOR2 its always the topmost input (one that is
closest to Vdd) that are switching. How do these extracted values for LE compare
with predictions from the simple switch model?
Note that to measure the delay and extract LE, you should use the circuit from
HW#2 and HW#3 with the chain of 4 NAND2 (4 NOR2) gates each having the
same fanout. The example for a chain of NAND2 gates each having a fanout of 4
is shown on figure below (A is the bottom input).
Solution:
To make the chain of NOR2 gates we use the same idea as the chain of NAND2
gates we want to propagate the input signal all the way to the output, thus we
connect all other inputs to ground:
To find the LE for NAND2 and NOR2, we size the gates in the chain such that
every one will have the same fanout. Then we know that using the formula for the
delay of the single gate:
t gate = t inv ( pgate + LE gate f ) ,
and measuring delay for different values of fanout f, we can find the LE of the
gate. To simulate this in HSPICE, we sweep the value of f and get plots of delay
vs. fanout for NAND2 and NOR2 gates (as well as for inverters). The HSPICE
deck is given at the end of the solution to this problem.
From this diagram we can first calculate t inv parameter by reading off the delays
for an inverter with fanout of 0 and 5:
t d ,5 t d ,0
t inv = = 5.2 ps .
5
Now we can calculate the LENAND2 and LENOR2 by doing the same thing for the
NAND2 and NOR2 gates:
t d ,5 t d ,0
LE NAND 2 = = 1.15
5t inv
t d ,5 t d ,0
LE NOR 2 = = 1.46
5t inv
These values are smaller from what was expected based on our calculation using
the switch model (LENAND2 = 4/3 and LENOR2 = 5/3). As well see shortly, the
main cause of this is that in modern technologies a stack of two transistors doesnt
actually have twice the resistance of a single transistor.
HSPICE deck for finding LE:

*** HW4 Problem 3c ***
.LIB '/home/ff/ee141/MODELS/gpdk090_mos.sp' TT_s1v
.PARAM vddval=1.2
.PARAM fanout=1
.SUBCKT inv vdd gnd in out

Mp out in vdd vdd gpdk090_pmos1v W=2u L=0.09u
Mn out in gnd gnd gpdk090_nmos1v W=1u L=0.09u
.ENDS
.SUBCKT nand2 vdd gnd in1 in2 out

Mp1 out in1 vdd vdd gpdk090_pmos1v W=2u L=0.09u
Mp2 out in2 vdd vdd gpdk090_pmos1v W=2u L=0.09u
Mn1 int in1 gnd gnd gpdk090_nmos1v W=2u L=0.09u
Mn2 out in2 int gnd gpdk090_nmos1v W=2u L=0.09u
.ENDS
.SUBCKT nor2 vdd gnd in1 in2 out

Mp1 int in1 vdd vdd gpdk090_pmos1v W=4u L=0.09u
Mp2 out in2 int vdd gpdk090_pmos1v W=4u L=0.09u
Mn1 out in1 gnd gnd gpdk090_nmos1v W=1u L=0.09u
Mn2 out in2 gnd gnd gpdk090_nmos1v W=1u L=0.09u
.ENDS
* Voltage Sources
V1 vdd 0 'vddval'
V2 vin1 0 PWL 0 0V 10p 0V 11p 'vddval' 1011p 'vddval' 1012p 0V
* NAND2 chain
Xnand2_1 vdd 0 vin1 vdd out1 nand2 M=1
Xnand2_2 vdd 0 out1 vdd out2 nand2 M=fanout
Xnand2_3 vdd 0 out2 vdd out3 nand2 M='fanout*fanout'
Xnand2_4 vdd 0 out3 vdd out4 nand2 M='fanout*fanout*fanout'
* NOR2 chain
Xnor2_1 vdd 0 vin1 0 out11 nor2 M=1
Xnor2_2 vdd 0 out11 0 out21 nor2 M=fanout
Xnor2_3 vdd 0 out21 0 out31 nor2 M='fanout*fanout'
Xnor2_4 vdd 0 out31 0 out41 nor2 M='fanout*fanout*fanout'
* INV chain
Xinv_1 vdd 0 vin1 out12 inv M=1
Xinv_2 vdd 0 out12 out22 inv M=fanout
Xinv_3 vdd 0 out22 out32 inv M='fanout*fanout'
Xinv_4 vdd 0 out32 out42 inv M='fanout*fanout*fanout'
* options
.option post=2 nomod
.op
* analysis
.TRAN 0.1PS 1.5NS sweep fanout 0.001 5.001 1
*nand2 chain
.MEASURE TRAN tpHL TRIG V(out1) VAL='vddval/2' RISE=1 TARG V(out2) VAL='vddval/2' FALL=1
.MEASURE TRAN tpLH TRIG V(out1) VAL='vddval/2' FALL=1 TARG V(out2) VAL='vddval/2' RISE=1
.MEASURE TRAN tpavg PARAM='(tpHL+tpLH)/2'
*nor2 chain
.MEASURE TRAN tpHL1 TRIG V(out11) VAL='vddval/2' RISE=1 TARG V(out21) VAL='vddval/2' FALL=1
.MEASURE TRAN tpLH1 TRIG V(out11) VAL='vddval/2' FALL=1 TARG V(out21) VAL='vddval/2' RISE=1
.MEASURE TRAN tpavg1 PARAM='(tpHL1+tpLH1)/2'
*inverter chain
.MEASURE TRAN tpHL2 TRIG V(out12) VAL='vddval/2' RISE=1 TARG V(out22) VAL='vddval/2' FALL=1
.MEASURE TRAN tpLH2 TRIG V(out12) VAL='vddval/2' FALL=1 TARG V(out22) VAL='vddval/2' RISE=1
.MEASURE TRAN tpavg2 PARAM='(tpHL2+tpLH2)/2'
.END

Ee141 Hw4 Sol

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Ee141 Hw4 Sol

Transféré par

Droits d'auteur :

Formats disponibles

UNIVERSITY OF CALIFORNIA, BERKELEY

Elad Alon Homework #4 - Solutions EECS141

PROBLEM 1: Inverter Chain

a) Size the inverters (with respect to Cin) to minimize the delay.

t d = 4t d ,1 = 4t inv ( + f ) 14.5t inv .

and therefore the total delay is:

t d = 3t d ,1 + t d ,3 = 3t inv ( + f ) + t inv ( + f 3 ) 77t inv .

Hence, the total delay is now:

td = 3td ,1 + td ,3 = 3tinv ( + f ) + tinv ( + f1 ) 30.2tinv

PROBLEM 2: Complex Logic and LE

Putting it all together, the optimal delay of this chain is:

td = td , gate + td ,inv = tinv ( p gate + EF ) + tinv ( + EF ) = 36tinv .

Since the fanout is F=128, total path effort is:

Now we can calculate the optimal number of stages:

EF / stage = 4 284.44 = 4.1

This leads to optimal sizings of:

Hence, the overall delay is:

As previously mentioned, we can actually come up with an implementation for

In this case, LE = LE1LE2 = 4/34/311=16/9, and PE and EF are:

EF / stage = 4 227.56 3.89 .

This leads to optimal sizings of:

Hence, the overall optimal delay is now:

td = 2t d ,NAND2 + 2td ,INV = 2t inv ( pNAND2 + EF) + 2t inv ( + EF) 20.4tinv ,

which is almost half of the value we got in part c).

Ctot,c = 4 Cin + 8Cin = 12Cin .

Ctot ,c = 2 Cin + 2 Cin + 2 3.1Cin + 2 12.7Cin + 31.2Cin = 66.8Cin

Ctot,c = 2 Cin + 2 Cin + 2 2.9Cin + 8.5Cin + 32.9Cin = 51.2Cin .

f) If CL=2*Cin, which implementation is now faster? Which one is better in terms of

For part d) (with the improved implementation) the PE is:

PROBLEM 3: Switch Model and LE

HL = RN Cint + 2RN (C L + C par ) ,

Cint = 2Cg + 2Cd + 2Cd = 10.4 fF

Similarly, the resistance of each transistor is:

Therefore, tpHL for the NAND gate will be:

t d , HL = ln2 HL = ln2 [450 10.4 fF + 900 (9.6 fF + 32 fF)] = 29.2 ps.

Again using Elmore time constants:

t d ,LH = ln2 LH = ln2 900 (9.6 fF + 32 fF +10.4 fF)] = 32.4 ps.

Once again using Elmore to find tpHL:

LH = RP Cint + 2RP (CL + C par )

Cint = 4C g + 4Cd + 4Cd = 20.8 fF

Therefore, tpLH is:

Now to calculate tpHL (still with B switching and A=0):

HL = RN (CL + C par + Cint )

t d , HL = ln2 HL = ln2 900 (9.6 fF + 40 fF + 20.8 fF)] = 43.9 ps.

t gate = t inv ( pgate + LE gate f ) ,

HSPICE deck for finding LE:

.LIB '/home/ff/ee141/MODELS/gpdk090_mos.sp' TT_s1v

.SUBCKT inv vdd gnd in out

.SUBCKT nand2 vdd gnd in1 in2 out

.SUBCKT nor2 vdd gnd in1 in2 out

Vous aimerez peut-être aussi