Académique Documents
Professionnel Documents
Culture Documents
College of Engineering
Department of Electrical Engineering and Computer Sciences
In this problem you will optimize the delay of a chain of four inverters. The load
capacitance is CL=64*Cin, where Cin represent the capacitance of the first inverter in the
chain. Assume that the input capacitance of the first inverter is Cunit, =0.8, and tinv is the
unit delay of an inverter as defined in lecture ( i.e., tp = tinv(+f) ).
Solution:
From the lectures we know that the optimal way to size the inverter chain for
minimal delay is to size every inverter for the same fanout f. Since we are given
the input and the output capacitance and the number of stages, we can find the
total fanout (F) and the fanout of each stage (f) as:
64Cin
F= = 64
Cin
f = 4 64 = 2 2 2.82 .
Now we know that the optimal sizing for the chain is (starting from the beginning
of the chain): Cin , fCin , f 2Cin , f 3Cin . The exact numbers are shown on figure
below.
b) What is the optimal delay?
Solution:
Every stage has the same fanout and therefore the same delay: t d ,1 = t inv ( + f ).
The total delay is therefore:
c) Now add an additional load of 500*Cin after the 3rd inverter in the chain. With the
same sizing as in part a), now what is the delay of the chain?
Solution:
With the sizing from part a), the fanout of the third inverter in the chain will
change due to the added capacitive load. The delay of this inverter is now:
500 + 22.7
t d ,3 = t inv ( + f 3 ) = t inv ( + ) = t inv ( + 65.3)
8
d) [BONUS] How could you modify design of the chain (i.e., change sizes, add or
remove stages, etc.) to improve the delay of the circuit from part c)?
Solution:
The large fixed load from this capacitor is now going to be the dominant factor
for the overall delay of the chain. This means that we should probably treat the
chain of the first 3 inverters as a new sizing problem, still with the same Cin for
the first stage (and leaving the last inverter the same size), but with the final load
equal to 522.7 Cin. (Well see next week that this heuristic will indeed get us
close to the true optimal results.)
Assuming we stick with just 3 inverters for this new chain, the new overall fanout
will be:
522.7Cin
F= = 522.7 .
Cin
So, the new f we should be targeting for the first 3 stages is:
f = 3 522.7 8.05
which is obviously drastically reduced comparing to the result from part c). In
fact, we can do even better than this if we allow ourselves to change the number
of stages before the fixed capacitive load. In this case, the optimal number of
stages is log4(522.7) 4.52, which if we round to 4 (5) stages gives an f of 4.78
(3.5) and a delay of ~25.95tinv (25.11tinv).
For this problem assume that =0.8 and tinv is the unit delay of an inverter as defined in
lecture ( i.e., tp = tinv(+f) ). Express all the delay values in terms of tinv.
a) Implement the logic function given by the expression shown below as a complex
CMOS gate followed by an inverter. Assuming the complex gate is sized for
equal rise and fall delays, what is its LE?
Out= AB+CD
Solution:
Cin,gate 6
The LE of the complex gate is: LE = = = 2.
Cin,inv 3
b) Implement the same logic function using only NAND2, NOR2, and inverters.
Solution:
c) Add a load capacitance of CL=128*Cin at the output of your circuit from part a).
Now size the circuit to minimize the delay. What is the minimum delay? Note
that you can assume that it is always the input closest to the output node that
switches i.e., you do not need to include any of the parasitics on the
intermediate nodes of the complex gate when calculating delay.
Solution:
In part a) we found the LE of the complex CMOS gate to be equal to 2. Since the
fanout is F=128, we can calculate the total path effort to be:
PE=LEF=2*128=256.
Our circuit has only two stages, and the optimal sizing to minimize the delay is to
distribute EF evenly on both stages. To size the chain, first we need to find the
EF/stage:
EF / stage = 2 256 = 16 .
Now that we know the EF/stage, we can size the inverter:
CL 128Cin
C2 = = = 8Cin .
EF / stage 16
Its always good to double the check of the size of the first gate to make sure that
we get Cin with the EF/stage we calculated (in case we made a mistake
somewhere):
C2 8C
C1 = LE1 = 2 in = Cin .
EF / stage 16
To find the overall delay, we need to know the parasitic delay of the complex
gate, which since weve sized the gate for equal drive strength as a unit inverter
we can find just by comparing the diffusion capacitance at the output:
C par,gate (4 + 4 + 2 + 2)C d
pgate = = = 4 .
C in,inv (2 + 1)C g
d) Add a load capacitance CL=128*Cin at the output of your circuit from part b) and
size the circuit to minimize the delay. Note that Cin in parts a) and b) must be
equal. Now what is the minimum delay?
Hint: You might want to change the implementation from part b) so that the
circuit has a total number of stages that is close to the optimal one. Also, pay
attention to which gates are better drivers and try to rearrange the circuit so that
they are placed closer to the load.
Solution:
For the implementation presented in part b), the total LE of the chain is:
LE = LE1LE2 = 4/315/31=20/9.
PE=LEF=20/9*128=284.44.
Finalizing on four stages and given our PE, we can now find the EF/stage:
CL 128Cin
C4 = LE 4 = 1 = 31.2Cin
EF 4.1
C4 31.2Cin
C3 = LE 3 = 5 /3 = 12.7Cin
EF 4.1
C3 12.7Cin
C2 = LE2 =1 = 3.1Cin
EF 4.1
C2 3.1Cin
C1 = LE1 = 4 /3 Cin (Checking the first stage)
EF 4.1
We also need to find pNAND2 and pNOR2 to plug in to the delay equation:
C par,NAND 2 (2 + 2 + 2)C d
pNAND 2 = = = 2 .
C in,inv (2 + 1)C g
C par,NOR 2 (4 +1+1)C d
pNOR 2 = = = 2
C in,inv (2 +1)C g
PE=LEF=16/9*128=227.56.
CL 128Cin
C4 = LE4 = 1 = 32.9Cin
EF 3.89
C4 32.9Cin
C3 = LE3 = 1 8.5Cin
EF 3.89
C3 8.5Cin
C2 = LE2 = 4 / 3 = 2.9Cin
EF 3.89
C2 2.9Cin
C1 = LE1 = 4 / 3 Cin (Checking first stage)
EF 3.89
Solution:
Since gate capacitance is directly proportional to transistor width, the easiest way
to compare these two implementations is to sum all the input capacitances of all
the gates in the circuit. Note that for complex circuits we have to include the input
capacitances from every input.
For the circuit from part c) we get that the total amount of gate capacitance is:
On the other hand, for the circuit from part d) with the original implementation
(the one that uses a NOR2), the total gate capacitance is:
With the improved implementation (that uses only NAND2 and INV), the total
gate capacitance is:
So, as promised, the second implementation of part d) indeed has both lower area
and lower delay than the first implementation. However, the circuit in part d) is
~4 times larger than part c), although it does buy us substantially reduced (by
almost a factor of two) delay.
Solution:
If the final load is reduced to 2*Cin, the PE for part c) is now approximately:
PE=LEF=2*2=4
PE=LEF=16/9*2=3.56.
In both cases it turns out that the optimal number of stages is 1. However, since
the logic function cannot be implemented in one stage (remember, using static
CMOS every stage has to invert), the minimum number of stages in terms of
functionality is 2, which matches the circuit from part c). Specifically, the
EF/stage of the circuit in c) would be 2, while the EF/stage for the chain in d)
would be ~1.37. The conclusion is that for such a small load (2Cin), the
implementation from part c) would not only be faster, it would also be quite a bit
smaller.
In this problem you will use the switch model to find the delay of NAND2 and NOR2
gates and estimate what the LE of these gates are. Assume that in our technology the gate
capacitance CG = 2fF/m and that the junction capacitance CD = 1.6fF/m. Also assume
that Rp=20 k/ and Rn=10 k/.
a) Using the switch model, calculate delay of a NAND2 driving 4 identical copies of
itself when the bottom input (i.e., the transistor that is closest to the power supply)
switches. You should calculate the delay both for the rising and the falling edge.
Solution:
For the NAND2 gate shown on the figure above, in order to calculate tpHL we
should assume that the B input switches while A =1. The switch model for this
situation is presented on the figure on the next page.
Using Elmore delay, the time constant for this circuit is:
where assuming that each transistor is sized to be 2m wide (note that any choice
of size will result in the same final delay), then the capacitors are:
C par = 2C d + 2C d + 2C d = 9.6 fF
CL = 4 Cin = 4 4Cg = 32 fF
L 0.09m
RN = Rn = 10k = 450,
W 2 m
LH = RP (C L + C par + Cint )
L 0.09m
where RP = Rp = 20k = 900, and therefore:
W 2m
b) Repeat part a) for a NOR2 gate when the input that controls the top-most PMOS
(i.e., the one closest to Vdd) switches.
Solution:
Similar to what we had in part a) of this problem, for the NOR2 gate shown on the
figure above, we should assume that B transitions while A=0. The switch model
for this situation is presented on the figure below.
L 0.09m
where RP = Rp = 20k = 450, and the capacitors are:
W 4 m
C par = 4C d + 1C d + 1C d = 9.6 fF
CL = 4 Cin = 4 5Cg = 40 fF
td ,LH = ln2 LH = ln2 [450 20.8 fF + 900 (9.6 fF + 40 fF)] = 37.4 ps.
Now to calculate tpHL,for which the switch model is shown below:
L 0.09m
where RN = Rn = 10k = 900, and therefore:
W 1m
c) Using HSPICE and the graphical method described in class, extract the LEs of the
NAND2 and NOR2 gates. Assume that for NAND2 its always the bottom input
(one closest to ground) and for NOR2 its always the topmost input (one that is
closest to Vdd) that are switching. How do these extracted values for LE compare
with predictions from the simple switch model?
Note that to measure the delay and extract LE, you should use the circuit from
HW#2 and HW#3 with the chain of 4 NAND2 (4 NOR2) gates each having the
same fanout. The example for a chain of NAND2 gates each having a fanout of 4
is shown on figure below (A is the bottom input).
Solution:
To make the chain of NOR2 gates we use the same idea as the chain of NAND2
gates we want to propagate the input signal all the way to the output, thus we
connect all other inputs to ground:
To find the LE for NAND2 and NOR2, we size the gates in the chain such that
every one will have the same fanout. Then we know that using the formula for the
delay of the single gate:
and measuring delay for different values of fanout f, we can find the LE of the
gate. To simulate this in HSPICE, we sweep the value of f and get plots of delay
vs. fanout for NAND2 and NOR2 gates (as well as for inverters). The HSPICE
deck is given at the end of the solution to this problem.
From this diagram we can first calculate t inv parameter by reading off the delays
for an inverter with fanout of 0 and 5:
t d ,5 t d ,0
t inv = = 5.2 ps .
5
Now we can calculate the LENAND2 and LENOR2 by doing the same thing for the
NAND2 and NOR2 gates:
t d ,5 t d ,0
LE NAND 2 = = 1.15
5t inv
t d ,5 t d ,0
LE NOR 2 = = 1.46
5t inv
These values are smaller from what was expected based on our calculation using
the switch model (LENAND2 = 4/3 and LENOR2 = 5/3). As well see shortly, the
main cause of this is that in modern technologies a stack of two transistors doesnt
actually have twice the resistance of a single transistor.
.PARAM vddval=1.2
.PARAM fanout=1
* Voltage Sources
V1 vdd 0 'vddval'
V2 vin1 0 PWL 0 0V 10p 0V 11p 'vddval' 1011p 'vddval' 1012p 0V
* NAND2 chain
Xnand2_1 vdd 0 vin1 vdd out1 nand2 M=1
Xnand2_2 vdd 0 out1 vdd out2 nand2 M=fanout
Xnand2_3 vdd 0 out2 vdd out3 nand2 M='fanout*fanout'
Xnand2_4 vdd 0 out3 vdd out4 nand2 M='fanout*fanout*fanout'
* NOR2 chain
Xnor2_1 vdd 0 vin1 0 out11 nor2 M=1
Xnor2_2 vdd 0 out11 0 out21 nor2 M=fanout
Xnor2_3 vdd 0 out21 0 out31 nor2 M='fanout*fanout'
Xnor2_4 vdd 0 out31 0 out41 nor2 M='fanout*fanout*fanout'
* INV chain
Xinv_1 vdd 0 vin1 out12 inv M=1
Xinv_2 vdd 0 out12 out22 inv M=fanout
Xinv_3 vdd 0 out22 out32 inv M='fanout*fanout'
Xinv_4 vdd 0 out32 out42 inv M='fanout*fanout*fanout'
* options
.option post=2 nomod
.op
* analysis
.TRAN 0.1PS 1.5NS sweep fanout 0.001 5.001 1
*nand2 chain
.MEASURE TRAN tpHL TRIG V(out1) VAL='vddval/2' RISE=1 TARG V(out2) VAL='vddval/2' FALL=1
.MEASURE TRAN tpLH TRIG V(out1) VAL='vddval/2' FALL=1 TARG V(out2) VAL='vddval/2' RISE=1
.MEASURE TRAN tpavg PARAM='(tpHL+tpLH)/2'
*nor2 chain
.MEASURE TRAN tpHL1 TRIG V(out11) VAL='vddval/2' RISE=1 TARG V(out21) VAL='vddval/2' FALL=1
.MEASURE TRAN tpLH1 TRIG V(out11) VAL='vddval/2' FALL=1 TARG V(out21) VAL='vddval/2' RISE=1
.MEASURE TRAN tpavg1 PARAM='(tpHL1+tpLH1)/2'
*inverter chain
.MEASURE TRAN tpHL2 TRIG V(out12) VAL='vddval/2' RISE=1 TARG V(out22) VAL='vddval/2' FALL=1
.MEASURE TRAN tpLH2 TRIG V(out12) VAL='vddval/2' FALL=1 TARG V(out22) VAL='vddval/2' RISE=1
.MEASURE TRAN tpavg2 PARAM='(tpHL2+tpLH2)/2'
.END