VLSI Design - An Introduction

Introduction to VLSI Design
Roy Paily
roypaily@iitg.ernet.in
Department of Electronics and Electrical Engineering
Indian Institute of Technology Guwahati, Guwahati- 781039, Assam, India
Plan of Talk
Introduction to digital integrated circuits
CMOS devices and scaling. CMOS logic gates and
their layout. Propagation delay, noise margins,
and power dissipation. Combinational (e.g.,
arithmetic) design
Course goals
Ability to design and implement CMOS digital
circuits and optimize them with respect to
different constraints: size (cost), speed and power
dissipation
Transistor Revolution
• Transistor –Bardeen (Bell Labs) in 1947
• Bipolar transistor – Schockley in 1949
• First bipolar digital logic gate – Harris
in 1956
• First monolithic IC – Jack Kilby in 1959
• First commercial IC logic gates –
Fairchild 1960
• TTL – 1962 into the 1990’s
• ECL – 1974 into the 1980’s
MOSFET Technology
• MOSFET transistor - Lilienfeld (Canada) in 1925
and Heil (England) in 1935
• CMOS – 1960’s, but plagued with manufacturing
problems
• PMOS in 1960’s (calculators)
• NMOS in 1970’s (4004, 8080) – for speed
• CMOS in 1980’s – preferred MOSFET technology
because of power benefits
• BiCMOS, Gallium-Arsenide, Silicon-Germanium
• SOI, Copper-Low K, …
Transistors on lead microprocessors
double every 2 years
1000
100
2X growth in 1.96 years!
10
P6
Transistors (MT)
Pentium® proc
1 486
386
0.1 286
8085 8086
0.01 8080
8008
4004
0.001
1970 1980 1990 2000 2010
Year
Evolution in DRAM Chip Capacity
100000000
64,000,000
16,000,000
10000000
4,000,000
Kbit capacity/chip
1000000 1,000,000
256,000
100000
64,000
16,000
10000
4,000
1000 1,000
256
100
64
10
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 2010
Year
Die size grows by 14% to satisfy
Moore’s Law
100
P6
Die size (mm)
486 Pentium ® proc

10 386
286
8080 8086
8085 ~7% growth per year
8008
4004 ~2X growth in 10 years
1
1970 1980 1990 2000 2010
Year
Lead microprocessors frequency
doubles every 2 years
10000
1000 2X every 2 years
P6
Frequency (Mhz)
100
Pentium ® proc
486
10 8085 386
8086 286
1 8080
8008
4004
0.1
1970 1980 1990 2000 2010
Year
Courtesy, Intel
Lead Microprocessors power
continues to increase
100
P6
Pentium ® proc
Power (Watts)
10
486
8086 286
386
8085
1 8080
8008
4004
0.1
1971 1974 1978 1985 1992 2000
Year
Power delivery and dissipation will be prohibitive

Courtesy, Intel
Power Density
10000
Rocket
Nozzle
Power Density (W/cm2)
1000
Nuclear
100 Reactor
8086
10 4004 Hot Plate P6
8008 8085 386 Pentium® proc
286 486
8080
1
1970 1980 1990 2000 2010
Year
Courtesy, Intel
Technology Directions: SIA Roadmap
Year 1999 2002 2005 2008 2011 2014

Feature size (nm) 180 130 100 70 50 35
Mtrans/cm2 7 14-26 47 115 284 701
Chip size (mm2) 170 170-214 235 269 308 354
Signal pins/chip 768 1024 1024 1280 1408 1472
Clock rate (MHz) 600 800 1100 1400 1800 2200
Wiring levels 6-7 7-8 8-9 9 9-10 10
Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.6
High-perf power (W) 90 130 160 170 174 183
Battery power (W) 1.4 2.0 2.4 2.0 2.2 2.4
http://www.itrs.net/ntrs/publntrs.nsf
Why Scaling?
• Technology shrinks by ~0.7 per generation

• With every generation can integrate 2x more
functions on a chip; chip cost does not increase
significantly
• Cost of a function decreases by 2x
• But …
– How to design chips with more and more functions?
– Design engineering population does not double every
two years…
Review: Design Abstraction Levels
SYSTEM
MODULE
+
GATE
CIRCUIT
Vin Vout
DEVICE
G
S D
n+ n+
Fundamental Design Metrics
• Functionality
• Cost
– NRE (fixed) costs - design effort
– RE (variable) costs - cost of parts, assembly, test
• Reliability, robustness
– Noise margins
– Noise immunity
• Performance
– Speed (delay)
– Power consumption; energy
• Time-to-market
Static Gate Behavior
 Steady-state parameters of a gate – static behavior – tell
how robust a circuit is with respect to both variations in
the manufacturing process and to noise disturbances.
 Digital circuits perform operations on Boolean variables
 A logical variable is associated with a nominal voltage
level for each logic state
1  VOH and 0  VOL
VOH = ! (VOL)
V(x) V(y)
VOL = ! (VOH)
 Difference between VOH and VOL is the logic or signal swing Vsw
DC Operation
Voltage Transfer Characteristics (VTC)
 Plot of output voltage as a function of the input voltage
V(y) V(x) V(y)
VOH = f (VIL)
V(y)=V(x)
Switching Threshold
VM
VOL = f (VIH)
VIL VIH V(x)

Mapping Logic Levels to the Voltage Domain
 The regions of acceptable high and low voltages are delimited

by VIH and VIL that represent the points on the VTC curve
where the gain = -1
V(y)
"1" VOH Slope = -1
VOH
VIH
Undefined
Region
Slope = -1
VIL
VOL
"0" VOL
VIL VIH V(x)
Noise Margins
 For robust circuits, want the “0” and “1” intervals to be a s
large as possible
VDD VDD
VOH "1"
NMH = VOH - VIH
VIH
Noise Margin High Undefined
Region
Noise Margin Low VIL
NML = VIL - VOL
VOL
"0"
Gnd Gnd
Gate Output Gate Input
 Large noise margins are desirable, but not sufficient …

The Regenerative Property
 A gate with regenerative property ensure that a disturbed
signal converges back to a nominal voltage level
v0 v1 v2 v3 v4 v5 v6
v2
5
v0
V (volts)
1 v1
-1
0 2 4 6 8 10
t (nsec)
Conditions for Regeneration
v0 v1 v2 v3 v4 v5 v6
v1 = f(v0)  v1 = finv(v2)
v3 f(v) finv(v)
v1 v1
v3
finv(v) f(v)
v2 v0 v0 v2
Regenerative Gate Nonregenerative Gate
 To be regenerative, the VTC must have a transient region with

a gain greater than 1 (in absolute value) bordered by two valid
zones where the gain is smaller than 1. Such a gate has two
stable operating points.
Noise Immunity
 Noise margin expresses the ability of a circuit to overpower a
noise source
 noise sources: supply noise, cross talk, interference, offset
 Absolute noise margin values are deceptive

 a floating node is more easily disturbed than a node driven by a low
impedance (in terms of voltage)
 Noise immunity expresses the ability of the system to

process and transmit information correctly in the
presence of noise
 For good noise immunity, the signal swing (i.e., the

difference between VOH and VOL) and the noise margin
have to be large enough to overpower the impact of fixed
sources of noise
Noise Immunity
 Noise margin expresses the ability of a circuit to overpower a
noise source
 noise sources: supply noise, cross talk, interference, offset
 Absolute noise margin values are deceptive

 a floating node is more easily disturbed than a node driven by a low
impedance (in terms of voltage)
 Noise immunity expresses the ability of the system to

process and transmit information correctly in the
presence of noise
 For good noise immunity, the signal swing (i.e., the

difference between VOH and VOL) and the noise margin
have to be large enough to overpower the impact of fixed
sources of noise
Fan-In and Fan-Out
 Fan-out – number of load gates

connected to the output of the
driving gate
 gates with large fan-out are slower
N
 Fan-in – the number of inputs to

the gate M
 gates with large fan-in are bigger and
slower
The Ideal Inverter
 The ideal gate should have
 infinite gain in the transition region
 a gate threshold located in the middle of the logic swing
 high and low noise margins equal to half the swing
 input and output impedances of infinity and zero, resp.
Vout
Ri = 
Ro = 0
Fanout = 
NMH = NML = VDD/2
Vin
The Ideal Inverter
 The ideal gate should have
 infinite gain in the transition region
 a gate threshold located in the middle of the logic swing
 high and low noise margins equal to half the swing
 input and output impedances of infinity and zero, resp.
Vout
Ri = 
Ro = 0
g=- Fanout = 
NMH = NML = VDD/2
Vin
Delay Definitions
Vin Vout
Vin
Propagation delay?
input
waveform
Vout
output
signal slopes?
waveform
t
Delay Definitions
Vin Vout
Vin
Propagation delay
input 50% tp = (tpHL + tpLH)/2
waveform
t
tpHL tpLH
Vout
90%
output
50% signal slopes
waveform
10%
t
tf tr
Modeling Propagation Delay
 Model circuit as first-order RC network
vout (t) = (1 – e–t/)V

R
vout
where  = RC
C
Time to reach 50% point is
vin
t = ln(2)  = 0.69 
Time to reach 90% point is

t = ln(9)  = 2.2 
 Matches the delay of an inverter gate

Power and Energy Dissipation
 Power consumption: how much energy is consumed
per operation and how much heat the circuit dissipates
 supply line sizing (determined by peak power)
Ppeak = Vddipeak
 battery lifetime (determined by average power dissipation)
p(t) = v(t)i(t) = Vddi(t) Pavg= 1/T  p(t) dt = Vdd/T  idd(t) dt
 packaging and cooling requirements
 Two important components: static and dynamic
E (joules) = CL Vdd2 P01 + tsc Vdd Ipeak P01 + Vdd Ileakage
f01 = P01 * fclock

P (watts) = CL Vdd2 f01 + tscVdd Ipeak f01 + Vdd Ileakage
Power and Energy Dissipation
 Propagation delay and the power consumption of a gate
are related
 Propagation delay is (mostly) determined by the speed at
which a given amount of energy can be stored on the
gate capacitors
 the faster the energy transfer (higher power dissipation) the
faster the gate
 For a given technology and gate topology, the product of the
power consumption and the propagation delay is a constant
 Power-delay product (PDP) – energy consumed by the gate per
switching event
 An ideal gate is one that is fast and consumes little energy, so

the ultimate quality metric is
 Energy-delay product (EDP) = power-delay 2
The MOS Transistor
Polysilicon Aluminum
Switch Model of NMOS Transistor
| VGS | Gate
Source Drain
(of carriers) (of carriers)
Open (off) (Gate = ‘0’) Closed (on) (Gate = ‘1’)
Ron
| VGS | < | VT | | VGS | > | VT |

Switch Model of PMOS Transistor
Gate
| VGS |
Source Drain
(of carriers) (of carriers)
Open (off) (Gate = ‘1’) Closed (on) (Gate = ‘0’)

Ron
| VGS | > | VDD – | VT | | | VGS | < | VDD – |VT| |

Voltage-Current Relation: Linear Mode
For long-channel devices (L > 0.25 micron)

 When VDS  VGS – VT
ID = k’n W/L [(VGS – VT)VDS – VDS2/2]
where
k’n = nCox = nox/tox = is the process
transconductance parameter (n is the carrier mobility
(m2/Vsec))
kn = k’n W/L is the gain factor of the device
For small VDS, there is a linear dependence between VDS
and ID, hence the name resistive or linear region
Transistor in Saturation Mode
Assuming VGS > VT
VGS VDS > VGS - VT

VDS
S G
D ID
n+ - V -V + n+
GS T
Pinch-off
The current remains constant (saturates).

Voltage-Current Relation: Saturation Mode
For long channel devices

 When VDS  VGS – VT
ID’ = k’n/2 W/L [(VGS – VT) 2]
since the voltage difference over the induced channel
(from the pinch-off point to the source) remains fixed at
VGS – VT
 However, the effective length of the conductive channel
is modulated by the applied VDS, so
ID = ID’ (1 + VDS)
where  is the channel-length modulation (varies with the
inverse of the channel length)
Current Determinates
• For a fixed VDS and VGS (> VT), IDS is a function of
– the distance between the source and drain – L
– the channel width – W
– the threshold voltage – VT
– the thickness of the SiO2 – tox
– the dielectric of the gate insulator (SiO2) – ox
– the carrier mobility
• for NMOS: n = 500 cm2/V-sec
• for PMOS: p = 180 cm2/V-sec
Long Channel I-V Plot (NMOS)
X 10-4
6
VGS = 2.5V
VDS = VGS - VT
5
4
VGS = 2.0V
3
Linear Saturation
2 VGS = 1.5V
1
VGS = 1.0V
0
cut-off 0 0.5 1 1.5 2 2.5
VDS (V)
NMOS transistor, 0.25um, Ld = 10um, W/L = 1.5, VDD = 2.5V, VT = 0.4V
Short Channel Effects
 Behavior of short channel device mainly due to
5
10
 Velocity saturation –
the velocity of the
carriers saturates due to
scattering (collisions
suffered by the carriers)
0
0 c= 1.5 3
(V/m)
 For an NMOS device with L of .25m, only a couple of volts

difference between D and S are needed to reach velocity saturation
Velocity Saturation Effects
10 For short channel devices

and large enough VGS – VT
 VDSAT < VGS – VT so the

device enters saturation
before VDS reaches VGS –
VT and operates more
0 often in saturation
 IDSAT
has a linear dependence wrt VGS so a reduced
amount of current is delivered for a given control voltage
Short Channel I-V Plot (PMOS)
 All polarities of all voltages and currents are reversed
-2 VDS (V) -1 0
0
VGS = -1.0V -0.2
VGS = -1.5V -0.4
-0.6
VGS = -2.0V
-0.8
VGS = -2.5V
-1 X 10-4
PMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = -0.4V

CMOS Inverter:
A First Look
VDD
Vin Vout
CL
CMOS Inverter:
Steady State Response
VOL = 0
VDD VDD
VOH = VDD
VM = f(Rn, Rp)
Rp
Vout = 1 Vout = 0
Rn
Vin = 0 Vin = V DD
CMOS Properties
• Full rail-to-rail swing  high noise margins
– Logic levels not dependent upon the relative device sizes
 transistors can be minimum size  ratioless
• Always a path to Vdd or GND in steady state  low
output impedance (output resistance in k range) 
large fan-out (albeit with degraded performance)
• Extremely high input resistance (gate of MOS
transistor is near perfect insulator)  nearly zero
steady-state input current
• No direct path steady-state between power and
ground  no static power dissipation
• Propagation delay function of load capacitance and
resistance of transistors
Short Channel I-V Plot (NMOS)
X 10-4
2.5
VGS = 2.5V
2
VGS = 2.0V
1.5
VGS = 1.5V
1
0.5 VGS = 1.0V
0
0 0.5 1 1.5 2 2.5
VDS (V)
NMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = 0.4V
Short Channel I-V Plot (PMOS)
 All polarities of all voltages and currents are reversed
-2 VDS (V) -1 0
0
VGS = -1.0V -0.2
VGS = -1.5V -0.4
-0.6
VGS = -2.0V
-0.8
VGS = -2.5V
-1 X 10-4
PMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = -0.4V

Transforming PMOS I-V Lines
 Want common coordinate set Vin, Vout, and IDn
IDn
IDSp = -IDSn
VGSn = Vin ; VGSp = Vin - VDD
VDSn = Vout ; VDSp = Vout - VDD
Vout
Vin = 0 Vin = 0
Vin = 1.5 Vin = 1.5
VGSp = -1
Mirror around x-axis Horiz. shift over VDD
VGSp = -2.5
Vin = VDD + VGSp Vout = VDD + VDSp
IDn = -IDp
CMOS Inverter Load Lines
PMOS X 10-4 NMOS
2.5
Vin = 0V Vin = 2.5V
2
Vin = 0.5V Vin = 2.0V

1.5
Vin = 1.0V 1 Vin = 1.5V

Vin = 1V
Vin = 1.5V
Vin = 2V Vin = 0.5V
0.5 Vin = 1.0V
Vin = 1.5V
Vin = 2.0V Vin = 0.5V
0
Vin = 2.5V 0 0.5 1 1.5 2 2.5 Vin = 0V
Vout (V)
0.25um, W/Ln = 1.5, W/Lp = 4.5, VDD = 2.5V, VTn = 0.4V, VTp = -0.4V
CMOS Inverter VTC
NMOS off
PMOS res
2.5 NMOS sat
PMOS res
2
1.5 NMOS sat

Vout (V)
PMOS sat
1
0.5 NMOS res

PMOS sat NMOS res
PMOS off
0
0 0.5 1 1.5 2 2.5
Vin (V)
Impact of Process Variation on VTC Curve
2.5
2 Good PMOS
Bad NMOS
1.5
Vout (V)
Nominal
1
Bad PMOS
Good NMOS
0.5
0
0 0.5 1 1.5 2 2.5
Vin (V)
Process variations (mostly) cause a shift in the switching threshold

CMOS Inverter:
Switch Model of Dynamic Behavior
VDD VDD
Rp
Vout Vout
CL CL
Rn
Vin = 0 Vin = V DD
 Gate response time is determined by the time to charge CL through Rp

(discharge CL through Rn)
Inverter Propagation Delay
• Propagation delay is proportional to the time-constant of
the network formed by the pull-down resistor and the
load capacitance
VDD tpHL = f(Rn, CL)
tpHL = ln(2) Reqn CL = 0.69 Reqn CL

Vout = 0 tpLH = ln(2) Reqp CL = 0.69 Reqp CL
Rn CL
tp = (tpHL + tpLH)/2 = 0.69 CL(Reqn +
Reqp)/2
Vin = V DD
• To equalize rise and fall times make the on-resistance of
the NMOS and PMOS approximately equal.
Inverter Transient Response
VDD=2.5V
3 0.25m
Vin
2.5 W/Ln = 1.5
W/Lp = 4.5
2 Reqn= 13 k ( 1.5)
Reqp= 31 k ( 4.5)
1.5
tf tr
1 tpHL tpLH tpHL = 36 psec
0.5 tpLH = 29 psec
0 so
-0.5 tp = 32.5 psec

0 0.5 1 1.5 2 2.5
x 10-10
t (sec)
From simulation: tpHL = 39.9 psec and tpLH = 31.7 psec

Inverter Propagation Delay, Revisited
• To see how a designer can optimize the delay of a gate
have to expand the Req in the delay equation
5.5
5
4.5
4
3.5
3
tpHL = 0.69 Reqn CL 2.5

2
1.5
= 0.69 (3/4 (CL VDD)/IDSATn ) 1
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
VDD (V)
 0.52 CL / (W/Ln k’n VDSATn )
Design for Performance
• Reduce CL
– internal diffusion capacitance of the gate itself
• keep the drain diffusion as small as possible
– interconnect capacitance
– fanout
• Increase W/L ratio of the transistor
– the most powerful and effective performance
optimization tool in the hands of the designer
– watch out for self-loading! – when the intrinsic
capacitance dominates the extrinsic load
• Increase VDD
– can trade-off energy for performance
– increasing VDD above a certain level yields only very
minimal improvements
– reliability concerns enforce a firm upper bound on VDD
NMOS/PMOS Ratio
 So far have sized the PMOS and NMOS so that the Req’s match
(ratio of 3 to 3.5)
 symmetrical VTC
 equal high-to-low and low-to-high propagation delays
• If speed is the only concern, reduce the width of

the PMOS device!
– widening the PMOS degrades the tpHL due to larger
parasitic capacitance
 = (W/Lp)/(W/Ln)
r = Reqp/Reqn (resistance ratio of identically-sized PMOS and NMOS)
opt = r when wiring capacitance is negligible
PMOS/NMOS Ratio Effects
x 10-11
5
tpLH tpHL
4.5  of 2.4 (= 31 k/13 k)
gives symmetrical response
4 tp
 of 1.6 to 1.9 gives optimal
performance
3.5
3
1 2 3 4 5
 = (W/Lp)/(W/Ln)
Device Sizing for Performance
• Divide capacitive load, CL, into
– Cint : intrinsic - diffusion and Miller effect
– Cext : extrinsic - wiring and fanout
tp = 0.69 Req Cint (1 + Cext/Cint) = tp0 (1 + Cext/Cint)
– where tp0 = 0.69 Req Cint is the intrinsic (unloaded) delay of
the gate
 Widening both PMOS and NMOS by a factor S reduces Req by

an identical factor (Req = Rref/S), but raises the intrinsic
capacitance by the same factor (Cint = SCiref)
tp = 0.69 Rref Ciref (1 + Cext/(SCiref)) = tp0(1 + Cext/(SCiref))
 tp0 is independent of the sizing of the gate; with no load the drive of
the gate is totally offset by the increased capacitance
 any S sufficiently larger than (Cext/Cint) yields the best performance
gains with least area impact
Sizing Impacts on Delay
x 10-11 The majority of the
3.8
for a fixed load improvement is already
3.6
obtained for S = 5. Sizing
3.4
factors larger than 10 barely
3.2
yield any extra gain (and
3
cost significantly more area).
2.8
2.6
2.4
2.2
2
1 3 5 7 9 11 13 15
S self-loading effect
(intrinsic capacitance
dominates)
CMOS Circuit Styles
• Static complementary CMOS - except during switching, output
connected to either VDD or GND via a low-resistance path
– high noise margins
• full rail to rail swing
• VOH and VOL are at VDD and GND, respectively
– low output impedance, high input impedance
– no steady state path between VDD and GND (no static
power consumption)
– delay a function of load capacitance and transistor
resistance
– comparable rise and fall times (under the appropriate
transistor sizing conditions)
• Dynamic CMOS - relies on temporary storage of signal values
on the capacitance of high-impedance circuit nodes
– simpler, faster gates
– increased sensitivity to noise
Static Complementary CMOS
 Pull-up network (PUN) and pull-down network (PDN)
VDD
PMOS transistors only
In1
pull-up: make a connection from VDD to F
In2 PUN
when F(In1,In2,…InN) = 1
InN
F(In1,In2,…InN)
In1
pull-down: make a connection from F to GND
In2 PDN
when F(In1,In2,…InN) = 0
InN
NMOS transistors only
PUN and PDN are dual logic networks

Threshold Drops
VDD VDD
PUN
S D
VDD
D 0  VDD S 0  VDD - VTn

VGS
CL CL
PDN VDD  0 VDD  |VTp|

VGS
D CL S CL
VDD
S D
Construction of PDN
• NMOS devices in series implement a NAND function
A•B
A
• NMOS devices in parallel implement a NOR function
A+B
A B
Dual PUN and PDN
• PUN and PDN are dual networks
– a parallel connection of transistors in the PUN
corresponds to a series connection of the PDN
• Complementary gate is naturally inverting
(NAND, NOR, AOI, OAI)
• Number of transistors for an N-input logic
gate is 2N
CMOS NAND
A B F
0 0 1
A B
0 1 1
1 0 1
A•B
A 1 1 0
A
B
CMOS NOR
A B F
B
0 0 1
A 0 1 0
A+B 1 0 0
1 1 0
A B
A
B
Complex CMOS Gate
B
A
C
D
OUT = !(D + A • (B + C))
A
D
B C
Standard Cell Layout Methodology
Routing
channel
VDD
signals
GND
What logic function is this?

VTC is Data-Dependent
0.5/0.25 NMOS
3 0.75 /0.25 PMOS
A M3 B M4
2 A,B: 0 -> 1
B=1, A:0 -> 1
F= A • B A=1, B:0->1
D weaker
A M2 1 PUN
S
VGS2 = VA –VDS1 D
Cint
B M1
VGS1 = VB S 0
0 1 2
 The threshold voltage of M2 is higher than M1 due to the body

effect ()
VTn1 = VTn0
VTn2 = VTn0 + ((|2F| + Vint) - |2F|)
since VSB of M2 is not zero (when VB = 0) due to the presence of Cint
Review: CMOS Inverter: Dynamic
VDD
tpHL = f(Rn, CL)
Vout
tpHL = 0.69 Reqn CL
CL tpHL = 0.69 (3/4 (CL VDD)/ IDSATn )

Rn
= 0.52 CL / (W/Ln k’n VDSATn )

Vin = V DD
Designing Inverters for Performance
• Reduce CL
– internal diffusion capacitance of the gate itself
– interconnect capacitance
– fanout
• Increase W/L ratio of the transistor
– the most powerful and effective performance
optimization tool in the hands of the designer
– watch out for self-loading!
• Increase VDD
– only minimal improvement in performance at the cost
of increased energy dissipation
• Slope engineering - keeping signal rise and fall times
smaller than or equal to the gate propagation delays and
of approximately equal values
– good for performance
– good for power consumption
Switch Delay Model
A Req
A
Rp
Rp Rp
B
A B Rp
A Rp Cint
Rn CL A
A Rn CL
A Rn Rn CL
Rn
Cint
A B
B INVERTER
NOR
NAND
Input Pattern Effects on Delay
• Delay is dependent on the pattern
of inputs
Rp Rp • Low to high transition
A B – both inputs go low
• delay is 0.69 Rp/2 CL since two p-resistors
are on in parallel
Rn CL
– one input goes low
A • delay is 0.69 Rp CL
Rn • High to low transition
Cint
B – both inputs go high
• delay is 0.69 2Rn CL
• Adding transistors in series (without
sizing) slows down the circuit
Delay Dependence on Input Patterns
2-input NAND with
NMOS = 0.5m/0.25 m
PMOS = 0.75m/0.25 m
3 CL = 10 fF
2.5 A=B=10
2 Input Data Delay

A=1 0, B=1 Pattern (psec)
1.5
A=B=01 69
Voltage, V
1 A=1, B=10
A=1, B=01 62
0.5 A= 01, B=1 50
0 A=B=10 35
0 100 200 300 400
-0.5 A=1, B=10 76
time, psec
A= 10, B=1 57
Transistor Sizing
Rp Rp Rp
1 A B 1 2 B
Rn Rp Cint
CL 2
2 A
B
Rn Rn Rn CL
2 Cint
1
A A B 1
Transistor Sizing a Complex CMOS Gate
B 4 12
A 2 6
C 4 12
D 2 6
OUT = !(D + A • (B + C))
A 2
D 1
B 2C 2
Fan-In Considerations
A B C D
A CL
B C3 Distributed RC model
C C2 (Elmore delay)
D C1 tpHL = 0.69 Reqn(C1+2C2+3C3+4CL)
Propagation delay deteriorates

rapidly as a function of fan-in –
quadratically in the worst case.
tp as a Function of Fan-In
1250
quadratic
1000 function of
fan-in
750
tp (psec)
tpHL tp
500
250 tpLH
linear
0 function of
2 4 6 8 10 12 14 16 fan-in
fan-in
 Gates with a fan-in greater than 4 should be avoided.

Fast Complex Gates: Design Technique
• Transistor sizing
– as long as fan-out capacitance dominates
• Progressive sizing
Distributed RC line
InN MN CL M1 > M2 > M3 > … > MN
(the fet closest to the output

In3 M3 C3 should be the smallest)
In2 M2 C2
Can reduce delay by more than
In1 M1 C1 20%; decreasing gains as
technology shrinks
• Input re-ordering
– when not all inputs arrive at the same time
critical path critical path
charged 01
In3 1
M3 CL In1 M3 CLcharged
In2 1 M2 In2 1 M2 C2 discharged

C2 charged
In1 In3 1 M1 C1 discharged
M1 C1 charged
01
delay determined by time to delay determined by time to

discharge CL, C1 and C2 discharge CL
Sizing and Ordering Effects
A 3 B 3 C 3 D 3
A 44 CL = 100 fF
B 45 C3
C 46 Progressive sizing in pull-down
C2
chain gives up to a 23%
D 47 C1 improvement.
Input ordering saves 5%

critical path A – 23%
critical path D – 17%
• Alternative logic structures
F = ABCDEFGH
• Isolating fan-in from fan-out using buffer insertion
CL CL
 Real lesson is that optimizing the propagation delay of a gate

in isolation is misguided.
Lowering Dynamic Power
Capacitance: Supply Voltage:
Function of fan-out, Has been dropping
wire length, transistor with successive
sizes generations
Pdyn = CL VDD2 P01 f
Activity factor: Clock frequency:

How often, on average, Increasing…
do wires switch?
Short Circuit Power Consumption
Vin Isc Vout
CL
Finite slope of the input signal causes a direct current

path between VDD and GND for a short period of time
during switching when both the NMOS and PMOS
transistors are conducting.
Leakage (Static) Power Consumption
VDD Ileakage
Vout
Drain junction
leakage
Gate leakage Sub-threshold current
Sub-threshold current is the dominant factor.
All increase exponentially with temperature!

Leakage as a Function of VT
 Continued scaling of supply voltage and the subsequent scaling
of threshold voltage will make subthreshold conduction a
dominate component of power dissipation.
10-2
 An 90mV/decade VT
roll-off - so each
255mV increase in VT
gives 3 orders of
ID (A)
10-7 magnitude reduction

in leakage (but
adversely affects
VT=0.4V
VT=0.1V
performance)
10-12
0 0.2 0.4 0.6 0.8 1
VGS (V)
TSMC Processes Leakage and VT
CL018 G CL018 LP CL018 CL018 HS CL015 HS CL013 HS
ULP
Vdd 1.8 V 1.8 V 1.8 V 2V 1.5 V 1.2 V
Tox (effective) 42 Å 42 Å 42 Å 42 Å 29 Å 24 Å
Lgate 0.16 m 0.16 m 0.18 m 0.13 m 0.11 m 0.08 m
IDSat (n/p) 600/260 500/180 320/130 780/360 860/370 920/400
(A/m)
Ioff (leakage) 20 1.60 0.15 300 1,800 13,000
(A/m)
VTn 0.42 V 0.63 V 0.73 V 0.40 V 0.29 V 0.25 V
FET Perf. (GHz) 30 22 14 43 52 80
From MPR, 2000

Exponential Increase in Leakage
Currents
10000
1000
0.25
Ileakage(nA/m)
0.18
100
0.13
0.1
10
1
30 40 50 60 70 80 90 100 110
Temp(C)
From De,1999
Review: Energy & Power Equations
E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage
f01 = P01 * fclock
P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage
Dynamic power Short-circuit Leakage power

(~90% today and power (~2% today and
decreasing (~8% today and increasing)
relatively) decreasing
absolutely)
Power and Energy Design Space
Constant Variable
Throughput/Latency Throughput/Latency
Energy Design Time Non-active Modules Run Time
Logic Design
DFS, DVS
Reduced Vdd
Active Clock Gating (Dynamic
Sizing Freq, Voltage
Scaling)
Multi-Vdd
Sleep Transistors
Leakage + Multi-VT Multi-Vdd + Variable VT
Variable VT
Dynamic Power Consumption is Data Dependent
 Switching activity, P01, has two components
 A static component – function of the logic topology
 A dynamic component – function of the timing behavior (glitching)
Static transition probability

P01 = Pout=0 x Pout=1
2-input NOR Gate
= P0 x (1-P0)
A B Out
0 0 1
With input signal probabilities
0 1 0
PA=1 = 1/2
1 0 0 PB=1 = 1/2
1 1 0
NOR static transition probability
= 3/4 x 1/4 = 3/16
NOR Gate Transition Probabilities
 Switching activity is a strong function of the input signal
statistics
 PA and PB are the probabilities that inputs A and B are one
0
A B
CL PA
1 0 1
PB
P01 = P0 x P1 = (1-(1-PA)(1-PB)) (1-PA)(1-PB)

Transition Probabilities for Some Basic Gates
P01 = Pout=0 x Pout=1
NOR (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB)
OR (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB))
NAND PAPB x (1 - PAPB)
AND (1 - PAPB) x PAPB
XOR (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB)
X
0.5 A
Z
0.5 B
For X: P01 = P0 x P1 = (1-PA) PA

= 0.5 x 0.5 = 0.25
For Z: P01 = P0 x P1 = (1-PXPB) PXPB
= (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16
Logic Restructuring
 Logic restructuring: changing the topology of a logic
network to reduce transitions
AND: P01 = P0 x P1 = (1 - PAPB) x PAPB
3/16
0.5 A Y
0.5 (1-0.25)*0.25 = 3/16
A W 7/64 0.5 B 15/256
B X F
15/256 0.5
0.5 C C
0.5 D F
0.5 0.5 D Z
3/16
Chain implementation has a lower overall switching activity

than the tree implementation for random inputs
Ignores glitching effects
Input Ordering
(1-0.5x0.2)x(0.5x0.2)=0.09 (1-0.2x0.1)x(0.2x0.1)=0.0196
0.5 0.2
A B X
X
B C
F 0.1 A F
0.2 C
0.1 0.5
Beneficial to postpone the introduction of signals with a

high transition rate (signals with signal probability close
to 0.5)
NMOS Transistors in Series/Parallel
• Primary inputs drive both gate and
source/drain terminals
• NMOS switch closes when the gate input is
high A B
X = Y if A and B
X Y
A
B X = Y if A or B
X Y
• Remember - NMOS transistors pass a strong 0

but a weak 1
PMOS Transistors in Series/Parallel
• Primary inputs drive both gate and
source/drain terminals
• PMOS switch closes when the gate input is low
A B
X = Y if A and B = A + B
X Y
A
B X = Y if A or B = A  B
X Y
• Remember - PMOS transistors pass a strong 1

but a weak 0
Pass Transistor (PT) Logic
B
B
A A
F =AB B
B
0 F =AB
0
 Gate is static – a low-impedance path exists to both supply

rails under all circumstances
N transistors instead of 2N
 No static power consumption
 Ratioless
 Bidirectional (versus undirectional)

Differential PT Logic (CPL)
A
A PT Network
B F
F
B
A
A Inverse PT F
B Network F
B
B B B B B B
A A A
F=AB B F=A+B A F=AB

B
A A A
F=AB F=A+B F=AB
B B A
AND/NAND OR/NOR XOR/XNOR
CPL Properties
• Differential so complementary data inputs and outputs
are always available (so don’t need extra inverters)
• Still static, since the output defining nodes are always
tied to VDD or GND through a low resistance path
• Design is modular; all gates use the same topology,
only the inputs are permuted.
• Simple XOR makes it attractive for structures like
adders
• Fast (assuming number of transistors in series is small)
• Additional routing overhead for complementary signals
• Still have static power dissipation problems
NMOS Only PT Driving an Inverter
In = VDD
Vx = VDD- M2
VGS
A = VDD VTn
D S
B M1
• Vx does not pull up to VDD, but VDD – VTn
 Threshold voltage drop causes static power consumption

(M2 may be weakly conducting forming a path from VDD to
GND)
 Notice VTn increases of pass transistor due to body effect
(VSB)
Voltage Swing of PT Driving an Inverter
3
In
In = 0  VDD
1.5/0.25 2
x = 1.8V
S
x
Voltage, V
D
VDD Out
0.5/0.25
1
B 0.5/0.25
Out
0
0 0.5 1 1.5 2
Time, ns
• Body effect – large VSB at x - when pulling high (B is

tied to GND and S charged up close to VDD)
• So the voltage drop is even worse
Vx = VDD - (VTn0 + ((|2f| + Vx) - |2f|))
Cascaded NMOS Only PTs
B = VDD B = VDD C = VDD
G
M1 x M2 y Out
M1 A = VDD
A = VDD x = VDD - VTn1
S
G
M2 y Out
C = VDD
S
Swing on y = VDD - VTn1 - VTn2 Swing on y = VDD - VTn1
 Pass transistor gates should never be cascaded as on the

left
 Logic on the right suffers from static power dissipation and
reduced noise margins
Solution 1: Level Restorer
Level Restorer
on
Mr
B off
A=1 M2 Out=0
Mn
x= 0
A=0 Out =1
1
M1
 Full swing on x (due to Level Restorer) so no static power

consumption by inverter
 No static backward current path through Level Restorer and PT
since Restorer is only active when A is high
• For correct operation Mr must be sized correctly (ratioed)
Solution 2: Multiple VT Transistors
• Technology solution: Use (near) zero VT devices for the
NMOS PTs to eliminate most of the threshold drop (body
effect still in force preventing full swing to VDD)
low VT transistors
In2 = 0V A = 2.5V
on
Out
off but
leaking
In1 = 2.5V B = 0V
sneak path
 Impacts static power consumption due to subthreshold

currents flowing through the PTs (even if VGS is below VT)
Solution 3: Transmission Gates (TGs)
 Most widely used
C C
solution
A B
A B C
C = GND C = GND
A = VDD B A = GND B
C = VDD C = VDD
• Full swing bidirectional switch controlled by the gate signal

C, A = B if C = 1
TG Multiplexer
S S F
S
VDD
In2
S F
In1
F = !(In1  S + In2  S) GND
In1 S S In2
Differential TG Logic (DPL)
B A B A B A B A
A A
F=AB B F=AB
GND
B A
B
GND
VDD A
A F=AB B F=AB
VDD A
B B
AND/NAND XOR/XNOR
The 1-bit Binary Adder
Cin A B Cin Cout S carry status
0 0 0 0 0 kill
A 0 0 1 0 1 kill
1-bit Full
Adder S 0 1 0 0 1 propagate
B (FA) 0 1 1 1 0 propagate
1 0 0 0 1 propagate
1 0 1 1 0 propagate
Cout
1 1 0 1 0 generate
1 1 1 1 1 generate
G = A&B
P=AB S = A  B  Cin = P  Cin
K = !A & !B Cout = A&B | A&Cin | B&Cin (majority function)
= G | P&Cin
 How can we use it to build a 64-bit adder?

 How can we modify it easily to build an adder/subtractor?
 How can we make it better (faster, lower power, smaller)?
Static CMOS Full Adder Circuit
B
A B B A B Cin
A
A Cin
!Cout !Sum
Cin
A Cin
A
A B B A B Cin
B
Static CMOS Full Adder Circuit
!Cout = !Cin & (!A | !B) | (!A & !B) !Sum = Cout & (!A | !B | !Cin) | (!A & !B & !Cin)
B
A B B A B Cin
A
A Cin
!Cout !Sum
Cin
A Cin
A
A B B A B Cin
B
Cout = Cin & (A | B) | (A & B) Sum = !Cout & (A | B | Cin) | (A & B & Cin)
CPL Full Adder
B B Cin Cin
A !Sum
A Sum
B B Cin Cin
A !Cout
B Cin
A Cout
B Cin
CPL Full Adder
B B Cin Cin
A !Sum
A Sum
B B Cin Cin
A !Cout
B Cin
A Cout
B Cin
TG Full Adder
Cin
A Sum
Cout
Mirror Adder
24+4 transistors
B 6
A 8 B 8 B 8 A 4 B 4 Cin 4
A 6
0-propagate kill
8 A 8 4 Cin 6
!Cout !S
Cin
4 A 4 2 Cin 3
1-propagate generate
A 3
A 4 B 4 B 4 A 2 B 2 Cin 2
B 3
Cout = A&B | B&Cin | A&Cin SUM = A&B&Cin | COUT&(A | B | Cin)
Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal
fan-out for each is also 2. Since !Cout drives 2 internal and 2 inverter
transistor gates (to form Cin for the nms bit adder) should oversize the carry
circuit. PMOS/NMOS ratio of 2.
Mirror Adder Features
• The NMOS and PMOS chains are completely
symmetrical with a maximum of two series transistors
in the carry circuitry, guaranteeing identical rise and fall
transitions if the NMOS and PMOS devices are properly
sized.
• When laying out the cell, the most critical issue is the
minimization of the capacitances at node !Cout (four
diffusion capacitances, two internal gate capacitances,
and two inverter gate capacitances). Shared diffusions
can reduce the stack node capacitances.
• The transistors connected to Cin are placed closest to
the output.
• Only the transistors in the carry stage have to be
optimized for optimal speed. All transistors in the sum
stage can be minimal size.
A 64-bit Adder/Subtractor
add/subt C0=Cin
 Ripple Carry Adder (RCA) A0 1-bit
built out of 64 FAs FA S0
B0 C1
 Subtraction – complement
all subtrahend bits (xor A1 1-bit
FA S1
gates) and set the low order B1
C2
carry-in
A2 1-bit
 RCA FA S2
B2 C3
 advantage: simple logic, so
...
small (low cost)
C63
 disadvantage: slow (O(N) for
N bits) and lots of glitching A63 1-bit
(so lots of energy FA S63
consumption) B63
C64=Cout
Ripple Carry Adder (RCA)
A3 B 3 A2 B2 A1 B1 A0 B0
Cout=C4 FA FA FA FA C0=Cin
S3 S2 S1 S0
Tadder  TFA(A,BCout) + (N-2)TFA(CinCout) + TFA(CinS)
T = O(N) worst case delay

Real Goal: Make the fastest possible carry path
Inversion Property
 Inverting all inputs to a FA results in inverted values for all
outputs
A B A B
Cout FA Cin  Cout FA Cin
S S
!S (A, B, Cin) = S(!A, !B, !Cin)
!Cout (A, B, Cin) = Cout (!A, !B, !Cin)

Exploiting the Inversion Property
A3 B 3 A2 B2 A1 B1 A0 B0
Cout=C4 FA’ FA’ FA’ FA’ C0=Cin
S3 S2 S1 S0
inverted cell regular cell
 Minimizes the critical path (the carry chain) by eliminating

inverters between the FAs (will need to increase the transistor
sizing on the carry chain portion of the mirror adder).
Now need two “flavors” of FAs

Fast Carry Chain Design
• The key to fast addition is a low latency carry network
• What matters is whether in a given position a carry is
– generated Gi = Ai & Bi = AiBi
– propagated Pi = Ai  Bi (sometimes use Ai | Bi)
– annihilated (killed) Ki = !Ai & !Bi
• Giving a carry recurrence of
Ci+1 = Gi | PiCi
C1 = G0 | P0C0
C2 = G1 | P1G0 | P1P0 C0
C3 = G2 | P2G1 | P2P1G0 | P2P1P0 C0
C4 = G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 C0
Manchester Carry Chain
• Switches controlled by Gi and Pi
!Ci+1 !Ci
Gi
Pi
clk
• Total delay of
– time to form the switch control signals Gi and Pi
– setup time for the switches
– signal propagation delay through N switches in the
worst case
A
3
4-bit
B
Sliced
A
3 B
MCC
2A
Adder
B A 2 1 1 0 B0
clk
& & & &

G P G P G P G P
!C4 !C0
!C3 !C2 !C1
   
S3 S2 S1 S0
Domino Manchester Carry Chain
Circuit
3 3 3 3 3 clk
P3 P2 P1 P0
Ci,4 1 2 3 4
1 G3 2 G2 3 G1 4 G0 5 Ci,0
2 3 4 5 6 clk
!(G0 | P0 Ci,0)
!(G2 | P2G1 | P2P1G0 | P2P1P0 Ci,0)
!(G1 | P1G0 | P1P0 Ci,0)

!(G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 Ci,0)
Carry-Skip (Carry-Bypass) Adder
A3 B3 A2 B2 A1 B1 A0 B0
Co,3
FA FA FA FA Ci,0
Co,3
S3 S2 S1 S0
BP = P0 P1 P2 P3 “Block Propagate”
If (P0 & P1 & P2 & P3 = 1) then Co,3 = Ci,0 otherwise the

block itself kills or generates the carry internally
Carry-Skip Chain Implementation
block carry-out
carry-out
BP
block carry-in
P3 P2 P1 P0
!Cout Cin
G3 G2 G1 G0
BP
4-bit Block Carry-Skip Adder
bits 12 to 15 bits 8 to 11 bits 4 to 7 bits 0 to 3
Setup Setup Setup Setup
Carry Carry Carry Carry

Propagation Propagation Propagation Propagation
Ci,0
Sum Sum Sum Sum
Worst-case delay  carry from bit 0 to bit 15 = carry generated in bit

0, ripples through bits 1, 2, and 3, skips the middle two groups (B is
the group size in bits), ripples in the last group from bit 12 to bit 15
Tadd = tsetup + B tcarry + ((N/B) -1) tskip +B tcarry + tsum

VLSI Design - An Introduction

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

VLSI Design - An Introduction

Transféré par

Droits d'auteur :

Formats disponibles

Introduction to VLSI Design

486 Pentium ® proc

1000 2X every 2 years

Power delivery and dissipation will be prohibitive

Year 1999 2002 2005 2008 2011 2014

• Technology shrinks by ~0.7 per generation

 Plot of output voltage as a function of the input voltage

V(y) V(x) V(y)

VIL VIH V(x)

 The regions of acceptable high and low voltages are delimited

 Large noise margins are desirable, but not sufficient …

Regenerative Gate Nonregenerative Gate

 To be regenerative, the VTC must have a transient region with

 Absolute noise margin values are deceptive

 Noise immunity expresses the ability of the system to

 For good noise immunity, the signal swing (i.e., the

 Absolute noise margin values are deceptive

 Noise immunity expresses the ability of the system to

 For good noise immunity, the signal swing (i.e., the

 Fan-out – number of load gates

 Fan-in – the number of inputs to

NMH = NML = VDD/2

NMH = NML = VDD/2

vout (t) = (1 – e–t/)V

Time to reach 90% point is

 Matches the delay of an inverter gate

 Two important components: static and dynamic

E (joules) = CL Vdd2 P01 + tsc Vdd Ipeak P01 + Vdd Ileakage

f01 = P01 * fclock

 An ideal gate is one that is fast and consumes little energy, so

Open (off) (Gate = ‘0’) Closed (on) (Gate = ‘1’)

| VGS | < | VT | | VGS | > | VT |

Open (off) (Gate = ‘1’) Closed (on) (Gate = ‘0’)

| VGS | > | VDD – | VT | | | VGS | < | VDD – |VT| |

For long-channel devices (L > 0.25 micron)

Assuming VGS > VT

VGS VDS > VGS - VT

The current remains constant (saturates).

For long channel devices

 For an NMOS device with L of .25m, only a couple of volts

10 For short channel devices

 VDSAT < VGS – VT so the

VGS = -1.0V -0.2

VGS = -1.5V -0.4

PMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = -0.4V

0.5 VGS = 1.0V

VGS = -1.0V -0.2

VGS = -1.5V -0.4

PMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = -0.4V

Vin = 0.5V Vin = 2.0V

Vin = 1.0V 1 Vin = 1.5V

1.5 NMOS sat

0.5 NMOS res

Process variations (mostly) cause a shift in the switching threshold

 Gate response time is determined by the time to charge CL through Rp

tpHL = ln(2) Reqn CL = 0.69 Reqn CL

-0.5 tp = 32.5 psec

From simulation: tpHL = 39.9 psec and tpLH = 31.7 psec

tpHL = 0.69 Reqn CL 2.5

• If speed is the only concern, reduce the width of

 Widening both PMOS and NMOS by a factor S reduces Req by

PUN and PDN are dual logic networks

D 0  VDD S 0  VDD - VTn

PDN VDD  0 VDD  |VTp|