Vous êtes sur la page 1sur 128

Introduction to VLSI Design

Roy Paily
roypaily@iitg.ernet.in
Department of Electronics and Electrical Engineering
Indian Institute of Technology Guwahati, Guwahati- 781039, Assam, India
Plan of Talk
Introduction to digital integrated circuits
CMOS devices and scaling. CMOS logic gates and
their layout. Propagation delay, noise margins,
and power dissipation. Combinational (e.g.,
arithmetic) design
Course goals
Ability to design and implement CMOS digital
circuits and optimize them with respect to
different constraints: size (cost), speed and power
dissipation
Transistor Revolution
• Transistor –Bardeen (Bell Labs) in 1947
• Bipolar transistor – Schockley in 1949
• First bipolar digital logic gate – Harris
in 1956
• First monolithic IC – Jack Kilby in 1959
• First commercial IC logic gates –
Fairchild 1960
• TTL – 1962 into the 1990’s
• ECL – 1974 into the 1980’s
MOSFET Technology
• MOSFET transistor - Lilienfeld (Canada) in 1925
and Heil (England) in 1935
• CMOS – 1960’s, but plagued with manufacturing
problems
• PMOS in 1960’s (calculators)
• NMOS in 1970’s (4004, 8080) – for speed
• CMOS in 1980’s – preferred MOSFET technology
because of power benefits
• BiCMOS, Gallium-Arsenide, Silicon-Germanium
• SOI, Copper-Low K, …
Transistors on lead microprocessors
double every 2 years
1000

100
2X growth in 1.96 years!

10
P6
Transistors (MT)

Pentium® proc
1 486
386
0.1 286
8085 8086
0.01 8080
8008
4004
0.001
1970 1980 1990 2000 2010
Year
Evolution in DRAM Chip Capacity
100000000
64,000,000

16,000,000
10000000
4,000,000
Kbit capacity/chip

1000000 1,000,000

256,000
100000
64,000

16,000
10000
4,000

1000 1,000

256
100
64

10
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 2010

Year
Die size grows by 14% to satisfy
Moore’s Law
100

P6
Die size (mm)

486 Pentium ® proc


10 386
286
8080 8086
8085 ~7% growth per year
8008
4004 ~2X growth in 10 years

1
1970 1980 1990 2000 2010
Year
Lead microprocessors frequency
doubles every 2 years
10000

1000 2X every 2 years

P6
Frequency (Mhz)

100
Pentium ® proc
486
10 8085 386
8086 286

1 8080
8008
4004
0.1
1970 1980 1990 2000 2010
Year
Courtesy, Intel
Lead Microprocessors power
continues to increase
100

P6
Pentium ® proc
Power (Watts)

10
486
8086 286
386
8085
1 8080
8008
4004

0.1
1971 1974 1978 1985 1992 2000
Year

Power delivery and dissipation will be prohibitive


Courtesy, Intel
Power Density
10000
Rocket
Nozzle
Power Density (W/cm2)

1000

Nuclear
100 Reactor

8086
10 4004 Hot Plate P6
8008 8085 386 Pentium® proc
286 486
8080
1
1970 1980 1990 2000 2010
Year

Courtesy, Intel
Technology Directions: SIA Roadmap

Year 1999 2002 2005 2008 2011 2014


Feature size (nm) 180 130 100 70 50 35
Mtrans/cm2 7 14-26 47 115 284 701
Chip size (mm2) 170 170-214 235 269 308 354
Signal pins/chip 768 1024 1024 1280 1408 1472
Clock rate (MHz) 600 800 1100 1400 1800 2200
Wiring levels 6-7 7-8 8-9 9 9-10 10
Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.6
High-perf power (W) 90 130 160 170 174 183
Battery power (W) 1.4 2.0 2.4 2.0 2.2 2.4

http://www.itrs.net/ntrs/publntrs.nsf
Why Scaling?

• Technology shrinks by ~0.7 per generation


• With every generation can integrate 2x more
functions on a chip; chip cost does not increase
significantly
• Cost of a function decreases by 2x
• But …
– How to design chips with more and more functions?
– Design engineering population does not double every
two years…
Review: Design Abstraction Levels
SYSTEM

MODULE
+

GATE

CIRCUIT
Vin Vout

DEVICE
G
S D
n+ n+
Fundamental Design Metrics
• Functionality
• Cost
– NRE (fixed) costs - design effort
– RE (variable) costs - cost of parts, assembly, test
• Reliability, robustness
– Noise margins
– Noise immunity
• Performance
– Speed (delay)
– Power consumption; energy
• Time-to-market
Static Gate Behavior
 Steady-state parameters of a gate – static behavior – tell
how robust a circuit is with respect to both variations in
the manufacturing process and to noise disturbances.
 Digital circuits perform operations on Boolean variables
 A logical variable is associated with a nominal voltage
level for each logic state
1  VOH and 0  VOL

VOH = ! (VOL)
V(x) V(y)
VOL = ! (VOH)

 Difference between VOH and VOL is the logic or signal swing Vsw
DC Operation
Voltage Transfer Characteristics (VTC)

 Plot of output voltage as a function of the input voltage

V(y) V(x) V(y)

VOH = f (VIL)
V(y)=V(x)

Switching Threshold
VM

VOL = f (VIH)

VIL VIH V(x)


Mapping Logic Levels to the Voltage Domain

 The regions of acceptable high and low voltages are delimited


by VIH and VIL that represent the points on the VTC curve
where the gain = -1

V(y)
"1" VOH Slope = -1
VOH
VIH

Undefined
Region
Slope = -1
VIL
VOL
"0" VOL
VIL VIH V(x)
Noise Margins
 For robust circuits, want the “0” and “1” intervals to be a s
large as possible
VDD VDD

VOH "1"
NMH = VOH - VIH
VIH
Noise Margin High Undefined
Region
Noise Margin Low VIL
NML = VIL - VOL
VOL
"0"
Gnd Gnd
Gate Output Gate Input

 Large noise margins are desirable, but not sufficient …


The Regenerative Property
 A gate with regenerative property ensure that a disturbed
signal converges back to a nominal voltage level

v0 v1 v2 v3 v4 v5 v6

v2
5

v0
V (volts)

1 v1

-1
0 2 4 6 8 10
t (nsec)
Conditions for Regeneration
v0 v1 v2 v3 v4 v5 v6

v1 = f(v0)  v1 = finv(v2)

v3 f(v) finv(v)
v1 v1
v3
finv(v) f(v)

v2 v0 v0 v2

Regenerative Gate Nonregenerative Gate

 To be regenerative, the VTC must have a transient region with


a gain greater than 1 (in absolute value) bordered by two valid
zones where the gain is smaller than 1. Such a gate has two
stable operating points.
Noise Immunity
 Noise margin expresses the ability of a circuit to overpower a
noise source
 noise sources: supply noise, cross talk, interference, offset

 Absolute noise margin values are deceptive


 a floating node is more easily disturbed than a node driven by a low
impedance (in terms of voltage)

 Noise immunity expresses the ability of the system to


process and transmit information correctly in the
presence of noise

 For good noise immunity, the signal swing (i.e., the


difference between VOH and VOL) and the noise margin
have to be large enough to overpower the impact of fixed
sources of noise
Noise Immunity
 Noise margin expresses the ability of a circuit to overpower a
noise source
 noise sources: supply noise, cross talk, interference, offset

 Absolute noise margin values are deceptive


 a floating node is more easily disturbed than a node driven by a low
impedance (in terms of voltage)

 Noise immunity expresses the ability of the system to


process and transmit information correctly in the
presence of noise

 For good noise immunity, the signal swing (i.e., the


difference between VOH and VOL) and the noise margin
have to be large enough to overpower the impact of fixed
sources of noise
Fan-In and Fan-Out

 Fan-out – number of load gates


connected to the output of the
driving gate
 gates with large fan-out are slower
N

 Fan-in – the number of inputs to


the gate M
 gates with large fan-in are bigger and
slower
The Ideal Inverter
 The ideal gate should have
 infinite gain in the transition region
 a gate threshold located in the middle of the logic swing
 high and low noise margins equal to half the swing
 input and output impedances of infinity and zero, resp.
Vout

Ri = 

Ro = 0

Fanout = 

NMH = NML = VDD/2

Vin
The Ideal Inverter
 The ideal gate should have
 infinite gain in the transition region
 a gate threshold located in the middle of the logic swing
 high and low noise margins equal to half the swing
 input and output impedances of infinity and zero, resp.
Vout

Ri = 

Ro = 0

g=- Fanout = 

NMH = NML = VDD/2

Vin
Delay Definitions

Vin Vout

Vin
Propagation delay?
input
waveform

Vout

output
signal slopes?
waveform

t
Delay Definitions

Vin Vout

Vin
Propagation delay
input 50% tp = (tpHL + tpLH)/2
waveform

t
tpHL tpLH
Vout
90%
output
50% signal slopes
waveform
10%
t
tf tr
Modeling Propagation Delay
 Model circuit as first-order RC network

vout (t) = (1 – e–t/)V


R
vout
where  = RC

C
Time to reach 50% point is
vin
t = ln(2)  = 0.69 

Time to reach 90% point is


t = ln(9)  = 2.2 

 Matches the delay of an inverter gate


Power and Energy Dissipation
 Power consumption: how much energy is consumed
per operation and how much heat the circuit dissipates
 supply line sizing (determined by peak power)
Ppeak = Vddipeak
 battery lifetime (determined by average power dissipation)
p(t) = v(t)i(t) = Vddi(t) Pavg= 1/T  p(t) dt = Vdd/T  idd(t) dt
 packaging and cooling requirements

 Two important components: static and dynamic

E (joules) = CL Vdd2 P01 + tsc Vdd Ipeak P01 + Vdd Ileakage

f01 = P01 * fclock


P (watts) = CL Vdd2 f01 + tscVdd Ipeak f01 + Vdd Ileakage
Power and Energy Dissipation
 Propagation delay and the power consumption of a gate
are related
 Propagation delay is (mostly) determined by the speed at
which a given amount of energy can be stored on the
gate capacitors
 the faster the energy transfer (higher power dissipation) the
faster the gate
 For a given technology and gate topology, the product of the
power consumption and the propagation delay is a constant
 Power-delay product (PDP) – energy consumed by the gate per
switching event

 An ideal gate is one that is fast and consumes little energy, so


the ultimate quality metric is
 Energy-delay product (EDP) = power-delay 2
The MOS Transistor

Polysilicon Aluminum
Switch Model of NMOS Transistor
| VGS | Gate

Source Drain
(of carriers) (of carriers)

Open (off) (Gate = ‘0’) Closed (on) (Gate = ‘1’)

Ron

| VGS | < | VT | | VGS | > | VT |


Switch Model of PMOS Transistor
Gate
| VGS |

Source Drain
(of carriers) (of carriers)

Open (off) (Gate = ‘1’) Closed (on) (Gate = ‘0’)


Ron

| VGS | > | VDD – | VT | | | VGS | < | VDD – |VT| |


Voltage-Current Relation: Linear Mode

For long-channel devices (L > 0.25 micron)


 When VDS  VGS – VT
ID = k’n W/L [(VGS – VT)VDS – VDS2/2]
where
k’n = nCox = nox/tox = is the process
transconductance parameter (n is the carrier mobility
(m2/Vsec))
kn = k’n W/L is the gain factor of the device
For small VDS, there is a linear dependence between VDS
and ID, hence the name resistive or linear region
Transistor in Saturation Mode

Assuming VGS > VT

VGS VDS > VGS - VT


VDS
S G
D ID

n+ - V -V + n+
GS T

Pinch-off

The current remains constant (saturates).


Voltage-Current Relation: Saturation Mode

For long channel devices


 When VDS  VGS – VT
ID’ = k’n/2 W/L [(VGS – VT) 2]
since the voltage difference over the induced channel
(from the pinch-off point to the source) remains fixed at
VGS – VT
 However, the effective length of the conductive channel
is modulated by the applied VDS, so
ID = ID’ (1 + VDS)
where  is the channel-length modulation (varies with the
inverse of the channel length)
Current Determinates
• For a fixed VDS and VGS (> VT), IDS is a function of
– the distance between the source and drain – L
– the channel width – W
– the threshold voltage – VT
– the thickness of the SiO2 – tox
– the dielectric of the gate insulator (SiO2) – ox
– the carrier mobility
• for NMOS: n = 500 cm2/V-sec
• for PMOS: p = 180 cm2/V-sec
Long Channel I-V Plot (NMOS)
X 10-4
6
VGS = 2.5V
VDS = VGS - VT
5

4
VGS = 2.0V
3
Linear Saturation
2 VGS = 1.5V

1
VGS = 1.0V
0
cut-off 0 0.5 1 1.5 2 2.5
VDS (V)
NMOS transistor, 0.25um, Ld = 10um, W/L = 1.5, VDD = 2.5V, VT = 0.4V
Short Channel Effects
 Behavior of short channel device mainly due to
5
10
 Velocity saturation –
the velocity of the
carriers saturates due to
scattering (collisions
suffered by the carriers)

0
0 c= 1.5 3
(V/m)

 For an NMOS device with L of .25m, only a couple of volts


difference between D and S are needed to reach velocity saturation
Velocity Saturation Effects

10 For short channel devices


and large enough VGS – VT

 VDSAT < VGS – VT so the


device enters saturation
before VDS reaches VGS –
VT and operates more
0 often in saturation

 IDSAT
has a linear dependence wrt VGS so a reduced
amount of current is delivered for a given control voltage
Short Channel I-V Plot (PMOS)
 All polarities of all voltages and currents are reversed
-2 VDS (V) -1 0
0

VGS = -1.0V -0.2

VGS = -1.5V -0.4

-0.6
VGS = -2.0V

-0.8

VGS = -2.5V
-1 X 10-4

PMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = -0.4V


CMOS Inverter:
A First Look

VDD

Vin Vout
CL
CMOS Inverter:
Steady State Response

VOL = 0
VDD VDD
VOH = VDD
VM = f(Rn, Rp)
Rp

Vout = 1 Vout = 0

Rn

Vin = 0 Vin = V DD
CMOS Properties
• Full rail-to-rail swing  high noise margins
– Logic levels not dependent upon the relative device sizes
 transistors can be minimum size  ratioless
• Always a path to Vdd or GND in steady state  low
output impedance (output resistance in k range) 
large fan-out (albeit with degraded performance)
• Extremely high input resistance (gate of MOS
transistor is near perfect insulator)  nearly zero
steady-state input current
• No direct path steady-state between power and
ground  no static power dissipation
• Propagation delay function of load capacitance and
resistance of transistors
Short Channel I-V Plot (NMOS)
X 10-4
2.5
VGS = 2.5V
2

VGS = 2.0V
1.5

VGS = 1.5V
1

0.5 VGS = 1.0V

0
0 0.5 1 1.5 2 2.5
VDS (V)
NMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = 0.4V
Short Channel I-V Plot (PMOS)
 All polarities of all voltages and currents are reversed
-2 VDS (V) -1 0
0

VGS = -1.0V -0.2

VGS = -1.5V -0.4

-0.6
VGS = -2.0V

-0.8

VGS = -2.5V
-1 X 10-4

PMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = -0.4V


Transforming PMOS I-V Lines
 Want common coordinate set Vin, Vout, and IDn
IDn
IDSp = -IDSn
VGSn = Vin ; VGSp = Vin - VDD
VDSn = Vout ; VDSp = Vout - VDD
Vout

Vin = 0 Vin = 0
Vin = 1.5 Vin = 1.5

VGSp = -1
Mirror around x-axis Horiz. shift over VDD
VGSp = -2.5
Vin = VDD + VGSp Vout = VDD + VDSp
IDn = -IDp
CMOS Inverter Load Lines
PMOS X 10-4 NMOS
2.5
Vin = 0V Vin = 2.5V
2

Vin = 0.5V Vin = 2.0V


1.5

Vin = 1.0V 1 Vin = 1.5V


Vin = 1V
Vin = 1.5V
Vin = 2V Vin = 0.5V
0.5 Vin = 1.0V
Vin = 1.5V
Vin = 2.0V Vin = 0.5V
0
Vin = 2.5V 0 0.5 1 1.5 2 2.5 Vin = 0V
Vout (V)
0.25um, W/Ln = 1.5, W/Lp = 4.5, VDD = 2.5V, VTn = 0.4V, VTp = -0.4V
CMOS Inverter VTC
NMOS off
PMOS res
2.5 NMOS sat
PMOS res
2

1.5 NMOS sat


Vout (V)

PMOS sat
1

0.5 NMOS res


PMOS sat NMOS res
PMOS off
0
0 0.5 1 1.5 2 2.5

Vin (V)
Impact of Process Variation on VTC Curve
2.5

2 Good PMOS
Bad NMOS
1.5
Vout (V)

Nominal
1
Bad PMOS
Good NMOS
0.5

0
0 0.5 1 1.5 2 2.5

Vin (V)

Process variations (mostly) cause a shift in the switching threshold


CMOS Inverter:
Switch Model of Dynamic Behavior

VDD VDD

Rp

Vout Vout
CL CL
Rn

Vin = 0 Vin = V DD

 Gate response time is determined by the time to charge CL through Rp


(discharge CL through Rn)
Inverter Propagation Delay
• Propagation delay is proportional to the time-constant of
the network formed by the pull-down resistor and the
load capacitance
VDD tpHL = f(Rn, CL)

tpHL = ln(2) Reqn CL = 0.69 Reqn CL


Vout = 0 tpLH = ln(2) Reqp CL = 0.69 Reqp CL
Rn CL
tp = (tpHL + tpLH)/2 = 0.69 CL(Reqn +
Reqp)/2
Vin = V DD
• To equalize rise and fall times make the on-resistance of
the NMOS and PMOS approximately equal.
Inverter Transient Response
VDD=2.5V
3 0.25m
Vin
2.5 W/Ln = 1.5
W/Lp = 4.5
2 Reqn= 13 k ( 1.5)
Reqp= 31 k ( 4.5)
1.5
tf tr
1 tpHL tpLH tpHL = 36 psec
0.5 tpLH = 29 psec
0 so

-0.5 tp = 32.5 psec


0 0.5 1 1.5 2 2.5
x 10-10
t (sec)

From simulation: tpHL = 39.9 psec and tpLH = 31.7 psec


Inverter Propagation Delay, Revisited
• To see how a designer can optimize the delay of a gate
have to expand the Req in the delay equation
5.5
5
4.5
4
3.5
3

tpHL = 0.69 Reqn CL 2.5


2
1.5
= 0.69 (3/4 (CL VDD)/IDSATn ) 1
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
VDD (V)
 0.52 CL / (W/Ln k’n VDSATn )
Design for Performance
• Reduce CL
– internal diffusion capacitance of the gate itself
• keep the drain diffusion as small as possible
– interconnect capacitance
– fanout
• Increase W/L ratio of the transistor
– the most powerful and effective performance
optimization tool in the hands of the designer
– watch out for self-loading! – when the intrinsic
capacitance dominates the extrinsic load
• Increase VDD
– can trade-off energy for performance
– increasing VDD above a certain level yields only very
minimal improvements
– reliability concerns enforce a firm upper bound on VDD
NMOS/PMOS Ratio
 So far have sized the PMOS and NMOS so that the Req’s match
(ratio of 3 to 3.5)
 symmetrical VTC
 equal high-to-low and low-to-high propagation delays

• If speed is the only concern, reduce the width of


the PMOS device!
– widening the PMOS degrades the tpHL due to larger
parasitic capacitance
 = (W/Lp)/(W/Ln)
r = Reqp/Reqn (resistance ratio of identically-sized PMOS and NMOS)
opt = r when wiring capacitance is negligible
PMOS/NMOS Ratio Effects
x 10-11
5

tpLH tpHL
4.5  of 2.4 (= 31 k/13 k)
gives symmetrical response
4 tp
 of 1.6 to 1.9 gives optimal
performance
3.5

3
1 2 3 4 5
 = (W/Lp)/(W/Ln)
Device Sizing for Performance
• Divide capacitive load, CL, into
– Cint : intrinsic - diffusion and Miller effect
– Cext : extrinsic - wiring and fanout
tp = 0.69 Req Cint (1 + Cext/Cint) = tp0 (1 + Cext/Cint)
– where tp0 = 0.69 Req Cint is the intrinsic (unloaded) delay of
the gate

 Widening both PMOS and NMOS by a factor S reduces Req by


an identical factor (Req = Rref/S), but raises the intrinsic
capacitance by the same factor (Cint = SCiref)
tp = 0.69 Rref Ciref (1 + Cext/(SCiref)) = tp0(1 + Cext/(SCiref))

 tp0 is independent of the sizing of the gate; with no load the drive of
the gate is totally offset by the increased capacitance
 any S sufficiently larger than (Cext/Cint) yields the best performance
gains with least area impact
Sizing Impacts on Delay
x 10-11 The majority of the
3.8
for a fixed load improvement is already
3.6
obtained for S = 5. Sizing
3.4
factors larger than 10 barely
3.2
yield any extra gain (and
3
cost significantly more area).
2.8
2.6
2.4
2.2
2
1 3 5 7 9 11 13 15
S self-loading effect
(intrinsic capacitance
dominates)
CMOS Circuit Styles
• Static complementary CMOS - except during switching, output
connected to either VDD or GND via a low-resistance path
– high noise margins
• full rail to rail swing
• VOH and VOL are at VDD and GND, respectively
– low output impedance, high input impedance
– no steady state path between VDD and GND (no static
power consumption)
– delay a function of load capacitance and transistor
resistance
– comparable rise and fall times (under the appropriate
transistor sizing conditions)
• Dynamic CMOS - relies on temporary storage of signal values
on the capacitance of high-impedance circuit nodes
– simpler, faster gates
– increased sensitivity to noise
Static Complementary CMOS
 Pull-up network (PUN) and pull-down network (PDN)

VDD
PMOS transistors only
In1
pull-up: make a connection from VDD to F
In2 PUN
when F(In1,In2,…InN) = 1
InN
F(In1,In2,…InN)
In1
pull-down: make a connection from F to GND
In2 PDN
when F(In1,In2,…InN) = 0
InN
NMOS transistors only

PUN and PDN are dual logic networks


Threshold Drops
VDD VDD
PUN
S D
VDD

D 0  VDD S 0  VDD - VTn


VGS
CL CL

PDN VDD  0 VDD  |VTp|


VGS
D CL S CL
VDD

S D
Construction of PDN
• NMOS devices in series implement a NAND function
A•B
A

• NMOS devices in parallel implement a NOR function

A+B
A B
Dual PUN and PDN
• PUN and PDN are dual networks
– a parallel connection of transistors in the PUN
corresponds to a series connection of the PDN
• Complementary gate is naturally inverting
(NAND, NOR, AOI, OAI)
• Number of transistors for an N-input logic
gate is 2N
CMOS NAND
A B F
0 0 1
A B
0 1 1
1 0 1
A•B
A 1 1 0

A
B
CMOS NOR
A B F
B
0 0 1
A 0 1 0

A+B 1 0 0
1 1 0
A B

A
B
Complex CMOS Gate

B
A
C

D
OUT = !(D + A • (B + C))
A
D
B C
Standard Cell Layout Methodology

Routing
channel
VDD

signals

GND

What logic function is this?


VTC is Data-Dependent
0.5/0.25 NMOS
3 0.75 /0.25 PMOS

A M3 B M4
2 A,B: 0 -> 1
B=1, A:0 -> 1
F= A • B A=1, B:0->1
D weaker
A M2 1 PUN
S
VGS2 = VA –VDS1 D
Cint
B M1
VGS1 = VB S 0
0 1 2

 The threshold voltage of M2 is higher than M1 due to the body


effect ()
VTn1 = VTn0
VTn2 = VTn0 + ((|2F| + Vint) - |2F|)
since VSB of M2 is not zero (when VB = 0) due to the presence of Cint
Review: CMOS Inverter: Dynamic
VDD

tpHL = f(Rn, CL)

Vout
tpHL = 0.69 Reqn CL

CL tpHL = 0.69 (3/4 (CL VDD)/ IDSATn )


Rn

= 0.52 CL / (W/Ln k’n VDSATn )


Vin = V DD
Designing Inverters for Performance
• Reduce CL
– internal diffusion capacitance of the gate itself
– interconnect capacitance
– fanout
• Increase W/L ratio of the transistor
– the most powerful and effective performance
optimization tool in the hands of the designer
– watch out for self-loading!
• Increase VDD
– only minimal improvement in performance at the cost
of increased energy dissipation
• Slope engineering - keeping signal rise and fall times
smaller than or equal to the gate propagation delays and
of approximately equal values
– good for performance
– good for power consumption
Switch Delay Model
A Req
A

Rp
Rp Rp
B
A B Rp
A Rp Cint
Rn CL A
A Rn CL
A Rn Rn CL
Rn
Cint
A B
B INVERTER

NOR
NAND
Input Pattern Effects on Delay
• Delay is dependent on the pattern
of inputs
Rp Rp • Low to high transition
A B – both inputs go low
• delay is 0.69 Rp/2 CL since two p-resistors
are on in parallel
Rn CL
– one input goes low
A • delay is 0.69 Rp CL
Rn • High to low transition
Cint
B – both inputs go high
• delay is 0.69 2Rn CL
• Adding transistors in series (without
sizing) slows down the circuit
Delay Dependence on Input Patterns
2-input NAND with
NMOS = 0.5m/0.25 m
PMOS = 0.75m/0.25 m
3 CL = 10 fF

2.5 A=B=10

2 Input Data Delay


A=1 0, B=1 Pattern (psec)
1.5
A=B=01 69
Voltage, V

1 A=1, B=10
A=1, B=01 62
0.5 A= 01, B=1 50

0 A=B=10 35
0 100 200 300 400
-0.5 A=1, B=10 76
time, psec
A= 10, B=1 57
Transistor Sizing
Rp Rp Rp
1 A B 1 2 B

Rn Rp Cint
CL 2
2 A
B

Rn Rn Rn CL
2 Cint
1
A A B 1
Transistor Sizing a Complex CMOS Gate
B 4 12
A 2 6
C 4 12

D 2 6
OUT = !(D + A • (B + C))
A 2
D 1
B 2C 2
Fan-In Considerations
A B C D

A CL
B C3 Distributed RC model
C C2 (Elmore delay)
D C1 tpHL = 0.69 Reqn(C1+2C2+3C3+4CL)

Propagation delay deteriorates


rapidly as a function of fan-in –
quadratically in the worst case.
tp as a Function of Fan-In

1250
quadratic
1000 function of
fan-in
750
tp (psec)

tpHL tp
500

250 tpLH
linear
0 function of
2 4 6 8 10 12 14 16 fan-in
fan-in

 Gates with a fan-in greater than 4 should be avoided.


Fast Complex Gates: Design Technique
• Transistor sizing
– as long as fan-out capacitance dominates

• Progressive sizing
Distributed RC line

InN MN CL M1 > M2 > M3 > … > MN

(the fet closest to the output


In3 M3 C3 should be the smallest)
In2 M2 C2
Can reduce delay by more than
In1 M1 C1 20%; decreasing gains as
technology shrinks
Fast Complex Gates: Design Technique
• Input re-ordering
– when not all inputs arrive at the same time

critical path critical path

charged 01
In3 1
M3 CL In1 M3 CLcharged

In2 1 M2 In2 1 M2 C2 discharged


C2 charged
In1 In3 1 M1 C1 discharged
M1 C1 charged
01

delay determined by time to delay determined by time to


discharge CL, C1 and C2 discharge CL
Sizing and Ordering Effects
A 3 B 3 C 3 D 3

A 44 CL = 100 fF
B 45 C3
C 46 Progressive sizing in pull-down
C2
chain gives up to a 23%
D 47 C1 improvement.

Input ordering saves 5%


critical path A – 23%
critical path D – 17%
Fast Complex Gates: Design Technique
• Alternative logic structures

F = ABCDEFGH
Fast Complex Gates: Design Technique
• Isolating fan-in from fan-out using buffer insertion

CL CL

 Real lesson is that optimizing the propagation delay of a gate


in isolation is misguided.
Lowering Dynamic Power
Capacitance: Supply Voltage:
Function of fan-out, Has been dropping
wire length, transistor with successive
sizes generations

Pdyn = CL VDD2 P01 f

Activity factor: Clock frequency:


How often, on average, Increasing…
do wires switch?
Short Circuit Power Consumption

Vin Isc Vout

CL

Finite slope of the input signal causes a direct current


path between VDD and GND for a short period of time
during switching when both the NMOS and PMOS
transistors are conducting.
Leakage (Static) Power Consumption
VDD Ileakage

Vout
Drain junction
leakage

Gate leakage Sub-threshold current

Sub-threshold current is the dominant factor.

All increase exponentially with temperature!


Leakage as a Function of VT
 Continued scaling of supply voltage and the subsequent scaling
of threshold voltage will make subthreshold conduction a
dominate component of power dissipation.

10-2
 An 90mV/decade VT
roll-off - so each
255mV increase in VT
gives 3 orders of
ID (A)

10-7 magnitude reduction


in leakage (but
adversely affects
VT=0.4V
VT=0.1V
performance)
10-12
0 0.2 0.4 0.6 0.8 1
VGS (V)
TSMC Processes Leakage and VT
CL018 G CL018 LP CL018 CL018 HS CL015 HS CL013 HS
ULP
Vdd 1.8 V 1.8 V 1.8 V 2V 1.5 V 1.2 V
Tox (effective) 42 Å 42 Å 42 Å 42 Å 29 Å 24 Å
Lgate 0.16 m 0.16 m 0.18 m 0.13 m 0.11 m 0.08 m
IDSat (n/p) 600/260 500/180 320/130 780/360 860/370 920/400
(A/m)
Ioff (leakage) 20 1.60 0.15 300 1,800 13,000
(A/m)
VTn 0.42 V 0.63 V 0.73 V 0.40 V 0.29 V 0.25 V
FET Perf. (GHz) 30 22 14 43 52 80

From MPR, 2000


Exponential Increase in Leakage
Currents
10000

1000
0.25
Ileakage(nA/m)

0.18
100
0.13
0.1
10

1
30 40 50 60 70 80 90 100 110
Temp(C)

From De,1999
Review: Energy & Power Equations
E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage

f01 = P01 * fclock

P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage

Dynamic power Short-circuit Leakage power


(~90% today and power (~2% today and
decreasing (~8% today and increasing)
relatively) decreasing
absolutely)
Power and Energy Design Space
Constant Variable
Throughput/Latency Throughput/Latency
Energy Design Time Non-active Modules Run Time
Logic Design
DFS, DVS
Reduced Vdd
Active Clock Gating (Dynamic
Sizing Freq, Voltage
Scaling)
Multi-Vdd
Sleep Transistors
Leakage + Multi-VT Multi-Vdd + Variable VT
Variable VT
Dynamic Power Consumption is Data Dependent
 Switching activity, P01, has two components
 A static component – function of the logic topology
 A dynamic component – function of the timing behavior (glitching)

Static transition probability


P01 = Pout=0 x Pout=1
2-input NOR Gate
= P0 x (1-P0)
A B Out
0 0 1
With input signal probabilities
0 1 0
PA=1 = 1/2
1 0 0 PB=1 = 1/2
1 1 0
NOR static transition probability
= 3/4 x 1/4 = 3/16
NOR Gate Transition Probabilities
 Switching activity is a strong function of the input signal
statistics
 PA and PB are the probabilities that inputs A and B are one

0
A B
CL PA
1 0 1
PB

P01 = P0 x P1 = (1-(1-PA)(1-PB)) (1-PA)(1-PB)


Transition Probabilities for Some Basic Gates
P01 = Pout=0 x Pout=1
NOR (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB)
OR (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB))
NAND PAPB x (1 - PAPB)
AND (1 - PAPB) x PAPB
XOR (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB)
X
0.5 A
Z
0.5 B

For X: P01 = P0 x P1 = (1-PA) PA


= 0.5 x 0.5 = 0.25
For Z: P01 = P0 x P1 = (1-PXPB) PXPB
= (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16
Logic Restructuring
 Logic restructuring: changing the topology of a logic
network to reduce transitions
AND: P01 = P0 x P1 = (1 - PAPB) x PAPB
3/16
0.5 A Y
0.5 (1-0.25)*0.25 = 3/16
A W 7/64 0.5 B 15/256
B X F
15/256 0.5
0.5 C C
0.5 D F
0.5 0.5 D Z
3/16

Chain implementation has a lower overall switching activity


than the tree implementation for random inputs
Ignores glitching effects
Input Ordering
(1-0.5x0.2)x(0.5x0.2)=0.09 (1-0.2x0.1)x(0.2x0.1)=0.0196
0.5 0.2
A B X
X
B C
F 0.1 A F
0.2 C
0.1 0.5

Beneficial to postpone the introduction of signals with a


high transition rate (signals with signal probability close
to 0.5)
NMOS Transistors in Series/Parallel
• Primary inputs drive both gate and
source/drain terminals
• NMOS switch closes when the gate input is
high A B
X = Y if A and B
X Y
A

B X = Y if A or B
X Y

• Remember - NMOS transistors pass a strong 0


but a weak 1
PMOS Transistors in Series/Parallel
• Primary inputs drive both gate and
source/drain terminals
• PMOS switch closes when the gate input is low
A B
X = Y if A and B = A + B
X Y
A

B X = Y if A or B = A  B
X Y

• Remember - PMOS transistors pass a strong 1


but a weak 0
Pass Transistor (PT) Logic
B
B
A A
F =AB B
B
0 F =AB
0

 Gate is static – a low-impedance path exists to both supply


rails under all circumstances
N transistors instead of 2N
 No static power consumption

 Ratioless

 Bidirectional (versus undirectional)


Differential PT Logic (CPL)
A
A PT Network
B F
F
B

A
A Inverse PT F
B Network F
B

B B B B B B

A A A

F=AB B F=A+B A F=AB


B

A A A
F=AB F=A+B F=AB
B B A
AND/NAND OR/NOR XOR/XNOR
CPL Properties
• Differential so complementary data inputs and outputs
are always available (so don’t need extra inverters)
• Still static, since the output defining nodes are always
tied to VDD or GND through a low resistance path
• Design is modular; all gates use the same topology,
only the inputs are permuted.
• Simple XOR makes it attractive for structures like
adders
• Fast (assuming number of transistors in series is small)
• Additional routing overhead for complementary signals
• Still have static power dissipation problems
NMOS Only PT Driving an Inverter
In = VDD
Vx = VDD- M2
VGS
A = VDD VTn
D S
B M1

• Vx does not pull up to VDD, but VDD – VTn

 Threshold voltage drop causes static power consumption


(M2 may be weakly conducting forming a path from VDD to
GND)
 Notice VTn increases of pass transistor due to body effect
(VSB)
Voltage Swing of PT Driving an Inverter
3
In
In = 0  VDD
1.5/0.25 2
x = 1.8V
S
x

Voltage, V
D
VDD Out
0.5/0.25
1
B 0.5/0.25
Out
0
0 0.5 1 1.5 2
Time, ns

• Body effect – large VSB at x - when pulling high (B is


tied to GND and S charged up close to VDD)
• So the voltage drop is even worse
Vx = VDD - (VTn0 + ((|2f| + Vx) - |2f|))
Cascaded NMOS Only PTs
B = VDD B = VDD C = VDD
G
M1 x M2 y Out
M1 A = VDD
A = VDD x = VDD - VTn1
S
G
M2 y Out
C = VDD
S

Swing on y = VDD - VTn1 - VTn2 Swing on y = VDD - VTn1

 Pass transistor gates should never be cascaded as on the


left
 Logic on the right suffers from static power dissipation and
reduced noise margins
Solution 1: Level Restorer
Level Restorer
on
Mr
B off
A=1 M2 Out=0
Mn
x= 0
A=0 Out =1
1
M1

 Full swing on x (due to Level Restorer) so no static power


consumption by inverter
 No static backward current path through Level Restorer and PT
since Restorer is only active when A is high
• For correct operation Mr must be sized correctly (ratioed)
Solution 2: Multiple VT Transistors
• Technology solution: Use (near) zero VT devices for the
NMOS PTs to eliminate most of the threshold drop (body
effect still in force preventing full swing to VDD)

low VT transistors
In2 = 0V A = 2.5V
on

Out

off but
leaking
In1 = 2.5V B = 0V
sneak path

 Impacts static power consumption due to subthreshold


currents flowing through the PTs (even if VGS is below VT)
Solution 3: Transmission Gates (TGs)
 Most widely used
C C
solution
A B
A B C

C = GND C = GND

A = VDD B A = GND B

C = VDD C = VDD

• Full swing bidirectional switch controlled by the gate signal


C, A = B if C = 1
TG Multiplexer
S S F
S
VDD

In2

S F

In1

F = !(In1  S + In2  S) GND

In1 S S In2
Differential TG Logic (DPL)
B A B A B A B A

A A

F=AB B F=AB
GND
B A

B
GND

VDD A

A F=AB B F=AB
VDD A

B B
AND/NAND XOR/XNOR
The 1-bit Binary Adder
Cin A B Cin Cout S carry status
0 0 0 0 0 kill
A 0 0 1 0 1 kill
1-bit Full
Adder S 0 1 0 0 1 propagate
B (FA) 0 1 1 1 0 propagate
1 0 0 0 1 propagate
1 0 1 1 0 propagate
Cout
1 1 0 1 0 generate
1 1 1 1 1 generate
G = A&B
P=AB S = A  B  Cin = P  Cin
K = !A & !B Cout = A&B | A&Cin | B&Cin (majority function)
= G | P&Cin

 How can we use it to build a 64-bit adder?


 How can we modify it easily to build an adder/subtractor?
 How can we make it better (faster, lower power, smaller)?
Static CMOS Full Adder Circuit

B
A B B A B Cin
A

A Cin
!Cout !Sum
Cin
A Cin

A
A B B A B Cin
B
Static CMOS Full Adder Circuit
!Cout = !Cin & (!A | !B) | (!A & !B) !Sum = Cout & (!A | !B | !Cin) | (!A & !B & !Cin)

B
A B B A B Cin
A

A Cin
!Cout !Sum
Cin
A Cin

A
A B B A B Cin
B

Cout = Cin & (A | B) | (A & B) Sum = !Cout & (A | B | Cin) | (A & B & Cin)
CPL Full Adder
B B Cin Cin

A !Sum

A Sum

B B Cin Cin

A !Cout
B Cin

A Cout
B Cin
CPL Full Adder
B B Cin Cin

A !Sum

A Sum

B B Cin Cin

A !Cout
B Cin

A Cout
B Cin
TG Full Adder
Cin

A Sum

Cout
Mirror Adder
24+4 transistors
B 6
A 8 B 8 B 8 A 4 B 4 Cin 4
A 6
0-propagate kill
8 A 8 4 Cin 6
!Cout !S
Cin
4 A 4 2 Cin 3
1-propagate generate
A 3
A 4 B 4 B 4 A 2 B 2 Cin 2
B 3

Cout = A&B | B&Cin | A&Cin SUM = A&B&Cin | COUT&(A | B | Cin)

Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal
fan-out for each is also 2. Since !Cout drives 2 internal and 2 inverter
transistor gates (to form Cin for the nms bit adder) should oversize the carry
circuit. PMOS/NMOS ratio of 2.
Mirror Adder Features
• The NMOS and PMOS chains are completely
symmetrical with a maximum of two series transistors
in the carry circuitry, guaranteeing identical rise and fall
transitions if the NMOS and PMOS devices are properly
sized.
• When laying out the cell, the most critical issue is the
minimization of the capacitances at node !Cout (four
diffusion capacitances, two internal gate capacitances,
and two inverter gate capacitances). Shared diffusions
can reduce the stack node capacitances.
• The transistors connected to Cin are placed closest to
the output.
• Only the transistors in the carry stage have to be
optimized for optimal speed. All transistors in the sum
stage can be minimal size.
A 64-bit Adder/Subtractor
add/subt C0=Cin
 Ripple Carry Adder (RCA) A0 1-bit
built out of 64 FAs FA S0
B0 C1
 Subtraction – complement
all subtrahend bits (xor A1 1-bit
FA S1
gates) and set the low order B1
C2
carry-in
A2 1-bit
 RCA FA S2
B2 C3
 advantage: simple logic, so

...
small (low cost)
C63
 disadvantage: slow (O(N) for
N bits) and lots of glitching A63 1-bit
(so lots of energy FA S63
consumption) B63
C64=Cout
Ripple Carry Adder (RCA)
A3 B 3 A2 B2 A1 B1 A0 B0

Cout=C4 FA FA FA FA C0=Cin

S3 S2 S1 S0

Tadder  TFA(A,BCout) + (N-2)TFA(CinCout) + TFA(CinS)

T = O(N) worst case delay


Real Goal: Make the fastest possible carry path
Inversion Property
 Inverting all inputs to a FA results in inverted values for all
outputs
A B A B

Cout FA Cin  Cout FA Cin

S S

!S (A, B, Cin) = S(!A, !B, !Cin)

!Cout (A, B, Cin) = Cout (!A, !B, !Cin)


Exploiting the Inversion Property
A3 B 3 A2 B2 A1 B1 A0 B0

Cout=C4 FA’ FA’ FA’ FA’ C0=Cin

S3 S2 S1 S0
inverted cell regular cell

 Minimizes the critical path (the carry chain) by eliminating


inverters between the FAs (will need to increase the transistor
sizing on the carry chain portion of the mirror adder).

Now need two “flavors” of FAs


Fast Carry Chain Design
• The key to fast addition is a low latency carry network
• What matters is whether in a given position a carry is
– generated Gi = Ai & Bi = AiBi
– propagated Pi = Ai  Bi (sometimes use Ai | Bi)
– annihilated (killed) Ki = !Ai & !Bi
• Giving a carry recurrence of
Ci+1 = Gi | PiCi

C1 = G0 | P0C0
C2 = G1 | P1G0 | P1P0 C0
C3 = G2 | P2G1 | P2P1G0 | P2P1P0 C0
C4 = G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 C0
Manchester Carry Chain
• Switches controlled by Gi and Pi

!Ci+1 !Ci
Gi
Pi
clk
• Total delay of
– time to form the switch control signals Gi and Pi
– setup time for the switches
– signal propagation delay through N switches in the
worst case
A
3
4-bit
B
Sliced
A
3 B
MCC
2A
Adder
B A 2 1 1 0 B0
clk

& & & &


G P G P G P G P

!C4 !C0

!C3 !C2 !C1

   

S3 S2 S1 S0
Domino Manchester Carry Chain
Circuit
3 3 3 3 3 clk
P3 P2 P1 P0

Ci,4 1 2 3 4

1 G3 2 G2 3 G1 4 G0 5 Ci,0

2 3 4 5 6 clk

!(G0 | P0 Ci,0)
!(G2 | P2G1 | P2P1G0 | P2P1P0 Ci,0)

!(G1 | P1G0 | P1P0 Ci,0)


!(G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 Ci,0)
Carry-Skip (Carry-Bypass) Adder
A3 B3 A2 B2 A1 B1 A0 B0

Co,3
FA FA FA FA Ci,0

Co,3
S3 S2 S1 S0

BP = P0 P1 P2 P3 “Block Propagate”

If (P0 & P1 & P2 & P3 = 1) then Co,3 = Ci,0 otherwise the


block itself kills or generates the carry internally
Carry-Skip Chain Implementation
block carry-out
carry-out
BP
block carry-in

P3 P2 P1 P0

!Cout Cin
G3 G2 G1 G0

BP
4-bit Block Carry-Skip Adder
bits 12 to 15 bits 8 to 11 bits 4 to 7 bits 0 to 3

Setup Setup Setup Setup

Carry Carry Carry Carry


Propagation Propagation Propagation Propagation
Ci,0

Sum Sum Sum Sum

Worst-case delay  carry from bit 0 to bit 15 = carry generated in bit


0, ripples through bits 1, 2, and 3, skips the middle two groups (B is
the group size in bits), ripples in the last group from bit 12 to bit 15

Tadd = tsetup + B tcarry + ((N/B) -1) tskip +B tcarry + tsum

Vous aimerez peut-être aussi