Académique Documents
Professionnel Documents
Culture Documents
Roy Paily
roypaily@iitg.ernet.in
Department of Electronics and Electrical Engineering
Indian Institute of Technology Guwahati, Guwahati- 781039, Assam, India
Plan of Talk
Introduction to digital integrated circuits
CMOS devices and scaling. CMOS logic gates and
their layout. Propagation delay, noise margins,
and power dissipation. Combinational (e.g.,
arithmetic) design
Course goals
Ability to design and implement CMOS digital
circuits and optimize them with respect to
different constraints: size (cost), speed and power
dissipation
Transistor Revolution
• Transistor –Bardeen (Bell Labs) in 1947
• Bipolar transistor – Schockley in 1949
• First bipolar digital logic gate – Harris
in 1956
• First monolithic IC – Jack Kilby in 1959
• First commercial IC logic gates –
Fairchild 1960
• TTL – 1962 into the 1990’s
• ECL – 1974 into the 1980’s
MOSFET Technology
• MOSFET transistor - Lilienfeld (Canada) in 1925
and Heil (England) in 1935
• CMOS – 1960’s, but plagued with manufacturing
problems
• PMOS in 1960’s (calculators)
• NMOS in 1970’s (4004, 8080) – for speed
• CMOS in 1980’s – preferred MOSFET technology
because of power benefits
• BiCMOS, Gallium-Arsenide, Silicon-Germanium
• SOI, Copper-Low K, …
Transistors on lead microprocessors
double every 2 years
1000
100
2X growth in 1.96 years!
10
P6
Transistors (MT)
Pentium® proc
1 486
386
0.1 286
8085 8086
0.01 8080
8008
4004
0.001
1970 1980 1990 2000 2010
Year
Evolution in DRAM Chip Capacity
100000000
64,000,000
16,000,000
10000000
4,000,000
Kbit capacity/chip
1000000 1,000,000
256,000
100000
64,000
16,000
10000
4,000
1000 1,000
256
100
64
10
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 2010
Year
Die size grows by 14% to satisfy
Moore’s Law
100
P6
Die size (mm)
1
1970 1980 1990 2000 2010
Year
Lead microprocessors frequency
doubles every 2 years
10000
P6
Frequency (Mhz)
100
Pentium ® proc
486
10 8085 386
8086 286
1 8080
8008
4004
0.1
1970 1980 1990 2000 2010
Year
Courtesy, Intel
Lead Microprocessors power
continues to increase
100
P6
Pentium ® proc
Power (Watts)
10
486
8086 286
386
8085
1 8080
8008
4004
0.1
1971 1974 1978 1985 1992 2000
Year
1000
Nuclear
100 Reactor
8086
10 4004 Hot Plate P6
8008 8085 386 Pentium® proc
286 486
8080
1
1970 1980 1990 2000 2010
Year
Courtesy, Intel
Technology Directions: SIA Roadmap
http://www.itrs.net/ntrs/publntrs.nsf
Why Scaling?
MODULE
+
GATE
CIRCUIT
Vin Vout
DEVICE
G
S D
n+ n+
Fundamental Design Metrics
• Functionality
• Cost
– NRE (fixed) costs - design effort
– RE (variable) costs - cost of parts, assembly, test
• Reliability, robustness
– Noise margins
– Noise immunity
• Performance
– Speed (delay)
– Power consumption; energy
• Time-to-market
Static Gate Behavior
Steady-state parameters of a gate – static behavior – tell
how robust a circuit is with respect to both variations in
the manufacturing process and to noise disturbances.
Digital circuits perform operations on Boolean variables
A logical variable is associated with a nominal voltage
level for each logic state
1 VOH and 0 VOL
VOH = ! (VOL)
V(x) V(y)
VOL = ! (VOH)
Difference between VOH and VOL is the logic or signal swing Vsw
DC Operation
Voltage Transfer Characteristics (VTC)
VOH = f (VIL)
V(y)=V(x)
Switching Threshold
VM
VOL = f (VIH)
V(y)
"1" VOH Slope = -1
VOH
VIH
Undefined
Region
Slope = -1
VIL
VOL
"0" VOL
VIL VIH V(x)
Noise Margins
For robust circuits, want the “0” and “1” intervals to be a s
large as possible
VDD VDD
VOH "1"
NMH = VOH - VIH
VIH
Noise Margin High Undefined
Region
Noise Margin Low VIL
NML = VIL - VOL
VOL
"0"
Gnd Gnd
Gate Output Gate Input
v0 v1 v2 v3 v4 v5 v6
v2
5
v0
V (volts)
1 v1
-1
0 2 4 6 8 10
t (nsec)
Conditions for Regeneration
v0 v1 v2 v3 v4 v5 v6
v1 = f(v0) v1 = finv(v2)
v3 f(v) finv(v)
v1 v1
v3
finv(v) f(v)
v2 v0 v0 v2
Ri =
Ro = 0
Fanout =
Vin
The Ideal Inverter
The ideal gate should have
infinite gain in the transition region
a gate threshold located in the middle of the logic swing
high and low noise margins equal to half the swing
input and output impedances of infinity and zero, resp.
Vout
Ri =
Ro = 0
g=- Fanout =
Vin
Delay Definitions
Vin Vout
Vin
Propagation delay?
input
waveform
Vout
output
signal slopes?
waveform
t
Delay Definitions
Vin Vout
Vin
Propagation delay
input 50% tp = (tpHL + tpLH)/2
waveform
t
tpHL tpLH
Vout
90%
output
50% signal slopes
waveform
10%
t
tf tr
Modeling Propagation Delay
Model circuit as first-order RC network
C
Time to reach 50% point is
vin
t = ln(2) = 0.69
Polysilicon Aluminum
Switch Model of NMOS Transistor
| VGS | Gate
Source Drain
(of carriers) (of carriers)
Ron
Source Drain
(of carriers) (of carriers)
n+ - V -V + n+
GS T
Pinch-off
4
VGS = 2.0V
3
Linear Saturation
2 VGS = 1.5V
1
VGS = 1.0V
0
cut-off 0 0.5 1 1.5 2 2.5
VDS (V)
NMOS transistor, 0.25um, Ld = 10um, W/L = 1.5, VDD = 2.5V, VT = 0.4V
Short Channel Effects
Behavior of short channel device mainly due to
5
10
Velocity saturation –
the velocity of the
carriers saturates due to
scattering (collisions
suffered by the carriers)
0
0 c= 1.5 3
(V/m)
IDSAT
has a linear dependence wrt VGS so a reduced
amount of current is delivered for a given control voltage
Short Channel I-V Plot (PMOS)
All polarities of all voltages and currents are reversed
-2 VDS (V) -1 0
0
-0.6
VGS = -2.0V
-0.8
VGS = -2.5V
-1 X 10-4
VDD
Vin Vout
CL
CMOS Inverter:
Steady State Response
VOL = 0
VDD VDD
VOH = VDD
VM = f(Rn, Rp)
Rp
Vout = 1 Vout = 0
Rn
Vin = 0 Vin = V DD
CMOS Properties
• Full rail-to-rail swing high noise margins
– Logic levels not dependent upon the relative device sizes
transistors can be minimum size ratioless
• Always a path to Vdd or GND in steady state low
output impedance (output resistance in k range)
large fan-out (albeit with degraded performance)
• Extremely high input resistance (gate of MOS
transistor is near perfect insulator) nearly zero
steady-state input current
• No direct path steady-state between power and
ground no static power dissipation
• Propagation delay function of load capacitance and
resistance of transistors
Short Channel I-V Plot (NMOS)
X 10-4
2.5
VGS = 2.5V
2
VGS = 2.0V
1.5
VGS = 1.5V
1
0
0 0.5 1 1.5 2 2.5
VDS (V)
NMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = 0.4V
Short Channel I-V Plot (PMOS)
All polarities of all voltages and currents are reversed
-2 VDS (V) -1 0
0
-0.6
VGS = -2.0V
-0.8
VGS = -2.5V
-1 X 10-4
Vin = 0 Vin = 0
Vin = 1.5 Vin = 1.5
VGSp = -1
Mirror around x-axis Horiz. shift over VDD
VGSp = -2.5
Vin = VDD + VGSp Vout = VDD + VDSp
IDn = -IDp
CMOS Inverter Load Lines
PMOS X 10-4 NMOS
2.5
Vin = 0V Vin = 2.5V
2
PMOS sat
1
Vin (V)
Impact of Process Variation on VTC Curve
2.5
2 Good PMOS
Bad NMOS
1.5
Vout (V)
Nominal
1
Bad PMOS
Good NMOS
0.5
0
0 0.5 1 1.5 2 2.5
Vin (V)
VDD VDD
Rp
Vout Vout
CL CL
Rn
Vin = 0 Vin = V DD
tpLH tpHL
4.5 of 2.4 (= 31 k/13 k)
gives symmetrical response
4 tp
of 1.6 to 1.9 gives optimal
performance
3.5
3
1 2 3 4 5
= (W/Lp)/(W/Ln)
Device Sizing for Performance
• Divide capacitive load, CL, into
– Cint : intrinsic - diffusion and Miller effect
– Cext : extrinsic - wiring and fanout
tp = 0.69 Req Cint (1 + Cext/Cint) = tp0 (1 + Cext/Cint)
– where tp0 = 0.69 Req Cint is the intrinsic (unloaded) delay of
the gate
tp0 is independent of the sizing of the gate; with no load the drive of
the gate is totally offset by the increased capacitance
any S sufficiently larger than (Cext/Cint) yields the best performance
gains with least area impact
Sizing Impacts on Delay
x 10-11 The majority of the
3.8
for a fixed load improvement is already
3.6
obtained for S = 5. Sizing
3.4
factors larger than 10 barely
3.2
yield any extra gain (and
3
cost significantly more area).
2.8
2.6
2.4
2.2
2
1 3 5 7 9 11 13 15
S self-loading effect
(intrinsic capacitance
dominates)
CMOS Circuit Styles
• Static complementary CMOS - except during switching, output
connected to either VDD or GND via a low-resistance path
– high noise margins
• full rail to rail swing
• VOH and VOL are at VDD and GND, respectively
– low output impedance, high input impedance
– no steady state path between VDD and GND (no static
power consumption)
– delay a function of load capacitance and transistor
resistance
– comparable rise and fall times (under the appropriate
transistor sizing conditions)
• Dynamic CMOS - relies on temporary storage of signal values
on the capacitance of high-impedance circuit nodes
– simpler, faster gates
– increased sensitivity to noise
Static Complementary CMOS
Pull-up network (PUN) and pull-down network (PDN)
VDD
PMOS transistors only
In1
pull-up: make a connection from VDD to F
In2 PUN
when F(In1,In2,…InN) = 1
InN
F(In1,In2,…InN)
In1
pull-down: make a connection from F to GND
In2 PDN
when F(In1,In2,…InN) = 0
InN
NMOS transistors only
S D
Construction of PDN
• NMOS devices in series implement a NAND function
A•B
A
A+B
A B
Dual PUN and PDN
• PUN and PDN are dual networks
– a parallel connection of transistors in the PUN
corresponds to a series connection of the PDN
• Complementary gate is naturally inverting
(NAND, NOR, AOI, OAI)
• Number of transistors for an N-input logic
gate is 2N
CMOS NAND
A B F
0 0 1
A B
0 1 1
1 0 1
A•B
A 1 1 0
A
B
CMOS NOR
A B F
B
0 0 1
A 0 1 0
A+B 1 0 0
1 1 0
A B
A
B
Complex CMOS Gate
B
A
C
D
OUT = !(D + A • (B + C))
A
D
B C
Standard Cell Layout Methodology
Routing
channel
VDD
signals
GND
A M3 B M4
2 A,B: 0 -> 1
B=1, A:0 -> 1
F= A • B A=1, B:0->1
D weaker
A M2 1 PUN
S
VGS2 = VA –VDS1 D
Cint
B M1
VGS1 = VB S 0
0 1 2
Vout
tpHL = 0.69 Reqn CL
Rp
Rp Rp
B
A B Rp
A Rp Cint
Rn CL A
A Rn CL
A Rn Rn CL
Rn
Cint
A B
B INVERTER
NOR
NAND
Input Pattern Effects on Delay
• Delay is dependent on the pattern
of inputs
Rp Rp • Low to high transition
A B – both inputs go low
• delay is 0.69 Rp/2 CL since two p-resistors
are on in parallel
Rn CL
– one input goes low
A • delay is 0.69 Rp CL
Rn • High to low transition
Cint
B – both inputs go high
• delay is 0.69 2Rn CL
• Adding transistors in series (without
sizing) slows down the circuit
Delay Dependence on Input Patterns
2-input NAND with
NMOS = 0.5m/0.25 m
PMOS = 0.75m/0.25 m
3 CL = 10 fF
2.5 A=B=10
1 A=1, B=10
A=1, B=01 62
0.5 A= 01, B=1 50
0 A=B=10 35
0 100 200 300 400
-0.5 A=1, B=10 76
time, psec
A= 10, B=1 57
Transistor Sizing
Rp Rp Rp
1 A B 1 2 B
Rn Rp Cint
CL 2
2 A
B
Rn Rn Rn CL
2 Cint
1
A A B 1
Transistor Sizing a Complex CMOS Gate
B 4 12
A 2 6
C 4 12
D 2 6
OUT = !(D + A • (B + C))
A 2
D 1
B 2C 2
Fan-In Considerations
A B C D
A CL
B C3 Distributed RC model
C C2 (Elmore delay)
D C1 tpHL = 0.69 Reqn(C1+2C2+3C3+4CL)
1250
quadratic
1000 function of
fan-in
750
tp (psec)
tpHL tp
500
250 tpLH
linear
0 function of
2 4 6 8 10 12 14 16 fan-in
fan-in
• Progressive sizing
Distributed RC line
charged 01
In3 1
M3 CL In1 M3 CLcharged
A 44 CL = 100 fF
B 45 C3
C 46 Progressive sizing in pull-down
C2
chain gives up to a 23%
D 47 C1 improvement.
F = ABCDEFGH
Fast Complex Gates: Design Technique
• Isolating fan-in from fan-out using buffer insertion
CL CL
CL
Vout
Drain junction
leakage
10-2
An 90mV/decade VT
roll-off - so each
255mV increase in VT
gives 3 orders of
ID (A)
1000
0.25
Ileakage(nA/m)
0.18
100
0.13
0.1
10
1
30 40 50 60 70 80 90 100 110
Temp(C)
From De,1999
Review: Energy & Power Equations
E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage
0
A B
CL PA
1 0 1
PB
B X = Y if A or B
X Y
B X = Y if A or B = A B
X Y
Ratioless
A
A Inverse PT F
B Network F
B
B B B B B B
A A A
A A A
F=AB F=A+B F=AB
B B A
AND/NAND OR/NOR XOR/XNOR
CPL Properties
• Differential so complementary data inputs and outputs
are always available (so don’t need extra inverters)
• Still static, since the output defining nodes are always
tied to VDD or GND through a low resistance path
• Design is modular; all gates use the same topology,
only the inputs are permuted.
• Simple XOR makes it attractive for structures like
adders
• Fast (assuming number of transistors in series is small)
• Additional routing overhead for complementary signals
• Still have static power dissipation problems
NMOS Only PT Driving an Inverter
In = VDD
Vx = VDD- M2
VGS
A = VDD VTn
D S
B M1
Voltage, V
D
VDD Out
0.5/0.25
1
B 0.5/0.25
Out
0
0 0.5 1 1.5 2
Time, ns
low VT transistors
In2 = 0V A = 2.5V
on
Out
off but
leaking
In1 = 2.5V B = 0V
sneak path
C = GND C = GND
A = VDD B A = GND B
C = VDD C = VDD
In2
S F
In1
In1 S S In2
Differential TG Logic (DPL)
B A B A B A B A
A A
F=AB B F=AB
GND
B A
B
GND
VDD A
A F=AB B F=AB
VDD A
B B
AND/NAND XOR/XNOR
The 1-bit Binary Adder
Cin A B Cin Cout S carry status
0 0 0 0 0 kill
A 0 0 1 0 1 kill
1-bit Full
Adder S 0 1 0 0 1 propagate
B (FA) 0 1 1 1 0 propagate
1 0 0 0 1 propagate
1 0 1 1 0 propagate
Cout
1 1 0 1 0 generate
1 1 1 1 1 generate
G = A&B
P=AB S = A B Cin = P Cin
K = !A & !B Cout = A&B | A&Cin | B&Cin (majority function)
= G | P&Cin
B
A B B A B Cin
A
A Cin
!Cout !Sum
Cin
A Cin
A
A B B A B Cin
B
Static CMOS Full Adder Circuit
!Cout = !Cin & (!A | !B) | (!A & !B) !Sum = Cout & (!A | !B | !Cin) | (!A & !B & !Cin)
B
A B B A B Cin
A
A Cin
!Cout !Sum
Cin
A Cin
A
A B B A B Cin
B
Cout = Cin & (A | B) | (A & B) Sum = !Cout & (A | B | Cin) | (A & B & Cin)
CPL Full Adder
B B Cin Cin
A !Sum
A Sum
B B Cin Cin
A !Cout
B Cin
A Cout
B Cin
CPL Full Adder
B B Cin Cin
A !Sum
A Sum
B B Cin Cin
A !Cout
B Cin
A Cout
B Cin
TG Full Adder
Cin
A Sum
Cout
Mirror Adder
24+4 transistors
B 6
A 8 B 8 B 8 A 4 B 4 Cin 4
A 6
0-propagate kill
8 A 8 4 Cin 6
!Cout !S
Cin
4 A 4 2 Cin 3
1-propagate generate
A 3
A 4 B 4 B 4 A 2 B 2 Cin 2
B 3
Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal
fan-out for each is also 2. Since !Cout drives 2 internal and 2 inverter
transistor gates (to form Cin for the nms bit adder) should oversize the carry
circuit. PMOS/NMOS ratio of 2.
Mirror Adder Features
• The NMOS and PMOS chains are completely
symmetrical with a maximum of two series transistors
in the carry circuitry, guaranteeing identical rise and fall
transitions if the NMOS and PMOS devices are properly
sized.
• When laying out the cell, the most critical issue is the
minimization of the capacitances at node !Cout (four
diffusion capacitances, two internal gate capacitances,
and two inverter gate capacitances). Shared diffusions
can reduce the stack node capacitances.
• The transistors connected to Cin are placed closest to
the output.
• Only the transistors in the carry stage have to be
optimized for optimal speed. All transistors in the sum
stage can be minimal size.
A 64-bit Adder/Subtractor
add/subt C0=Cin
Ripple Carry Adder (RCA) A0 1-bit
built out of 64 FAs FA S0
B0 C1
Subtraction – complement
all subtrahend bits (xor A1 1-bit
FA S1
gates) and set the low order B1
C2
carry-in
A2 1-bit
RCA FA S2
B2 C3
advantage: simple logic, so
...
small (low cost)
C63
disadvantage: slow (O(N) for
N bits) and lots of glitching A63 1-bit
(so lots of energy FA S63
consumption) B63
C64=Cout
Ripple Carry Adder (RCA)
A3 B 3 A2 B2 A1 B1 A0 B0
Cout=C4 FA FA FA FA C0=Cin
S3 S2 S1 S0
S S
S3 S2 S1 S0
inverted cell regular cell
C1 = G0 | P0C0
C2 = G1 | P1G0 | P1P0 C0
C3 = G2 | P2G1 | P2P1G0 | P2P1P0 C0
C4 = G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 C0
Manchester Carry Chain
• Switches controlled by Gi and Pi
!Ci+1 !Ci
Gi
Pi
clk
• Total delay of
– time to form the switch control signals Gi and Pi
– setup time for the switches
– signal propagation delay through N switches in the
worst case
A
3
4-bit
B
Sliced
A
3 B
MCC
2A
Adder
B A 2 1 1 0 B0
clk
!C4 !C0
S3 S2 S1 S0
Domino Manchester Carry Chain
Circuit
3 3 3 3 3 clk
P3 P2 P1 P0
Ci,4 1 2 3 4
1 G3 2 G2 3 G1 4 G0 5 Ci,0
2 3 4 5 6 clk
!(G0 | P0 Ci,0)
!(G2 | P2G1 | P2P1G0 | P2P1P0 Ci,0)
Co,3
FA FA FA FA Ci,0
Co,3
S3 S2 S1 S0
BP = P0 P1 P2 P3 “Block Propagate”
P3 P2 P1 P0
!Cout Cin
G3 G2 G1 G0
BP
4-bit Block Carry-Skip Adder
bits 12 to 15 bits 8 to 11 bits 4 to 7 bits 0 to 3