Circuit Level

LOW POWER DESIGN
CIRCUIT LEVEL
CMOS only circuit design techniques does not
consume any static power execept leakage
Increasing requirements for speed and
functionality tend to lead classic static CMOS
logic to limits of acceptable power consumption
Different CMOS logic styles and special circuit
design techniques proposed for improving the
power characteristics as well as speed
CIRCUIT LEVEL
When speed and area are taken into account for the different design
techniques
Many factors can influence the efficiency of each of the proposed
techniques
Signal probabilities are the determinative factor for the use it is not
of dynamic logic in design
Pseudo nMOS reduce the power when used in complex logic
function with high frequency switching
In this chapter we will discuss
Several logic styles in terms of performance, area and power
consumption
Overview of latches and flip flops focusing on power
characteristics
Reducing power dissipation based on transistor sizing and
reordering
Energy issues in the design of drivers for large loads
LOGIC STYLE
Discussion about the influence of each logic
style on speed, size and power dissipation
Concept and power considerations relative to
other logic families
Static logic
Static means that the output of a logic gate at
every time point is connected through a low
resistance path to the power supply rails
Static CMOS gates will be given
A static CMOS gate consists of a pull up network
and pull down network
Static CMOS has the following characteristics
Ratioless logic
High noise margins which offer low sensitivity to
noise
Sufficient sped, especially for small gates
Comparable rise and fall times under
appropriate scaling
Ease of design
LOGIC STYLE
POWER CONSIDERATIONS
Other three power components namely dynamic, short
circuit and leakage power
If transistors are properly sized short circuit power is less
than 10% of the total power
Recent techniques attempt to operate the circuits at
supply voltage which are less than the sum of pMOS and
nMOS threshold voltages
Leakage power is due a) leakage currents due to
reverse bias diodes which are present at a transistors
drain and b) subthreshold leakage currents due to carrier
diffusion between source and drain, when the gate to
source voltage of a transistor is below the threshold
voltage
POWER CONSIDERATIONS
Variation of static logic called branch based logic
provides a layout optimization for low power by reducing
parasitic node capacitances. Logic cells are designed
exclusively with branches which are implemented by
transistor in series between supply and output
This new approach, achieves improved speed and lower
power consumption compared to conventional libraries
with a large number of complex gates
Example: 16 bit adder has been shown have lower
static and dynamic power dissipation compared to an
equivalent complementary CMOS adder
DYNAMIC LOGIC
Reduce the transistor count, increase speed and to avoid static
power consumption two phases : precharge and conditional
evaluation phase
Power consumption
Power is consumed during the precharge phase each time the
output capacitor is discharged in the preceding cycle
Dynamic gate can consume power even if the inputs remain
constant.
The output transition probability which determines the dynamic
power consumption depends only on the signal probabilities
P0 ---1 = N0/2N
N0-number of 0s for the output signal in the truth table
N number of inputs
Power considerations
Leakage power carefully taken into account
when a ckt operates in stand by mode for large
periods of time
Variation of static logic branch based logic
(BBL) provides layout optimization for low power
by reducing parasitic node
Capacitance
Large cells designed exclusively with branches
implemented by transistor in series between
supply and output
Branch Based Logic
Used to build a cell library with a limited number
of standard cells
Achieves improved speed and low power
consumption compared to conventional libraries
with a large number of complex gates
Better optimization enabled by a limited choice
of cells
BBL 16 bit adder has been shown to have
lower static and dynamic dissipation compared
to an equivalent CMOS adder
Dynamic logic
Reduce transistor count
Increase speed
Avoid static power consumption which
present in pseudo nMOS
dynamic gates are clocked and based on
the sequence of 2 phases
Precharge and evaluation
Precharge pMOS conducts and output
node is precharged nMOS cutoff
Dynamic logic
No DC current flows regardless of input signals
Evaluation: pMOS off nMOS ON depending
upon inputs a conditional path between out and
Gnd
If no path output node remains in precharge
causing high output value
One transition during the evaluation phase
If not redistribution occurs, corrupt the output
node voltage
Drawback doesnt allow single phase dynamic
gate to be cascaded

In dynamic implementation power is
consumed every time the output equals 0
Dynamic gate can consume power even if
the inputs remain constant
Output transition probability determines
the dynamic power consumption depends
only on the signal probabilities
Uniformly distributed inputs, the transition
probability

P
0 1
N
0
/2
N
N
0
number of 0s for the output signal in truth
table and N number of inputs
NOR
2
gate
Dynamic power consumption by the dynamic
implementation
P
NOR
= 0.75 C
L
V
dd
2
f
c
Static power consumption is significantly smaller
P
NOR
= 3/16 C
L
V
dd
2
f
c

The capacitance being switched is
dynamic logic < static implementation
Total power include the power dissipated
in driving the capacitance of the clock lines
Advantages of static Vs dynamic
logic

Two and 4 phase clocking strategies have
been developed to overcome the problem
of cascading dynamic gates (8 clock signal
not suited for low power
To correct the problem modifications to the
basic dynamic logic style
Domino, np-CMOS and NORA logic
Comparison
Static logic Dynamic logic
Glitches 30% energy increase Intrinsically does not
have this problem
Short ckt current Less than 10% of the total power
Parasitic cap - Fewer transistors,
reduced switched
capacitance
Switching activities Depends on previous
state
Does not depend on
previous state.
Generally higher
activity factor
Power down models Effectively used Not well suited
Clock power No clock Due to gate cap. Of
precharge MOS
transistor
Pass transistor logic
To reduce physical capacitance
Boolean function are implemented as a network
of switches, realized by pass transistor
Series connection AND function Parallel OR
function
Relatively expensive for simple monotonic gates
Efficient in terms of transistor count for XOR and
MUX
XOR implementation in CMOS 12 transistors
In Pass transistor - 4 transistors
Pass transistor logic
Full adder requires - 28
Pass transistor - 24
Pass transistor presents the inherent problem of
the threshold drop across a transistor
Causes static power dissipation and requires the
addition of level restoring transistors
n Channel CMOS:
Not suited for low supply voltage
Most important logic style is complementary
pass transistor
Complementary pass transistor
Logic (CPL) consisting of 2 nMOS logic
2 small pull ups pMOS for level restoration
CPL logic in ratio less and high noise margins
enable reliable operation even at low voltages
CPL
Output driving capability due to output inverters
Fast differential stage due to cross coupled
pMOS
Complementary pass transistor
Small input loads reduces overall cap
switched
Power consumption is lower and rise/fall
times are faster

Application
High performance application multipliers
Power consideration
CPL gates count fewer transistors small
transistor size smaller node capacitance
Significant power reduction can be
achieved
Threshold drop
Static power dissipation of the output
Inverter is properly addressed
Example
Pass gate family adder with zero threshold
pass transistor at a supply voltage of 4 V
Consumes 30% less energy than
conventional static design
Full adder simulation result
Logic family Delay ns
3.3 V 1.5 V
Power mw
3.3 V 1.5 V

Power delay
3.3 V 1.5 V

CMOS
1.89 7.88 32.9 6.4 1.00 1.00
CPL
1.39 8.33 34.1 6.0 0.76 0.99

Providing 50 % energy savings
Multiplier based full adder and using modified booth power savings of 18 %
Speed improvement of 30 %
PASS TRANSISTOR
Static power proves to be superior to all
pass transistor logic style both delay and
power for all logic gates except for the full
adder at higher supply voltages
Pass transistor logic not the best choice
for low power design.
Full adder is based on XOR gates and 2:1
MUX are suitable for pass transistors
SPL
Number of full adders are limited compared to
other logic gates and flip flop
Single rail pass transistor logic (SPL) is viable
alternative if low power and compatibility
Pass transistor logic has been increased during
the last few years
Proved by the large number of designs
Synthesis methodologies that target pass
transistor logic
Starting from higher level, tech independent
design specification
Single rail pass transistor logic
Known as single ended pas transistor logic and
refer as LEAP ( LEAn integration using pass
transistor)
Offers a promising low power ckt design
It is simplest member of the pass transistor logic
family
Like CPL uses only nMOS transistor in the pass
network
It doesnot implement the second pass transistor
network for the complementary signals, which
are generated locally if required
SPL
3 main components
Input inverters that buffer inputs and generate all
signal for the pass transistor network
Pass transistor network that implements the
logic function n type transistor, the output swing
at the end of the network will be 0V to V
dd
- V
tn

Output buffers for speed improvement including
a weak pMOS transistor for voltage level
restoration and elimination of short ckt currents
in the output inverter
SPL
Optimum power delay product output
buffer must be inserted every 3 to 4 stages
more buffers in the critical path
Basic element in the pass transistor is the
2 input MUX each mux is a node in the
BDD (binary decision diagram)
Advantages
SPL library has no more than 10 components it
has only 3 main components
Most functions are based on MUX and XOR
implemented efficiently in terms of transistor
count
Pass transistor network contains only nMOS
transistor resulting in a compact layout with fast
operation
Ckt synthesis starting from BDD - automated
POWER
Power consumption is very sensitive to the
min voltage of operation
nMOS transistor is better down to 1V
when V
t
=0.4V
Lower threshold may become a severe
constraint against low power design
Report
7 input and 4 output reduction in power consumption
around15.5% with 31% increase speed
4 bits adder/sub circuit no significant power reduction
3.3V and 0.35 m tech the power consumption of a full
adder and 4 bits adder is about
Delay is worse and significantly greater for lower supply
voltage
SPL does not work for low supply voltage (1.5V)
Final conclusion: power delay performance of SPL logic
has to be investigated for future deep submicron
technologies
Other logic styles
Pseudo nMOS
N inputs N+1 transistor are required, resulting in smaller
area and smaller parasitic capacitance
Logic is ratioed, transistor sizes have to be selected
Static power (min size gate consumes 1 mw
1,00,000 gates consumes 50 W ( half of them output is
low output)
Reduced power is complex logic function switching at
higher frequency where dynamic power is less due to
reduced capacitance
Ckt makes low to high large pMOS is on large current
and fat transition
Suited for gates that should switch only during certain
time periods (decoder)

Differential voltage logic style
(DCVSL)
Eliminates static current in pseudo nMOS based
on dual rail
Advantages
Faster switching due to reduced cap.
Pseudo nMOS the DCVSL exceeds in that there
is no static power
Current during switching increases due to the
large pull up transistor
Two pulldown network area increase but
output (both are available)
Dynamic DCVSL power hungry

Differential current switch logic
(DCSL)
Suitable for high fan in gates, restricting
internal node voltage swings (1V for 5V
supply voltage)
Evaluation is complete DCSL gate does
not respond to its input latch followed by
combinational ckt
Moderate tree DCSL reduces power
dissipation 1/2
DCSL problems
Precharged differential logic it has high
activity factor
Sensitive to noise
AND/NAND cant be implemented
Balanced layout techniques
Charge recycling Differential logic
Power consumption reduced by using
some of already used charge in precharge
Half supply precharge level is achieved.
50% of that with full swing
0.8 m 5V manchester carry chain 27 %
improvement in power dealy product
compared with DCVSL
Push pull pass transistor logic
Similar to CPL
Pass transistor network employs two complementary
pass transistor networks
Complementary network turns the corresponding pull up
or pull down transistor for the threshold voltage drop
Push pull eliminated the need for output buffers with
restoring transistors
Good low power choice for logic style
40 stage full adder indicate a power delay product of
only 60 % compared to SPL
0.8 m 3.3 V CMOS technology power delay product for
PPL 42% , 63% CPL and 78% SPL implementation for
multiplier
Logic styles - Discussion
Not possible to conclude a specific logic style is the
optimum in terms of both performance and power
consumption
SPL has been promising logic style in the era of low
power designs reduced transistor count for complex
functions drawback is submicron technology (below
1V)
Analysis should not focus on dynamic power due to
charging and discharging but also short circuit power
dissipation, energy consumption due to glitching activity,
sub threshold currents and ability to benefit from future
developments (ultra low supply voltages) should be
taken
Logic styles - Discussion
Static logic remains the most reliable logic
style
Simple, robust and relatively low power
techniques, ease of design and
advantages of being supported by the
majority of electronic design automation
(EDA) tools
Latches and Flip-flops
Clock distribution network and clocked
registers are flipflops and latches
Clock distribution network a substantial
part of the system total power
consumption
To save power the clocked capacitance
should be minimum
LATCHES
Dynamic latches are the simplest and most efficient
timing circuits.
Classic latch fastest latch (due to true single stage),
transmission gate followed by inverter
C
2
MOS latch slower, robust, more power efficient (no
contact at the intermediate nodes)
Both have four clocked transistors (including the inverter,
three are loading the clock input)
True single phase clocked (TSPC) half latch- isolate high
inputs at clock low
Non-precharged TSPC latch two of the TSPC half latch
ckts- slower , more robust and only two clocked
transistors
LATCHES
Dual rail CVSL logic depends on transistor ratios n-
transistor must be stronger than p-transistor (flip the
latch)
Dynamic single transistor clocked (DSTC) latch uses
common clocked transistor to save power, fast and
power efficient but sensitive to input glitches (hold state)
In simulation Non precharged true single phase clocked
latch (NPTSPC) lowest power (due to two clocked
transistors and the lack of precharging).
TSPC and PTSPC is next for power consumption
STATIC LATCHES
Positive feedback (cross coupled inverter)
3GATE latch more complex, low power
consumption (four clocked transistors)
TGATE latch classical transmission gate 6
clocked transistors
Above two are based on standard CMOS cell
libraries
SRAM memory cell based ( 6 transistors)
depends on transistor ratios
Static version of SSTC
STATIC LATCHES
SSTC and SRAM highest speed and low
power consumption. SSTC operation
speed is higher than SRAM. p version of
these latches worse performance
depend on transistor ratios
3GATE latch less power consumption than
TGATE but worse speed

DYNAMIC FLIP FLOP
Dynamic f/f constructed by cascading two
latches with different polarities used in classic,
C
2
MOS and NPTSPC
Efficient f/f constructed by p-half TSPC latch and
PTSPC latch (9T latch). Inverter is used for
complementary output.
Dual rail f/f using DSTC latch with only two
clocked transistors
Simulation results shows DSTC,NPTSPC and
9T f/f lowest power consumption. NPTSPC
and 9T f/f can improve the speed without
complementary output

STATIC FLIP FLOP
3GATE, TGATE and SSTC flip flop is
shown
SSTC uses only two clocked transistors
very low power consumption minimum
delay
3GATE f/f consumes low power compared
to TGATE
DOUBLE EDGE TRIGGERED F/F
Triggered on both edges clock frequency can be half
for the same data rate reduce the power dissipation on
the clock distribution
Double edge triggered (DET) and Single edge triggered
(SET) - static and dynamic both has two D type
latches DET latches are arranged in parallel whereas
SET latches are serial
DET lower power consumption (20%) less than SET
Dynamic SET f/f being slightly faster requires clock
operating twice of dynamic DET.
Dynamic f/f consumes less power than static
System level energy saving is possible in DET. Example
proves that DET can save about 17% power
TRANSISTOR ORDERING
Relative placement and ordering will reduce the power
By reordering the transistors in CMOS gate switching activity
Power dissipation can be reduced by two ways minimizing the
drain source capacitance and by signal probability algorithms
Highest capacitance have to placed closest to the supply and
ground (fig b)
Signal probability algorithm pull down network was reported to
be better (fig a)
The reduction in power dissipation (15,1% for worst case and
7.2 % average)
MUX, adder and ALU 12 % average power reduction with 4 %
increase in delay
Reordering rules reduces power by about 10% on average and
30% in some cases
TRANSISTOR SIZING
Minimizing the power consumption under a given delay
constraint by sizing the transistor. Two algorithms
Algorithms that start with a circuit that satisfies the timing
constraint and reduce the size of the gates to reduce
power dissipation
Performing an initial power optimal sizing on each gate
Power minimal layout satisfies the delay constraint, the
process terminate
Algorithm is more complex it takes into account for
circuit capacitance and short circuit power dissipation
Power consumption of a CMOS ckt is a convex function
DRIVERS FOR LARGE LOADS
To drive high capacitance loads with reasonable speed
with short rise/fall time
Long rise/fall time larger short ckt power consumption
Ex: clock networks, clock drivers, long buses, long
interconnects and chip outputs
To drive big loads tapered inverter chain is used
scaling factor in a uniformly tapered buffer minimizes
the power delay product
Simulation shows 15 -35 % savings in power delay
compared with min. propagation delay
Non uniform tapering shows 8 % improvement in
dynamic switching improvement becomes smaller 3- 5
%for total switching energy
Use uniform buffer, much simpler and provides better
insight into the optimization of power delay product
tapering factor Y= C
L
/C
i
C
i
input capacitance of the first inverter

N
= Y; N = lnY/ln
Total delay time
tot
=(lnY/ln)
i
i

propagation delay of the first inverter
Differentiating with respect to optimum value of = e=2.72
Total capacitance of the chain C
tot
= (Y-1)C
i
/(-1)
Power delay product of inverter chain
P
tot
tot
directly proportional =(lnY/ln) * (Y-1) /(-1)
Differentiating with respect to optimum power delay
product = 4.25

Tapering factor increases the total
capacitance and power consumption
decreases
is increased from 3.5 to 9 power
consumption overhead reduced from 80 %
to 25 % and the cost delay increases by
20 %.
Very large value of , the delay will be too
large

Conclusions
Different logic style, circuit structures and
circuit design techniques have been
presented

Circuit Level

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Circuit Level

Transféré par

Droits d'auteur :

Formats disponibles

LOW POWER DESIGN

Vous aimerez peut-être aussi