Vous êtes sur la page 1sur 15

White Paper

Power Management in Complex SoC Design

Jim Flynn, Senior IC Design Engineer, Synopsys Professional Services


Brandon Waldo, Senior IC Design Engineer, Synopsys Professional Services

April 2005
The need to reduce power consumption—long recognized as a significant design issue—becomes more
critical as larger, faster ICs go into portable applications. As a result, techniques for managing power
throughout the design flow are evolving to assure that all parts of the product receive power properly and
efficiently, and that the product is reliable. Techniques such as multi-voltage islands and dynamic scaling
of both clock frequency and threshold voltage help conserve battery power in portable applications, while
delivering high performance.

Perhaps more critically, increases in system-on-chip (SoC) size and speed have led to power consumption
challenges across a broad range of designs that have not been viewed traditionally as supply-limited. In
these designs, heat dissipation and reliability issues such as electromigration and IR drop have become
vitally important. (For information on dealing with power-related reliability issues, please consult the
Synopsys Professional Services’ White paper “Design Planning Strategies to Improve Physical Design
Flows—Floorplanning and Power Planning” http://www.synopsys.com/cgi-bin/sps/wp/dps/paper1.cgi)

Power issues in mainstream deep submicron designs may limit functionality or performance and severely
affect manufacturability and yield. Higher power dissipation increases junction temperature, which slows
transistors and increases interconnect resistance. Design techniques aimed at improving performance
may therefore fall short if power is not considered. Lower-than-expected performance decreases device
yield. Additionally, higher power dissipation requires more system-level measures for thermal management.
In general, these power issues are increasing SoC and system costs. Managing power consumption at
appropriate points in the SoC design flow keeps these costs under control.

Where an SoC Consumes Power


The total power consumed by a chip equals dynamic power plus static power. Dynamic power is the power
consumed in switching logic states, both internal to the cells (internal power) and for driving the chip’s nets
and external loads (switching power):

Dynamic power = CV2F

where C is the load, V is the voltage swing and F is the number of logic-state transitions.

As semiconductor structures become smaller, device and interconnect capacitances decrease, allowing for
higher performance and lower power. Countering these factors are power increases due to larger designs
and higher switching rates.

Static power (leakage power) is consumed while transistors are not switching:

Static power = VISTAT

Although transistors have some reverse-biased diode leakage from drain to substrate, the larger portion of
leakage power is due to the sub-threshold current through a transistor that is turned off. This sub-threshold
current results from the conduction between source and drain through the transistor channel.

The sub-threshold leakage current is problematic because it increases as transistor threshold voltages
(Vth) decrease. In fact, the move to 130 nanometer (nm) and beyond may boost leakage power as high as
50 percent of the total chip power (Figure 1). Increased leakage power helps to exponentially increase
reliability related failures in chips (even in standby).

©2007 Synopsys, Inc. 


Leakage Power Active Power

250

200
Power (W)

150

100

50

0
0.25µ 0.18µ 0.13µ 0.10µ 0.07µ

Technology

Source: Intel

Figure 1: Increase in leakage power—Bringing down transistor threshold voltages helps decrease dynamic
power but increases sub-threshold leakage current. A power-aware design flow is thus needed to meet timing
requirements and keep power consumption within acceptable limits. Source: Intel. Published in IC Insights Inc.
2003 Technology Trends.

As CMOS technologies scale down, the main approach for reducing power has been to scale down the
supply voltage VDD. Voltage scaling is a good technique for controlling a chip’s dynamic power because of
the quadratic effect of voltage on power consumption. However, just reducing the power supply degrades
circuit speed because the switching delay time is proportional to the load capacitance and the ratio Vth/
VDD. To maintain sufficient drive strength for fast switching, Vth must decrease in proportion to VDD. This
relationship leads to the leakage power increase. Fortunately, a power-aware design flow helps balance
timing requirements with various power goals.

Power Solutions
The higher the level of design abstraction, the greater the influence on power consumption. At the system
and algorithm levels, for example, using a parallel approach rather than a serial implementation reduces
clock frequencies, which helps to decrease power consumption significantly. The lower power of the parallel
approach may come at the expense of somewhat greater area or slower performance.

To give an example of the effect of parallel vs. sequential architectures, in one chip that received data
samples serially, the samples were processed in parallel to reduce this logic’s clock speed from 80 to 10
MHz. Additionally, the supply voltage was reduced from 1.8V to 1.25V. The parallel processing logic was much
larger than the serial processing equivalent, but the logic’s reduced voltage and operating frequency reduced
the power consumption by 75 percent. This parallel approach was able to save power because power has
a squaring function to voltage and only a linear function for frequency and switching. In other designs, the
area penalty has been small but the power savings significant, so it is worth exploring the tradeoffs.

 ©2007 Synopsys, Inc.


Power Optimization Power Analysis
Architecture optimization Power estimates based on
System Design (e.g. parallel vs. serial) - Estimated gate counts
- Estimated activity
Supply voltage scaling
Clock frequency scaling

Module clock gating RTL power analysis based on


RTL Design defined clocks and registers
- Estimated gate counts
- Realistic activity

Voltage islands
Floorplanning

Threshold voltage scaling Gate level power analysis


Synthesis Power optimization in synthesis based on
- Actual gate counts
RTL clock gating
- Realistic activity
- Accurate routing
- Final libraries

Gate level power analysis


Place and Route based on
- Actual gate counts
- Realistic activity
- Accurate routing
- Final libraries

Figure 2: In the context of the design flow, the potential for power savings and the accuracy of power
estimates is greatest early in the flow

Figure 2 references several power optimization and analysis techniques that can be used throughout an
SoC design flow. The power solutions covered in this paper include:

• Module clock gating


• Multiple supply voltages
• Multiple threshold voltages
• Power optimization in synthesis, including RTL clock gating

Because techniques such as clock gating and dividing affect design for test (DFT), that topic is also
addressed. A brief design example at the end of the paper shows the benefits of combining dynamic
frequency and voltage scaling.

©2007 Synopsys, Inc. 


Power Estimation and Analysis
Over the course of the design flow, it is useful to estimate power consumption at four stages (Table 1). The
accuracy of the estimate improves at each stage as additional design and library information becomes available.

When to perform the How Gates are How Load is Calculated Estimation Tool(s) Used
estimation – during Calculated
1. Design/library Rough estimation Unknown/In definition Spreadsheet
exploration
2. Pre/early synthesis Rough estimation DC-Wire Load Models Design Compiler, Power
Compiler
3. Post-synthesis Accurate (placed) Wire Load Models/ Power Compiler,
SPEF Physical Compiler,
PrimePower
4. Post-layout Exact Extracted –SPEF PrimePower

Table 1: Four stages of power consumption estimation (Recommended).

RTL Power Analysis


In the earliest stages of a design flow, power analysis provides rough estimates of a design’s power
consumption. Libraries may not be selected yet, so library data may be limited. Before the library is
selected, a spreadsheet analysis can be used to reveal the best power-conscious libraries and design
architectures. After the library is selected, Design Compiler® and Power Compiler™ can be used instead of
the spreadsheet method or to supply values for use in the spreadsheets.

The power-analysis spreadsheet includes approximate gate counts, rough activity-per-block values, side-by-
side vendor µW/MHz data, and relative power estimates. The analysis at this point also helps to show if a
design consumes too much power to be practical–thus avoiding weeks of design effort to implement a chip
that will never be manufactured.

To use the spreadsheet analysis method, it is necessary to estimate each block’s gate count (number of
library cells of each type) and activity level. The amount of energy consumed by the switching of each cell
type is also needed; data from a library vendor’s manuals can be used to assign an appropriate power value
relative to speed (in µW/MHz). A block’s internal power consumption for a particular type of cell is given by
the equation:

Power consumption = Gate Count * µW/MHz * Activity * Frequency

Summing these power values for all the different types of cells in a block gives the block’s overall internal
active-power estimate. Before synthesis, gate counts are estimated based on architectural choices and
an understanding of the design. For example, approximate gate counts can be drawn from features such
as bus sizes, word lengths, control layers and memory depth. When the library has been selected, the gate
counts for a block can be estimated by using Design Compiler’s report _ reference capability after early
synthesis, which reports the number of each instance type for the design.

A key aspect of the power calculation is assigning the activity levels. The gates of a design have different
activity levels that can be estimated with or without a simulation to extract switching activity. After the
library is selected, however, a functional simulation is recommended to determine the switching activity.

Switching activity is measured in terms of a toggle rate (TR). Toggle rate is the number of logic-0-to-logic-1
and logic-1-to-logic-0 transitions of a design object (for example, a net, pin or port) per unit of time. A net
having an activity of 50 logic-1-to-logic-0 transitions and 50 logic-0-to-logic-1 transitions during a 100ns
interval has a TR of 1. A net having an activity of five logic-1-to-logic-0 transitions and five logic-0-to-logic-
1 transitions during a 10ns interval also has a TR of 1. These examples have nanoseconds as the unit of

 ©2007 Synopsys, Inc.


time, and a TR of 1 indicates one activity transition per ns. Power and TR can be related by understanding
that for each transition an amount of energy must be supplied to change the state of an internal circuit
during the time interval of the state change.

Keep in mind that power estimates at any level of abstraction are meaningful only when the switching
activity represents the chip’s actual working operation. A common mistake is to use a vector set that
simulates system boot sequences when trying to determine activity. This activity rarely represents actual
working conditions and therefore leads to inaccurate power estimates. An RTL simulator helps to generate
a Switching Activity Interchange Format (SAIF) file automatically, but the activity values are accurate only
if the vector set is realistic. Current tools are not able to generate such vectors automatically—the task
requires an understanding of the circuit’s intent.

Figure 3 shows the Programming Language Interface (PLI) system tasks that can be used within VCS®
to generate an SAIF file during simulation. Power Compiler offers a power _ estimate capability that
uses an SAIF file to define libraries and constraints and annotate the design for power estimation. Power
Compiler’s default switching activity for non-annotated ports is 0.25 toggle per positive edge; this value is
applied and propagated throughout the block.

$set_gate_level_monitoring (“rtl_on”);
$set_toggle_region;
$toggle_start;
$toggle_stop;
$toggle_report;

Figure 3: Programming Language Interface (PLI) commands —


These commands cause VCS to generate an SAIF file for use in
Power Compiler.

Tables 2 lists examples of results estimated using the above methods. After calculating internal power,
switching power can be estimated as 30 percent of internal power. Without accurate load and switching
data, this value is only a rough estimate. Such estimates are useful mainly as a way to compare the power
implications of various design strategies rather than as predictors of a chip’s actual power consumption.
As mentioned earlier, however, rough estimates at the RTL stage do provide an early warning that a design
may turn out to be unacceptably hot.

Block 1: (Frequency = 100 Mhz)


Gate Type A = 125 activity = 0.25 µW/MHz = 5
Gate Type B = 150 activity = 0.05 µW/MHz = 12
Gate Type C = 50 activity = 0.1 µW/MHz = 16

Table 2: Example of estimation results using the spreadsheet method

report_reference:
Reference Library Unit Area Count Total Area Attributes
INV tech_lib 1.00 1 1.00
MX1P tech_lib 8.00 8 64.00 n
NAND2 tech_lib 1.00 6 6.00
NAND3 tech_lib 2.00 1 2.00
Total 8 references 174.00

Table 3: Example of report_reference command output

©2007 Synopsys, Inc. 


Switching power is usually the most important value to determine in early analysis, but it is also possible
to estimate leakage power based on each cell type’s leakage data. Since leakage is different for high and
low states, the leakage analysis must be based on the static probability that a signal is at a certain logic
state. Static probability is expressed as a value between 0 and 1. This value can be estimated based on a
signal’s function. For example, an active-low reset signal typically has a logic-one static probability (SP1) at
or near 1.0 (100 percent). For a data-bus signal, SP1 can usually be assumed as 0.5 (50 percent) unless
some architectural characteristic suggests otherwise. After the library is selected, static probability can be
calculated during simulation by comparing the time a signal is at a certain logic state to the total time of
simulation.

Gate-level Power Analysis


After synthesis, it is possible to get fairly accurate power estimates from Power Compiler based on
actual gate counts and simulated activity. The most significant sources of inaccuracy at this point are the
activity and pre-layout wire-load values. Accuracy is improved by generating an SAIF file from gate-level
simulations. In VCS, the same commands shown in Figure 3 generate the SAIF file, except that the first
command should be:

$set_gate_level_monitoring (“on”);

Again it must be emphasized that activity values are accurate only when the simulation vectors represent
actual application behavior. Physical Compiler® helps improve the accuracy of the load values by using the
write _ parasitics -distributed command after physical optimization. This command produces a
SPEF file annotating Steiner route and RC parasitic estimates.

After layout, a gate-level simulation helps generate a Value Change Dump (VCD) file for use in
PrimePower® analysis. VCD files log changes to signal values during a simulation and provide the design’s
nodal activity, structural data hierarchical connectivity, path delays, timing and event information.

Note that chip I/Os can be a significant source of inaccuracy if they are numerous, switching at high speed
and driving long wires. If design goals require accurate rather than worst-case power estimates, lumped
load models for the I/Os may produce overly pessimistic results. To get a more accurate picture, HSPICE®
simulations can be performed on critical I/O cell types with accurate distributed-impedance models.
The I/O cell power can then be calculated using numeric methods that determine charge and energy
per rising/falling edge. Given the HSPICE output of current and time, the internal energy per transient
is calculated using the trapezoidal integration method (in Matlab, for example). The I/O activity recorded
during PrimePower analysis is used to scale I/O power, and the total I/O power is combined with the core
power for an overall power estimate.

To show how power estimates vary using the methods described here over different phases of the design
and implementation cycle, Figure 4 shows examples based on one block (a high-speed FIR filter) in a
DSP design. This example demonstrates how the power estimates vary depending on the accuracy of the
information supplied. The graph shows how the estimates changed for an example block at four points in
the flow:

• Case 1—An estimate using worst-case switching activity and worst-case wire load estimates
• Case 2—An estimate using more accurate wire load estimates and worst-case activity
• Case 3—An estimate using accurate wire load estimates and realistic activity
• Case 4—An estimate using exact wire loads (extracted) and realistic activity based on
SPICE-accurate simulation

 ©2007 Synopsys, Inc.


600
518
500
430
Power —mW
400

300 260
237

200

100

0
1 2 3 4

Case

Figure 4: In the course of a design flow, power estimates can vary considerably.

Power Optimization Techniques


Figure 5 categorizes the power optimization techniques with respect to static vs. dynamic power and
the level of design abstraction at which the technique is applied. The use of one or more of these
methodologies depends on the design goals. Incorporating the methods into a design flow provides an
integrated power-management design strategy.

IC Design (RTL coding)

1. Physical
approach
(Power supply Multi-clock
control or 2. Design
source
voltage island) approach
Multi-power
supply
Power gating Clock gating
Static Power Dynamic Power

Multi-VT Low power


synthesis synthesis

3. Synthesis
approach

IC Implementation (RTL to GDSII)

Figure 5: Power optimization techniques for different stages of a design flow (from
top to bottom) and how they affect static or dynamic power (from left to right).

©2007 Synopsys, Inc. 


Module Clock Gating
Module clock gating can be used at the architectural level to disable the clock to parts of the design that
are not in use. Power Compiler helps replace the clock gating logic inserted manually, gating the clock to
any module using an Integrated Clock Gating (ICG) cell from the library. The tool automatically identifies
such combinational logic once the clock is explicitly created by the user in the script.

Module clock gating can be applied in a series of levels, including the chip level, domain level (DSP, CPU,
etc.), module and sub-module. When the whole chip is in idle mode but must respond to external wakeup
events, an application can gate the chip clock. The same is true at the lowest level; when no memory
access is needed, the clock to the SDRAM controller can be switched off, given that the SDRAM is first set
to self-refresh mode. In addition to turning clocks on and off, the gating structure can include configurable
clock dividers to change the clock speed to various parts of the design.

Designing such a clock structure depends on an understanding of the chip’s function and insights from
power analysis about how much power can be saved by clock-gating ever-smaller portions of the design. In
general, clock switching power is more than 30 percent of a chip’s total power consumption, so clock gating
at all levels is usually well worth the effort.

Clock Gating Challenges


Beyond the complexities of deciding where and how to gate and/or divide clocks, high-level clock gating
involves a number of timing and DFT issues (more on DFT later). The timing issues can be appreciated by
observing that a long path in a clock structure might go through a DPLL, a clock divider, several mode-
switching multiplexers and several levels of clock gating.

While a tool such as Astro™ CTS (clock tree synthesis) synthesizes high-quality clock trees for typical chips,
complex gated clocks and dividers can require manual intervention, largely based on the need to modify
parts of the design outside the purview of the tool. This intervention may be needed to prevent severe clock
phase delay, for example. Clock phase delay might occur because registers and non-CTS cells in a high-
level clock hierarchy are placed far apart, causing an increase in high-level expanded clock tree insertion
delays and thus an increase in clock phase delay. Netweight-based placement control of non-CTS cells
can avoid the problem. This method involves extracting nets that connect the clock gating cells, switching
multiplexers and driven CTS macros, then applying heavy net weights to these nets to pull the cells close
to each other in the placement optimization. The optimization then minimizes the cells’ load and hence cell
delays and output slews.

A poor floorplan for clock distribution can also cause phase delay problems because clock tree synthesis
balances the clock tree according to the delay of the longest clock tree branch. A single long clock path
due to a poor floorplan therefore increases the entire clock tree insertion delay. Careful floorplanning
constraints for better clock tree balancing prevent this problem.

Other sources of clock phase delay are bad placement of non-CTS cells and large slew at non-CTS cell
outputs. The Synopsys Professional Services paper “Clock Distribution and Balancing In a Large and
Complex ASIC: Issues and Solutions” gives solutions to these problems as well as methods for dealing with
three other clock distribution issues: clock skew reduction, clock duty cycle distortion reduction and clock
gating efficiency (The paper is available at http://www.synopsys.com/sps/techpapers.html.). The paper
also provides a clock-balancing automation strategy. Manual clock tree analysis and balancing methods are
not suitable for complex ASIC designs due to time-to-market constraints. The automation strategy involves
three steps: extracting a common shared clock distribution topology, defining a local balance strategy
for each clock path that does not fit in the common clock distribution, and combining these local balance
constraints with the constraints of the common clock distribution. The result is a clock tree synthesis
constraint for the CTS tools to balance the complete clock distribution automatically.

Another timing issue is the clock glitch that can occur when restarting a clock asynchronously. (Figure 6
shows how this glitch occurs.) It is therefore necessary to include a circuit that times the restart to avoid
the glitch.

 ©2007 Synopsys, Inc.


CLK1

Select
Out Clock

CLK0

CLK0

CLK1

Select

Out Clock

Glitch

Figure 6: Clock switching glitch — After “turning off” a clock using clock
gating, the clock restart must be timed to avoid the glitch shown here.

Multiple Voltage Islands


While clock gating helps limit dynamic power consumption, the use of multiple supply and/or threshold
voltages can help manage both dynamic and leakage power. Threshold voltage does not have to scale
directly with supply voltage, though the two are related, as explained earlier.

The use of voltage islands or voltage domains offers a way to meet both power consumption and
performance requirements. In this scheme, sections of logic are grouped physically into separate regions
according to their functionality. The logic regions that must operate at the highest speed use the highest
supply voltage, while less timing-critical regions use lower supply voltages.

Frequency scaling is thus necessary along with the voltage scaling, so the voltage island approach works
well with clock gating. The logic in a clock-gated block constantly consumes leakage power, but reducing
the supply voltage to this block reduces the leakage.

Multiple supply voltages must be provided through separate power pins or analog voltage regulators
integrated into the device. The efficiency of these voltage regulators must be included in power calculations
for the device. If only a small portion of the design will operate at a lower voltage, more power may be lost
in the voltage regulator than is saved in the lower-voltage logic. Note that voltage island design may require
level-shifter cells to ensure a proper rail shift for signals traveling between voltage domains.

In addition to reducing supply voltages, it is possible to vary the supply voltage of an island depending
on system requirements. Among other challenges, this method requires the use of cells that have been
characterized at all voltages. Synopsys Scalable Polynomial Models (SPMs) support the necessary timing
and power information. Non-Linear look-up table Models (NLMs) can also be used for voltage-island designs.

An SoC can also be designed to power-down certain voltage islands to eliminate their leakage power.
Such islands require the use of power isolation cells, which can be simple AND gates. The outputs from a
powered-down section into an active power domain should never be allowed to float. Power isolation logic
ensures that all inputs to the active power domain are clamped to a stable value. Additionally, a state-
retention technique may be required so that the blocks can resume operation when powered-up. Powering-
down various islands’ voltages or scaling their voltages dynamically may also require power-sequencing
circuitry to ensure correct operation of the chip.

©2007 Synopsys, Inc. 10


Multiple-threshold Design
Multiple supply-voltage islands work well with multi-threshold synthesis. Optimization meets timing goals by
using low-Vth cells on critical timing paths and high-Vth cells on non-critical paths. Note that better leakage
quality of results can be obtained by using state-dependent leakage models, if the silicon vendor provides
such models.

A one or two-pass synthesis flow can be used for multi-threshold designs, depending on the design team’s
methodology or preference. Initial synthesis may be performed with the low-Vth, high-performance library,
followed by an incremental compile using multi-Vth libraries to reduce leakage current. For designs in
which both timing and leakage are important, one-pass synthesis uses multi-Vth libraries simultaneously.
The design is first optimized for timing, then leakage power optimization is performed without affecting
the achieved timing (i.e., the worst negative slack, or WNS). The timing optimization is not degraded by
power optimization. The power optimization is followed by area optimization. The use of multi-Vth libraries
is recommended in the synthesis environment (using Power Compiler with Design Compiler or Physical
Compiler) when optimizing for leakage power for either the one- or two-pass flow.

The flow relies on the use of a reasonable leakage constraint, set in Power Compiler by the set _ max _
leakage _ power command.

Power Optimization in Synthesis


Synthesis tools have the ability to optimize designs for power consumption with techniques such as RTL
clock gating insertion and gate-level power optimization. These techniques are implemented by Power
Compiler in conjunction with Design Compiler and/or Physical Compiler.

RTL clock gating shuts down the clock to large register banks when the outputs of these flip-flops are not
needed. Figure 7 shows the difference between a clock gating circuit and the synchronous load enable
circuit that Design Compiler would otherwise use. The feedback net and multiplexer of the synchronous
load enable circuit are replaced by a latch and a two-input gate inserted in the register’s clock net.

Fig 7a:

always@(posedge CLK)
if (EN) Synchronous Load-enable
D_out = D_in implementation without Clock Gating

elaborate D_out
D_in Reg
Bank
EN
FSM
CLK

Fig 7b:

always@(posedge CLK)
if (EN) Synchronous Load-enable
D_out = D_in implementation with Clock Gating

elaborate -gate_clock
D_in D_out
Reg
EN G_CLK Bank
FSM Latch
CLK

Figure 7: Power optimization during synthesis — Power Compiler automatically inserts clock-gating circuits,
replacing typical Design Compiler implementations (a) with the gating circuit (b).

11 ©2007 Synopsys, Inc.


This type of clock gating has a relatively low impact on area because the gating circuits replace
muxes (and, in fact, reduces the area used by 5 to 15 percent). Power Compiler implements the gating
automatically, and it requires no RTL code change, though it is possible to specify the gating manually
using a variety of coding styles.

Power Compiler also has the capability to replace the manually inserted clock gates with an ICG from the
library. This feature helps support the legacy blocks or IPs that have manual clock gates throughout the
physical flow. Power Compiler recognizes the ICG’s power-related attributes, which aid in the placement
of such cells. For advanced users of clock gating, Power Compiler helps obtain greater power savings by
performing multi-stage clock gating. In this technique, one clock gating cell feeds another clock gating cell
instead of a register bank. (This technique is also an RTL-based feature.)

RTL clock gating saves power in several ways. Internal power consumption decreases because the clock
does not continuously feed register banks, switching power decreases because of reduced capacitance on
the clock network, and power decreases further because downstream logic does not change.

When Power Compiler works with Physical Compiler, the placements for the clock gating cells are
optimized. Within the Physical Compiler flow, Power Compiler makes sure that the gate element cells are
placed close together and that the gating element is placed close to the sequential elements it drives. This
layout reduces the clock skew that can otherwise occur with clock gating.

Clock gating can reduce a chip’s testability unless specific DFT features are added. Because the clock
signal is gated with an internal signal, a test engineer cannot control the loading of the DFT scan flip-flops.
This problem is avoided by adding a test pin and assigning a fixed value (1’b1) to it during test compilation.
No specific coding style is required. Figure 8 shows a clock gating circuit with a control point added.

Control point "OR" Gate +1 Data_out


Mux
Test_Mode
Sequential Enable
circuit
Clock
Reset

Figure 8: Clock gating circuit with added control point — Because clock gating makes part of the
circuit untestable, clock-gated designs require the addition of control points, as shown here.

The options of Power Compiler’s set _ clock _ gating _ style command improves the chip’s testability
by specifying the amount and type of testability logic added during clock gating. It is possible to add a
control point for testing before or after the clock-gating latch, for example, and choose test _ mode or
scan _ enable mode. Other options add observability logic or setup and hold-time margin. To use the
Design Compiler commands check _ test or check _ dft, use the following commands prior: hookup _
testports and set _ test _ hold 1 Test _ Mode.

Note that clock gating should not be used on designs that have variables (or signals) from which Design
Compiler implements master/slave flip-flops. Design Compiler uses the clocked _ on _ also signal-type
attribute in implementing these flip-flops. At the abstraction level at which clock-gating occurs, Power
Compiler does not recognize this attribute and will gate only the slave clock of the flip-flop. It is possible to
use the set _ clock _ gating _ signals command to exclude specific design variables (or signals) that
are implemented as master-slave flip-flops: dc _ shell> set _ clock _ gating _ signals -design
TOP -exclude { A B }

©2007 Synopsys, Inc. 12


In general, the coding that works best is a basic synchronous load-enable implementation in one of four
styles that can be mixed or nested together:

• “If–Else” statements
• Conditional assignments
• “Case” statements
• “For” loops

In addition to RTL optimization, Power Compiler optimizes power simultaneously with timing and area using
the following gate-level optimization techniques (in order of priority):

• Sizing
• Technology mapping
• Pin swapping
• Factoring
• Buffer insertion
• Phase assignment

These optimizations require the use of a power-characterized library. Because Power Compiler maintains
timing automatically and keeps area within the designer’s constraints, the tool provides “push-button” power
savings at the gate level.

High-level Power Management Example


To show the potential for high-level power management in SoCs, Synopsys partnered with ARM®, National
Semiconductor® and Artisan Components® to create a test chip that demonstrates dramatic power savings.
The chip uses specialized hardware and software to control the voltage and clock frequency of various chip
domains, applying high-level control for the voltage and frequency scaling techniques described earlier in
this paper.

The control elements include ARM Intelligent Energy Manager software that balances processor workload
and energy consumption. PowerWise hardware from National monitors performance and communicates
with voltage regulators to scale the supply voltage to the minimum operating level at each operating
frequency. This system compensates for silicon performance variations due to the manufacturing process
as well as run-time performance changes due to temperature fluctuations.

The 240-MHz chip is partitioned into three primary power domains: voltage-scaled CPU and memory
power domains and a standard fixed-voltage domain for the rest of the chip. The independent power
domains allow precise voltage control and current measurement for the CPU and RAM. Standard cells and
level shifters operate in the 0.7-1.32V range.

For cache-intensive workloads, both the power consumption and the precise time to process a workload
were measured to compare dynamic frequency scaling alone with dynamic voltage and frequency scaling.
Figure 9 summarizes the results normalized to the 1.2V operating voltage. Note that this diagram shows
the power savings only for the chip’s dynamic-voltage-and-frequency-scaling subsystem. Normally in such
SoCs, some of the chip will not be voltage scalable. Components such as external memories typically
operate at a fixed voltage, so design partitioning and planning must take into account the system-level
power savings.

13 ©2007 Synopsys, Inc.


Figure 9: ARM test chip developed shows that a combination of power-
saving techniques can reduce a device’s power consumption dramatically.

The figure shows that voltage and frequency scaling can significantly reduce energy consumption compared
to frequency scaling alone. Running at 120 MHz cuts power requirements by half, for example, but scaling
the supply voltage at the same time slashes power consumption to about 20 percent of full power.

Summary
Dramatic power reductions such as those achieved by the Synopsys, ARM, National and Artisan test chip are
possible through a combination of high- and low-level power management techniques. The typical SoC may
not require all of these techniques, but mainstream solutions are available to meet all design requirements.

Choosing the right solutions depends on careful power analysis as well as understanding the capabilities
of available tools. Analyzing power requirements as early as possible in the design flow helps avoid power-
related disasters. Early analysis also makes power goals easier to attain because higher-level techniques
save the greatest amount of power.

About Synopsys Professional Services


Synopsys Professional Services provides a broad range of consulting and design services to chip
developers worldwide to help them achieve success in their design programs. These services address all
critical phases of the SoC development process and are tightly aligned with Synopsys’ EDA tools and IP
products to help customers accelerate their learning curves, develop and deploy advanced methodologies,
and achieve successful tape-outs. We offer customers a variety of engagement models to address their
project-specific and long-term design needs. For more information on Synopsys Professional Services visit
our website at www.synopsys.com/sps .

©2007 Synopsys, Inc. 14


Author Biography
Jim Flynn
Senior IC Design Engineer, Synopsys Professional Services
Jim Flynn has been with Synopsys for 5 years and is currently a Senior Design Engineer working in the
Synopsys Professional Services group where he focuses on analog IC design and the design of high-speed
connectivity circuits (PCIE/SATA/XAUI PHY). Prior to Synopsys, Mr. Flynn worked at Lockheed Martin
as a circuit designer. Mr. Flynn has a MSEE in Analog IC Design from the Florida Institute of Technology
Analog IC Design.

Brandon Waldo
Senior IC Design Engineer, Synopsys Professional Services
Brandon Waldo is a Senior Design Consultant working in Synopsys Professional Services where he specializes
in low-power design, physical design and signal integrity analysis. He joined Synopsys in 2001 and has over
15 years of experience in semiconductor design. Prior to joining Synopsys, Mr. Waldo worked at Motorola
and Advanced Micro Devices doing full-custom, semi-custom and ASIC designs on several microprocessor
projects. Mr. Waldo has a BS and MS degree in Electrical Engineering from Texas A&M University.

References
Design Planning Strategies to Improve Physical Design Flows Floorplanning and Power Planning, Synopsys Professional Services
White Paper, August 2003 (authors Sachin Idgunjj, Steve Lloyd, Rick Mitchell, Ron Spillman, Jon Young.)

BAE SYSTEMS Mission Specific Processor Technology<http://wwwin.synopsys.com/sps/docs/marketing/techpapers/CSPAD_


GOMAC_2003_Final_public.pdf4>
CSPAD GOMAC 2003
Dale Rickard, Richard Berger, Ernie Chan, BAE SYSTEMS
Synopsys Professional Services
Steve Patton and Robert Anderson, Lockheed Martin
Richard Brown, Dennis Sylvester, Matthew Gathaus, Harmander Deogun, University of Michigan
K. J. Ray Liu, Charles Pandana and Nitin Chandrachoodan, University of Maryland

A Hierarchical Power Analysis Methodology and Case Study


<http://wwwin.synopsys.com/sps/docs/marketing/techpapers/1_snugeu03_hierarchical-cs.pdf>
SNUG Europe 2003
James P. Flynn, Synopsys Professional Services

Challenges in the Hierarchical STA of a Low-Power 3G Wireless Application Platform


<http://wwwin.synopsys.com/sps/docs/marketing/techpapers/8_snugsj03_LowPowerSTA.pdf>
SNUG San Jose 2003
James SW. Song, Satyendra R.P.Raju Datla, Yuanqiao Zheng, Texas Instruments
Stewart Shankel, Kaijian Shi, Synopsys Professional Services

Clock Distribution and Balancing In a Large and Complex ASIC: Issues and Solutions
<http://wwwin.synopsys.com/sps/docs/marketing/techpapers/13_designcon03_omap.pdf>
DesignCon 2003
James Song, Sandeep Aggarwal, Texas Instruments, Inc.
Kaijian Shi, Stewart Shankel, Synopsys Professional Services

700 East Middlefield Road, Mountain View, CA 94043 T 650 584 5000 www.synopsys.com
©2007 Synopsys, Inc. Synopsys, the Synopsys logo, Design Compiler, Physical Compiler, VCS, PrimePower, and HSPICE are
registered trademarks and Power Compiler and Astro are trademarks of Synopsys, Inc. All other products or service names
mentioned herein are trademarks of their respective holders and should be treated as such.
Printed in the U.S.A. 03/07.CE.WO.06-14884