Vous êtes sur la page 1sur 44

Technology Note

RTL Synthesis
Magma RTL synthesis includes:

High-level optimization, including expression synthesis, resource sharing, and automatic


clockgating

Area optimization, including mapping to cell instances of HyperCell models.


A HyperCell model is a technology-specific scalable timing model created during library
preparation. The HyperCell model represents a range of cell sizes, rather than a single size.
The HyperCell model contains timing and area descriptions for a collection of cell drive
strengths of a particular function. For more detailed information, see Understanding
HyperCell Models on page 29.

Constraint-driven optimization (sometimes called timing-driven optimization), which includes


scan insertion, strength trimming, and arithmetic architecture selection.

Introduction to Magma RTL Synthesis


Magma RTL synthesis uses strength-based synthesis. The strength of a cell is the ratio between the
output capacitance and input capacitance of the cell. Mapping to cell instances of HyperCell models
enables use of this ratio for balancing cell sizes and delays to maintain constant timing. By using
HyperCell models and strength-based synthesis, Talus Design can defer cell sizing until later in the
design flow, during placement.
Figure 1 shows the RTL synthesis portion of the Talus Design flow. This chapter describes the steps
in more detail. For steps related to DFT, see the Design for Test Technology Guide.

Talus 1.0

RTL Synthesis
Introduction to Magma RTL Synthesis

Figure 1: RTL Synthesis Flow Diagram


Import library Volcano.

Prepare and load a library Volcano.

Import RTL.

Analyze and elaborate the RTL.


Select initial architecture.

Perform high-level optimization.

Perform automatic clockgating.


Share resources.
Perform expression synthesis.

Select a scan style.


Map to instances of HyperCell models.
Perform area-based optimization.

Optimize area.

Apply timing constraints.

Flatten user hierarchy.

Flatten the user-specified hierarchy in


the design.

Check DFT rules and insert scan.


Optimize for timing.
Perform constraint-based optimization.

Assign strength values.


Perform strength trimming.

Perform timing and strength


analysis.

Analyze the final results.

Selecting a scan style, checking DFT rules, and inserting scan are described in the Design for Test
Technology Guide.

Talus 1.0

RTL Synthesis
Importing a Library Volcano

Importing a Library Volcano


A library Volcano is a Magma database that contains the information necessary to synthesize your
RTL and perform prototyping. The necessary information includes a logic and timing library,
technology rules, and a physical library. If you plan to design for test, include DFT rules in the
Volcano.
Command
import volcano directory
Example
import volcano ./volcanoes/cmos_tech13.volcano
The technology library is not used during high-level optimization (fix rtl). High-level optimization
produces a technology-independent implementation using Magma primitives. The technology library
is used later in synthesis, during area-based optimization (fix netlist).
For information about creating a library Volcano, see the Magma Basics Technology Guide.

Importing RTL
The import rtl command performs both analysis and elaboration of HDL files.
When you import RTL, Talus Design reads and checks the Verilog and VHDL files for correct syntax.
During elaboration, the tool resolves all unresolved references and instantiates parameterized
modules and entities.
Talus Design RTL synthesis is completely compatible with industry-standard coding styles. Talus
Design supports the Synopsys coding style, commonly used pragmas, and instantiated DesignWare
components. For more information, see Using Pragmas on page 5 and Using DesignWare
Components on page 6.
The VHDL elaboration engine provides correct-by-construction implementation. Correct-byconstruction means that by using simultaneous type and data inference, there should be no formal
mismatches with the IEEE VHDL Language Reference Manual (LRM).
During elaboration, dont-care conditions are detected and automatically optimized without any
additional steps on your part.
After you use the import rtl command, a Talus library called /work is created. After elaboration, the
Talus /work library contains the design data. You can view the design using the Talus GUI or export it
as a Volcano.

Talus 1.0

RTL Synthesis
Importing RTL

Note: Do not export your elaborated design Volcano. Run fix rtl first, and then use the export
volcano command if you need to export a design Volcano.

Importing Different Versions of VHDL and Verilog


Talus Design can import VHDL-93 and VHDL-87. Talus Design also supports Verilog IEEE Std 13641995 and IEEE Std 1364-2001.
Table 1 summarizes the commands to use for importing the supported versions of HDL.
a

Table 1:

Importing Different Versions of HDL

To Import

Use These Commands

Note

VHDL-93

import rtl -vhdl

VHDL-93 is the default


version of VHDL.

VHDL-87

import rtl -vhdl -87

Verilog

import rtl -verilog

Standard Verilog is the


default format.

The following example shows how to import a series of Verilog files:


Example
import rtl -analyze -verilog -include ./include_dir \
./rtl/chip.v ./rtl/cntr.v ./rtl/interface.v

Mixing VHDL and Verilog HDL


Use separate import rtl commands to mix VHDL and Verilog files in a design. You can mix Verilog
and VHDL objects at any level in the design hierarchy. Talus Design does not automatically detect
the type of HDL; you must set the -vhdl or -verilog switch. Example 1 shows how to mix VHDL and
Verilog.

Talus 1.0

RTL Synthesis
Importing RTL

Example 1: Importing Mixed-Language RTL


import rtl

-analyze -vhdl \
reg_file.vhd \
alu.vhd \
cntr.vhd
import rtl
-analyze -verilog \
chip.v \
pads.v \
interface.v \
state.v \
core.v
run rtl elaborate -verilog -case preserve chip
Note: When elaborating a design with mixed-language RTL, use the switch (-vhdl or -verilog) that
matches the language of the top-level design. Then, elaborate the design from the top level.

Analyzing and Elaborating Separately


You can run analysis and elaboration separately by using the import rtl -analyze and run rtl
elaborate commands. Analyzing and elaborating separately writes more information to the log files
than analyzing and elaborating in one step. The additional log file information can be useful if you are
debugging a problem in your design.
Example
import rtl -analyze chip
run rtl elaborate chip
Note: The default behavior of import rtl is to analyze and elaborate your design. The -analyze
option is necessary only if you later elaborate with run rtl elaborate.

Using Pragmas
Pragmas are directives that cause the compiler to perform specialized tasks during synthesis. Talus
Design supports several pragmas used by legacy synthesis tools. These directives appear as
comments in Verilog source files; therefore, they are ignored by simulation tools.
Table 2 shows the supported pragmas and how they are used. Although Table 2 uses the
// synthesis and // synopsys syntax of Verilog designs, Talus Design also supports the --synthesis
and --synopsys syntax used in VHDL.

Talus 1.0

RTL Synthesis
Importing RTL

Table 2:

Pragma Directives

Pragma

Function

// synopsys translate_on
// synthesis translate_on

Allows the synthesis parser to interpret the


subsequent lines in the source file as intended for
synthesis only.

// synopsys translate_off
// synthesis translate_off

Turns off the parsing of subsequent lines in the source


file for synthesis.

// synopsys parallel_case
// synthesis parallel_case

Forces Talus Design to generate multiplexer logic


rather than a priority encoder for the given case
statement. The directive must be issued on the line on
which the case statement is defined. The
parallel_case directive can result in mismatches
between simulation results and the synthesized logic.

// synopsys full_case
// synthesis full_case

Prevents unwanted latches associated with missing


default case values. The directive must be issued on
the line on which the case statement is defined.

// synopsys map_to_module
// synthesis map_to_module

Flags a Verilog function for implementation as a


distinct component in the design.

// synopsys return_port_name
// synthesis return_port_name

Identifies a return port, which is used in conjunction


with the map_to_module directive. Verilog functions
do not have return ports.

// synopsys sync_set_reset
// synthesis sync_set_reset

Identifies synchronous control signals for register and


latch inference.

// synopsys async_set_reset_local
// synthesis async_set_reset_local

Identifies asynchronous control signals for register


and latch inference.

Using DesignWare Components


Talus Design supports direct instantiation of many Synopsys DesignWare Foundation components.
DesignWare components are prefixed with the name of the library.
Example
DW01...
DW02...
DW03...

Talus 1.0

RTL Synthesis
Flattening Hierarchy During RTL Optimization

Initially, all components are assigned minimum-area architectures. During constraint-driven


optimization, architectures are reselected to meet timing constraints. You can control how Talus
Design chooses default architectures; for information on doing this, see Setting Default Architecture
for Arithmetic Operators on page 20.

Adding Target Libraries


Use the config rtl targetlib command to add target libraries.
Adding target libraries can be necessary if there are instances in the RTL design where Verilog
module descriptions or VHDL entity architecture descriptions are not found when reading in the RTL
design. Talus Design checks the target libraries to find any matching entity that can be used to bind
such instances.
For example, you might have RAM or ROM modules in the design that are not in the default
technology library. In this case, Talus Design cannot find the referenced cells unless you specify the
library.
For information on power gating library attributes, see the section Adding Library Attributes for
Power Gating on page 22.

Flattening Hierarchy During RTL Optimization


By default, Talus Design maintains the design hierarchy you specify. Thus, after the RTL files are
read and elaborated, the full hierarchy is present. During elaboration, Talus Design might add some
tool-induced hierarchy during operator inference. For example, when Talus Design infers an adder
module from an addition operator (+) in the RTL code, the tool creates hierarchy around the adder to
enable certain internal optimization steps later in the flow.
You can flatten some or all of the design hierarchy to extend the scope and improve the results of
RTL optimizations that are otherwise limited by hierarchical boundaries. The fix rtl command can
flatten tool-induced hierarchy and user hierarchy according to settings you make with the config rtl
autoflatten command.
Although the exact size of RTL modules is unknown during RTL optimization, Talus Design makes an
estimate based on the RTL operation types and bit widths of the signals. You can specify global size
thresholds for the flattening of RTL user models and tool-induced hierarchy by using the config rtl
autoflatten command. To specify autoflattening settings on a particular model, use the force rtl
autoflatten command.
In any case, you do flattening of the entire design after fix netlist or after scan insertion, as shown in
the flow in Figure 1 on page 2.

Talus 1.0

RTL Synthesis
Flattening Hierarchy During RTL Optimization

Configuring Hierarchy Flattening


During fix rtl (or other fix... commands), Talus Design performs flattening according to the settings
you make with the config rtl autoflatten command and the force rtl autoflatten command.
Flattening during optimization is sometimes called autoflattening because the actual flattening is
done automatically during the fix... commands if you have configured options using config rtl
autoflatten. By setting the config rtl autoflatten command you can:

Set the minimum size of any user model to remain after flattening.
config rtl autoflatten -min integer
After flattening, models of this size or less remain untouched by flattening.

Set an upper size limit for the combined parent and child user model after flattening.
config rtl autoflatten -max integer
When flattened into the parent cell, if the combination is greater than this maximum size,
flattening does not take place on the child cell.

Flatten only the user models that are purely combinational and that meet the limits you
specify with -min and -max. By default, this option is turned off.
config rtl autoflatten -comb

Set a bit-width threshold for flattening tool-inferred models.


config rtl autoflatten -bitwidth_threshold integer
You can use this option to flatten tool-inferred models at or below the specified bit width. For
the default setting of 3, tool inferred models (an adder for example) are flattened if they have
a bit width of 3 or less.

As noted previously, the exact size of a user model is unknown during high-level optimization (fix rtl),
Talus Design estimates size based on the RTL operation types and bit widths of signals.

Talus 1.0

RTL Synthesis
Flattening Hierarchy During RTL Optimization

To perform autoflattening, integrate the following steps into your synthesis flow:
1. Configure autoflattening options. Use the following syntax:
config rtl autoflatten
[-min integer]
[-max integer]
[-comb]
[-bitwidth_threshold integer]
[-rtl]
Set options according to your design and compute resource requirements. By default, the min, -max, and -comb options apply only to gate-level user models. Use the -rtl switch to
apply the settings to RTL user models.
2. After importing the design, generate initial autoflatten reports.
report rtl autoflatten $m -max_depth int -file output_file
The report rtl autoflatten command uses your autoflattening settings to predict the results of
autoflattening before you run fix rtl. After you run fix rtl, you can use the report rtl
autoflatten command to report the actual results. By default, the report is not hierarchical.
Here, the -max_depth option directs Talus Design to report down int levels of hierarchy from
$m and send the report to output_file. The default for -max_depth is 3.
3. Import your constraint file in SDC format.
import sdc $m -lib $l file.sdc -translation file1.tcl -rtl
The import sdc -rtl command does not annotate timing constraints on the design for timingdriven optimization. Instead, this step identifies all objects in the design that will later be
constrained and preserves those objects during autoflattening. Because timing constraints
are not annotated when you use the -rtl switch, before timing-driven optimization (fix time)
you must import the SDC timing constraints again using import sdc without the -rtl switch.
During the importing of the SDC file, Talus Design translates the constraints into Magma Tcl
commands and creates an M-Tcl file. The M-Tcl file also contains information messages and
warnings. To specify a name for the translation file, use the -translation file_name option.
4. Perform high-level optimization.
fix rtl $m
The fix rtl command performs autoflattening according to the options set in step 1. For more
information about high-level optimization, see Performing High-Level Optimization on
page 10.

Talus 1.0

RTL Synthesis
Performing High-Level Optimization

Explicitly Manipulating Hierarchy


To manipulate hierarchy explicitly, use force keep to maintain a cell or model. For example, if you
use the force keep command on a model, you preserve the hierarchy of the model instances. The
force keep command preserves the named design object, whether it be a cell, net, port, entity,
model, or bus. Use the clear keep command to clear the keep attribute on a design object.
You can use the data flatten command to flatten model instances explicitly. You can use this
command immediately after elaboration (using the GUI or command line), or you can run them with a
design-specific script. Using data flatten is not normally necessary.
To avoid keeping hierarchical levels throughout the rest of the flow, run clear keep to clear the force
keep attributes after flattening. Doing so allows global flattening of the entire design after fix netlist,
as shown in the synthesis flow in Figure 1 on page 2.
For more information about the force keep and clear keep commands, see the online man pages.

Performing High-Level Optimization


High-level optimization performs RTL optimization on an elaborated RTL design and generates a
technology-independent implementation using Magma primitives. Use the fix rtl command to
perform high-level optimization, which includes the following steps:

Performs optimization steps

Performs automatic RTL clockgating

Finds, groups, and optimizes arithmetic expressions

Performs final synthesis of a complete netlist

Flattens the hierarchy under a model or cell

The fix rtl command uses the settings you specify in the following config... and force... commands:
At the Global Level

At the Cell or Model Level

Description

config rtl autoflatten

force rtl autoflatten

Configures global settings for flattening


hierarchy (the data flatten command)

config rtl clockgate

force rtl clockgate

Specifies clockgating options

config rtl datapath

force rtl datapath

Controls selection of architecture for


data-path elements

config rtl merging

force rtl merging

Controls merging of data-path elements

config rtl sharing

force rtl sharing

Controls sharing of data-path elements

10

Talus 1.0

RTL Synthesis
Performing High-Level Optimization

See the fix rtl online man page for information about switching certain optimizations on and off.

Enabling FSM Optimization


Finite state machine (FSM) optimization allows you to specify the implementation style for any given
FSM within a design. Based on the implementation style of the specified FSMs, there can be a direct
impact on timing, area, DFT, and power optimization of the design.
To enable FSM optimization, set config rtl fsm on. FSM optimization occurs during high-level
optimization with fix rtl or if you explicitly invoke the run rtl fsm command. The default setting for
config rtl fsm is off. The config rtl fsm command must be set before invoking fix rtl or run rtl fsm.

Specifying an Optimization Style


Use either the force rtl fsm command or the config rtl fsm command to specify a particular FSM
style for a model. For more details about syntax, see the online man pages.
The following FSM styles are supported:

binary Talus Design infers a binary encoding style FSM in all instances in the current
model.

one_hot Talus Design infers a one-hot encoding style FSM in all instances in the current
model.

gray Talus Design infers Gray code encoding style FSM in all instances in the current
model.

auto (default) Talus Design infers the most efficient FSM style based on the RTL coding
style.

Running FSM Optimization


The run rtl fsm command runs FSM optimization. You can invoke the command explicitly or from fix
rtl during high-level optimization.

Reporting FSMs in the Design


To view the FSMs in the design, use the report rtl fsm command. This command reports the FSMs
found or optimized in the design.

Talus 1.0

11

RTL Synthesis
Performing High-Level Optimization

Generic RTL Optimizations


After any manual or automatic hierarchy manipulation, generic RTL optimization steps are
accomplished by the run rtl optimize command.
The run rtl optimize command accomplishes these tasks:

Inference of storage elements, such as flip-flops and latches

Constant propagation

Removal of various unnecessary data flow graph nodes

The run rtl optimize command optimizes all models that must proceed through the rest of the Talus
Design flow. When you set the required model argument to the top model of the design, the
command hierarchically works down from the given model.

Setting Options for Clockgating


Clockgating reduces power consumption by switching off the clock to flip-flops when the values of
those flip-flops do not change.
To enable global clockgating and set global clockgating options, use the config rtl clockgate
command before running fix rtl. To enable clockgating and set clockgating options on specific
models, use the force rtl clockgate command. Settings you make locally on a model by the force rtl
clockgate command override any global settings of the config rtl clockgate command.
After enabling clockgating and setting options, clockgate insertion is done during fix rtl. When
clockgate insertion is enabled, Talus Design detects synchronously enabled registers and replaces
them with appropriate logic according to the settings of config rtl clockgate or force rtl clockgate.
The config rtl clockgate or force rtl clockgate commands perform these actions:

Select a latch-based (default) or latch-free clockgating style.

Set the minimum number of bits for gating (the default is 4).

Choose whether test control points are inserted, and determine their location in relation to the
gating latch.
By default, no test point is inserted; however, when you insert a test point, the default location
is before the latch. Test point insertion applies only to a latch-based clockgating style.

12

Perform hierarchical clockgating with the -share option. By default the global clock is routed
to all registers. Hierarchical clockgating moves the gating point as far up the hierarchy of the
clock tree as possible. This reduces the overall number of gating points.

Talus 1.0

RTL Synthesis
Performing High-Level Optimization

Perform advanced register clockgating with the -effort option. This restructures the enable
logic and the hold MUX to reduce the overall enable logic.

Create clockgating logic during insertionsometimes called discrete clockgatingor use an


integrated clockgating cell (ICG) from your library.
Using ICGs can reduce clock skew and improve routability by reducing the total number of
pins. Discrete clockgates and ICGs are fully supported by Talus Design scan insertion;
however, you must identify the ICGs before scan insertion. For more information, see about
identifying integrated clockgates for scan, see the Design for Test Technology Guide.
To find out if your library has ICGs, you can write a short script that searches for the
appropriate library attribute.
Example
proc identify_icgs {l} {
data loop e lib_entity $l {
data loop mod entity_model $e {
#puts $mod
if { [data get -exists $mod clock_gating_integrated_cell] } {puts $mod}
}
}

Talus Design supports both rising- and falling-edge sequential elements during clockgate insertion as
well as positive edge-triggered or negative edge-triggered elements.
Table 3 lists clockgating options. For complete syntax and details, see the online man pages.
Table 3:

Summary of Clockgating Options

To Do This

Use This Option

Control whether gating logic


is latch based or latch free in
a design.

config rtl clockgate on -style latched | combinational

Set a minimum bit threshold


for clockgating in a design.

config rtl clockgate on -min_bits number

Talus 1.0

Note

Latch-based clockgates are the default. A latch-based


clockgating style avoids glitches on the gated clock
signal. Use a latch-free (combinational) style only if
you have special requirements.

Clockgating introduces some overhead in the design


by adding gating logic. You can set the minimum bit
width for synchronously enabled registers in a clock
domain. The default is 4.

13

RTL Synthesis
Performing High-Level Optimization

Table 3:

Summary of Clockgating Options (Continued)

To Do This

Use This Option

Note

Insert test control points


during clockgating in a
design.

config rtl clockgate on -test before | after

Use a specific ICG when


inserting clockgates in a
design.

config rtl clockgate on -integrated cell_name

Use any appropriate ICG to


insert clockgates in a design.

config rtl clockgate on -integrated

Set clockgating options


locally on a model instead of
globally on a design.

force rtl clockgate on model [-hier] [additional options]

Report the potential or results


of clockgating.

report rtl clockgate model

Test control points are not inserted by default.


However, when you use the -test option, the default
location is before the gating latch.

Use cell_name to specify the name (and library


path) of the ICG to use. Your library must contain at
least one or more ICGs and be characterized.

By using an empty set of quotation marks, clockgating


can choose any appropriate ICG. You can use
additional command options such as -style, and -test
if your ICGs are appropriately characterized. To use
an ICG, your library must contain at least one or more
ICGs and be characterized.

By default, this command acts only on the specified


model. Use the -hier option to propagate clockgate
settings downward in the hierarchy of the model.

This command considers your config rtl clockgate


settings and generates a detailed report. If run before
clockgating, the report shows the potential results of
clockgating. After clockgating, the report shows the
results of clockgating.

Figure 2 shows a simple circuit that can benefit from clockgate insertion. In this example, the
enabled flip-flops clock in new values when the signal on the enable (E) input is high.

14

Talus 1.0

RTL Synthesis
Performing High-Level Optimization

Figure 2: Design Before Clockgating


4

reg [3:0] q;
always @ (posedge
clk)
begin
if (E)
q <= d;
end

3
q [3:0]

2
1
1

0
Y

D
4
clk

Figure 3 shows the result of clockgate insertion.


Figure 3: Design After Clockgating
3

2
1
d

0
D

e
Y
cg_latch

clk

cg

Gating logic

The enabled flip-flops are replaced by D-type flip-flops and an AND gate is inserted in the clock path
(cg in Figure 3). The AND gate gates the clock. By using the latch (cg_latch in Figure 3), the timing
requirements on the gating signal (not E) are relaxed. Figure 3 shows a discrete clockgating solution.
If you use an ICG, the gating logic is contained within a single library cell.
For more information, see the Clock Implementation Technology Guide.

Talus 1.0

15

RTL Synthesis
Performing High-Level Optimization

Expression Merging and Resource Sharing


The fix rtl command performs several optimizations on arithmetic operations and expressions after
clockgatingmost importantly, expression merging and resource sharing.

Expression Merging
Expression merging combines multiple operators into a single expression for synthesis. Synthesizing
larger expressions as a whole can improve both area and delay. If arithmetic functions are not
merged, they are synthesized on an operation-by-operation basis.
Expression merging is enabled by default. To disable or re-enable it, use the config rtl merging
command.
Figure 4 illustrates tree-wise connected operations that are merged into expressions:
Figure 4: Expression Merging of Tree-Wise Connected Operations
assign y = a + b + c * d + e + f;

C4
c

in0

in1

out
C7
in0

star_2_2_2
f

in1

C6
in0

out

out

plus_2_2_2

in1

e
C5

C3
out

plus_2_2_2
out

in0

in1

in1

plus_2_2_2

plus_2_2_2

in0

Talus Design usually introduces operation cells when arithmetic operators are used in the RTL
description. During expression merging, all operation cells in the given model are found
hierarchically. Those that are connected in a tree-wise manner (as in Figure 4) are grouped into
expression models with an additional level of hierarchytool-induced hierarchy. The functionality of
the expression model is conveyed in an expression and implemented by the module generator.

16

Talus 1.0

RTL Synthesis
Performing High-Level Optimization

Resource Sharing
Resource sharing saves area, when possible, by sharing resources such as adders and subtractors,
thus reducing the amount of circuitry needed. This optimization runs on arithmetic operations and
expressions.
Resource sharing occurs both before and after expression merging when you run the run rtl
expression command. Set the scope of resource sharing with the config rtl sharing and force rtl
sharing commands.
Resource sharing can result in a delay penalty. By default, run rtl expression applies maximum
sharing to save area. In rare cases, when sharing limits the performance later in the flow, you can
use the force rtl sharing command to locally prevent sharing.
There are two types of resource sharing:

Single-operation sharing
Sharing on an operation-by-operation basis. Single-operation sharing shares identical
operations over the entire design before expression grouping occurs.

Expression sharing
Sharing parts of arithmetic expressions after expression grouping occurs. Expression sharing
can share operations that are not single operations in the RTL or can map slightly different
operations, such as plus and minus, on shared circuitry.

Figure 5 shows expression sharing of operations that are not single operations in the RTL.

Talus 1.0

17

RTL Synthesis
Performing High-Level Optimization

Figure 5: Sharing Equal Operations


assign z = a + b + c;
assign y = a + c; // a + c will be shared

Before Expression Sharing


C27
out

in0
C18
a

in0

in1

out

in1
plus_7_7_8

plus_7_7_8

C23
out

in0

The operation
a + c is not a
single operation
in the RTL.

+
in1

plus_8_7_9
After Expression Sharing
exp_0
b

z_RS
exp_0
exp_2
a

exp_1
z

z_RS
c

c
exp_2

exp_1

Figure 6 illustrates mapping of slightly different operations on shared circuitry. This sharing can be
applied only after verification of the mutual exclusivity of the results.

18

Talus 1.0

RTL Synthesis
Performing High-Level Optimization

Figure 6: Sharing Slightly Different Operations


always @(a or b or c or s)
begin
if (s)
z = a + b - c;
else
z = a + c; // a + c and a - c will be shared
end

Before Expression Sharing


C3
a

in0

in1

out

C4
out

in0

merge_4

plus_8_8_8
in1

minus_8_8_8

C17
out

in0

in1
out
in0
c

mux_2_8

in1
plus_8_8_8
s
After Expression Sharing
exp_0
a

exp_0

You can enable or disable both kinds of resource sharing with the config rtl sharing command
(global) and the force rtl sharing command (for specific design modules).

Talus 1.0

19

RTL Synthesis
Performing High-Level Optimization

RTL Implementation
RTL implementation maps all RTL constructs and expressions to Magma primitives.
The fix rtl command completes its functions by running the module generator for arithmetic
constructs and expressions.
Talus Design has an extensive RTL library of data-path elements, each with a range of architectures
(area versus speed). For example, arithmetic modules can be implemented using different
architectures such as carry ripple, carry lookahead, carry select, and carry save. These optimized
modules can be automatically inferred from the RTL description or manually instantiated.
The module generator synthesizes these modules and implements expressions found during the
fix rtl command. Use the config rtl datapath command to determine which architectures the
module generator uses.
The best architecture for implementing a module depends on the modules context (for example,
timing). Because the amount of information about context is limited at this point in the flow, these
operators are marked as data-path elements and an initial, smallest-area implementation is
generated. When more physical and timing information becomes available later in the design flow,
Talus Design changes the initial implementation of data-path elements to the best architecture for
that context.

Setting Default Architecture for Arithmetic Operators


When Talus Design imports RTL and infers arithmetic operators, the default architecture initially
inferred by the tool is that of the smallest area. You can change the default by using import rtl arithmetic. Valid arguments for the -arithmetic option are:
Argument

Function

auto

Automatically infers the fastest Default. This option provides the most
or smallest architecture, as
flexibility. During elaboration, Talus Design
appropriate
implements the smallest architecture.
However, during later synthesis this option
allows Talus Design to swap architecture to
the smallest or fastest, as appropriate.

small

Infers the smallest architecture Talus Design elaborates the smallest


architecture and retains it throughout the
design flow.

fast

Infers the fastest architecture

20

Note

Talus Design elaborates the fastest


architecture and retains it throughout the
design flow.

Talus 1.0

RTL Synthesis
Performing Power Gating by Mapping to Retention Flip-Flops

During subsequent high-level, area-based, constraint-based, or physical optimization, you can


control data-path element architecture by using the config rtl datapath command. See the online
man page for more information.

Example Performing High-Level Optimization


Example 2 illustrates the following tasks:

Inserting latch-based clockgating with a bit-width threshold of 10 bits.

Inserting testability and uses the test_ena pin to control the testability logic.

Setting all adders to fast carry-lookahead architecture (this overrides the automatic
architecture selection capability).

Performing high-level optimization.

Example 2: Example Code for Performing High-Level Optimization


config rtl clockgate on -style latched -min_bits 10
config scan clockgate test_mode test_ena
config rtl datapath cpa_arch fcla
fix rtl $m

Performing Power Gating by Mapping to


Retention Flip-Flops
Power gating involves the use of retention flip-flops, which are flip-flops with an additional inputa
retention pinto put the flip-flops into a sleep mode and save power. The retention flip-flops can
have single or dual retention pins. Talus Design can map to these flip-flops during fix netlist and
connect them to constants that are described in the library (.lib).
After timing optimization you can automatically or explicitly stitch the retention pins into the correct
retention net.
Talus Design offers two methods of implementing power gating:

RTL tag-based method


Sequential elements are mapped according to Verilog or VHDL tags you insert in the RTL
design block. You use power gating commands to associate the tags with power gating cell
classes defined by cell attributes in the library.

Talus 1.0

21

RTL Synthesis
Performing Power Gating by Mapping to Retention Flip-Flops

Cell-based method
For this method, no RTL tags are required. Talus Design implements power gating from celllevel specifications you make on the command line or in a script. You can use the cell-based
method to override RTL tags in a block. You can also choose to implement power gating in a
portion of the design using the tag method and in another portion of the design using the cellbased method.

Checking Library Support for Power Gating


To do power gating, you must have retention flip-flops defined in your library that use the appropriate
power_gating_pin and power_gating_cell attributes for power gating. See the section Adding Library
Attributes for Power Gating on page 22 for more information.

Adding Library Attributes for Power Gating


To support mapping to retention flip-flops and retention latches, the technology library must contain
the appropriate cell-level and pin-level attributes. Power gating library attributes are as follows:

power_gating_cell

power_gating_pin

map_to_logic

power_gating_cell Attribute
This required cell-level .lib attribute specifies that the cell is a retention sequential cell. The attribute
specifies which class of power gating cells the retention cell belongs to. All retention cells must
belong to a group of sequential cells.
After library preparation, all the cells in the library with identical functionality, pinout, and
power_gating_cell attributes are grouped together into a single entity for HyperCell preparation.
The power_gating_cell attribute is stored on the cell's entity and cell model.
The value of the power_gating_cell attribute is accessible using the data get command.
Syntax:
power_gating_cell: "STRING";
.lib example:
power_gating_cell: "CLK_LOW";

22

Talus 1.0

RTL Synthesis
Performing Power Gating by Mapping to Retention Flip-Flops

Example
power_gating_cell: "CTRL_LOW"
talus> data get /mylib/RETDFF
talus> data get /mylib/RETDFF power_gating_cell
CTRL_LOW

power_gating_pin Attribute
This optional pin-level .lib attribute has two components:

power_gating_pin_string
This user-defined string is a label that specifies the power gating pin to which the retention
control signal should be stitched. Within a level of hierarchy, all cells connected to the same
retention control pin must share the same power_gating_pin_string value. The converse
of this statement is also true: Any components that should not be connected to each other
should not share a common power_gating_pin_string value.

power_gating_pin_value
This user-defined value specifies whether the retention control pin should be tied high (1) or
low (0) before the retention pin is stitched to the control signal. The maximum number of
retention control pins for a design is dictated by the number of unique power_gating_pin
classes in the library.

When you specify this attribute, the map_to_logic attribute is not necessary.
Syntax
power_gating_pin : (power_gating_pin_string, power_gating_pin_value)
Where power_gating_pin_string is a string and power_gating_pin_value is either 0 or 1.
The following .lib example specifies that the retention control signal be stitched to all the
power_pin_1 pins and that the pins are tied low before they are connect to the control signal.
Example
power_gating_pin (power_pin_1, "0");

Example
talus> data get /mylib/RETDFF/RETDFF/mpin:RET_CTRL type name gate_area
{{power_gating_pin_val}} connection_class {{power_gating_pin_str}}
capacitance fanout_load max_fanout_force mpin_typ_cap
mpin_typ_cap_parallel model_pin_type access_layer direction flow

Talus 1.0

23

RTL Synthesis
Performing Power Gating by Mapping to Retention Flip-Flops

map_to_logic Attribute
This optional pin-level .lib attribute is placed upon retention control pins of a retention cell. The
attribute specifies whether the retention control pin should be tied high or low before retention pin
stitching.
This attribute is not needed when the power_gating_pin is specified.
Syntax:
map_to_logic: (0 | 1);
.lib example
map_to_logic : "0";
After library preparation, this .lib attribute is stored as an attribute on the pin of the library model that
is accessible using the data get command.
Example
talus> data get /mylib/RETDFF/RETDFF/mpin:RET_CTRL
map_to_logic
0

Using RTL Tags for Power Gating


Talus Design allows mapping of cells to retention flip-flops and retention latches. Such retention
elements can be put into a sleep mode in order to save power (power gating).
To specify which sequential cells in a design to map to a retention cell, you can designate RTL tags
in the RTL description of the design cells. The following examples show a Verilog always block
(Example 3) and VHDL process block (Example 4) with added RTL tags for power gating.
Example 3: Verilog Syntax Including RTL Tag for Power Gating
always @(posedge clk or negedge RESET) t
begin : ret_clk_free_desIn_r
if (!RESET) desIn_r <= #1 64'b0;
else desIn_r <= #1 desIn;
end

24

RTL tag for power gating

Talus 1.0

RTL Synthesis
Performing Power Gating by Mapping to Retention Flip-Flops

Example 4: VHDL Syntax Including RTL Tag for Power Gating


ret_clk_control_state : process(clk, control_state, new_key, sub_parray_done, sub_sboxes_done)
begin
if (clk'event and clk = '1') then
RTL tag for power gating
if (control_state = S_INIT_START) then
control_state <= S_INIT;
.
.
.
elsif (control_state = S_SUB_SBOXES) then
if (sub_sboxes_done = '1' ) then
control_state <= S_NORMAL;
else
control_state <= S_SUB_SBOXES;
end if;
end if;
end if;
end process;

The methods for implementing power gating are described in more detail in Reporting RTL Tagged
Blocks in the Design on page 26.

Implementing Power Gating Using the RTL Tag-Based


Method
To use this method, your design must have RTL tags inserted in blocks that are to be mapped to
retention flip-flops. For more information on adding RTL power gating tags to blocks, see Using RTL
Tags for Power Gating on page 24.
By using the force model power_gating command, you define which tagged RTL blocks are
associated with which type of retention flip-flop in the library.
The following example specifies that all RTL blocks tagged with the ret_clk_free_desIn_r tag be
mapped to a retention flip-flop of type CLK_FREE.
Example
force model power_gating $m -tag ret_clk_free_desIn_r -type CLK_FREE

If you specify the tag as an empty string (tag ""), then untagged RTL blocks are mapped as
specified. If you specify the sequential cell type as an empty string (type ""), Talus Design maps the
RTL blocks to standard sequential cells rather than retention sequential cells.

Talus 1.0

25

RTL Synthesis
Performing Power Gating by Mapping to Retention Flip-Flops

In the following example, the top block $m and all its subblocks will have all sequential cells without
RTL tags mapped to sequential cells to CLK_LOW registers. Any RTL blocks with
ret_clk_low_des3_key_b_r0 or ret_clk_free_des3_key_c_r0 tagging are mapped to CLK_LOW
registers.
Example
force model power_gating $m -tag "" -type CLK_LOW -hier
force model power_gating $m -tag ret_clk_low_des3_key_b_r0 -type CLK_LOW -hier
force model power_gating $m -tag ret_clk_free_des3_key_c_r0 -type CLK_LOW -hier

In the following example, a subblock of the top block, U1, will have its untagged RTL blocks mapped
to CLK_FREE registers. Tagged blocks ret_clk_low_des_key_r and ret_clk_free_desIn_r will also be
mapped to CLK_FREE. And finally, U2 (a subblock of U1) will have all of its blocks tagged
ret_clk_low_key_sel_K_ri mapped to standard cells. Untagged blocks are mapped to CLK_FREE
cells, inherited from the empty string passed to -tag at the $sub1 block.
Example
set sub1 [data list cell_model $m/U1]
set sub2 [data list cell_model $m/U1/U2]
force model power_gating $sub1 -tag "" -type CLK_FREE -hier
force model power_gating $sub1 -tag ret_clk_low_des_key_r -type CLK_FREE -hier
force model power_gating $sub1 -tag ret_clk_free_desIn_r -type CLK_FREE -hier
force model power_gating $sub2 -tag ret_clk_low_key_sel_K_ri -type "" -hier

Reporting RTL Tagged Blocks in the Design


During analysis and elaboration (import rtl command), the RTL tags for power gating are parsed
and saved in the design database. After fix rtl, the tagged modules in the RTL can be examined with
the report power_gating command.
Example
report power_gating [data list model_cell $m/U1] -file power_gating_sub.rpt

This example creates a report for subblock U1 and puts the report in a file called
power_gating_sub.rpt.
During fix netlist, the delay primitives are replaced with retention flip-flops or latches. The retention
control pins are tied either high or low, depending upon the values of the map_to_logic or
power_gating_pin attributes.
If scan-equivalent cells exist for the retention registers, DFT scan replacement can replace retention
flip-flops with their scan equivalents. No additional steps to the normal DFT flow are required; the
DFT flow treats retention flip-flops in the same manner as normal flip-flops.

26

Talus 1.0

RTL Synthesis
Performing Power Gating by Mapping to Retention Flip-Flops

Implementing Power Gating Using a Cell-Based Method


You can override the RTL tag-based method by using the cell-based method. In this method, you use
the force model power_gating command to specify cell-specific mappings to retention flip-flops.
Any RTL tags that have been specified for this cell are superseded by the force model
power_gating command.
After fix rtl, sequential cells are implemented with the name assigned to them in the RTL with a suffix
of _reg. For example, consider the following Verilog RTL:
always @(posedge clk or negedge RESET)
begin
if(!RESET) key_c_r[0] <= 1'b1;
else
key_c_r[0] <= key_c;
end
During fix rtl a register called key_c_r_reg[0] will be inferred.
In the following example, the sequential cell key_c_r_reg[0] is mapped to retention cells of
connection class CLK_FREE. This mapping occurs after fix rtl.
Example
force model power_gating /work/top/top -cell key_c_r_reg[0] -type CLK_FREE

Stitching Retention Pins for Power Gating


You can stitch retention pins with the run plan power_gating_net command. This command only
connects retention cells that are either tied off to 1'b0 and 1'b1, or are disconnected. When a
retention flip-flop is inferred the retention pin will be tied off to 1'b0 or 1'b1 according to the
power_gating_pin_value attribute.
The run plan power_gating_net command detaches the retention control pins from 1'b0 or 1'b1 (if
the retention control pins are connected) and connects them to the user-specified retention control
pin:
run plan power_gating_net model power_gating_pin_class pin
The following example finds all retention flip-flop pins that have a power_gating_pin_string of RET1.
If these pins are either unconnected or are connected to 1'b0 or 1'b1, then the command reconnects
the pins to the pin $m/mpin:RET_CTRL. The command creates new nets (if necessary) to perform
the stitching:
run plan power_gating_net $m RET1 $m/mpin:RET_CTRL

Talus 1.0

27

RTL Synthesis
Performing Area-Based Optimization

Performing Area-Based Optimization


Before you perform area-based optimization, be sure to specify a scan style to prevent use of
nonscan cells or cells without a scan equivalent for functional mode implementation. For information
about selecting a scan style, see the Design for Test Technology Guide.
Use the fix netlist command to perform area-based optimization. Area-based optimization optimizes
the design for area and maps it to cell instances of HyperCell models. The fix netlist command
performs the following actions:

Binds unbound cells to the technology library

Removes tool-induced hierarchical barriers on inferred RTL modules


At this time in the flow, however, your user hierarchy is maintained. Later, after fix netlist, you
do explicit flattening as shown in Figure 1 on page 2.

Unmaps any cells that have been mapped to the technology library by exchanging them for
Magma primitive cells

Maps the RTL components to the technology library

Removes unreachable and floating cells from a mapped or unmapped model, and removes
single-input and constant cells from an unmapped model
A model cannot have floating logic when it is bound to the physical library. The sweep does
not traverse through latches and, by default, sweeps only the top-level model.

Checks the model for basic netlist and floorplan integrity

Searches for patterns in the logic and collapses primitive logic nodes into a large logic node
to represent each pattern
Collapsing the patterns prevents logic restructuring from destroying regular patterns.

Optimizes logic of primitives


You can control this step by setting the force gate independent command and the force
gate opt_mode command. For more information about optimizing primitive logic, see
Optimizing Boolean Logic During Area Optimization on page 30.

28

Maps primitive logic to technology-specific instances of HyperCell models based on the


specified technology library

Removes redundancies in the circuits of the model, using constant propagation and single
stuck-at fault testing (SAT) that uses SAT-based automatic test-pattern generation (ATPG)

Talus 1.0

RTL Synthesis
Performing Area-Based Optimization

Uniquifies the hierarchy by copying models


Most algorithms require a one-to-one relation between cells and models before such
algorithms can run properly. The hierarchy is not changed, but each model is guaranteed to
be instantiated only once. If more instantiations are found (multiple cells bound to the same
model), the model is copied so that a one-to-one relation between cells and models is
created.

Flags flip-flop models for replacement with their equivalent scan flip-flops
This step is invoked by fix netlist -scan and is for third-party DFT flows only; it is not
compatible with the Magma DFT flow.

Area-based optimization performs Boolean optimization on the design and maps it to the technology
library. Because no timing constraints are set yet, a smallest-area mapping is done. The result of
area-based optimization is a hierarchical design that is represented as instances of HyperCell
models.
During fix netlist, dont-care conditions for data-path elements are detected and automatically
optimized. The effort level for fix netlist must be set to high. No further action is required on your
part.

Understanding HyperCell Models


During area-based optimization, the fix netlist command maps the design to cell instances of
HyperCell models.
A HyperCell model is a scalable timing model that represents a range of cell sizes, rather than a
single size. During library preparation, the run prepare hyper command creates HyperCell models
by analyzing models with the same functionality and identical pin names to collect their properties.
Cells with the same function and identical pin names are called an entity. Thus, during library
preparation, a HyperCell model is created for each entity.
A HyperCell model contains timing and area descriptions for a collection of cell drive strengths of a
particular entity. A HyperCell model is a mathematical representation or equation that scales linearly
with the load it drives. The HyperCell model represents the relation between delay and strength in
the network. For each cell in the design, the mapping function represents the available tradeoffs
between the drive strengths and delays that the cell can have versus their effects on area.
Using HyperCell models, Talus Design makes rapid structural decisions for the design early in the
flow. Later, after physical optimization of the design, the instances of HyperCell models are mapped
back to actual library (standard) cells that match the required characteristics for the optimized logic.
Remapping occurs just before detailed placement, routing, and the other physical optimization steps.

Talus 1.0

29

RTL Synthesis
Performing Area-Based Optimization

Linking the Design to One or More Libraries


If your RTL design contains instantiations of library cells or uses modules from libraries other than
your technology library, you must link the design to the additional libraries.
Any unbound cells or direct instantiations of technology library cells (such as pad cells or special
buffers) in the RTL design are bound to library cells in fix netlist.
The fix netlist command uses the -no_uniquify option when binding the technology library. When
binding to another user library, using the -no_uniquify option can decrease the runtime of
subsequent steps in the flow without decreasing quality.

Preventing Propagation of X-States


Talus Design supports set/reset optimization to prevent the propagation of unknown states (X-states)
through the reset signal of a design.
If the design is still in RTL form, you can specify the appropriate pragma (synchronous or
asynchronous) in the RTL file. When using pragmas in the RTL file, specify the reset signal as the
first signal in the sensitivity list. This ensures that the RTL parser detects the reset signal. For a list of
supported pragmas and compiler directives, see Using Pragmas on page 5.

Implementing Synchronous Set and Reset


Talus Design detects and implements synchronous set/reset from the RTL constructs. The command
to control this is the config syncsetreset command.
If the config syncsetreset command is set to on, the mapper maps the flip-flops with set/reset logic
to flip-flop models in the library with built-in set/reset logic. For mapping to occur, such models must
exist in the library and using them must improve the cost function. If there are no models with built-in
set/reset in the library or using them does not improve the cost, the Talus Design mapper still
ensures that set/reset logic is preserved during mapping. The mapper also preserves the structure
and mapping of set/reset logic during the unmap/map cycles during timing optimization (fix time).

Optimizing Boolean Logic During Area Optimization


During area optimization, the fix netlist command optimizes Boolean logic. During Boolean
optimization, Talus Design analyzes the Boolean logic in your design to find opportunities for literal
reduction. The primary goal of Boolean optimization is to reduce area by reduction of Boolean
literals.

30

Talus 1.0

RTL Synthesis
Performing Area-Based Optimization

You can control Boolean optimization by:

Setting the logic partition size with the force gate independent command. For more
information, see the next section, Setting the Partition Size for Boolean Optimization.

Choosing between two optimization modes set by the force gate opt_mode command. For
more information, see Controlling Boolean Optimization in Timing-Critical Blocks on
page 32.

Setting the Partition Size for Boolean Optimization


The partition size is the amount of logic that the run gate independent command considers at one
time when performing Boolean optimization on your design. The partition size is expressed with a
number of Boolean literals. The run gate independent command considers portions of logic of this
size until all the Boolean logic is considered.
The larger the partition size, the more effective Boolean optimization can be; however, larger
partition sizes can have an impact on runtime and memory resources. Also, Boolean optimization
can sometimes result in small increases in delay.
You can set the partition size explicitly by using the force gate independent command on a block or
design. Use the -hier switch to set the partition size on a block and its subblocks.
When you run fix netlist -effort high, Talus Design implicitly sets the partition size to a global value
of 70,000 literals. When you run fix netlist, the global partition size is set to 15,000 literals. The
partition sizes that are implicitly set by the effort level of fix netlist are overridden by settings you
define with force gate independent at a lower level in the design. You can also use force gate
independent -hier at the top level of the design.
If a block is particularly timing critical, you can turn off Boolean optimization by setting the partition
size to zero.
In the following example, a partition size of 10,000 literals is set on the ALU block and any subblocks:
Example
force gate independent /work/ALU/ALU 10000 hier
To query or clear the partition size set on a block, use query gate independent or clear gate
independent.

Talus 1.0

31

RTL Synthesis
Performing Area-Based Optimization

Controlling Boolean Optimization in Timing-Critical Blocks


The run gate independent command performs Boolean optimization in either of two modes:

Area mode
In area mode, Boolean optimizations only goal is area reduction. It does not consider delay.
Use the force gate opt_mode area command to explicitly turn on the area mode of run gate
independent. Area mode is the default mode for the fix netlist command.

Delay mode
In delay mode, Boolean optimization considers impact on delay; however, the primary goal is
still area reduction. Use the force gate opt_mode delay command to turn on the delay mode
of run gate independent.

If a block is timing critical, you can use force gate opt_mode delay to turn on the delay mode of
Boolean optimization. However, delay mode can result in larger area than using area mode. If you
need to, you can completely turn off Boolean optimization as described in Setting the Partition Size
for Boolean Optimization on page 31.
To query or clear the mode setting on a block, use query gate opt_mode or clear gate opt_mode.

Exporting a Netlist
At any point in the flow, you can export a Verilog netlist using the export verilog netlist command.
Use the -minsize option to cause all the gates in the generated netlist to be implemented using
minimum drive gates from the technology library. Without this option, the netlist contains cell
instances of HyperCell models. A netlist with instances of HyperCell models cannot be simulated or
used for formal verification.

Example Performing Area-Based Optimization


Example 5 illustrates the following tasks:

32

Selects the muxed_flip_flop scan style

Performs area-based optimization

Exports a netlist

Talus 1.0

RTL Synthesis
Applying Timing Constraints

Example 5: Example Code for Area-Based Optimization


force dft scan style $m muxed_flip_flop
fix netlist $m $l
export verilog netlist $m ./netlists/chip.v -minsize

Applying Timing Constraints


You can apply timing constraints by importing SDC constraints or by using the force timing... group
of commands.

Importing SDC Constraints


The Magma design tools read an SDC constraints file and apply the constraints to the design.
The following example imports and applies SDC constraints from a file called constraints.sdc, then
saves a translation of the constraints to an M-Tcl file named translated.tcl:
import sdc $m constraints.sdc -translation translated.tcl
The translated.tcl file contains commands that have been translated from SDC format to Magma
format, including comments listing the original SDC commands. The translated.tcl file also lists any
constraints that are not translated to Magma format.
SDC does not specify design units. By default, the constraint reader uses the following units:

Nanoseconds for time

Picofarads for capacitance

Kilohms for resistance

Your constraints should use units that are consistent with the library. You can configure the constraint
reader to use different units using the config sdc unit... commands. These are the alternatives:
config sdc unit time {n p f}
config sdc unit capacitance {n p f}
config sdc unit resistance {k 1 m}

Talus 1.0

33

RTL Synthesis
Applying Timing Constraints

In these commands, the factor represents the multiplier used to describe the SDC constraint units,
as follows:
Factor Multiplier

Factor

Multiplier

nano

kilo

pico

femto

milli

Specifying Timing Constraints


Figure 7 shows the general flow for specifying timing constraints.
Figure 7: Flow for Specifying Timing Constraints
Define clock domains
(including latency and margin).

Define I/O timing.

Specify I/O adjustment.

Define constants (constant propagation).

Define timing exceptions.

For detailed information about specifying timing constraints, see the fix time information in the
Timing Constraints and Analysis Technology Guide.

34

Talus 1.0

RTL Synthesis
Flattening or Maintaining Hierarchy

Flattening or Maintaining Hierarchy


Talus Design is a high-capacity RTL, DFT, and physical synthesis tool capable of handling
multimillion-gate designs. Flattening of design hierarchy is recommended for optimal quality of
results (QOR).
Talus Design distinguishes between the concepts of flattening and maintaining hierarchy as follows:

Flattening physically collapses subdesigns into parent designs. By removing unnecessary


levels of design hierarchy, Talus Design improves QOR.

Maintaining hierarchy performs no action, other than marking levels of hierarchy that must be
preserved during the exporting of netlists and other data formats. Typical applications are to
preserve subdesigns in the netlist for functional and timing (SDF back-annotation) simulation,
formal verification, third-party ATPG manipulation, and visual inspection.

The recommended practice is to:

Flatten all design hierarchy, subject to runtime and capacity limits, as early as possible in the
flow.

Maintain only those parts of the design hierarchy required by your flow.

Note: If you want to turn on register retiming, you must do so before flattening your design. For
more information on register retiming during timing-driving optimization, see Enabling
Register Retiming on page 39.

Flattening in the Talus Design Flow


The fix rtl command performs an initial partial flattening. This is intended for user-inferred
subdesigns, with a threshold size determined by the config rtl flatten or force rtl flatten
commands. A suggested threshold for initial flattening is 5K estimated gates.
Flattening (full or partial) of the design can be performed at one of several points in the design flow.
Considerations affecting the point at which flattening is performed include:

DFT strategy
Will you do flat or hierarchical insertion with Talus Design? Does your scan flow interface with
third-party DFT tools?

Back-end strategy
For a Talus Design flat flow, the design should be fully flattened (for optimal results) before
placement by the fix cell command.

Talus 1.0

35

RTL Synthesis
Performing Constraint-Based Optimization

For a prototyping flow, the design hierarchy can be partitioned into GlassBox abstractions for
physical implementation.
Typically, the full flattening step is performed at the earliest possible of the following points:

After fix netlist (flat DFT and physical optimization flow)

Before fix time (hierarchical DFT, flat physical optimization flow)

Before fix cell (hierarchical floorplanning flow)

Performing Explicit Flattening on a Cell or Model


This is the command for explicitly flattening a design hierarchy from model $m downward:
data flatten $m
Note: For a cell, perform the flattening operation on the model, and not the cell itself. Hence, the
correct flatten command for a cell is as follows:
data flatten [data list cell_model $c]
Flattening is not applied to any model or cell that is subject to a force keep command.
For information on the force maintain command, see the online man page. Typically, the force
maintain command is applied after RTL elaboration, before the model is flattened.

Performing Constraint-Based Optimization


The fix time command runs constraint-based optimization.
During constraint-based optimization, the design is optimized using your timing and scan constraints;
cells are mapped to technology-specific instances of HyperCell models (for more information about
HyperCell models, see Understanding HyperCell Models on page 29). Constraint-based
optimization unmaps and maps logic on noncritical paths before mapping logic for critical paths. Gate
delays are adjusted by restructuring logic and by adjusting the strengths of instances of HyperCell
models to meet your timing requirements.

36

Talus 1.0

RTL Synthesis
Performing Constraint-Based Optimization

Mapping and Restructuring Logic in fix time


The fix time command maps and restructures logic as follows:

Maps primitive logic that has been unmapped by the run gate unmap command to
technology-specific instances of HyperCell models based on the specified technology library.

Improves the timing of critical paths by moving late-arriving negative-slack signals to the top
of the logic cones involved in the critical path. Also propagates constants, removes dead
logic, and attempts to limit cell count increase.
Logic restructuring occurs on each path within a slack range in order of path criticality.

Restructures logic on critical paths and performs architecture swapping to reduce delay.

After constraint-based optimization, do floorplanning and physical synthesis. For more information
on floorplanning, see the Floorplanning Technology Guide. For more information on physical
synthesis, see the Physical Synthesis chapter in the Design Optimization Technology Guide.

Grouping Paths for Timing Optimization


To create path groups or add a path to an existing group, use the force gate pathgroup command.
Optimization of path groups is supported during fix time.
Path grouping enables you to cluster various paths of a design into individually optimizable
collections. Path grouping is a useful technique for designs that contain a large number of failing
paths that are within a particular area of a design. During timing optimization, such failing paths can
prevent optimization on other portions of the circuit and reduce QOR for both timing and area.
By defining path groups, you can use a multipass flow to handle such problem paths. For example, at
the end of the first iteration of optimization you can define path groups as critical paths, then optimize
the path groups separately in the next iteration. This is especially useful for designs with preliminary
constraints, or for any other situation in which some paths might be overconstrained.
Any timing point in the design can be a member of only one path group. A warning is issued if the
specified points are not timing endpoints; however, you can add any pin in the design to a path
group. If the group does not exist, it is created and the specified pins are added. The path is defined
as the worst path to the specified endpoint.
You can define path grouping in a design based on clock domains or on user-defined specifications.
You can define path groups and continue your synthesis flow with the existing Volcano (as in a fix
cell multipass flow), or you can write out path group force... commands in an M-Tcl file and reload
them later. You can define path groups in terms of slack ranges, or you can explicitly specify the
endpoints. You can report the slack for each path group.

Talus 1.0

37

RTL Synthesis
Performing Constraint-Based Optimization

For details, syntax, and examples of path group commands, see the online man pages for the
following commands:

clear gate pathgroup

force gate pathgroup

query gate pathgroup

report force pathgroup

report pathgroup

report timing path

run gate speed

Defining a path group for each different clock domain results in better optimization, and though
optimization can occur for paths that have different clock domains, currently such paths cannot be in
the same group.
Example
force gate pathgroup $m group_name tcl_list_endpoints
clear gate pathgroup [$m | -group A | -point pin_name]
where

$m Clears all path groups.

$m -group A Clears path group A.

$m -point pinname Removes pin pinname from its path group if the pin is a member of a
group.

If you clear a pin that is the last member a group, the group is also cleared.
To query the path group for an element, run:
query gate pathgroup

38

Talus 1.0

RTL Synthesis
Performing Constraint-Based Optimization

Example
query gate pathgroup $m end_point
report pathgroup $m [-group group | -short

where

-group Prints only the specific group.

-short Prints all defined groups without their members.

Enabling Register Retiming


During timing-driven optimization, situations can occur in which timing can be improved if
combinational logic is moved across a register boundary. This allows register retiming between the
problem registers.
To enable retiming, use config gate retime or force gate retime. To perform register retiming
explicitly, use the run gate retime command. Retiming is off by default.
When turned on, register retiming:

Works on any uniquified and mapped design

Preserves hierarchy (user-specified or tool-induced)

Maintains set/clear states

Simultaneously retimes all models for which retiming is enabled

No grouping is needed and no special constraints for retiming are required. Follow these steps to
enable register retiming:
1. Set force gate retime on all blocks to be retimed.
2. Run retiming on the top-level model (run gate retime).
The register retiming commands are as follows:

config gate retime

force gate retime

force gate pipeline

run gate retime

run gate unclockgate

Talus 1.0

39

RTL Synthesis
Performing Constraint-Based Optimization

See the online man pages for detailed examples and syntax.
Retiming can change the timing and flip-flop count in the circuit and has an indirect impact on the
combinational area. For best results, follow these guidelines and techniques during retiming:

Where possible, flatten hierarchy inside blocks that will be retimed. This gives the retiming
engine more flexibility when moving flip-flops.

If there are synchronous controls in the logic, decompose complex flip-flops by setting config
gate retime -decomp on.

If only a few blocks need to be retimed, use config gate retime -mode flop_slack to
improve results by increasing the scope of retiming.

Use a MUXed flip-flop style and retime register before scan insertion. You can use force dft
scan style $m muxed_flip_flop to set the MUXed scan style. Register retiming does not
retime flip-flops on scan chains.

Avoid clockgating on paths that need to be retimed.

Important: Retiming changes the timing nodes in a design; therefore, be aware that register
retiming has implications for formal verification. Register retiming is not supported for
designs that are placed.

40

Talus 1.0

RTL Synthesis
Example Script for RTL Synthesis

Example Script for RTL Synthesis


Example 6: Script for Performing RTL Synthesis
set m /work/chip/chip
set l /cmos_tech18

# Set m to the top-level model


# Set l to the library

# IMPORT LIBRARY VOLCANO


import volcano ./volcanoes/cmos_tech18.volcano

# Import the prepared library


# volcano (if not already loaded)

# IMPORT RTL
import rtl -analyze -verilog \
-include ./include_dir \
./rtl/chip.v ./rtl/cntr.v \
./rtl/interface.v

# Import all the RTL files for the design

run rtl elaborate chip

# Elaborate the design

# HIGH-LEVEL OPTIMIZATION
config rtl clockgate on \
-style latched -min_bits 10
config scan clockgate test_mode test_ena

fix rtl $m
# AREA-BASED OPTIMIZATION
force dft scan style $m muxed_flip_flop

# Insert latch-based clockgating


# with a bit width threshold of 10 bits
# Insert testability and use the pin
# test_ena to control the testability
# logic

# Perform high-level optimization


# Select muxed_flip_flop scan style

fix netlist $m $l

# Perform area-based optimization

export verilog netlist $m \


./netlists/chip.v -minsize

# Export a netlist

Talus 1.0

41

RTL Synthesis
Example Script for RTL Synthesis

# APPLY TIMING AND TEST CONSTRAINTS


force maintain /work/cntr/cntr
force timing clock $m/clk1 9n -waveform
force
force
force
force

timing
timing
timing
timing

timing
timing
timing
timing

{-rise

3n -fall 6n}

latency $m/clk1 2ns -type source


latency $m/clk1 1ns -type network
margin $m/clk1 setup 100p
margin $m/clk1 hold 50ps

force timing clock $m/clk2 9n -waveform


force
force
force
force

# Maintain the controller


# model boundaries

{-rise

# Define the clock


# domains

3n -fall 6n}

latency $m/clk2 2ns -type source


latency $m/clk2 1.5ns -type network
margin $m/clk2 setup 100p
margin $m/clk2 hold 50ps

# Define I/O constraints


force timing delay $m/clk1 \
"[data list "model_pin -direction in" $m]" -time 1.5n
force timing check \
"[data list "model_pin -direction out" $m]" $m/clk2 -time 1n
config timing propagate constants combinational
force timing constant $m/test_mode 0

# Set up constants for


# constant propagation

force timing false -from $m/clk1 -to $m/clk2


force timing false -from $m/clk2 -to $m/clk1

# Define timing exceptions

# FLATTEN

# Flatten the design

data flatten $m

# TEST CONSTRAINTS
force dft scan include $m
force dft scan disable $m
force dft scan chain $m 0
force dft scan chain $m 1
force dft scan control $m

42

# Exclude any registers


# from scan, or assign
# specific registers to a
# specific scan chain
"$m/cntr2/out2_reg $m/cntr1/out1_reg" 0
$m/cntr1/out2_reg
sdi1 sdo1
# Define the scan chains
sdi2 sdo2
# Define the
sena scan_enable

# scan control signals

Talus 1.0

RTL Synthesis
Example Script for RTL Synthesis

# INSERT SCAN
run dft scan insert $m $l

# Insert the scan chains

# CONSTRAINT-BASED OPTIMIZATION
fix time $m $l -libonly

# Run constraint-based
# optimization

export verilog netlist $m \


./netlists/chip_ft.v -minsize

# Export a netlist

# TIMING AND STRENGTH ANALYSIS


# Configure the timing report
config report timing path \
"PIN_NAME NET_NAME DELAY STRENGTH TOTAL_LOAD SLEW SLACK"

# Report the worst-case path

report timing path $m


config report timing node

{NODE_NAME

report timing node $m -endpoints \


-sort ENDPOINT_GAIN -file gain.rpt

Talus 1.0

ENDPOINT_GAIN}

# Configure the
# endpoint strength report
# Report the endpoint strength

43

RTL Synthesis
Example Script for RTL Synthesis

Copyright 19972008 Magma Design Automation, Inc.


RTL Synthesis
This document, as well as the software described in it, are furnished under license and can be used or copied only in
accordance with the terms of such license. The content of this document is furnished for information use only, is subject to
change without notice, and should not be construed as a commitment by Magma Design Automation Inc. Magma Design
Automation Inc. assumes no responsibility or liability for any errors, omissions, or inaccuracies that might appear in this
book.
Except as permitted by such license, no part of this publication can be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, recording, or otherwise, without the prior written
permission of Magma Design Automation Inc. Further, this document and the software described in it constitute the
confidential information of Magma Design Automation Inc. and cannot be disclosed within your company or to any third
party except as expressly permitted by such license.
The absence of a name, tagline, symbol or logo in these lists does not constitute a waiver of any and all intellectual property
rights that Magma Design Automation Inc. has established in any of its product, feature, or service names or logos.

Registered Trademarks
Magma, the Magma logo, Magma Design Automation, Blast Chip, Blast Fusion, Blast Gates, Blast Noise, Blast RTL, Blast
Speed, Blast Wrap, FixedTiming, MegaLab, Melting Logical & Physical Design, MOLTEN, QuickCap, SiliconSmart, Talus,
and YieldManager are registered trademarks of Magma Design Automation Inc.

Trademarks
ArchEvaluator, Automated Chip Creation, Blast Create, Blast DFT, Blast FPGA, Blast Logic, Blast Plan, Blast Power, Blast
Prototype, Blast Rail, Blast SA, Blast View, Blast Yield, Camelot, Characterization-to-Silicon, Design Ahead of the Curve,
Diamond SI, Fastest Path from RTL to Silicon, FineSim, FineWave, GlassBox, Hydra, HyperCell, MagmaCast, Merlin,
Native Parallel Technology, PALACE, Physical Netlist, Quartz, QuickInd, QuickRules, Relative Floorplanning Constraints,
Relative Placement Constraint, RioMagic, Sign-off in the Loop, Silicon Integrity, SiliconSmart CR, SiliconSmart I/O,
SiliconSmart MR, SiliconSmart SI, Smart Sampling, SuperSite, Titan, and Volcano are trademarks of Magma Design
Automation Inc.
Sun, Sun Microsystems, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United
States and in other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks
of SPARC International, Inc. in the United States and in other countries. UNIX is a registered trademark of The Open
Group.
All other trademarks are the property of their respective owners.
Notice to U.S. government end users. The software and documentation are "commercial items," as that term is defined at
48 C.F.R. 2.101, consisting of "commercial computer software" and "commercial computer software documentation," as
such terms are used in 48 C.F.R. 12.212 or 48 C.F.R. 227.7202, as applicable. Consistent with 48 C.F.R. 12.212 or 48
C.F.R. 227.7202-1 through 227.7202-4, as applicable, the commercial computer software and commercial computer
software documentation are being licensed to U.S. government end users (A) only as commercial items and (B) with only
those rights as are granted to all other end users pursuant to the terms and conditions set forth in the Magma standard
commercial agreement for this software. Unpublished rights reserved under the copyright laws of the United States.
Magma trademarks, taglines, symbols, and logo are registered trademarks or trademarks of Magma Design Automation
Inc., in the United States and/or other countries. This trademark list is provided for informational purposes only; Magma
Design Automation Inc. does not provide any express or implicit warranties or guarantees with respect to the information
provided in this document.

Printed in the U.S.A.

44

Talus 1.0

Vous aimerez peut-être aussi