Vous êtes sur la page 1sur 112

Datapath Synthesis in Encounter RTL

Compiler
Product Version 11.2
March 2012
2003-2012 Cadence Design Systems, Inc. All rights reserved.
Printed in the United States of America.
Cadence Design Systems, Inc. (Cadence), 2655 Seely Ave., San Jose, CA 95134, USA.
Open SystemC, Open SystemC Initiative, OSCI, SystemC, and SystemC Initiative are trademarks or registered
trademarks of Open SystemC Initiative, Inc. in the United States and other countries and are used with
permission.
Trademarks: Trademarks and service marks of Cadence Design Systems, Inc. contained in this document are
attributed to Cadence with the appropriate symbol. For queries regarding Cadences trademarks, contact the
corporate legal department at the address shown above or call 800.862.4522. All other trademarks are the
property of their respective holders.
Restricted Permission: This publication is protected by copyright law and international treaties and contains
trade secrets and proprietary information owned by Cadence. Unauthorized reproduction or distribution of this
publication, or any portion of it, may result in civil and criminal penalties. Except as specified in this permission
statement, this publication may not be copied, reproduced, modified, published, uploaded, posted, transmitted,
or distributed in any way, without prior written permission from Cadence. Unless otherwise agreed to by
Cadence in writing, this statement grants Cadence customers permission to print one (1) hard copy of this
publication subject to the following conditions:
1. The publication may be used only in accordance with a written agreement between Cadence and its
customer.
2. The publication may not be modified in any way.
3. Any authorized copy of the publication or portion thereof must include all original copyright, trademark,
and other proprietary notices and this permission statement.
4. The information contained in this document cannot be used in the development of like products or
software, whether for internal or external use, and shall not be used for the benefit of any other party,
whether or not for consideration.
Patents: Cadence Product Encounter RTL Compiler described in this document, is protected by U.S. Patents
[5,892,687]; [6,470,486]; 6,772,398]; [6,772,399]; [6,807,651]; [6,832,357]; [7,007,247]; and [8,127,260]
Disclaimer: Information in this publication is subject to change without notice and does not represent a
commitment on the part of Cadence. Except as may be explicitly set forth in such agreement, Cadence does
not make, and expressly disclaims, any representations or warranties as to the completeness, accuracy or
usefulness of the information contained in this document. Cadence does not warrant that use of such
information will not infringe any third party rights, nor does Cadence assume any liability for damages or costs
of any kind that may result from use of such information.
Restricted Rights: Use, duplication, or disclosure by the Government is subject to restrictions as set forth in
FAR52.227-14 and DFAR252.227-7013 et seq. or its successor
Datapath Synthesis in Encounter RTL Compiler

Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

List of Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
About This Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Additional References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
How to Use the Documentation Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Reporting Problems or Errors in Manuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Customer Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Cadence Online Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Other Support Offerings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Man Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Command-Line Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Getting the Syntax for a Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Getting the Syntax for an Attribute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Searching for Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Searching For Commands When You Are Unsure of the Name . . . . . . . . . . . . . . . . 21
Documentation Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Text Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1
Datapath Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Overview of Datapath Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Supported HDL Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Sharing Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Controlling Sharing Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Speculation Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Controlling Speculation Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

March 2012 3 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

Carry-Save Adder Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


Controlling CSA Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Typical Transformation Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Preventing CSA Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Context-Driven Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Target Library-Based Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Timing-Driven Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Dynamic Datapath Block Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Controlling Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Controlling Sub-Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Improving Quality of Results (QoR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2
Datapath Reporting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Overview of Datapath Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Interpreting the Datapath Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Module and Instance Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Operator Type and Signed Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Width of Input and Output Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
File Name, Line Number, and Column Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Report Datapath for RTL Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Using the report datapath Command at Different Stages in the Design Flow . . . . . . . . . 61
elaboration Design Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
synthesize -to_generic Design Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
synthesize -to_mapped Design Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3
RC-Datapath (RC-DP) Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Datapath Function Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Primitive Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
$abs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

March 2012 4 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

$blend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
$carrysave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
$intround . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
$lead0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
$lead1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
$rotatel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
$rotater . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
$round . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Parameter Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
ChipWare Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4
Datapath Coding Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Starting from the RTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Importing the Gate-Level Netlist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Keeping Relevant Operators in the Same Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Inferring Datapath Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Applying Carry-save Arithmetic Automatically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Inferring a Constant Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Using Signed Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Using Signed Part-Select / Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Implementing Programmable Unsigned and Signed Multiplier . . . . . . . . . . . . . . . . . . 91
Mixing Unsigned and Signed Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Avoiding Manual Signal Extension Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
VHDL Signed/Unsigned Type Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Inferring a Square Automatically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Implying Upper-Bit Truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Following Self-Determined Bit Width Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Avoiding Instantiated Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Arithmetically Complementing an Operand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

March 2012 5 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

March 2012 6 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

List of Figures
Figure 1-1 Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 1-2 Speculation (Unsharing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Figure 1-3 Carry-save Transformation on the Critical Path. . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 1-4 Carry-save Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Figure 1-5 Timing-Driven Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

March 2012 7 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

March 2012 8 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

List of Examples
Example 1-1 HDL description of a multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Example 1-2 HDL description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Example 1-3 Merging Across HDL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Example 1-4 Merging Product-of-Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Example 1-5 Merging a Comparator with Arithmetic Operators . . . . . . . . . . . . . . . . . . . . . 37
Example 1-6 Merging One Upstream Operator to Multiple Downstream Operators . . . . . 37
Example 1-7 Transforming CSA over a Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Example 1-8 Transforming CSA over an Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Example 1-9 Arithmetic with Lower-Bit Truncation, Truncation after Addition . . . . . . . . . . 39
Example 1-10 Instantiated Operators Cannot Be Merged . . . . . . . . . . . . . . . . . . . . . . . . . 40
Example 1-11 Inferred Operators Can Be Merged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Example 1-12 Gate-Level Netlist Cannot Be Merged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Example 1-13 Non-Interacting Operators Cannot Be Merged . . . . . . . . . . . . . . . . . . . . . . 43
Example 2-1 report datapath Header Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Example 2-2 Area Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Example 2-3 Design with Datapath Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Example 2-4 Tcl Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Example 2-5 Results of report datapath After Using the Elaborate Command . . . . . . . . . 55
Example 2-6 Report datapath Results after Using synthesize -to_generic . . . . . . . . . . . . 56
Example 2-7 Verilog Design with Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Example 2-8 report datapath for RTL Sharing that Shows the Mux Operator . . . . . . . . . . 60
Example 3-1 $abs Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Example 3-2 $abs Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Example 3-3 $blend Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Example 3-4 $blend Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Example 3-5 $carrysave Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Example 3-6 $intround Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Example 3-7 $lead0 Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

March 2012 9 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

Example 3-8 $lead0 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72


Example 3-9 $lead1 Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Example 3-10 $lead1 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Example 3-11 $rotatel Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Example 3-12 $rotate1 Simulation Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Example 3-13 $rotater Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Example 3-14 $rotater Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Example 3-15 Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Example 3-16 $round Simulation Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Example 4-1 Keeping Operators in Separate Levels of the Design Hierarchy . . . . . . . . . . 81
Example 4-2 Inferring Operators at the Same Level of Design Hierarchy . . . . . . . . . . . . . 83
Example 4-3 Inferring Operators at the Same Level of Design Hierarchy . . . . . . . . . . . . . 83
Example 4-4 Manual Shift-and-Add Operations in Verilog-1995 . . . . . . . . . . . . . . . . . . . . 86
Example 4-5 Recommended Coding for Unsigned Constant Multiplication . . . . . . . . . . . . 86
Example 4-6 Manual Shift-and-Add Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Example 4-7 Recommended Coding Style for Signed Constant Multiplication . . . . . . . . . 87
Example 4-8 Signed Addition in Verilog-1995 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Example 4-9 Signed Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Example 4-10 Signed Multiplication in Verilog-1995 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Example 4-11 Signed Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Example 4-12 Using Part-Select of Signed Vectors Become Unsigned . . . . . . . . . . . . . . . 90
Example 4-13 Correct Coding Style Using Part-Select of Signed Vectors . . . . . . . . . . . . . 90
Example 4-14 Using Two Operators to Implement Switchable Unsigned and Signed Multiplier
91
Example 4-15 Using a Single Operator to Implement Switchable Unsigned and Signed Mul-
tiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Example 4-16 Mixing Operands of Different Sign Type Results in Unsigned Functionality 94
Example 4-17 Inferring a Signed Multiplier when Mixing Operands of Different Sign Type 94
Example 4-18 Mixing Operands of Different Sign Type Results in Unsigned Functionality 94
Example 4-19 Inferring a Signed Multiplier when Mixing Operands of Different Sign Type 94
Example 4-20 Manual Zero-Extension for Unsigned Signals . . . . . . . . . . . . . . . . . . . . . . . 95

March 2012 10 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

Example 4-21 Recommended Coding Style for Unsigned Signals. . . . . . . . . . . . . . . . . . . 95


Example 4-22 Manual Sign-Extension for Signed Signals . . . . . . . . . . . . . . . . . . . . . . . . . 95
Example 4-23 Recommended Coding Style for Signed Signals. . . . . . . . . . . . . . . . . . . . . 95
Example 4-24 Inconsistency Between Declaration and Usage . . . . . . . . . . . . . . . . . . . . . 96
Example 4-25 Using Consistent Numeric Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Example 4-26 Inferring an Unsigned Square with an Unsigned Signal . . . . . . . . . . . . . . . 97
Example 4-27 Inferring a Signed Square with a Signed Signal . . . . . . . . . . . . . . . . . . . . . 97
Example 4-28 Operator Merging is Allowed if Truncation Does Not Affect Final Outcome 98
Example 4-29 Arithmetic With Full Precision Facilitates Operator Merging . . . . . . . . . . . . 99
Example 4-30 Mixture of Implied Upper-Bit Truncation and Full Precision Arithmetic May Hurt
Operator Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Example 4-31 Mixture of Explicit Upper-Bit Truncation and Full-Precision Arithmetic May Still
Allow Operator Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Example 4-32 Design That Triggers the Self-Determined Rule of Addition . . . . . . . . . . . 102
Example 4-33 LRM Interpretation of Example 4-32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Example 4-34 Merging-Inclined Variation of Example 4-32 . . . . . . . . . . . . . . . . . . . . . . . 102
Example 4-35 Design That Triggers the Self-Determined Rule of Multiplication . . . . . . . 103
Example 4-36 LRM Interpretation Example 4-36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Example 4-37 Merging-Inclined Variation of Example 4-35 . . . . . . . . . . . . . . . . . . . . . . . 103
Example 4-38 Instantiating Arithmetic DW Components Results in Bad QoS . . . . . . . . . 104
Example 4-39 Writing Arithmetic Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Example 4-40 Arithmetically Complementing Operands . . . . . . . . . . . . . . . . . . . . . . . . . 107
Example 4-41 Manually Complementing Operands Prevents SoP Extraction . . . . . . . . . 108

March 2012 11 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

March 2012 12 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

Preface

About This Manual on page 14


Additional References on page 14
How to Use the Documentation Set on page 15
Customer Support on page 17
Messages on page 18
Man Pages on page 19
Command-Line Help on page 20
Documentation Conventions on page 22

March 2012 13 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Preface

About This Manual


This manual describes datapath synthesis in RTL Compiler.

Additional References
The following sources are helpful references, but are not included with the product
documentation:
TclTutor, a computer aided instruction package for learning the Tcl language:
http://www.msen.com/~clif/TclTutor.html.
TCL Reference, Tcl and the Tk Toolkit, John K. Ousterhout, Addison-Wesley
Publishing Company
IEEE Standard Hardware Description Language Based on the Verilog Hardware
Description Language (IEEE Std.1364-1995)
IEEE Standard Hardware Description Language Based on the Verilog Hardware
Description Language (IEEE Std. 1364-2001)
IEEE Standard VHDL Language Reference Manual (IEEE Std. 1076-1987)
IEEE Standard VHDL Language Reference Manual (IEEE Std. 1076-1993)
Note: For information on purchasing IEEE specifications go to http://shop.ieee.org/store/ and
click on Standards.

March 2012 14 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Preface

How to Use the Documentation Set

Cadence Installation Guide


INSTALLATION AND CONFIGURATION
Cadence License Manager

README File

README File
NEW FEATURES AND
SOLUTIONS TO PROBLEMS
Whats New in Encounter RTL
Compiler

Known Problems and Solutions in


Encounter RTL Compiler

Getting Started with Encounter


RTL Compiler

Using Encounter RTL Compiler

TASKS AND CONCEPTS

HDL Modeling in Encounter RTL ChipWare Developer in Library Guide for Encounter RTL
Compiler Encounter RTL Compiler Compiler

Setting Constraints
Datapath Synthesis in Low Power in Design for Test in
and Performing Timing
Encounter RTL Encounter RTL Encounter RTL
Analysis in Encounter
Compiler Compiler Compiler
RTL Compiler

REFERENCES

Attribute Reference Command Reference ChipWare in GUI Guide for Quick Reference for
for Encounter RTL for Encounter RTL Encounter RTL Encounter RTL Encounter RTL
Compiler Compiler Compiler Compiler Compiler

March 2012 15 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Preface

Reporting Problems or Errors in Manuals


The Cadence Help online documentation, lets you view, search, and print Cadence product
documentation. You can access Cadence Help by typing cdnshelp from your Cadence tools
hierarchy.

Contact Cadence Customer Support to file a PCR if you find


An error in the manual
Any missing information in a manual
A problem using the Cadence Help documentation system

March 2012 16 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Preface

Customer Support
Cadence offers live and online support, as well as customer education and training programs.

Cadence Online Support


The Cadence online support website offers answers to your most common technical
questions. It lets you search more than 40,000 FAQs, notifications, software updates, and
technical solutions documents that give you step-by-step instructions on how to solve known
problems. It also gives you product-specific e-mail notifications, software updates, service
request tracking, up-to-date release information, full site search capabilities, software update
ordering, and much more.

For more information on Cadence online support go to:

http://support.cadence.com

Other Support Offerings


Support centersProvide live customer support from Cadence experts who can
answer many questions related to products and platforms.
Software downloadsProvide you with the latest versions of Cadence products.
Education servicesOffers instructor-led classes, self-paced Internet, and virtual
classroom.
University software program supportProvides you with the latest information to
answer your technical questions.

For more information on these support offerings go to:

http://www.cadence.com/support

March 2012 17 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Preface

Messages
From within RTL Compiler there are two ways to get information about messages.
Use the report messages command.
For example:
rc:/> report messages

This returns the detailed information for each message output in your current RTL
Compiler run. It also includes a summary of how many times each message was issued.
Use the man command.
Note: You can only use the man command for messages within RTL Compiler.
For example, to get more information about the TIM-11 message, type the following
command:
rc:/> man TIM-11

If you do not get the details that you need or do not understand a message, contact Cadence
Customer Support to file a CCR.

March 2012 18 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Preface

Man Pages
In addition to the Command and Attribute References, you can also access information about
the commands and attributes using the man pages in RTL Compiler. Man pages contain the
same content as the Command and Attribute References. To use the man pages from the
UNIX shell:
1. Set your environment to view the correct directory:
setenv MANPATH $CDN_SYNTH_ROOT/share/synth/man

2. Enter the name of the command or attribute that you want either in RTL Compiler or
within the UNIX shell. For example:
man check_dft_rules
man cell_leakage_power

You can also use the more command, which behaves like its UNIX counterpart. If the output
of a manpage is too small to be displayed completely on the screen, use the more command
to break up the output. Use the spacebar to page forward, like the UNIX more command.
rc:/> more report timing

March 2012 19 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Preface

Command-Line Help
You can get quick syntax help for commands and attributes at the RTL Compiler command-
line prompt. There are also enhanced search capabilities so you can more easily search for
the command or attribute that you need.
Note: The command syntax representation in this document does not necessarily match the
information that you get when you type help command_name. In many cases, the order of
the arguments is different. Furthermore, the syntax in this document includes all of the
dependencies, where the help information does this only to a certain degree.

If you have any suggestions for improving the command-line help, please e-mail them to:
rc_pubs@cadence.com

Getting the Syntax for a Command


Type the help command followed by the command name. For example:
rc:/> help path_delay

This returns the syntax for the path_delay command.

Getting the Syntax for an Attribute


Type the following:
rc:/> get_attribute attribute name * -help

For example:
rc:/> get_attribute max_transition * -help

This returns the syntax for the max_transition attribute.

March 2012 20 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Preface

Searching for Attributes


Get a list of all the available attributes by typing the following command:
rc:/> get_attribute * * -help

Type a sequence of letters after the set_attribute command and press Tab to get a
list of all attributes that contain those letters. For example:
rc:/> set_attr li
ambiguous "li": lib_lef_consistency_check_enable lib_search_path libcell
liberty_attributes libpin library library_domain line_number

Searching For Commands When You Are Unsure of the Name


You can use help to find a command if you only know part of its name, even as little as one
letter.
You can type a single letter and press Tab to get a list of all commands that start with
that letter.
For example:
rc:/> c <Tab>

This returns the following commands:


ambiguous "c": cache_vname calling_proc case catch cd cdsdoc change_names
check_dft_rules chipware clear clock clock_gating clock_ports close cmdExpand
command_is_complete concat configure_pad_dft connect_scan_chains continue
cwd_install ..

You can type a sequence of letters and press Tab to get a list of all commands that start
with those letters.
For example:
rc:/> path_<Tab>

This returns the following commands:


ambiguous command name "path_": path_adjust path_delay path_disable path_group

March 2012 21 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Preface

Documentation Conventions

Text Command Syntax


The list below defines the syntax conventions used for the RTL Compiler text interface
commands.

literal Nonitalic words indicate keywords you enter literally. These


keywords represent command or option names.

arguments and options Words in italics indicate user-defined arguments or information


for which you must substitute a name or a value.

| Vertical bars (OR-bars) separate possible choices for a single


argument.

[ ] Brackets indicate optional arguments. When used with OR-bars,


they enclose a list of choices from which you can choose one.

{ } Braces indicate that a choice is required from the list of


arguments separated by OR-bars. Choose one from the list.

{argument1 | argument2 | argument3}

{ } Braces, used in Tcl commands, indicate that the braces must be


typed in.

... Three dots (...) indicate that you can repeat the previous
argument. If the three dots are used with brackets (that is,
[argument]...), you can specify zero or more arguments. If
the three dots are used without brackets (argument...), you
must specify at least one argument.

# The pound sign precedes comments in command files.

March 2012 22 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

1
Datapath Optimization

Overview of Datapath Optimization on page 24


Sharing Transformations on page 25
Controlling Sharing Transformations on page 27
Speculation Transformations on page 28
Controlling Speculation Transformations on page 30
Carry-Save Adder Transformations on page 31
Controlling CSA Transformations on page 34
Typical Transformation Scenarios on page 34
Preventing CSA Transformations on page 40
Architecture Selection on page 44
Context-Driven Architecture Selection on page 44
Target Library-Based Architecture Selection on page 44
Timing-Driven Architecture Selection on page 45
Dynamic Datapath Block Generation on page 45
Controlling Architecture Selection on page 46
Controlling Sub-Architecture Selection on page 46
Improving Quality of Results (QoR) on page 48

March 2012 23 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Overview of Datapath Optimization


Datapath synthesis in RTL Compiler starts from the RTL code that infers or directly describes
datapath logic. The RTL code is written in industry-standard design description languages,
such as Verilog or VHDL. RTL Compiler reads in the RTL code and synthesizes it down to
gates. Equipped with built-in datapath knowledge, RTL Compiler performs operator-level
optimizations and constructs gate-level datapath structures. This combines datapath
synthesis and logic synthesis in one tool in standard synthesis flows. Datapath synthesis in
RTL Compiler does not involve layout generation or regularity-driven placement (tiling).

This chapter describes datapath optimization performed in RTL Compiler, as well as how to
optimize RTL code to improve synthesis quality of results of datapath designs.

Supported HDL Operators


Verilog-1995
arithmetic: +, -, unary -, *, /,%
shifting: >>, <<
relational: ==,!=, >, >=, <, <=
Verilog 2001
All the Verilog-1995 operators plus the following:
arithmetic: ** (in a limited fashion)
shifting: <<<, >>>
The ** operator is supported in the following two style expressions:
constant ** constant
2 ** variable
VHDL-1987
arithmetic: +, -, unary -, *, /, mod, rem, abs
relational: ==, /=, >, >=, <, <=
VHDL-1993:
All the VHDL-1987 operators plus the following:
shift/rotate: sll, srl, sla, sra, rol, ror

March 2012 24 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Sharing Transformations
Sharing is an optimization technique that reclaims design area (reduces gate count) without
degrading design performance. This optimization is based on the principle that two similar
arithmetic operations can be performed on one arithmetic hardware component if they are
never used at the same time.

This technique shares hardware resources across portions of the design. During the
synthesis flow, resource sharing is performed automatically to reduce area. Sharing is
performed by merging two datapath modules into one, which guarantees that the design
timing constraints are not violated. This results in a design with smaller area and possibly
reduced delay.

Example 1-1 on page 25 shows the HDL description of a multiplexer with two multipliers and
a corresponding diagram. After sharing, there is only one multiplier on the output of the
multiplexer as shown in Figure 1-1 on page 26.

Example 1-1 HDL description of a multiplexer

Verilog
module sharing_example (a, b, c, d, cond, y);
parameter w = 16;
input [w-1:0] a, b, c, d;
input cond;
output [w*2-1:0] y;
wire [w*2-1:0] a_times_b = a * b;
wire [w*2-1:0] c_times_d = c * d;
assign y = cond ? a_times_b : c_times_d;
endmodule

VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity sharing_example is
generic (w : natural := 16);
port ( a, b, c, d : in unsigned (w-1 downto 0);
cond : in std_logic;
y : out unsigned (w*2-1 downto 0) );
end sharing_example;
architecture rtl of sharing_example is
signal a_times_b, c_times_d : unsigned (w*2-1 downto 0);
begin
a_times_b <= a * b;
c_times_d <= c * d;
y <= a_times_b when (cond = 1) else c_times_d;
end rtl;

March 2012 25 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Figure 1-1 Sharing

a b c d a c b d

* * Cond MUX MUX

Cond MUX

*
out
out
Implementation of HDL without sharing Implementation of HDL after sharing optimization

March 2012 26 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Controlling Sharing Transformations


While sharing transformations are performed implicitly, you can manually control the sharing
transformation process using the following attributes.
Enable sharing transformations using
synthesize -to_generic -effort high

To turn off RTL sharing transformations, use the dp_sharing root attribute.
rc:/> set_attribute dp_sharing none /

Default: basic
To turn off sharing transformations within the design, use the dp_sharing design
attribute.
rc:/> set_attribute dp_sharing none [find /designs* -design name]

Default: inherited
To turn off sharing transformations within a particular subdesign, use the dp_sharing
attribute.
rc:/> set_attribute dp_sharing none [find /designs* -subdesign name]

Default: inherited

See Report Datapath for RTL Sharing on page 60 for an example report.

March 2012 27 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Speculation Transformations
If the select pin of a multiplexer (mux) is on the critical path, and the mux drives a datapath
component, then the operation at the output side of the mux can be duplicated at the input
side of the mux. This unsharing process speeds up the critical path.

Figure 1-2 on page 29 shows the select line of the multiplexer on the critical path. The adder
at the output side of the mux is duplicated and moved toward the input side of the mux,
thereby removing it from the critical path.

Example 1-2 HDL description


Verilog
module ex3 (a, b, c, cond, y);
parameter w = 16;
input [w-1:0] a, b, c;
input cond;
output [w-1:0] y;
wire [w-1:0] a_mux_b = cond ? a : b;
assign y = a_mux_b + c;
endmodule

VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity ex3 is
generic (w : natural := 16);
port ( a, b, c : in unsigned (w-1 downto 0);
cond : in std_logic;
y : out unsigned (w-1 downto 0) );
end ex3;
architecture rtl of ex3 is
signal a_mux_b : unsigned (w-1 downto 0);
begin
a_mux_b <= a when (cond = 1) else b;
y <= a_mux_b + c;
end rtl;

March 2012 28 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Figure 1-2 Speculation (Unsharing)

a c c b
a b

+ +
MUX
c
b

MUX

+
Critical Path

March 2012 29 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Controlling Speculation Transformations


While speculation transformations are performed implicitly, you can manually control the
speculation transformation process using the following attributes.
Enable sharing transformations using the following command:
synthesize -to_generic -effort high

To generally turn off RTL speculation transformations use the dp_speculation root
attribute:
rc:/> set_attribute dp_speculation_ none /

Default: basic
To turn off RTL speculation transformations for a design use the dp_speculation
design attribute.
rc:/> set_attribute dp_speculation none [find /designs* -design name]

Default: inherited
To turn off speculation transformations within a particular subdesign use the
dp_speculation subdesign attribute.
rc:/> set_attribute dp_speculation none [find /designs* -subdesign name]

Default: inherited

March 2012 30 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Carry-Save Adder Transformations


The guiding principle of CSA transformation is to preserve functionality. RTL Compiler
automatically applies Carry-save adder (CSA) transformations which reduce the number
of carry-propagate adders in the design to improve timing and area. It may keep some internal
signals in carrysave form.

More specifically, CSA transformation is applicable to datapath operators whose isolated


implementation includes a carry propagate adder. This includes all arithmetic and relational
operators. When the output of one such operator feeds into another datapath operator, it
becomes a candidate for CSA transformation.

A fundamental characteristic of CSA transformation is that the transformed arithmetic implies


full-precision at intermediary signals. Inside the datapath portion of a design, an intermediary
signal comes from an upstream datapath operator and goes into one or more downstream
datapath operators. If an intermediary signal is truncated before flowing downstream, then
CSA transformations may distort the overall functionality.

When a set of datapath operators is identified as a candidate for CSA transformation, RTL
Compiler looks at each intermediary signal between them and examines whether it is
equipped with an appropriate bit range to carry the needed precision of its computation. Any
disqualified intermediary signal becomes the boundary of CSA transformation.

A truncation may or may not block transformation. In the process of qualifying an intermediary
signal for CSA transformation, RTL Compiler looks into the intention behind the RTL code and
analyzes the information content it needs to carry. The qualifying or disqualifying decision is
based on the required precision, instead of the HDL-implied precision suggested by its source
operator alone.

CSA transformation does not span across hierarchical boundaries. The output ports of a
module are never made in carrysave form. CSA transformation does not span across clock
cycles, and a signal feeding a register is never made in carrysave form.

A carry-save adder (CSA) takes three numbers and produces two outputs, one formed with
the sums and the other with the carryouts.

The most straightforward way to add up a set of numbers is to employ an adder tree. Each
adder consumes two numbers and produces one. The adder at the root of the tree generates
the final sum.

Use the carrysave transformation technique to greatly improve both timing and area.
Figure 1-3 on page 32 shows the carrysave transformation technique.

March 2012 31 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Figure 1-3 Carry-save Transformation on the Critical Path

Critical Path Critical Path

a b c d e f a b c d e f

+ + +


+

+
+
y
y
(a) Carry-propagate (b) Carrysave

Diagram (a) shows three carry propagate adders on the critical path.

Diagram (b) has only one carry-propagate adder on the critical path and shows how special
carrysave blocks can be used to perform the same carry-propagate addition. By taking in
three input numbers and generating two output numbers, the carrysave block adds up three
numbers without resolving the carry propagation. At the end, the only two remaining numbers
are said to be the sum in a carrysave form. A carry-propagate adder is needed to add the two
numbers to produce the final sum.

The sign represents a carry-save adder (CSA). The carrysave block does not incur the
delay of carry propagation. Its delay is small and is independent of the width of the operands.

This carrysave concept is applicable in various scenarios, such as the vector sum, which
adds up a set of numbers, or the Wallace tree, which adds up partial products inside of a
multiplier.

Timing analysis on a multiplier or a vector sum often identifies the carry-propagate adder as
a significant portion of the critical path. Therefore, employing a technique that merges
arithmetic operators can significantly improve timing and area. Figure 1-4 on page 33 shows
how this works on a block computing y = a * b + c * d.

March 2012 32 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Figure 1-4 Carry-save Transformation

a b c d a b c d

*
(+) *
(+)
a*b+c*d

+
y
y
(a) (b)

Figure 1-4 (a) shows an implementation using discrete operators, that is, without any CSA
transformations. It takes two multipliers and one adder to implement this function. The adders
are used to implement the multipliers, as indicated with the (+) in the diagram.

Traditionally, the synthesis tool labors to optimize each of these discrete operators
individually, without taking into account how they interact with each other. Each of these
operators have a carry-propagate adder; therefore, there are two carry-propagate adders that
land on the critical path.

Figure 1-4 (b) shows an implementation after CSA transformations. RTL Compiler looks at
the design at the operator level and recognizes that this is a cluster of arithmetic operators
that can be merged. Instead of implementing three discrete components, RTL Compiler
merges them as one larger complex operator, optimizing the entire merged operator. By
doing so, there is only one carry-propagate adder on the critical path.

The merged operator is no longer a multiplier or an adder. It is a complex operator computing


a * b + c * d.

March 2012 33 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Controlling CSA Transformations


While CSA transformations are performed implicitly, you can manually control the CSA
transformation process using the following attributes.
To generally turn off carry-save transformations, use the dp_csa root attribute.
rc:/> set_attribute dp_csa none /

Default: basic
To turn off carry-save transformations within a design, use the dp_csa design attribute.
rc:/> set_attribute dp_csa none [find /designs* -subdesign name]

Default: inherited
To turn off carry-save transformations within a particular subdesign, use the dp_csa
subdesign attribute.
rc:/> set_attribute dp_csa none [find /designs* -subdesign name]

Default: inherited

Typical Transformation Scenarios


If one mergeable operator has a multiple fanout, and one or more of the downstream
operators are mergeable operators, then this upstream operator can be merged with each
individual downstream mergeable operator.

For each merging candidate of any scenario, merging may or may not take place, depending
on whether the original functionality can be preserved.

The purpose of merging operators is to eliminate intermediary carry propagate adders. This
can be applied to a set of datapath operators interacting with each other. The most typical
scenarios are vector sum, multiply-add, sum of product, or a combination of these. Datapath
operators are transformed in the following scenarios.
Vector Sum

a + b + c

Multiply-Add

a * b + c

Sum-of-Product

a * b + c * d

March 2012 34 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Combination of Vector Sum, Multiply-Add, Sum-of-Product

a * b + c * d + e - f

Merging is not limited to operators inferred in the same HDL statement. As shown in
Example 1-3 on page 35, both of the HDL code segments implement signal y using a single
merged operator.

Example 1-3 Merging Across HDL Statements


Operators across HDL statements
Verilog

p = a * b;
q = c * d;
y = p + q;

VHDL

p <= a * b;
q <= c * d;
y <= p + q;

Operators in the same HDL statement


Verilog

y = a * b + c * d;

VHDL

y <= a * b + c * d;

March 2012 35 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Product-of-Sum CSA

A product of sums expression (P0S) is the production (multiplication) of summations. A


scenario not as commonly seen is product-of-sum, (a+b)*c, as shown in Example 1-4 on
page 36.

Example 1-4 Merging Product-of-Sum

Verilog
wire [13:0] a, b, c, d;
wire [15:0] cs, p;
wire [31:0] q;
wire [32:0] y;

assign cs = a + b + c + d;
assign y = cs * p + q;

VHDL
signal a, b, c, d : unsigned (13 downto 0);
signal cs, p : unsigned (15 downto 0);
signal q : unsigned (31 downto 0);
signal y : unsigned (32 downto 0);

cs <= resize(a,16) + resize(b,16) + resize(c,16) + resize(d,16);


y <= resize(cs * p, 33) + resize(q,33);

In this case, the internal signal (cs) is kept in carrysave form and the downstream multiplier
is equipped with a special architecture that takes care of inputs in carrysave form. This
effectively merges the upstream adders with the downstream multiplier. There is only one
carry propagate adder from each input to output.

Comparator

Another common CSA transformation scenario is the comparison of the result of arithmetic
expressions, as shown in Example 1-5 on page 37. Both p and q in the example are in
carrysave format.

Other transformation scenarios involving comparators include comparing one or two


incidences of sum-of-product, product-of-sum, or a combination of vector sum, sum-of-
product, and product-of-sum.

March 2012 36 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Example 1-5 Merging a Comparator with Arithmetic Operators

Verilog
wire [15:0] a, b, c, d;
wire [16:0] p, q;
wire is_greater, is_equal;
assign p = a + b;
assign q = c + d;
assign is_greater = (p > q);
assign is_equal = (p == q);

VHDL
signal a, b, c, d : unsigned (15 downto 0);
signal p, q : unsigned (16 downto 0);
signal is_greater, is_equal : std_logic;
p <= resize(a,17) + resize(b,17);
q <= resize(c,17) + resize(d,17);
is_greater <= 1 when (p > q) else 0;
is_equal <= 1 when (p = q) else 0;

Multiple-Fanout

A more challenging scenario is when one mergeable operator feeds into more than one
operator downstream, as shown in Example 1-6 on page 37, where the one multiplier needs
to be merged with each of the two downstream adders.

Example 1-6 Merging One Upstream Operator to Multiple Downstream Operators

Verilog
cs = a * b;
x = cs + c;
y = cs + d;

VHDL
cs <= a * b;
x <= cs + c;
y <= cs + d;

In this case, the internal signal (cs) is kept in carrysave form and the downstream clusters (x
and y) add them up. This effectively merges the upstream multiplier with each of the two
downstream adders. There is only one carry propagate adder from input to each output.

March 2012 37 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

CSA over a Multiplexer

Example 1-7 on page 38 shows a transformation scenario involving CSA over a multiplexer.
The tmp signal is in carrysave format.

Example 1-7 Transforming CSA over a Multiplexer

Verilog
module test(a, b, c, d, s, e, z);
input [7:0] a, b, c, d;
input [14:0] e;
input s;
output [17:0] z;
wire [16:0] tmp;
assign tmp = s ? a*b + c*d : a*c + b*d;
assign z = tmp + e;
endmodule

VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity test is
port ( a, b, c, d : in unsigned (7 downto 0);
s : in std_logic;
e : in unsigned (14 downto 0);
z : out unsigned (17 downto 0) );
end test;
architecture rtl of test is
signal tmp : unsigned (16 downto 0);
begin
tmp <= (resize(a*b,17) + resize(c*d,17)) when (s = 1)
else (resize(a*c,17) + resize(b*d,17));
z <= resize(tmp,18) + resize(e,18);
end rtl;

CSA over an Inverter

Example 1-8 on page 38 shows a transformation scenario involving CSA over an inverter.
The tmp signal is in carrysave format.

Example 1-8 Transforming CSA over an Inverter

Verilog
module test (a, b, c, z);
input [15:0] a, b, c;
output [16:0] z;
wire [16:0] tmp = a + b;
assign z = ~tmp + c;
endmodule

March 2012 38 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

VHDL
library ieee;
use ieee.numeric_std.all;
entity test is
port ( a, b, c : in unsigned (15 downto 0);
z : out unsigned (16 downto 0) );
end test;
architecture rtl of test is
signal tmp : unsigned (16 downto 0);
begin
tmp <= resize(a,17) + resize(b,17);
z <= (not tmp) + c;
end rtl;

Truncated CSA

Example 1-9 on page 39 shows a special CSA case known as a truncated CSA. The r signal
in the example will be transformed.

Example 1-9 Arithmetic with Lower-Bit Truncation, Truncation after Addition

Verilog
wire [15:0] a, b, c, d;
wire [16:0] p, q;
wire [17:0] r;
wire [15:0] y;
assign p = a + b;
assign q = c + d;
assign r = p + q;
assign y = r[17:2]; // the three operators are merged

VHDL
signal a, b, c, d : unsigned (15 downto 0);
signal p, q : unsigned (16 downto 0);
signal r : unsigned (17 downto 0);
signal y : unsigned (15 downto 0);

p <= resize(a,17) + resize(b,17);


q <= resize(c,17) + resize(d,17);
r <= resize(p,18) + resize(q,18);
y <= r(17 downto 2); -- the three operators are merged

March 2012 39 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Preventing CSA Transformations


There are various occasions in which the design has multiple datapath operators with no
truncation, but RTL Compiler does not merge them. The following sections are some typical
examples where datapath operators cannot be merged.

Non-Inferred, Instantiated Operators

CSA transformations work on inferred operators, but not on instantiated ones, as shown in
Example 1-11 on page 41. There are two reasons for this:
An instantiated module is a user-defined design hierarchy that RTL Compiler must honor.
RTL Compiler is not allowed to automatically dissolve a user-defined module for the
purpose of CSA transformations, as shown in Example 1-10 on page 40.
RTL Compiler does not guess the arithmetic functionality of a user-defined module by its
module name, its input or output port names, or its input or output bit widths.

Example 1-10 Instantiated Operators Cannot Be Merged

Verilog
module test (a, b, c, d, y);
input [7:0] a, b, c, d;
wire [15:0] p, q;
output [15:0] y;

CW_mult #(8, 8) u1 (.A(a), .B(b), .TC(1b0), .Z(p));


CW_mult #(8, 8) u1 (.A(c), .B(d), .TC(1b0), .Z(q));
CW_sum #(2, 16, 16) u3 (.A({p, q}), .TC(1b0), .Z(y));
endmodule

VHDL
library ieee, cw;
use ieee.std_logic_1164.all;
use cw.components.all;
entity test is
port ( a, b, c, d : in std_logic_vector (7 downto 0);
y : out std_logic_vector (15 downto 0) );
end test;
architecture rtl of test is
signal p, q : std_logic_vector (15 downto 0);
begin
u1 : CW_mult generic map (wA => 8, wB => 8)
port map (A => a, B => b, TC => 0, Z => p);
u2 : CW_mult generic map (wA => 8, wB => 8)
port map (A => c, B => d, TC => 0, Z => q);
u3 : CW_sum generic map (nI => 2, wAi => 16, wZ => 16)
port map (A => (p & q), TC => 0, Z => y);
end rtl;

March 2012 40 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Example 1-11 Inferred Operators Can Be Merged

Verilog
module test (a, b, c, d, y);
input [7:0] a, b, c, d;
wire [15:0] p, q;
output [15:0] y;

assign p = a * b;
assign q = c * d;
assign y = p + q;
endmodule

VHDL
library ieee;
use ieee.numeric_std.all;
entity test is
port ( a, b, c, d : in unsigned (7 downto 0);
y : out unsigned (15 downto 0) );
end test;
architecture rtl of test is
signal p, q : unsigned (15 downto 0);
begin
p <= a * b;
q <= c * d;
y <= p + q;
end rtl;

Non-Inferred, Gate-Level Netlist

CSA transformation works on inferred operators. RTL Compiler knows exactly what they do
and how they can be merged. RTL Compiler cannot merge operators represented by
imported gate-level netlists for the following two reasons. With an instantiated component,
1. RTL Compiler must respect the user- defined design hierarchy
2. RTL Compiler does not guess the functionality of a user-defined module and does not
try to reverse-engineer the hidden functionality in a gate-level representation. For
example, in Example 1-12 on page 41, RTL Compiler cannot determine whether the
numbers are being added or not.

Example 1-12 Gate-Level Netlist Cannot Be Merged

Verilog
module add8 (y, a, b);
input [7:0] a, b;
output [7:0] y;
wire n1, n2, n3, n4, n5, n6, n7;
HA1 i0 (.A(a[0]), .B(b[0]), .S(y[0]), .CO(n1));
FA1A i1 (.A(a[1]), .B(b[1]), .CI(n1), .S(y[1]), .CO(n2));
FA1A i2 (.A(a[2]), .B(b[2]), .CI(n2), .S(y[2]), .CO(n3));

March 2012 41 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

FA1A i3 (.A(a[3]), .B(b[3]), .CI(n3), .S(y[3]), .CO(n4));


FA1A i4 (.A(a[4]), .B(b[4]), .CI(n4), .S(y[4]), .CO(n5));
FA1A i5 (.A(a[5]), .B(b[5]), .CI(n5), .S(y[5]), .CO(n6));
FA1A i6 (.A(a[6]), .B(b[6]), .CI(n6), .S(y[6]), .CO(n7));
EO3 i7 (.A(a[7]), .B(b[7]), .C(n7), .Z(y[7]));
endmodule
module test (y, a, b, c);
input [7:0] a, b, c;
wire [7:0] p;
output [7:0] y;
// assign y = a + b + c;
add8 u0 (.a(a), .b(b), .y(p));
add8 u1 (.a(p), .b(c), .y(y));
endmodule

VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity add8 is
port ( y : out unsigned (7 downto 0);
a, b : in unsigned (7 downto 0) );
end add8;
architecture rtl of add8 is
component HA1 port ( A, B : in std_logic; S, CO : out std_logic );
end component;
component FA1A port ( A, B, CI : in std_logic; S, CO : out std_logic );
end component;
component EO3 port ( A, B, C : in std_logic; Z : out std_logic );
end component;
signal n1, n2, n3, n4, n5, n6, n7 : std_logic;
begin
i0 : HA1 port map ( A => a(0), B => b(0), S => y(0), CO => n1);
i1 : FA1A port map ( A => a(1), B => b(1), CI => n1, S => y(1), CO => n2);
i2 : FA1A port map ( A => a(2), B => b(2), CI => n2, S => y(2), CO => n3);
i3 : FA1A port map ( A => a(3), B => b(3), CI => n3, S => y(3), CO => n4);
i4 : FA1A port map ( A => a(4), B => b(4), CI => n4, S => y(4), CO => n5);
i5 : FA1A port map ( A => a(5), B => b(5), CI => n5, S => y(5), CO => n6);
i6 : FA1A port map ( A => a(6), B => b(6), CI => n6, S => y(6), CO => n7);
i7 : EO3 port map ( A => a(7), B => b(7), C => n7, Z => y(7) );
end rtl;
library ieee;
use ieee.numeric_std.all;
entity test is
port ( y : out unsigned (7 downto 0);
a, b, c : in unsigned (7 downto 0) );
end test;
architecture rtl of test is
component add8
port ( y : out unsigned (7 downto 0);
a, b : in unsigned (7 downto 0) );
end component;
signal p : unsigned (7 downto 0);
begin
-- y <= a + b + c;
u0 : add8 port map (a => a, b => b, y => p);
u1 : add8 port map (a => p, b => c, y => y);
end rtl;

March 2012 42 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Non-Interacting Datapath Operators

Often there is RTL code that has multiple datapath operators, but RTL Compiler concludes
that they cannot be merged because these operators do not interact with each other, as
shown by the RTL code in Example 1-13 on page 43.

In Example 1-13 on page 43, the operators come from the same source and their outputs go
into the same mux, although different pins. These operators are not interacting with each
other.

Example 1-13 Non-Interacting Operators Cannot Be Merged


Verilog
case test (code)
2b00 : y = a + b;
2b01 : y = a - b;
2b10 : y = a * b;
default : y = a + b;
endcase

VHDL
signal mode : unsigned (1 downto 0);
signal a, b, y : unsigned (15 downto 0);
variable p : unsigned (31 downto 0);
case mode is
when "00" => y <= a + b;
when "01" => y <= a - b;
when "10" => p := a * b; y <= p(15 downto 0);
when others => y <= a + b;
end case;

March 2012 43 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Architecture Selection
The best architecture for a datapath operator is a function of the design constraints,
technology library, and its surrounding logic. The choice should not be uniform among all
operators since each operator has its own environment. Whether done manually or
automatically through a synthesis tool, architecture selection performs accurate timing
analysis and makes precise decisions based on the delay and area calculations.

For each operator in the design, whether it is merged or discrete, RTL Compiler selects the
best architecture, also known as speed grade. There can be multiple architectures or speed
grades for datapath operators, such as addition, subtraction, multiplication, and division.
Furthermore, the implementation of the selected architecture is refined to improve the overall
quality of results. These architecture selection and implementation refinement decisions are
a function of timing constraints, surrounding control logic, and the target technology library.

Context-Driven Architecture Selection


Part of the criteria affecting implementation selection is the design context. For example, if
there is a constant multiplier or a square, then RTL Compiler automatically performs
appropriate optimizations (shift-and-add type optimization). RTL Compiler will not
implement a full-blown multiplier as a starting point and use constant propagation to optimize
it. Another example is the partial product encoding scheme inside of a multiplier. RTL
Compiler chooses between the booth, non-booth and radix8 architectures based on the size
and signedness of the operands and the library.

Target Library-Based Architecture Selection


Architecture selection is also affected by the target library. During the set up stage, RTL
Compiler develops its architecture preference based on what cells are available in the library,
as well as the timing and area characteristics of those cells. For example, the choice between
booth and non-booth encoding schemes.

March 2012 44 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Timing-Driven Architecture Selection


If an operator sits on the critical path, then you want a fast architecture. If an operator sits off
the critical path, then you want a small architecture that meets timing. RTL Compiler always
tries to honor timing with the smallest possible area.

Figure 1-5 is a simplified scenario of timing-driven architecture selection. The adder at the
upper half of the figure is more timing critical and may be implemented using a fast
architecture. The adder at the lower half of the figure has more slack and can be implemented
using a slower architecture. RTL Compiler evaluates the situation and may change the
architecture if it helps timing or area.

Figure 1-5 Timing-Driven Architecture Selection

fast
critical path
+

+
slow
*

Dynamic Datapath Block Generation


Generation of the datapath block happens dynamically. All of these architecture selection and
implementation selection procedures occur during optimization. The actual implementation
of a datapath block may change depending on the surrounding logic. There is no built-in static
architecture or implementation and there is no simplified assumption about the surrounding
timing profile.

March 2012 45 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Controlling Architecture Selection


Explicitly control the datapath architecture selection process, (fix a datapath component
to a particular speed grade) by specifying a valid implementation with the following
attribute:
rc:/> set_attribute user_speed_grade \
{very_slow | slow | medium | fast | very_fast} [find /designs* -subdesign name]

Note: If you choose to explicitly control this process through the user_speed_grade
attribute, then you must do so after using the synthesize -to_generic command.
Otherwise, RTL Compiler will ignore the user specified speed grade and implement an
architecture that may or may not coincide with the specified speed grade.

The number of architectures available to a particular datapath operator is usually less than
five. Therefore, in some cases, RTL Compiler will map different speed grades to the same
architecture. For example, the very_slow and slow speed grades may actually be
implemented with the same architecture.
Find out the architecture of a particular datapath component using the speed_grade
attribute. For example:
rc:/> get_attribute speed_grade [find /designs* -subdesign name]

The speed_grade attribute is a read-only attribute that returns a string whose value is
either very_fast, fast, medium, slow, or very_slow.

Controlling Sub-Architecture Selection


Some of the datapath components, such as multipliers and dividers can have multiple sub-
architectures. In RTL Compiler a multiplier can have one of three different sub-architectures:
booth , non_booth and radix8. In its simplest form, a partial product is the multiplicand
multiplied by one of the bits in the multiplier. Booth encoding is one way to implement the
partial product generator. Booth encoding looks at multiple bits in the multiplier while
generating each partial product, which leads to a smaller number of partial products.
However, the partial product generator becomes bigger and slower to minimize the number
of partial products. Radix8 is another encoding scheme to implement the partial product
generator. It further reduces the number of partial products generated. Radix8 encoding will
make the multiplier slower and smaller depending on the width of the multiplicand and the
multiplier.

Booth encoding may make the multiplier faster, smaller, or both depending on the width of the
multiplicand, the multiplier, and the underlying technology library.

March 2012 46 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

To explicitly control the sub-architecture of a multiplier, use one of following two methods:
Specify a valid implementation using the user_sub_arch attribute.
For example:
set_attribute user_sub_arch {booth | non_booth | radix8} \
[find /designs* -subdesign name]

Use the sub_arch pragma in the RTL. Examples:


assign Z = a * /* cadence sub_arch booth */ b;
assign Z = a * /* cadence sub_arch non_booth */ b;
assign Z = a * /* cadence sub_arch radix8 */ b;

If any string other than booth, non_booth or radix8 is provided as the sub-architecture,
then that value is ignored and a warning message is issued.

Once you have set the sub_arch of a multiplier, the rest of the RTL Compiler flow will honor
the setting.
Note: In a CSA_tree group, different participating multipliers can have different sub_arch
pragma settings. For example:
assign Z = a * /* cadence sub_arch booth */ b +
c * /* cadence sub_arch non_booth/ * d;

Find out the sub-architecture of a particular multiplier component using the sub_arch
attribute. For example:
rc:/> get_attribute sub_arch [find /designs* -subdesign name]

The sub_arch attribute returns a string whose value is either booth , non_booth or
radix8.

March 2012 47 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Optimization

Improving Quality of Results (QoR)


You can potentially improve QoR by using the following attributes.
Perform architecture downsizing after mapping by setting the dp_postmap_downsize
attribute to true. For example:
rc:/> set_attribute dp_postmap_downsize true /

Default: false

Using this attribute accurately downsizes a datapath component without degrading timing.
However, there is a potential increase in run-time.
Note: Using this attribute is only effective during incremental optimization. This
operation is not performed if the synthesize -to_mapped -no_incremental
command is set.
Perform architecture upsizing after mapping by setting the dp_postmap_upsize
attribute to true. For example:
rc:/> set_attribute dp_postmap_upsize true /

Default: false
Using this attribute can improve slack when a datapath component may be affecting the
critical path. Using this attribute improves timing but may result in a longer run-time.
Note: Using this attribute is only effective during incremental optimization. This operation is
not performed if the synthesize -to_mapped -no_incremental command is set.

March 2012 48 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

2
Datapath Reporting

Overview of Datapath Reporting on page 50


Command Syntax on page 50
Interpreting the Datapath Report on page 53
Module and Instance Names on page 57
Operator Type and Signed Type on page 58
Architecture on page 58
Width of Input and Output Operands on page 58
Area on page 59
File Name, Line Number, and Column Number on page 59
Report Datapath for RTL Sharing on page 60
Using the report datapath Command at Different Stages in the Design Flow on page 61
elaboration Design Stage on page 61
synthesize -to_generic Design Stage on page 61
synthesize -to_mapped Design Stage on page 61

March 2012 49 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

Overview of Datapath Reporting


Use the report datapath command, which is available immediately after elaboration, to
Identify datapath operators
Examine how Carrysave Arithmetic (CSA) transformations are applied
Examine the selected architectures
Examine the datapath area

Command Syntax
Report datapath operators that were inferred from the design using the following
command:
report datapath [-full_path] [-no_header] [-no_area_statistics] [-mux] [-all]
[design] [-max_width string] [-print_instantiated] [-print_inferred]
[> file]

-all reports all the datapath operators present in the design including muxes, as shown
in Example 2-5.
Note: The mux operators are different from the MUX library cells that are picked by the
mapper or are available in the technology library.
design specifies a particular design on which to report datapath operators. By default,
RTL Compiler reports on the current design.
file specifies the name of the file to write the report.
-full_path reports the full UNIX path and name of the filename. By default, RTL
Compiler only reports the design name.
Using the -full_path option is useful when the HDL source files being synthesized
are located in directory/directories, which is different from the directory where
RTL Compiler is being executed.
By default the report datapath command does not print the full path of the file, it only
prints the filename. For example, if RTL Compiler is run in the following directory:
/home/somebody/report_example

and sources the example.v file from the following directory:


/home/somebody/report_example/src_files

then by default, the report datapath command reports the example.v file in the
Filename column of the report.

March 2012 50 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

If you use the -full_path option with the report datapath command, then the full
path of the file is displayed, which is the name of the file provided to the read_hdl
command. For example, the following is the relative path used with the read_hdl
command.
exec dir : /home/somebody/report_example
file dir : /home/somebody/report_example/src_files

If you source the example.v file as follows:


read_hdl src_files/example.v

Then, when you use the -full_path option, the path of the source file relative to the
current directory is reported. For example:
src_files/example.v

-max_width string specifies the maximum width of individual columns. By default,


the maximum width for a column is 20. If a name is more than 20 characters, then it will
wrap to the next line.
The valid column names are Operator, Signedness, Inputs, Outputs, Cell, Area,
Line, Col, and Filename.
The following command specifies a 30-character width for the Filename column and
specifies a 0-character width for the Area column:
report datapath -max_width {{filename 30} {area 0}}

-no_header suppresses the header information. shown in Example 2-1.

Example 2-1 report datapath Header Information


============================================================
Generated by: RTL Compiler Version
Generated on: Date
Module: cpu
Technology library: tutorial 1.0
Operating conditions: typical_case (balanced_tree)
Wireload mode: enclosed
============================================================

-no_area_statistics suppresses the table that shows the total area and
percentage information. The area and the percentage of the total area consumed by the
datapath operators in the design are only available after issuing the
synthesize -to_mapped command.
By default, when using the report datapath command on a mapped netlist containing
datapath operators, you will get the area statistics of the design, as shown in
Example 2-2.

March 2012 51 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

Example 2-2 Area Statistics

------------------------------------
datapath modules 4938.00 100.00
mux modules 0.00 0.00
others 0.00 0.00
------------------------------------
total 4938.00 100.00

This information is useful in determining the percentage of the design that contains
datapath operators.
By default, the area report is suppressed for a netlist that contains only generic cells (no
library cells).
-mux reports muxes present in the design, as shown in Example 2-8. Muxes are not
reported by default. Using the -mux option only displays the muxes in the design and
suppresses the other datapath operators. To view both, use the -all option.
-print_inferred reports only the inferred datapath components in the design.
-print_instantiated reports only the instantiated datapath components in the
design.

March 2012 52 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

Interpreting the Datapath Report


The following examples take a design (Example 2-3) through synthesis, then reports the
datapath operators through the report datapath command.

Example 2-3 Design with Datapath Operators


module tst (a, b, c, x, y);
input [11:0] a, b, c;
output [23:0] x, y;
CW_multadd #(12, 12, 12, 24) u2 (a, b, c, 1b0, x);
assign y = b * c + a;
endmodule

Note: As Example 2-4 shows, set the hdl_track_filename_row_col attribute to true


to enable filename, column, and line number tracking in the datapath report, shown in
Example 2-5 on page 55.

March 2012 53 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

Example 2-4 Tcl Script


set_attribute hdl_track_filename_row_col true
set_attribute library tutorial.lbr
read_hdl ex1.v
elaborate
report datapath -max_width {
{Module 12}
{Instance 6}
{Operator 3}
{Signedness 5}
{Architecture 2}
{Inputs 5}
{Outputs 3}
{CellArea 7}
{Line 4}
{Col 3}
{Filename 8}
}
synthesize -to_generic
report datapath -max_width {
{Module 12}
{Instance 6}
{Operator 3}
{Signedness 5}
{Architecture 2}
{Inputs 5}
{Outputs 3}
{CellArea 7}
{Line 4}
{Col 3}
{Filename 8}
}

March 2012 54 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

Example 2-5 Results of report datapath After Using the Elaborate Command
Instantiated datapath components

Ope Signe Input Out CellAre Line Col Filename


rat dness s put a
or s
=================================================================
tst
u2
module:CW__CW_multadd__builtin_wA12_wB12_wC12_wZ24
CW/CW_multadd/builtin
n/a n/a 12x12 24 2349.00 4 36x ex1.v
x12x1
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
mul_1_19
very_fast/booth * signe 13x13 24 1763.25
d
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
add_1_24
very_fast + signe 24x13 24 581.25
d
=================================================================

Inferred components

Ope Signe Input Out CellAre Line Col Filename


rat dness s put a
or s
=================================================================
tst
mul_5_19
module:mult_unsigned
very_fast/non_booth
* unsig 12x12 24 1762.50 5 19x ex1.v
ned
=================================================================
tst
add_5_23
module:add_unsigned
very_fast + unsig 24x12 24 414.00 5 23x ex1.v
ned
=================================================================

Type CellArea Percentage


-------------------------------------
datapath modules 4525.50 100.00
external muxes 0.00 0.00
others 0.00 0.00
-------------------------------------
total 4525.50 100.00

Example 2-6 on page 56 shows the report datapath command after using the
synthesize -to_generic command.

March 2012 55 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

Example 2-6 Report datapath Results after Using synthesize -to_generic


Instantiated datapath components
Ope Signe Input Out CellAre Line Col Filename
rat dness s put a
or s
===========================================================
tst
u2
module:CW__CW_multadd__builtin_wA12_wB12_wC12_wZ24
CW/CW_multadd/builtin
n/a n/a 12x12 24 1914.00 4 36x ex1.v
x12x1
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
csa_tree_add_1_24
wal 13x13 24x 1321.50
lac x13 24x
e 1
-----------------------------------------------------------
add_1_24 + signe 24x13 24 1 24x CW_multa
d dd.v
mul_1_19 * signe 13x13 24 1 19x CW_multa
d dd.v
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
final_adder_add_1_24
very_fast + signe 24x24 24 588.00
d x1
===========================================================
Inferred components
Ope Signe Input Out CellAre Line Col Filename
rat dness s put a
or s
===========================================================
tst
final_adder_add_5_23
module:add_unsigned_carry
very_fast + unsig 24x24 24 586.50 5 19x ex1.v
ned x1
===========================================================
tst
csa_tree_add_5_23
module:csa_tree_167
wal 12x12 24x 1321.50
lac x12 24x
e 1
-----------------------------------------------------------
add_5_23 + unsig 24x12 24 5 23x ex1.v
ned
mul_5_19
non_booth * unsig 12x12 24 5 19x ex1.v
ned
===========================================================
Type CellArea Percentage
-------------------------------------
datapath modules 3822.00 100.00
external muxes 0.00 0.00
others 0.00 0.00
-------------------------------------
total 3822.00 100.00

March 2012 56 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

The following section briefly describes the column headings in the report datapath
command report.
Module Name of the datapath partition
Instance Instance name of the datapath partition
Operator Operator type of the individual datapath operator
Signedness Sign type of the individual datapath operator
Architecture Selected architecture of the individual datapath operator
Note: The no_value listed in the report means that the CSA tree contains partial
product generators of two multipliers and a carrysave tree of a*b*c+d. The partial
product generators has two sub-architectures, booth and non-booth. An adder does not
have a sub-architecture choice. Therefore, no_value indicates that there is not a sub-
architecture.
Inputs Bit-width of input operands of the individual datapath operator
Outputs Bit-width of output operands of the individual datapath operator
Area Area consumed by the datapath partition
Line Line number in the RTL code where the operator is inferred
Col Column number in the RTL code where the operator is inferred
Filename File name of the RTL code that infers these operators

The report regarding the datapath modules is followed by a datapath area summary (under
the headings: Type, Area, and Percentage) that includes the entire design. This second part
of the report shows the percentage of area consumed by all the datapath operators in the
entire design.

The following sections describe the column headings of the datapath report in greater detail.

Module and Instance Names


Multiple datapath operators can be grouped together into one hierarchical instance for the
following two scenarios:
When a discrete datapath operator does not have any automatic Carrysave Arithmetic
(CSA) transformation opportunities
For a group of datapath operators that will be merged as a CSA tree

March 2012 57 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

An artificial level of design hierarchy is created for each partition. The Module and Instance
column show its module name and instance name, respectively. RTL Compiler generates the
module and instances names. They cannot be predetermined.

Operator Type and Signed Type


Each row of the datapath report is dedicated to a specific datapath operator in a datapath
partition. The Operator column shows the operator type while the Signedness column
shows the sign type of the datapath operator. The sign type can be either signed or
unsigned.

If the datapath operator is multiply (*), add (+), or subtract (-), then its Verilog symbol is
shown.

The ++ symbol indicates the incremented operator.

If a datapath partition has multiple operators, then these operators form a CSA tree, which
has only one carry-propagate adder on the timing path through it. Examining how operators
are grouped into partitions reveals how the carrysave transformation is applied.

CSA trees are shown as wallace in the Operator column.

Architecture
The Architecture column shows the selected architecture of an individual datapath operator.
Before using the synthesize -to_mapped command, the information in this column is
only intermediate, or temporary, because RTL Compiler has not yet chosen the optimal
architecture. Only after using the synthesize -to_mapped command will the final
implemented architecture appear.

Width of Input and Output Operands


A datapath operator has one or more input operands, plus one or more output operands. The
Inputs column shows the bit width of its input operands and the Outputs shows the bit-width
of its output operands.

There is only one input operand if the operator is a unary minus. There are three input
operands if the operator is, for example, an addition with a one-bit carry-in. There are two
output operands if the output is in carrysave format.

Multiple operands are separated by the x character in the table.

March 2012 58 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

Area
The Area column shows the area occupied by a datapath partition. It is only available after
using the synthesize -to_mapped command.

The second part of the datapath report, under the Type, Area, and Percentage headings
summarizes the percentage of area taken by the datapath partition in the design. These
numbers are not meaningful until you have used the synthesize -to_mapped command.

File Name, Line Number, and Column Number


Datapath operators are inferred from the design and the Filename column shows the file that
contains the inferred datapath operator. The Line and Col columns indicate the exact
location in the RTL code where the datapath operator was inferred.
Enable file, row, and column information tracking by setting the
hdl_track_filename_row_col attribute to true:
rc:/> set_attribute hdl_track_filename_row_col true /

Set this attribute before elaboration.

When you set this attribute to false, all the file, row, and column information is not printed
out or is not available.

March 2012 59 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

Report Datapath for RTL Sharing


The following examples take a design (Example 2-7) through synthesis, then reports the mux
operators through the report datapath -all command (Example 2-8).

Example 2-7 Verilog Design with Sharing


module sharing (in1, in2, in3, in4,sel, out);
input [7:0] in1, in2, in3, in4;
input sel;

output [8:0] y;
reg [8:0] y;
always @(in1 or in2 or in3 or in4 or sel)
begin
if (sel == 1b0)
y = in1 + in2;
else
y = in3 + in4;
end
endmodule; // sharing

Example 2-8 report datapath for RTL Sharing that Shows the Mux Operator
Inferred components
Operator Signedness Inputs Outputs CellArea
========================================================
sharing
add_12_14
module:add_unsigned
very_fast + unsigned 8x8 9 130.50
========================================================
sharing
add_10_14
module:add_unsigned
very_fast + unsigned 8x8 9 130.50
========================================================
sharing
mux_out_9_10
module:mux
very_fast mux n/a 2x9x9 9 27.00
========================================================

March 2012 60 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

Using the report datapath Command at Different Stages


in the Design Flow
This section describes the report datapath command results after using
elaborate
synthesize -to_generic
synthesize -to_mapped

elaboration Design Stage


After elaboration, all the discrete operators found in the RTL code are listed in the report.
Every datapath operator in the RTL code is represented by a datapath component. Use the
report datapath command after elaboration to examine the discrete datapath operators
inferred from the RTL code. Every adder will have the very_fast speed grade. The booth
or non-booth choice of a multiplier is based on the width. If the width is greater than or equal
to13, then it is a booth multiplier. If the width is less than or equal to 12, then it is a non-booth
multiplier. At this stage, there are no CSA trees.

synthesize -to_generic Design Stage


CSA transformation (or operator merging) is performed during the synthesize -
to_generic design stage. At this stage, use the report datapath command to examine:
CSA grouping
Resource sharing

An adder chain such as: y = a + b + c + d + e becomes a csa_tree followed by a


final_add.

synthesize -to_mapped Design Stage


Architecture and implementation selection is performed during the synthesize
-to_mapped design stage. At this stage, use the report datapath command to examine:
CSA grouping
Resource sharing
Architecture selection
At this stage, there are no changes to how discrete operators are grouped into CSA trees.
Small operators are automatically ungrouped.

March 2012 61 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Reporting

March 2012 62 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

3
RC-Datapath (RC-DP) Functions

Datapath Function Primitives on page 64


Primitive Functionality on page 65
$abs on page 66
$blend on page 67
$carrysave on page 69
$intround on page 71
$lead0 on page 72
$lead1 on page 73
$rotatel on page 74
$rotater on page 75
$round on page 76
Parameter Functions on page 77
$log2(int) on page 77
$max(int, int) on page 77
$min(int, int) on page 77
ChipWare Components on page 77

March 2012 63 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

Datapath Function Primitives


To describe and synthesize advanced datapath designs, RTL Compiler supports an extended
set of advanced datapath functions in Verilog. This extended language interface enables
concise description of advanced, complex datapath designs, and allows the integration of
advanced datapath synthesis and logic synthesis in one tool.

Table 3-1 shows the function primitives. All of these functions have an output precision that
can be computed simply from the input signal attributes.

Table 3-1 Datapath Function Primitives

Syntax Function
$abs(X) Returns absolute value of the signed input X.
$carrysave If the input can be generated in carrysave format, then it
returns a concatenation of the two words of the carrysave
result.
$blend (X0, X1, alpha, Alpha blender. Returns value obtained by interpolating
alpha1) between values represented by X0, X1.
$intround Returns an approximate product of the first input by rounding
off as many lead-significant bits of each partial product as the
value of the second input position.
$lead0(X) Returns count of leading 0s. If the input is an all 0 signal, then
the output is an all 1 signal.
$lead1(X) Returns count of leading 1s. If the input is an all 1 signal, then
the output is an all 1 signal.
$round (X, pos) Rounds off the input value coming in through port in and
rounds off as many least significant bits (LSBs) as the value
of pos.
$rotatel (X,Y) Left rotates the signal in data by as many bits as given by the
value in amount signal.
$rotater (X,Y) Right rotates the signal in data by as many bits as given by
the value in amount signal.

Of the primitives listed in this table, $blend and $abs are mergeable functions that are used
in sum of products operations. These built-in signal functions are potentially mergeable with
surrounding computations representing sum or sum-of-product. Detailed information about
the mergeability of individual primitives is available in the following section.

March 2012 64 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

Primitive Functionality
This section describes the functionality of the primitives listed in Table 3-1. For all datapath
function primitives, each argument can be a signed or unsigned expression, though not a
word array or a word array slice. Therefore, function primitives cannot be used on arrays or
word arrays or on expressions whose result type is an array or word array slice. Whenever an
expression is used as an argument to a function primitive, the width and sign type of the
expression result are self-determined by Verilog rules. An integer constant has the width of
32 and an array reference has the same width and sign type as each array word.

The following is a list of the datapath function primitives:


$abs on page 66
$blend on page 67
$carrysave on page 69
$intround on page 71
$lead0 on page 72
$lead1 on page 73
$rotatel on page 74
$rotater on page 75
$round on page 76

March 2012 65 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

$abs

Syntax and Port Name


$abs(X)

Output Format

The width of the output of $abs is the same as the width of input X. The output sign type is
always unsigned.

Function

For the integer value represented by the input parameter, ($abs/it) outputs the absolute value
of the input integer as an unsigned integer.

The value of the most significant bit determines whether negation takes place. The input
signal is always treated as signed regardless of the sign type of the input signal at the parent
module that calls this $abs() function.
Note: The absolute value is potentially mergeable inside a sum-of-product computation. As
shown in Example 3-1, a computation, such as $abs(X) +a*b, is potentially mergeable and
can be synthesized with just one carry-propagate adder.

Example 3-1 $abs Function Primitive


module ex_abs(X, result);
input signed [7:0] X;
output [7:0] result;

assign result = $abs(X);


endmodule

Example 3-2 shows the simulation model for the $abs primitive function. Change the
parameters values as needed.

Example 3-2 $abs Simulation Model


parameter wi = 8;

function [wi-1:0] abs;


input signed [wi-1:0] in;
begin
abs = (in >= 0) ? in : -in;
end
endfunction

March 2012 66 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

$blend

Syntax and Port Names


$blend(X0, X1, alpha, alpha1)

Output Format

The width and sign type of the $blend output is determined by widths of the X0, X1, and
alpha inputs and the sign types of the X0 and X1 inputs, as shown in Table 3-2.

Table 3-2 Scenarios of $blend Outputs

X0 Signtype X1 Signtype Output Signtype Output Width


unsigned unsigned unsigned max(w(X0),w(X1)) + w(alpha)
unsigned signed signed max(w(X0)+1,w(X1)) + w(alpha)
signed unsigned signed max(w(X0),w(X1)+1) + w(alpha)
signed signed signed max(w(X0),w(X1)) + w(alpha)

Function

The blend operation represents the interpolation between two numeric values to produce
another numeric value. Define to be the sum of integer values of alpha and alpha1. Let
wa
max be 2 , where wa is the width of alpha. Let X0 and X1 represent the integer values
of X0 and X1. Then the blend function is mathematically given by the expression:
X0* ( max ) + X1* .
Note: When the above expression is divided by the alphamax, it represents a value that is
in between the extreme x0 and x1 values. Therefore, the above expression represents an
integer-normalized interpolation between X0 and X1.

The $blend output is a signed signal only if at least one of the X0 and X1 values is signed.

Internal to $blend, the alpha1 is a single bit unsigned input. Therefore, irrespective of the
width and sign type of the input to alpha0 port, only the least significant bit of the connected
input is used internally as the value at alpha1.
Note: Regardless of the sign type of the signal connected to the alpha port, it is always
internally treated as an unsigned signal. Also, since $blend is essentially a sum-of-product
computation, it is mergeable with the surrounding computation, as long as the merged

March 2012 67 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

computation is still a sum-of-product. The following, as shown in Example 3-3 is potentially a


mergeable computation needing only one carry propagate addition:
$blend(x0, x1, a1, a10) + a*b +c

Example 3-3 $blend Function Primitive


module ex_blend(X0, X1, alpha, alpha, result);
input signed [3:0] X0, X1;
input [3:0] alpha;
input alpha1;
output signed [7:0] result;

assign result = $blend(X0, X1, alpha, alpha1);


endmodule

Example 3-4 shows the simulation model for the $blend primitive function. Change the
parameters values as needed.

Example 3-4 $blend Simulation Model


ifndef MAX
define MAX(a, b) ((a) > (b) ? (a) : (b))
endif

parameter wx0 = 2;
parameter sx0 = 0;
parameter wx1 = 2;
parameter sx1 = 0;
parameter walp = 2;

function [(MAX(wx0,wx1) + walp + 1)-1:0] blend;


input [wx0-1:0] X0;
input [wx1-1:0] X1;
input [walp-1:0] alpha;
input alpha1;

reg signed [wx0:0] X0_s;


reg signed [wx1:0] X1_s;
reg signed [walp+1:0] al;
reg signed [walp+1:0] alc;

begin
X0_s = {sx0 ? X0[wx0-1] : 1b0, X0};
X1_s = {sx1 ? X1[wx1-1] : 1b0, X1};
al = alpha + alpha1;
alc = (1b1 << walp) - al;
blend = al*X1_s + alc*X0_s;
end
endfunction

March 2012 68 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

$carrysave

Syntax and Port Names


$carrysave (X)

Output Format

The width of the carrysave output is twice the width of the input. The X input can contain a
constant, a single signal name, or an arithmetic expression. The width of the input is
determined by Verilogs rules of self-determination of expression width. The sign type of the
carrysave output is unsigned.

Function

The carrysave function assumes that the result of the input expression can be generated in
carrysave format. If this is the case, then the output signal will be a concatenation of the two
words of the carrysave result. When the input expression cannot produce the result in
carrysave format, the output remains twice as wide as the input expression width. The left half
is all zeroes and the right half is the binary representation of the input expressions value.

Example 3-5 $carrysave Function Primitive


module testAdd (a, b, c, z);
// Output z is the sum of the inputs a, b, and c.
input [7:0] a, b, c;
output [9:0] z;
wire [9:0] tmp = a + b + c;
wire [19:0] cs = $carrysave(tmp);
assign z = cs[9:0] + cs[19:10];
endmodule

Note: There is not a bit-accurate simulation model for the $carrysave function primitive.

The use of the tmp variable in the example is needed to determine the bit width of the sum
a + b + c.

Without such an intermediary variable whose bit width is explicitly specified, the bit width of
the sum would follow the rules of self-determined bit width in Section 4.4 Expression bit
lengths in the Verilog-2001 LRM (IEEE Std 1364-2001).

March 2012 69 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

For example,
input [7:0] a, b, c;
wire [19:0] cs = $carrysave(a + b + c);

is equivalent to
input [7:0] a, b, c;
wire [7:0] tmp = a + b + c; // bit width of sum is same as bit width of widest term
wire [19:0] cs = $carrysave(tmp);

and is different from:


input [7:0] a, b, c;
wire [9:0] tmp = a + b + c; // bit width of sum is explicitly set to 10
wire [19:0] cs = $carrysave(tmp);

March 2012 70 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

$intround

Syntax and Port Names


$intround (X, pos)

Output Format

The output width is the width of the X input. The output sign-type is the sign-type of the X
input.

Function

The X input is assumed to come from a sum-of-product tree. For each $intround()
operation, RTL Compiler traverses backward and identifies the product terms. A product term
is either a multiplication operator or a single operand.

The $intround() operation computes an approximate product of the first input by rounding
off as many least-significant bits (LSB) of each partial product as the value of the second input
position. The rounding causes the final product bit vector, which is returned, to have as many
0 valued least-significant bits as the number of positions. The approximation reduces area,
although the result of the computation is approximate.

The input position must evaluate to a non-negative integer constant.

Example 3-6 $intround Function Primitive


module ex_intround(y, a, b, c);
parameter pos = 8;
input signed [7:0] a, b;
input signed [15:0] c;
wire signed [15:0] p, q;
output [15:0] y;
assign p = a * b;
assign q = p + c;
assign y = $intround (q, pos);
endmodule

Note: There is not a bit-accurate simulation model for the $intround function primitive.

March 2012 71 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

$lead0

Syntax and Port Names


$lead0(X)

Output Format

The output is always an unsigned value. Its width is 1 + log 2 ( w ( in ) ) .

Function

Computes the count of leading zeroes in the input argument and outputs this count as an
unsigned binary number. If the input is an all 0 string, then the output is an all 1 string.

Example 3-7 $lead0 Function Primitive


module ex_lead0(x, result);
input signed [7:0] x;
output [3:0] result;
assign res = $lead0(x);
endmodule

Example 3-8 shows the simulation model for the $lead0 primitive function. Change the
parameters values as needed.

Example 3-8 $lead0 Simulation Model


parameter wi=8;
parameter wr=4;
function [wr-1:0] lead0;
input [wi-1:0] in;
reg [wi-1:0] sum;
integer i;
reg [wi-1:0] ibits;
begin
sum = 0;
for(i = wi-1; i >= 0; i = i - 1)
begin
ibits = ((({(wi-1+1){1b1}} >> i) << i) & in) >> i;
sum = sum + ((~|ibits) ? 1 : 0);
end
lead0 = (~|in) ? {(wr-1+1){1b1}} : sum;
end
endfunction

March 2012 72 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

$lead1

Syntax and Port Names


$lead1(X)

Output format

The output is always an unsigned value. Its width is 1 + log 2 ( w ( in ) ) .

Function

Computes the count of leading ones in the input argument and outputs this count as an
unsigned binary number. If the input is an all 1 string, then the output is an all 1 string.

Example 3-9 $lead1 Function Primitive


module ex_lead1(x, result);
input signed [7:0] x;
output [3:0] result;

assign result = $lead1(x);


endmodule

Example 3-10 shows the simulation model for the $lead1 primitive function. Change the
parameters values as needed.

Example 3-10 $lead1 Simulation Model


parameter wi=8,
parameter wr=4;

function [wr-1:0] lead1;


input [wi-1:0] in;
reg [wi-1:0] sum;
integer i;
reg [wi-1:0] ibits;
begin
sum = 0;
for(i = wi-1; i >= 0; i = i - 1)
begin
ibits = ((({(wi-1+1){1b1}} >> i) << i) & in) |
({(wi-1+1){1b1}} >> (wi-1+1-i));
sum = sum + ((&ibits) ? 1 : 0);
end
lead1 = (&in) ? {(wr-1+1){1b1}} : sum;
end
endfunction

March 2012 73 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

$rotatel

Syntax and Port Names


$rotatel(in, rotate)

Output Format

The output width and signtype is the same as the input width and signtype.

Functionality

Rotate left the signal given in the in input using the same width as in and by as many bits
as given by the value in the rotate signal.

If width of rotate is larger than log 2 ( inwidth ) + 1 , then only the log 2 ( inwidth ) + 1
amount of least significant bits of rotate are used.

Example 3-11 $rotatel Function Primitive


module ex_rotatel(in, rotate, result);
input [7:0] in;
input [4:0] rotate;
output [7:0] result;
assign result = $rotatel(in, rotate);
endmodule

Example 3-12 shows the simulation model for the $rotate1 primitive function. Change the
parameters values as needed.

Example 3-12 $rotate1 Simulation Model


parameter wi = 8;
parameter wc = 8;

function [wi-1:0] rotatel;


input [wi-1:0] in;
input [wc-1:0] count;
integer i;
reg [wi-1:0] tmp;
begin
if (wi == 1)
rotatel = in;
else
begin
tmp = in;
for (i = 0; i < count; i = i + 1)
tmp = { tmp[wi-2:0], tmp[wi-1] };
rotatel = tmp;
end
end
endfunction

March 2012 74 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

$rotater

Syntax and Port Names


$rotater(in, rotate)

Output Format

The output width and signtype are the same as the input width and signtype.

Functionality

Rotate right the signal given in the in input using the same width as in and by as many bits
as given by the value in the rotate signal.

If the width of rotate is larger than log 2 ( inwidth ) + 1 then only log 2 ( inwidth ) + 1
many least significant bits of rotate are used.

Example 3-13 $rotater Function Primitive


module ex_rotater(in, rotate, result);
nput [7:0] in;
input [4:0] rotate;
output [7:0] result;
assign result = $rotater(in, rotate);
endmodule

Example 3-14 shows the simulation model for the $rotater primitive function. Change the
parameters values as needed.

Example 3-14 $rotater Simulation Model


parameter wi = 8;
parameter wc = 8;

function [wi-1:0] rotater;


input [wi-1:0] in;
input [wc-1:0] count;
integer i;
reg [wi-1:0] tmp;
begin
if (wi == 1)
rotater = in;
else
begin
tmp = in;
for (i = 0; i < count; i = i + 1)
tmp = { tmp[0], tmp[wi-1:1] };
rotater = tmp;
end
end
endfunction

March 2012 75 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

$round

Syntax and Port Names


$round(in, pos)

Output Format

The output format has the same width and signtype as the port in input.

Functionality

The input value coming in through port in is rounded off and as many least significant bits
are rounded off as the value of pos. In other words, the output value is generated by starting
with the same value as in, then as many least significant bits to 0 as the value of pos. The
bit at position pos-1 of in is added at bit position number pos.

Example 3-15 Function Primitive


module ex_round(q0, in);
parameter pos = 3;
output signed [9:0] q0;
input signed [7:0] in;
assign q0 = $round(in, pos);
endmodule

Example 3-16 shows the simulation model for the $round primitive function. Change the
parameters values as needed.

Example 3-16 $round Simulation Model


parameter wi = 8;
parameter pos = 2;

function [wi-1:0] round;


input [wi-1:0] in;
input unused;
reg [wi-1:0] tmp;
begin
if (pos >= wi)
round = 1b0;
else if (pos <= 0)
round = in;
else
begin
tmp = (in >> pos) << pos;
round = tmp + (in[pos-1] << pos);
end
end
endfunction

March 2012 76 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

Parameter Functions
Table 3-3 describes global system functions that assist in writing parameterized code. The
inputs to these functions must be constant expressions. Because these functions are
themselves constant expressions, they do not imply any hardware.
.
Table 3-3 Provided Global System Functions

Global System Function Functionality


$log2(int) Return log 2 ( int )
$max(int, int) Return max of two integers
$min(int, int) Return min of two integers

Note: The input arguments for $max, $min, and $log2 must evaluate to constants.

ChipWare Components
Some datapath functions can be conveniently described with ChipWare components. RTL
Compiler supports its proprietary ChipWare components, as well as DesignWare, GTECH,
and AmbitWare libraries.

See the ChipWare in Encounter RTL Compiler for more information on ChipWare and
third party libraries.

March 2012 77 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
RC-Datapath (RC-DP) Functions

March 2012 78 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

4
Datapath Coding Styles

Overview on page 80
Starting from the RTL on page 80
Importing the Gate-Level Netlist on page 80
Keeping Relevant Operators in the Same Hierarchy on page 81
Inferring Datapath Modules on page 84
Applying Carry-save Arithmetic Automatically on page 85
Inferring a Constant Multiplier on page 86
Using Signed Arithmetic on page 88
Using Signed Part-Select / Concatenation on page 90
Implementing Programmable Unsigned and Signed Multiplier on page 91
Mixing Unsigned and Signed Expressions on page 94
Avoiding Manual Signal Extension Techniques on page 95
VHDL Signed/Unsigned Type Conversion on page 96
Inferring a Square Automatically on page 97
Implying Upper-Bit Truncation on page 98
Following Self-Determined Bit Width Rules on page 101
Avoiding Instantiated Components on page 104
Arithmetically Complementing an Operand on page 107

March 2012 79 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Overview
Immediately after reading in the design, RTL Compiler separates datapath computations from
control-related logic and creates one datapath partition for each set of merged datapath
operators. When examining opportunities to merge operators, the highest priority is to
preserve the original functionality. More merging usually leads to better quality of silicon
(QoS). Based on a set of intelligent rules, RTL Compiler performs as much operator merging
as possible.
However, RTL Compiler does not understand the overall design specification behind the RTL
code. There are scenarios where the RTL coding style affects merging activities, and
therefore affects the QoS. In these scenarios the RTL coding style imposes more restrictions
on operator merging than necessary. The following sections discuss typical coding scenarios
and guidelines that interfere with or support maximum merging of operators.

Starting from the RTL


Always start from the RTL, because RTL Compiler prefers RTL code that infers arithmetic
operators like adders, subtractors, and multipliers. This way, RTL Compiler acquires a high-
level view of the design and performs operator-level optimization, which provides better QoS
benefits than gate-level optimization.

Importing the Gate-Level Netlist


Tip
Guideline: Do not import a gate-level netlist for arithmetic operators. Instead, infer
them.
Sometimes you may use a special datapath module generator to generate a gate-level netlist
for arithmetic operators. The netlist is then fed into the logic synthesis tool, along with the RTL
code of the non-datapath portion of the design. This may lead to an overall QoS that is worse
than what RTL Compiler can accomplish. Importing a gate-level netlist of an arithmetic
operator limits RTL Compiler to doing only gate-level logic optimization on the given netlist.
None of the built-in datapath techniques can be exercised.
RTL Compiler does not
Reverse-engineer the arithmetic functionality of a given netlist
Perform operator merging on the netlist
Change the architecture of the operators
Refine the implementation of the operators to pursue a dramatic QoS improvement

March 2012 80 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Keeping Relevant Operators in the Same Hierarchy

Tip
Guideline: If two arithmetic operators are directly interacting with each other, keep
them at the same level of hierarchy, that is, in the same module.

A great deal of RTL code keeps an adder, subtractor, or multiplier in a module by itself for
various reasons. CSA transformation respects user-defined design hierarchies and does not
merge across hierarchical boundaries. Therefore, a standalone operator inside a module
cannot be merged with operators in other modules, which has a negative impact on QoS.

Note: Dissolving a hierarchy helps with CSA transformation decisions.

The design in Example 4-1 separates the interacting operators and operands in different
modules thereby preventing any CSA opportunities.

The designs in Example 4-2 on page 83 and Example 4-3 on page 83, keep the interacting
operators and operands in the same module.

Example 4-1 Keeping Operators in Separate Levels of the Design Hierarchy

Verilog
module mult (y, a, b);
input [7:0] a, b;
output [15:0] y;
assign y = a * b;
endmodule
module add (y, a, b);
input [15:0] a, b;
output [15:0] y;
assign y = a + b;
endmodule
module test (y, a, b, c);
input [7:0] a, b;
input [15:0] c;
output [15:0] y;
wire [15:0] p;
mult U1 (p, a, b);
add U2 (y, p, c);
endmodule

March 2012 81 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

VHDL
library ieee;
use ieee.numeric_std.all;

entity mult is
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (7 downto 0)
);
end mult;
architecture rtl of mult is
begin
y <= a * b;
end rtl;
library ieee;
use ieee.numeric_std.all;
entity add is
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (15 downto 0)
);
end add;
architecture rtl of add is
begin
y <= a + b;
end rtl;
library ieee;
use ieee.numeric_std.all;
entity test is
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (7 downto 0);
c : in unsigned (15 downto 0)
);
end test;
architecture rtl of test is
component mult
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (7 downto 0)
);
end component;
component add
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (15 downto 0)
);
end component;
signal p : unsigned (15 downto 0);
begin
U1: mult port map ( p, a, b);
U2: add port map ( y, p, c);
end rtl;

March 2012 82 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Example 4-2 Inferring Operators at the Same Level of Design Hierarchy

Verilog
module test (y, a, b, c);
input [7:0] a, b;
input [15:0] c;
output [15:0] y;
wire [15:0] p;
assign p = a * b;
assign y = p + c;
endmodule

VHDL
library ieee;
use ieee.numeric_std.all;
entity test is
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (7 downto 0);
c : in unsigned (15 downto 0)
);
end test;
architecture rtl of test is
signal p : unsigned (15 downto 0);
begin
p <= a * b;
y <= p + c;
end rtl;

Example 4-3 Inferring Operators at the Same Level of Design Hierarchy

Verilog
module test (y, a, b, c);
input [7:0] a, b;
input [15:0] c;
output [15:0] y;
assign y = a * b + c;
endmodule

VHDL
library ieee;
use ieee.numeric_std.all;
entity test is
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (7 downto 0);
c : in unsigned (15 downto 0)
);
end test;
architecture rtl of test is
begin
y <= a * b + c;
end rtl;

March 2012 83 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Inferring Datapath Modules

Tip
Guideline: Infer adders, subtractors, or multipliers, rather than creating them
yourself.

You can often handcraft arithmetic operators. For example, instead of inferring a multiplier,
you may choose to devise a certain architecture for the multiplier and describe its
implementation in detail, including Booth encoding, partial product generation, carrysave
reduction, and so on. Sometimes, a part of the architecture may be described at an
abstraction level as low as logic equations.

Note: Although at quite a low level, this is still called RTL coding since it does not directly
instantiate gates in the target library.

Handcrafting a multiplier prevents RTL Compiler from recognizing it as a multiplier, therefore,


RTL Compiler cannot use a better architecture to implement the multiplier. RTL Compiler
cannot refine this given architecture and cannot merge this multiplier with other arithmetic
operators. Handcrafting has a negative impact on QoS.

If you consider the performance of an individual adder, subtractor, or multiplier, then the
architectures built into the datapath engine are usually as good as what can be accomplished
by handcrafting. However, when you consider the overall design, carrysave transformation
becomes the differentiator between inferring and handcrafting.

March 2012 84 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Applying Carry-save Arithmetic Automatically

Tip
Guideline: Do not handcraft the carrysave technique. Let RTL Compiler apply the
carrysave technique through CSA transformations automatically.

Carrysave arithmetic is usually implemented using handcrafted arithmetic operators because


of the need to save the carry-propagate addition until later in the data flow and the lack of a
carrysave data type in standard HDL syntax.

Although this practice of handcrafting arithmetic operators does work, it limits RTL Compiler
to the architecture and the implementation described in the RTL code. It also makes the RTL
code difficult to read and maintain.

When using datapath synthesis it is no longer necessary to handcraft arithmetic operators.


Each merged operator has only one final carry-propagate adder on the critical path. Plus, for
each merged operator, RTL Compiler selects the best architecture based on overall QoS
constraints. It also refines the implementation dynamically.

March 2012 85 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Inferring a Constant Multiplier

Tip
Guideline: Do not perform manual shift-and-add for a constant multiplication. Just
infer a multiplier.

The traditional method to synthesize constant multiplication is to start from a full multiplier and
later let logic optimization remove all of the redundant logic.

A better way is to decompose the multiplier to a sequence of shift-and-add operations. You


can do this manually in the RTL code, which is another form of handcrafted multipliers.

Just like other handcrafting, this manual shift-and-add approach hurts operator merging.
Datapath synthesis does shift-and-add whenever it helps QoS. Handcrafted shift-and-add is
no longer needed.

Example 4-4 shows a typical shift-and-add in Verilog-1995.

Example 4-4 Manual Shift-and-Add Operations in Verilog-1995


wire [15:0] p;
wire [23:0] a6;
wire [23:0] a3;
wire [23:0] a2;
wire [23:0] y;
assign a6 = {2b0, a[15:0], 6b0}; // (a * 2**6)
assign a3 = {5b0, a[15:0], 3b0}; // (a * 2**3)
assign a2 = {6b0, a[15:0], 2b0}; // (a * 2**2)
assign y = a6 + a3 + a2;

To give RTL Compiler more freedom to optimize, use the RTL coding style for unsigned
constant multiplication shown in Example 4-5.

Example 4-5 Recommended Coding for Unsigned Constant Multiplication

Verilog
wire [15:0] a;
wire [23:0] y;
assign y = a * 76; // i.e. a * 8b01001100

VHDL
signal a : unsigned (15 downto 0);
signal y : unsigned (23 downto 0);

y <= a * to_unsigned(76,8); -- i.e. a * "01001100"

March 2012 86 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Example 4-6 describes a manual shift-and-add sequence in Verilog-1995.

Example 4-6 Manual Shift-and-Add Sequence


wire [15:0] a; // to be used as signed
wire [23:0] a6; // to be used as signed
wire [23:0] a3; // to be used as signed
wire [23:0] a2; // to be used as signed
wire [23:0] y; // to be used as signed
assign a6 = {{2{a[15]}}, a[15:0], 6b0}; // signed (a * 2**6)
assign a3 = {{5{a[15]}}, a[15:0], 3b0}; // signed (a * 2**3)
assign a2 = {{6{a[15]}}, a[15:0], 2b0}; // signed (a * 2**2)
assign y = a6 + a3 + a2;

To give RTL Compiler more freedom to optimize, use the coding style for signed constant
multiplication shown in Example 4-7.

Example 4-7 Recommended Coding Style for Signed Constant Multiplication

Verilog
wire signed [15:0] a;
wire signed [23:0] y;
assign y = a *76 ; // i.e. a * 8sb01001100

VHDL
signal a : signed (15 downto 0);
signal y : signed (23 downto 0);

y <= a * to_signed(76,8); -- i.e. a * "01001100"

March 2012 87 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Using Signed Arithmetic

Tip
Guideline: Use signed operators for any design that needs signed arithmetic.

Using unsigned operators to perform signed arithmetic requires lots of explicit, manual sign
extension, resulting in lengthy and error-prone RTL code.

Verilog-1995 does not support a signed data type. RTL Compiler supports Verilog-2001,
which has a signed data type and associated signed operators.

The coding styles shown in Example 4-8 and Example 4-9 both work for signed addition.
When in doubt, use the coding style shown in Example 4-9.

Example 4-8 Signed Addition in Verilog-1995


wire [15:0] a, b; // to be used as signed
wire [16:0] y; // to be used as signed
assign y = {a[15], a} + {b[15], b}; // infers an 17x17 unsigned adder

Example 4-9 Signed Addition

Verilog -2001
wire signed [15:0] a, b;
wire signed [16:0] y;
assign y = a + b; // infers an 16x16 signed adder

VHDL
signal a, b : signed (15 downto 0);
signal y : signed (16 downto 0)

y <= resize(a,17) + resize(b,17);

Example 4-10 and Example 4-11 both work for signed multiplication. When in doubt, use
Example 4-11.

Example 4-10 Signed Multiplication in Verilog-1995


wire [16:0] a; // to be used as signed
wire [14:0] b; // to be used as signed
wire [31:0] y; // to be used as signed
wire [31:0] ax, bx;
assign ax = {{15{a[16]}}, a};
assign bx = {{17{b[14]}}, b};
assign y = ax * bx; // a 32x32 unsigned multiplier

March 2012 88 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Example 4-11 Signed Multiplication

Verilog -2001
wire signed [16:0] a;
wire signed [14:0] b;
wire signed [31:0] y;
assign y = a * b; // infers a 17x15 signed multiplier

VHDL
signal a : signed (16 downto 0);
signal b : signed (14 downto 0);
signal y : signed (31 downto 0)

y <= a * b;

March 2012 89 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Using Signed Part-Select / Concatenation

Tip
Guideline: Do not use part-selects that specify an entire vector.

Part-select and concatenation results in an unsigned value, as shown in Example 4-12.

Example 4-13 shows the correct coding style.

Example 4-12 Using Part-Select of Signed Vectors Become Unsigned


wire signed [15:0] a, b;
wire signed [30:0] out1, out2;
// a is signed; but a[15:0] is unsigned ==> zero-extension
assign out1 = a[15:0];
// a is signed; but a[14:0] is unsigned ==> unsigned multiplication
assign out2 = a[14:0] * b;

Example 4-13 Correct Coding Style Using Part-Select of Signed Vectors


wire signed [15:0] a, b;
wire [30:0] out1, out2;
// $signed(a[15:0]) is signed ==> sign-extension
assign out1 = $signed(a[15:0]);

// $signed(a[14:0]) is signed ==> signed multiplication


assign out2 = $signed(a[14:0]) * b;

March 2012 90 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Implementing Programmable Unsigned and Signed Multiplier

Tip
Guideline: Use selective zero-/sign-extend operands and have a single datapath to
implement switchable unsigned and signed datapath, as shown in Example 4-15.

When both a signed and unsigned version of an operator is needed, and there is a way to use
one bigger operator to implement both, using one single operator usually produces a better
QoS.

Example 4-14 is an easy way to implement a programmable signed/unsigned multiplier


because it infers two multipliers. Example 4-15 is a better way to accomplish the same
functionality but with a much better QoS.

Example 4-14 Using Two Operators to Implement Switchable Unsigned and Signed
Multiplier

Verilog
module ex_4_14 (a, b, tc, y); //bad QoS
parameter width = 16;
input [width-1:0] a, b;
input tc;
output [2*width-1:0] y;
wire signed [width-1:0] a_sgn, b_sgn;
wire signed [2*width-1:0] y_sgn;
wire [2*width-1:0] y_uns;
assign a_sgn = a;
assign b_sgn = b;

assign y_sgn = a_sgn * b_sgn; // a 16x16 signed multiplier


assign y_uns = a * b; // a 16x16 unsigned multiplier
assign y = tc ? y_sgn : y_uns;
endmodule

VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity ex_4_14 is
generic (width : natural := 16);
port ( a : in unsigned (width-1 downto 0);
b : in unsigned (width-1 downto 0);
tc : in std_logic;
y : out unsigned (width*2-1 downto 0) );
end ex_4_14;
architecture rtl of ex_4_14 is
signal a_sgn, b_sgn : signed (width-1 downto 0);
signal y_sgn : signed (width*2-1 downto 0);
signal y_uns : unsigned (width*2-1 downto 0);

March 2012 91 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

begin
a_sgn <= signed(a);
b_sgn <= signed(b);
y_sgn <= a_sgn * b_sgn; -- a 16x16 signed multiplier
y_uns <= a * b; -- a 16x16 unsigned multiplier
y <= unsigned(y_sgn) when (tc = 1) else y_uns;
end rtl;

Example 4-15 Using a Single Operator to Implement Switchable Unsigned and Signed
Multiplier

Verilog
module ex_4_15 (a, b, tc, y); //good QoS
parameter width = 16;
input [width-1:0] a, b;
input tc;
output [2*width-1:0] y;
wire a_msb, b_msb;
wire signed [width:0] a_sgn, b_sgn;

assign a_msb = tc & a[width-1];


assign b_msb = tc & b[width-1];
assign a_sgn = {a_msb, a};
assign b_sgn = {b_msb, b};
assign y = a_sgn * b_sgn; //a 17x17 signed multiplier
endmodule

VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity ex_4_15 is
generic (width : natural := 16);
port ( a : in unsigned (width-1 downto 0);
b : in unsigned (width-1 downto 0);
tc : in std_logic;
y : out unsigned (width*2-1 downto 0) );
end ex_4_15;
architecture rtl of ex_4_15 is
signal a_msb, b_msb : std_logic;
signal a_sgn, b_sgn : signed (width downto 0);
signal y_extra : signed (width*2+1 downto 0);
begin
a_msb <= tc and a(width-1);
b_msb <= tc and b(width-1);
a_sgn (width) <= a_msb;
b_sgn (width) <= b_msb;
a_sgn (width-1 downto 0) <= signed(a);
b_sgn (width-1 downto 0) <= signed(b);
y_extra <= a_sgn * b_sgn; -- a 17x17 signed multiplier
y <= unsigned (y_extra (width*2-1 downto 0));
end rtl;

March 2012 92 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Table 4-1 Programmable Unsigned and Signed Datapath Results

Slack (ps) Area (lib units)


Using Two Datapaths to Implement Switchable 0 11426.92
Unsigned and Signed
Using a Single Datapath to Implement Switchable 0 7008.58
Unsigned and Signed

March 2012 93 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Mixing Unsigned and Signed Expressions

Tip
Guideline: Do not unintentionally mix unsigned and signed expression types in one
expression.

Avoid mixing unsigned and signed signals in an arithmetic expression. VHDL does not allow
mixing signed and unsigned operators among the operands of an operator.

Verilog allows mixing signed and unsigned operators among the operands of an operator.
When this happens, as defined by the IEEE Standard, an operator is unsigned if any one of
its operands is unsigned.
Example 4-16 is shows a coding style where you may not be aware that RTL Compiler is
synthesizing an unsigned multiplier. If the intention is to have a multiplier taking an unsigned
operand and a signed operand, then Example 4-17 shows the correct coding style.

Example 4-18 shows a coding style where you may not be aware that RTL Compiler is
synthesizing an unsigned constant multiplier. If the intention is to have a constant multiplier
taking a signed signal and a unsigned constant, then Example 4-19 shows the correct coding
style.

Example 4-16 Mixing Operands of Different Sign Type Results in Unsigned Functionality
wire [15:0] a; //unsigned
wire signed [15:0] b; //signed
wire signed [31:0] prod;
assign prod = a * b; //an unsigned multiplier

Example 4-17 Inferring a Signed Multiplier when Mixing Operands of Different Sign Type
wire [15:0] a;
wire signed [15:0] b;
wire signed [31:0] prod;
// Add 0 as signed bit, then cast to signed
assign prod = $signed ({1'b0, a}) * b; //a signed multiplier

Example 4-18 Mixing Operands of Different Sign Type Results in Unsigned Functionality
wire signed [15:0] a;
wire signed [19:0] prod;
assign prod = a * 4'b1011; //an unsigned constant multiplier
// 4'b0011 is an unsigned constant

Example 4-19 Inferring a Signed Multiplier when Mixing Operands of Different Sign Type
wire signed [15:0] a;
wire signed [19:0] prod;
assign prod = a * 5'sb01011; //a signed constant multiplier
// 5sb01011 is a signed constant

March 2012 94 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Avoiding Manual Signal Extension Techniques

Tip
Guideline: Declare appropriate signal types to automatically handle signal
extension.

Manual signal extension is popular in RTL coding of real-world designs, which could be zero-
extension for unsigned data or sign-extension for signed data.

RTL Compiler performs behavioral analysis and understands the real intention behind the
concatenation. However, when it comes to explicit manual signal extension, as shown in
Example 4-20 and Example 4-22, there is the potential to create a scenario of mixed signed
and unsigned operands. If this happens, using the intended sign type, as shown in
Example 4-21 and Example 4-23 can be a better practice.

Example 4-20 Manual Zero-Extension for Unsigned Signals


wire [15:0] a, b; // to be used as unsigned
wire [16:0] s; // to be used as unsigned
assign s = {1b0, a} + {1b0, b};

Example 4-21 Recommended Coding Style for Unsigned Signals


wire [15:0] a, b; // unsigned
wire [16:0] s; // unsigned
assign s = a + b;

Example 4-22 Manual Sign-Extension for Signed Signals


wire [15:0] a, b; // to be used as signed
wire [16:0] s; // to be used as signed
assign s = {a[15], a} + {b[15], b};

Example 4-23 Recommended Coding Style for Signed Signals


wire signed [15:0] a, b; // signed
wire signed [16:0] s; // signed
assign s = a + b;

March 2012 95 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

VHDL Signed/Unsigned Type Conversion

Tip
Guideline: When using numeric packages, make sure you clearly specify whether
arithmetic operations are unsigned or signed.

Do not declare a signal type that is inconsistent with the intended operation, as shown in
Example 4-24.

Example 4-24 Inconsistency Between Declaration and Usage


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity ex_4_24 is
port( a,b : in std_logic_vector (15 downto 0);
z : out std_logic_vector (31 downto 0) );
end ex_4_24;
architecture str of ex_4_24 is
begin
-- on-the-fly cast to/from numeric types
z <= std_logic_vector(signed(a) * signed(b));
end str;

Declare the signal type that matches the intended operation, as shown in Example 4-25.
Use the ieee.numeric_std IEEE numeric package for numeric types and functions.
Use only one numeric package at a time.
Use the unsigned and signed numeric types in all arithmetic expressions.

Use the report datapath command to list the datapath blocks and the input and output
operand types.

Example 4-25 Using Consistent Numeric Types


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity ex_4_25 is
port( a,b : in signed (15 downto 0);
z : out signed (31 downto 0) );
end ex_4_25;
architecture str of ex_4_25 is
begin
-- consistent use of numeric types in datapath blocks
z <= a * b;
end str;

March 2012 96 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Inferring a Square Automatically


RTL Compiler is able to recognize and synthesize the square (x 2) mathematical operator
automatically. Example 4-26 and Example 4-27 show cases in which squares are inferred

Example 4-26 Inferring an Unsigned Square with an Unsigned Signal

Verilog
wire [15:0] a;
wire [31:0] y;
assign y = a * a;

VHDL
signal a : unsigned (15 downto 0);
signal y : unsigned (31 downto 0);

y <= a * a;

Example 4-27 Inferring a Signed Square with a Signed Signal

Verilog
wire signed [15:0] a;
wire [31:0] y;
assign y = a * a;

VHDL
signal a : signed (15 downto 0);
signal y : signed (31 downto 0);

y <= a * a;

March 2012 97 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Implying Upper-Bit Truncation

Tip
Guideline: Be aware of implied upper-bit truncation in addition and subtraction.
When there is a sequence of computation by addition, subtraction, or multiplication,
unless disallowed in the algorithm, keep full precision until the end of the data flow.
Do truncation at the end of the sequence, which facilitates the most operator
merging and usually leads to the best QoS.

Truncation potentially prevents merging. Upper-bit truncation is often subtle or unintentional,


but inadvertently affects the QoS.

Example 4-28 shows a scenario where implied upper-bit truncation does not affect operator
merging.

Example 4-28 Operator Merging is Allowed if Truncation Does Not Affect Final
Outcome

Verilog
wire [15:0] a, b, c, d; // operators merged
wire [15:0] p, q;
wire [15:0] y;
assign p = a + b; // implied upper-bit truncation
assign q = c + d; // implied upper-bit truncation
assign y = p + q; // implied upper-bit truncation

VHDL
signal a, b, c, d : unsigned (15 downto 0); -- operators merged
signal p, q : unsigned (15 downto 0);
signal y : unsigned (15 downto 0);

p <= a + b; -- implied upper-bit truncation


q <= c + d; -- implied upper-bit truncation
y <= p + q; -- implied upper-bit truncation

However, since the final output y requires a precision of only 16 bits, the intermediate implied
truncations in generating p and q do not cause any loss of information. Therefore, the three
additions are mergeable in spite of implied upper-bit truncation.

Example 4-29 carries full precision everywhere, allowing the three adders to be merged
without introducing any mathematical error.

March 2012 98 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Example 4-29 Arithmetic With Full Precision Facilitates Operator Merging

Verilog
wire [15:0] a, b, c, d; // operators merged
wire [16:0] p, q;
wire [17:0] y;
assign p = a + b; // full precision
assign q = c + d; // full precision
assign y = p + q; // full precision

VHDL
signal a, b, c, d : unsigned (15 downto 0); -- operators merged
signal p, q : unsigned (16 downto 0);
signal y : unsigned (17 downto 0);

p <= resize(a,17) + resize(b,17); -- full precision


q <= resize(c,17) + resize(d,17); -- full precision
y <= resize(p,18) + resize(q,18); -- full precision

Note: Adding two 16-bit numbers with full precision leads to a 17-bit sum. Similarly, adding
two 17-bit numbers leads to a 18-bit sum.

Example 4-30 contains both implied upper-bit truncation and full precision. The calculation of
p and q throws away the carry out. The calculation of y accommodates the carry-out. If the
three adders are merged, then the calculation of p and q are treated as full precision without
throwing away the carry out. This would make the merged operator mathematically different
from the original design. This is a case where the operators cannot be merged.

Example 4-30 Mixture of Implied Upper-Bit Truncation and Full Precision Arithmetic
May Hurt Operator Merging

Verilog
wire [15:0] a, b, c, d; // operators not merged
wire [15:0] p, q;
wire [17:0] y;
assign p = a + b; // implied upper-bit truncation
assign q = c + d; // implied upper-bit truncation
assign y = p + q; // full precision

VHDL
signal a, b, c, d : unsigned (15 downto 0); -- operators not merged
signal p, q : unsigned (15 downto 0);
signal y : unsigned (17 downto 0);

p <= a + b; -- implied upper-bit truncation


q <= c + d; -- implied upper-bit truncation
y <= resize(p,18) + resize(q,18); -- full precision

March 2012 99 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Example 4-31 shows another scenario where it is safe to merge the three additions as one
operator.

Example 4-31 Mixture of Explicit Upper-Bit Truncation and Full-Precision Arithmetic


May Still Allow Operator Merging

Verilog
wire [15:0] a, b, c, d; // merged as one cluster
wire [16:0] p, q;
wire [15:0] y;
assign p = a + b; // full precision, no truncation
assign q = c + d; // full precision, no truncation
assign y = p + q; // explicit upper-bit truncation

VHDL
signal a, b, c, d : unsigned (15 downto 0); -- merged as one cluster
signal p, q : unsigned (16 downto 0);
signal y_extra : unsigned (16 downto 0);
signal y : unsigned (15 downto 0);
p <= resize(a,17) + resize(b,17); -- full precision, no truncation
q <= resize(c,17) + resize(d,17); -- full precision, no truncation
y_extra <= p + q;
y <= y_extra (15 downto 0); -- explicit upper-bit truncation

March 2012 100 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Following Self-Determined Bit Width Rules

Tip
Guideline: Do not use self-determined expressions. Make widths of arithmetic
expressions unambiguous using intermediate signals and additional assignments.

When manipulating fixed-point arithmetic algorithms, full precision calculation is often


assumed:
If doing an addition such as y = a + b, then assume
width(y) = max (width(a), width(b)) + 1

The extra bit accommodates the carry if the addition overflows.


If doing a subtraction such as y = a - b, then assume
width(y) = max(width(a), width(b)) + 1

The extra bit accommodates the borrow if the subtraction underflows.


If doing a multiplication such as y = a * b, then assume
width(y) = width(a) + width(b)

However, when the RTL code falls into the self-determined bit-width rules defined in the
Verilog LRM (IEEE Std 1364-1995), the width of y is as shown in Table 4-2. This can have a
negative impact on overall QoS.

Table 4-2 Rules of Self-determined Bit-Width in Verilog LRM

Bit-width according Bit-width needed


Expression
to Verilog LRM for full precision
i + j Max (L(i), L(j)) Max (L(i), L(j)) + 1
i - j Max (L(i), L(j)) Max (L(i), L(j)) + 1
i * j Max (L(i), L(j)) L(i) + L(j)

The following examples highlight the impact of self-determined rules.

Example 4-32 is a design that relies on the self-determined bit-width rule for the two adders
in the comparison. According to the LRM rule, Example 4-32 is equivalent to Example 4-33.
The three operators (two adders and one comparator) are not merged.

March 2012 101 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Many times, the needed functionality can be implemented by using either the coding style
shown in Example 4-33 or Example 4-34. However, the coding style shown in Example 4-34
leads to better QoS because it is more inclined to facilitate operator merging.

Example 4-32 Design That Triggers the Self-Determined Rule of Addition


wire [7:0] a, b, c, d; // operators not merged
reg [7:0] y;
always @ (a or b or c or d)
begin
if (a + b == c + d)
y <= a & b;
else
y <= a | b;
end

Example 4-33 LRM Interpretation of Example 4-32


wire [7:0] a, b, c, d; // operators not merged
reg [7:0] p, q; // implied upper-bit truncation
reg [7:0] y;
always @ (a or b or c or d or p or q)
begin
p <= a + b;
q <= c + d;
if (p == q)
y <= a & b;
else
y <= a | b;
end

Example 4-34 Merging-Inclined Variation of Example 4-32


wire [7:0] a, b, c, d; // three operators merged
reg [8:0] p, q; // full precision
reg [7:0] y;
always @ (a or b or c or d or p or q)
begin
p <= a + b;
q <= c + d;
if (p == q)
y <= a & b;
else
y <= a | b;
end

Example 4-35 on page 103 is a design that relies on the self-determined bit-width rule for the
two multipliers in the comparison. According to the LRM rule, Example 4-35 is equivalent to
Example 4-36 on page 103. The three operators (two multipliers and one comparator) are not
merged.

Many times, the needed functionality can be implemented by using either Example 4-36 or
Example 4-37. However, Example 4-37 on page 103 is more inclined to facilitate operator
merging and will lead to a better QoS.

March 2012 102 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Example 4-35 Design That Triggers the Self-Determined Rule of Multiplication


wire [7:0] a, b, c, d; // operators not merged
reg [7:0] y;
always @ (a or b or c or d)
begin
if (a * b >= c * d)
y <= a & b;
else
y <= a | b;
end

Example 4-36 LRM Interpretation Example 4-36


wire [7:0] a, b, c, d; // operators not merged
reg [7:0] p, q; // implied upper-bit truncation
reg [7:0] y;
always @ (a or b or c or d or p or q)
begin
p <= a * b;
q <= c * d;
if (p >= q)
y <= a & b;
else
y <= a | b;
end

Example 4-37 Merging-Inclined Variation of Example 4-35


wire [7:0] a, b, c, d; // three operators merged
reg [15:0] p, q; // full precision
reg [7:0] y;
always @ (a or b or c or d or p or q)
begin
p <= a * b;
q <= c * d;
if (p >= q)
y <= a & b;
else
y <= a | b;
end

Note: Be aware of the LRM self-determined bit-width rules. Avoid the self-determined rules
by explicitly declaring width of intermediary signals. As long as functionality allows, facilitate
as much operator merging as possible.

March 2012 103 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Avoiding Instantiated Components

Tip
Guideline: Do not instantiate arithmetic DesignWare components because
instantiated components prevent datapath optimization and have a negative effect
on QoS, as shown in Example 4-38. Instead, write arithmetic expressions in the
RTL, as shown in Example 4-39.

Example 4-38 Instantiating Arithmetic DW Components Results in Bad QoS

Verilog
module gd11_1 (a, b, c, d0, d1, z0, z1); //instantiation of DW component
input [7:0] a, b, c;
input [15:0] d0, d1;
output [15:0] z0, z1;

wire [17:0] p0, p1;


wire [17:0] p2, p3;
wire [15:0] s00, s01, s10, s11;
//multiply with explicit carry-save output
DW02_multp # (8, 8, 18) mult0 (.a(a), .b(b), .tc(1b0), .out0(p0), .out1(p1));
DW02_multp # (8, 8, 18) mult1 (.a(a), .b(c), .tc(1b0), .out0(p2), .out1(p3));

//add with explicit carry-save output


DW01_csa # (16) csa0 (.a(p0[15:0]), .b(p1[15:0]), .c(d0), .ci(1b0), .sum(s00), .carry(s01));
DW01_csa # (16) csa1 (.a(p2[15:0]), .b(p3[15:0]), .c(d1), .ci(1b0), .sum(s10), .carry(s11));

//carry-save to binary conversion (final _adder)


DW01_add # (16) add0 (.A(s00), .B(s01), .CI(1b0), .SUM(z0));
DW01_add # (16) add1 (.A(s10), .B(s11), .CI(1b0), .SUM(z1));
endmodule

March 2012 104 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

VHDL
library ieee, dw01, dw02;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
use dw01.dw01_components.all;
use dw02.dw02_components.all;

entity gd11_1 is --instantiation of DW component


port ( a, b, c : in std_logic_vector(7 downto 0);
d0, d1 : in std_logic_vector(15 downto 0);
z0, z1 : out std_logic_vector(15 downto 0) );
end gd11_1;

architecture rtl of gd11_1 is


signal p0, p1, p2, p3 : std_logic_vector(17 downto 0);
signal s00, s01, s10, s11 : std_logic_vector(15 downto 0);
begin
-- multiply with explicit carry-save output
mult0: DW02_multp
generic map ( a_width => 8, b_width => 8, out_width => 18 )
port map ( a => a, b => b, tc => 0, out0 => p0, out1 => p1 );
mult1: DW02_multp
generic map ( a_width => 8, b_width => 8, out_width => 18 )
port map ( a => a, b => c, tc => 0, out0 => p2, out1 => p3 );

-- add with explicit carry-save output


csa0: DW01_csa
generic map ( width => 16 )
port map ( a => p0(15 downto 0), b => p1(15 downto 0),
c => d0, ci => 0, sum => s00, carry => s01 );
csa1: DW01_csa
generic map ( width => 16 )
port map ( a => p2(15 downto 0), b => p3(15 downto 0),
c => d1, ci => 0, sum => s10, carry => s11 );

-- carry-save to binary conversion (final _adder)


add0: DW01_add
generic map ( width => 16 )
port map ( A => s00, B => s01, CI => 0, SUM => z0 );
add1: DW01_add
generic map ( width => 16 )
port map ( A => s10, B => s11, CI => 0, SUM => z1 );
end rtl;

March 2012 105 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Example 4-39 Writing Arithmetic Expressions

Verilog
module gd11_2 (a, b, c, d0, d1, z0, z1); //without instantiation of DW component
input [7:0] a, b, c;
input [15:0] d0, d1;
output [15:0] z0, z1;

//single datapath with:


//automatic sum-of-products
//implicit usage of carry-save internally

assign z0 = a * b + d0;
assign z1 = a * c + d1;
endmodule

VHDL
library ieee;
use ieee.numeric_std.all;
entity gd11_2 is -- without instantiation of DW component
port ( a, b, c : in unsigned (7 downto 0);
d0, d1 : in unsigned (15 downto 0);
z0, z1 : out unsigned (15 downto 0) );
end gd11_2;
architecture rtl of gd11_2 is
begin
-- single datapath with:
-- automatic sum-of-products
-- implicit usage of carry-save internally
z0 <= a * b + d0;
z1 <= a * c + d1;
end rtl;

Table 4-3 Component Instantiation Results

Slack (ps) Area (lib units)


Instantiating arithmetic DesignWare components -1635 8676
Writing Arithmetic Expressions -1504 8655

March 2012 106 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Arithmetically Complementing an Operand

Tip
Guideline: Complement operands arithmetically using the - operator, as shown in
Example 4-40.

These operands can be extracted easily from a larger datapath and has a positive impact on
QoS.

Do not manually negate operands because this prevents Sum-of-Products (SoP) extraction
and has a negative impact on QoS, as shown in Example 4-41.

Example 4-40 Arithmetically Complementing Operands

Verilog
module gd13_2 (a, b, c, sign, z); //good QoS

input signed [15:0] a, b;


input signed [31:0] c;
input sign;

output signed [31:0] z;

wire signed [16:0] a_int;

//complement multiplier instead of product


assign a_int = sign ? -a : a;
assign z = a_int * b + c;
endmodule

VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity gd13_2 is -- good QoS
port ( a, b : in signed (15 downto 0);
c : in signed (31 downto 0);
sign : in std_logic;
z : out signed (31 downto 0) );
end gd13_2;
architecture rtl of gd13_2 is
signal a_int : signed (16 downto 0);
signal z_extra : signed (32 downto 0);
begin
-- complement multiplier instead of product
a_int <= -resize(a,17) when (sign = 1) else resize(a,17);
z_extra <= a_int * b + c;
z <= z_extra (31 downto 0);
end rtl;

March 2012 107 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler
Datapath Coding Styles

Example 4-41 Manually Complementing Operands Prevents SoP Extraction

Verilog
module gd13_1 (a, b, c, sign, z); //poor QoS

input signed [15:0] a, b;


input signed [31:0] c;
input sign;

output signed [31:0] z;


wire signed [31:0] p;

//manual complement prevents sop extraction


assign p = a * b;
assign z = (sign ? ~p : p) + $signed ({1b0, sign}) + c;
endmodule

VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity gd13_1 is -- poor QoS
port ( a, b : in signed (15 downto 0);
c : in signed (31 downto 0);
sign : in std_logic;
z : out signed (31 downto 0) );
end gd13_1;
architecture rtl of gd13_1 is
signal p, px : signed (31 downto 0);
constant CONSTVAL1 : signed (1 downto 0) := "01";
constant CONSTVAL2 : signed (1 downto 0) := "00";
begin
p <= a * b;
px <= (not p) when (sign = 1) else p;
z <= px + (0 & sign) + c;
end rtl;

Table 4-4 Complementing an Operand Results

Slack (ps) Area (lib units)


Manually complementing operands -3246 34133
Arithmetically complementing operands -3091 30833

March 2012 108 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

Index
A applying 85
coding styles 80
applying carrysave arithmetic 85 column number 59
architecture Commands
no_value 57 report datapath 50
architectures synthesize 58
selection 44 synthesize -to_generic 46
context-driven 44 synthesize -to_generic -effort high 27
dynamic generation 45 synthesize -to_mapped 51
sub-architecture 47 comparator 36
target library-based 44 complementing
timing-driven 45 operands 107
area 59 components
arithmetic instantiated 104
operators 81 concatenation 90
signed 88 constant multiplication 86
Attributes context-driven architecture selection 44
dp_csa 34
dp_postmap_downsize 48
dp_sharing 27 D
dp_speculation_ 30
hdl_track_filename_row_col 53, 59 datapath
speed_grade 46 coding styles 80
sub_arch 47 functions 64
user_speed_grade 46 modules 84
user_sub_arch 47 report
architecture 58
area 59
B file name, line and column
number 59
bit width growth operator type and signed type 58
addition 101 width of input and output
multiplication 101 operands 58
subtraction 101 datapath functions
bit width rules 101 $abs 66
booth and non-booth 47 $blend 67
$carrysave 69
$intround 71
C $lead0 72
$lead1 73
Carry-Save Adder (CSA) 31 $rotatel 74
transformations 31 $rotater 75
non-inferred, gate-level netlist 41 $round 76
non-inferred, instantiated 40 DesignWare components
preventing 40 instantiated 104
Carry-save Arithmetic (CSA)

March 2012 109 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

E M
-effort_high manual
enabling CSA transformations 31 signal extension 95
enabling sharing transformations 27, mixing expressions 94
30 modules 84
elaboration multiple-fanout 37
design stage 61

N
F
no_value architecture 57
functions
datapath primitives 64
parameter 77 O
operand
G complementing 107
input and output 58
gate-level netlist operators
importing 80 HDL 24
square 97
supported 24
H
HDL P
supported operators 24
hierarchy parameter functions 77
inferring operators 81 part-select 90
pragma
sub_arch 47
I primitive functionality 65
Product-of-Sum (PoS)
importing the gate-level netlist 80 scenarios 36
inferring
a square 97
constant multiplier 86 Q
datapath modules 84
instantiated components 104 quality of results
improving 48

K
R
keeping operators in the same
hierarchy 81 report datapath
elaboration design stage 61
synthesize -to_generic design stage 61
L synthesize -to_mapped design
stage 61
line number 59 RTL

March 2012 110 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

start at 80
rules
U
self-determined bit width 101 unsigned and signed datapath
mixing 94
upper-bit truncation 98
S
scenarios
CSA transformations 34
V
comparator 36 Verilog
CSA over a multiplexer 38 preprocessor (VPP)
CSA over an invertor 38 directives 109
multiple-fanout 37 VHDL
product-of-sum (CSA) 36 signed and unsigned types 96
truncated 39 using numeric packages 96
self-determined bit width 101
sharing 25
transformations 27
signal extension 95
W
signed width
arithmetic 88 arithmetic expressions 101
part-select 90 input and output operands 58
speculation 28
transformations 30
speed grades
very_slow and slow 46
square
inferring 97
sub-architecture selection 46
sub-architectures
booth and non_booth 46
supported
HDL operators 24
synthesize -to_mapped 58

T
target library-based architecture
selection 44
timing-driven
architecture selection 45
transformations
CSA 31
sharing 27, 30
truncation
upper-bit 98

March 2012 111 Product Version 11.2


Datapath Synthesis in Encounter RTL Compiler

March 2012 112 Product Version 11.2

Vous aimerez peut-être aussi