Vous êtes sur la page 1sur 52

Physical Synthesis 2.

Andrew B. Kahng
UCSD CSE and ECE Departments

abk@ucsd.edu
http://vlsicad.ucsd.edu

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 1


[UCSD
Concept: Design Principles ECE 260B
CSE 241A]

Partition the problem divide and conquer, hierarchy


Different abstraction levels: RT-level, gate-level, switch-level,
transistor-level

Orthogonalize concerns
Function vs. implementation
Logic vs. timing vs. embedding

Solve chicken-egg conundrums


Constrain the design space to simplify the design
process
Balance between design complexity and performance
E.g., standard-cell methodology
freedom from choice

ECE 260B CSE 241A Intro and ASIC Flow 2 Andrew B. Kahng, UCSD
[UCSD
Concept: How the IC Design Flow is Evolving ECE 260B
CSE 241A]

Flow expands in two directions Architecture Design


System-Level Design
Design for Manufacturability (DFM)
High Level Synthesis
More design care-abouts
Area, Timing, Power, Signal Integrity, RTL
Reliability, Cost
Key challenges: loops, chicken-egg Logic
Verification

Synthesis
Design closure through tight
integrations Gate Netlist
RTL, GDSII signoffs = business
structure of semiconductor creation FP, Place, CTS, Opt
Extraction,
One-pass flow: required for Updated Gate Netlist
Timing, Physical
Verification
Productivity, requires Predictability
By Guardbands? Routing

By Unifications?
By Statistics? GDSII
By Methodology (to avoid issues)?
Manufacturing

ECE 260B CSE 241A Intro and ASIC Flow 3 Andrew B. Kahng, UCSD
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges / Stressors
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
New Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 4


Logic Design Needs Spatial Information
High aspect ratio floorplan: shift one macro block from left to
right, and vary its shape (with constant area)
10% power range (post-route): center location, taller blockage
= more power, more contribution of wire (delays)
Separation of logical, temporal, spatial must crumble

230
225
220
Power (mW) 215
Shift the location of blockage 210
205
Macro size 200
260m x 65m 195
184m x 92m 190
0% 25% 50% 75% 100%

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 5


How Do We Predict Spatial Information ?
Predict by modeling
Machine learning, regression, etc.
(Dont dismiss this!)
[SLIP15] http://vlsicad.ucsd.edu/Publications/Conferences/325/c325.pdf
[DAC00] http://vlsicad.ucsd.edu/Publications/Conferences/112/c112.pdf
[DATE13] http://vlsicad.ucsd.edu/Publications/Conferences/296/c296.pdf
[SLIP13] http://vlsicad.ucsd.edu/Publications/Conferences/300/c300.pdf

Predict by assuming and enforcing


Make a prediction, then make the prediction come true
(Constant-delay methodology)

Predict by doing
Constructive prediction
(Run under the hood quick and dirty, else no leverage)
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 6
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
New Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 7


Synthesis vs. Physical Synthesis
Synthesis (DC, RC)
Elaboration, mapping to generic gates
Clock gating
Apply timing constraints, remap / optimize
Multibit FF optimization
MBIST insertion
Scan chain stitching
Further optimization, area recovery
Physical Synthesis (DCT/DCG, RCP)
LEF list
Tech file, map file
tluplus_{max,min}
floorplan DEF
{min,max}_routing_layer
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 8
Physical Synthesis
In
RTL + SDC + Library models + Floorplan DEF

Out
Better netlist (usually), at one (worst) corner
Better netlist (usually) + placed DEF (not legalized)
N.B.: very fast TAT required by customers

Netlist (+ placed DEF) is passed to P&R + signoff


Place, placeOpt, CTS, CTSOpt, route, routeOpt, leakage
recovery, timing closure
Different companies and tools in a long tool chain
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 9
Example

Physical Synthesis
RC tech file Libraries, LEF,
Floorplan information (tluplus,captable) tech files

Floorplan e.g., DCT


Specified by
(Physical
designers
Synthesis)
Netlist + initial
placement
physical Floorplan in DEF or P&R flow
information physical guidance

Routed Results

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 10


Note: P&R + Signoff is Complicated!
N. MacDonald, Broadcom Corp., Timing Closure in Deep
Submicron Designs, 2010 DAC Knowledge Center article
TOP-LEVEL NETLIST / SPEF
BLOCK-LEVEL NETLIST / SPEF

Static Timing Analysis for all Modes / Corners Timing Closed

About 5
iterations Breakdown of Timing Violations on per Block Basis

Manual Repair of Timing Failures

Operations Permitted at Each Iteration Violation Classes Addressed


(in order of preference) for Each Iteration (in order of priority)
(1) Vt Swap, Resizing, Buffer Insertion, (1) Electrical Rule Violations
NDR Changes, Useful Skew
(2) Vt Swap, Resizing, Buffer Insertion, (2) Noise Violations
NDR Changes (3) Setup Violations
(3) Vt Swap, Resizing, Buffer Insertion
(4) Hold Violations
(4) Vt Swap, Resizing
(5) Vt Swap A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 11
[DAC15]
Since That Article Was Written:
90nm 65nm 45/40nm 28nm 20nm 16/14nm 10nm 7nm
Temp Maxtrans DynamicIR Multi MOL,BEOLR
inversion patterning
PBA Fixedmargin MIS
Noise spec
EM CellPOCV
MCMM Physaware
AOCV/ Min
POCV timingECO implant
LVF

BTI BEOL,MOLvariations

SignoffcriteriawithAVS

SOCcomplexity

Filleffects

Layoutrules

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 12


How Can Physical Synthesis Possibly Work?
If it sounds too good to be true, it usually is

What do we do with constraints at (physical) synthesis


stage?
Overconstrain the clock period in synthesis (was by 20%, now by
~10%)
Utilization: 60% target in synthesis (sometimes 50%, 55%)
85+% post-placement
Which detailed placer, CTS tool, router, optimizer?
Complex tool sensitivities (noisy, chaotic behavior)
Information that is ignored (advanced manufacturing)
Information that is never available (CTS, SI)

What explains success? Guardbands, low expectations?


Designers preoccupation with area and schedule helps
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 13
Challenges
FinFET, BEOL scaling effects
Drive
Resistivity
Gate-wire balance
Clock effects
Skew across corners
Top-level clock distribution (CGCs, muxes, dividers, )
Useful skews = area vs. delay tradeoffs
Extreme localization effects
Advanced (multi-)patterning
Pin access, congestion, coupling
Breakdown of placement-optimization separation

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 14


Questions
If Logic Synthesis cant know outcomes at end of
Physical Design, can it be doing the right thing?
(Simple information arguments) (What margin is left on the table? Are
we seeing placebo effects (association vs. causation etc.)?)

Can Logic Synthesis be made better aware of


future Physical Design outcomes?

Is Logic Synthesis at risk of being eclipsed by


Physical Design? (Venus-Mars Sun-Moon, etc.)
LS
LS

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 15


Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 16


FinFET: Current Density + Discreteness
Better electrostatic control + continued gate length scaling
Drive current cell height (e.g., 8.25T), better area density (w/ fin height )
Effective width 1.6x equivalent area with planar devices
Current density , plus fin discreteness challenges
Multi-Fin 3D FinFET

http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtbfinfetjan2013.aspx
MetalVIA1
(M1 M2)
VIA0(MOLx M1)
NWell 1Pfin
Poly
3Pfin
Fin
Active 2Pfin
M1
M2
3Pfin
1Pfin
MOL1
MOL2
4Ppoly http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtbfinfet
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 17
processsoc2015q1.aspx
FinFET: Aggressive Voltage Scaling
FinFET enables voltage scaling for reduced dynamic
power
Better electrostatic control better performance at low supply
voltage
High-performance mode: wire-dominated
Low-performance mode: gate-dominated

C.H.Lin,VLSITSA,2012,p.12.
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 18
[DAC15]
Gate-Wire Balancing
Unbalanced gate-wire delay causes severe delay variation
on data and clock paths across modes
Delay variation in clock paths == skew variation
Increased difficulty for timing closure (ping-pong effect)
Minimization of skew variation is important for timing closure
(Our work at DAC15 uses global-local optimization achieves 22% skew variation reduction)

Skew = -0.1/+0.2
Clocklatency
datapath Corner Skew
Launch Capture
SS, 0.7V,25C 1.0 1.1 0.1
1.0 /0.7 1.1
/0.7 FF, 1.1V,25C 0.9 0.7 +0.2
Low voltage: gate delay dominates
launch path capture path High voltage: wire delay dominates
Skew reversal
Power/area overheads
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 19
FinFET: Less Body Effect, Richer Libraries?
FinFET 4-input NAND ~ planar bulk 3-input NAND
More complex cells / higher fan-in cells could be
made available to synthesis

w/ body effect

Numberoffaninlimitedbybodyeffect
BulkFinFETs:Fundamentals,Modeling,andApplication,JongHo
Lee,SNU

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 20


[DAC15]
Pin Accessibility Below 20nm
Routing challenged by complex rules for multi-patterning
Insertedvia Blockedbythevia

< MinOverlap

< MinSpacing
metalpitch<viapitch
Limited pin access with small track cells
Widerpowerrail
Wider power rail
for reliable connection
M2
fewer pin access points V1
M1
Complex design rules Poly
Fin
+ less pin access
Pin accessibility problem
Access
Difficulty
conflict in routingarea reduction
between point
and routability
9TNAND2
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 21
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos [ISQED02] http://vlsicad.ucsd.edu/Publications/Conferences/131/c131.pdf
[iSQED10] http://vlsicad.ucsd.edu/Publications/Conferences/267/c267.pdf
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
New Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 22


Slack vs. Layout Context
Layout knobs: SRAM pitches and buffer keepout distances
Post-P&R slacks of five embedded memories is chaotic
Physical synthesis challenge: Logic optimization given chaos
Buffer keepouts

1 0.7 Delta slack > 300ps

WNSofpathsthroughSRAMs(ns)
2
0.8
3 Blockage
sram_pitch
4 0.9
slack1
5
1 slack2
Placement region for
1.1 slack3
standard cells slack4
1.2
slack5
1.3
Blockage Blockage 0 10 20 30
SRAMpitch(um)

Testcase: Logic from OpenCores GPU THEIA + SRAMs


A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 23
[SLIP15]
Slack vs. Clock Period
path slack is 81ps at signoff clock period of 1.0ns
Changing clock period to 0.82ns changes path
slack to 143ps!

0.15
143psattighter
MaxDeltaPathSlack(SI nonSI)(ns)

0.14 clockperiod
0.13

0.12
81psatsignoff
clockperiod
0.11

0.1

0.09

0.08

0.07

0.06
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
1.19
1.20
1.21
1.22
1.23
1.24
1.25
1.26
1.27
1.28
1.29
1.30
Clockperiod(ns) 24
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
[SLIP15]
Non-SI vs. SI
Top-1000 critical paths from Viterbi design (clock period = 1.0ns)
Slack diverges by 81ps !!! ~4 stages of logic at 28nm FDSOI
Unfortunately, we dont know coupling before routing !!!
PathSlackinNonSIMode(ns)

Ideal correlation

81ps

PathslackinSIMode(ns)
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 25
[DAC15]
WLM, RC (Interconnect proxy) Effects

23
22.8
22.6
3DICPower(mW)

22.4
22.2
22
21.8
21.6 1.35mW
21.4 (6.43%)
21.2
21
20.8
0 0.2 0.4 0.6 0.8 1 1.2
WLMCap(pF)
Example: SOCE-based Shrunk2D (S2D) flow [1]
Perform synthesis with different WLM caps, P&R with S2D flow
Shown: total power (#buffers, #instances, instance area, WL,
similar)
[1]Panth etal.,DesignandCADMethodologiesforLowPowerGateLevelMonolithic3DICs,Proc.ISLPED,2014,pp.171176.
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 26
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 27


[SLIP13]
Sensitivity of CTS Outcomes to Layout Contexts

800 BL BLM B RBM R


700
600 R
Falldelay(ps)

500
400
RBM
300
BL BLM B
200
100
0
0.1

0.125

0.250

0.33

0.4

0.5

1.0

2.0

2.5

3.0

4.0

8.00

10.00
Coreaspectratio

Delay varies by up to 43% with clock entry point locations


Delay varies by up to 45% with core aspect ratio
NDRs, fill, buffer sizes, max fanout / max trans rules,
100ps impacts on insertion delays, skew, slacks
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 28
[ISQED14]
Useful Skew Improves Timing
Useful skew optimization adjusts clock sink latencies to
improve timing
Our predictive useful skew flow resolves the chicken-and-egg
loop further improved timing

Zero skew
10/0
-1000 -893
Useful skew

Total negative slack


FF1 FF2 FF3 -800
7/3 7/3 improves timing
-600
Clock 5 5 5 -400
-197
Useful skew 10/2 -200
-60
0
FF1 FF2 FF3 Zero skew Typical Predictive
7/2 7/2 useful skew useful skew

Clock 6 testcases {3 RTLs x 2 clock periods}


7 6 5
Delay/Slack Clock latency 29
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Conventional Useful Skew Optimization
Standard useful skew flow has chicken-egg problem
Netlist and placement Useful skew optimization
assume zero skew relies on placement
One solution: Back-annotation flows (large runtime)
RTL netlist

Synthesis
Back annotation
Placement / Place Opt.
Wangetal.inDAC06 proposetoback
annotateusefulskewfrompost
CTS placementtobeforesynthesis

CTS Opt. Skew_opt

Routing / Route Opt.


A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 30
NOLO: No-Loop Useful Skew Optimization
Our work: Cure the chicken-egg problem with delay prediction

RTL netlist

Synthesis w/ Multi-Vt Synthesis w/ LVT

Predictive Useful Skew LVT-only netlist

Placement/Place Opt. Use setup slacks from LVT-only synthesis


estimation of achievable slacks
Use hold slacks from multi-VT synthesis
CTS/CTS Opt.
reduce pessimism
Advantage: One-pass approach, not
Routing/Route Opt. constrained by placement

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 31


Experimental Results
Predictive flow achieve similar or better timing and much
smaller runtime
200 160

Runtime (min)
Runtime (min)

150 120

100 80

50 40
aes_cipher des_perf
0 0
-6 -5 -4 -3 -7 -6 -5 -4 -3
TNS (ns) TNS (ns)
1600 200
Runtime (min)

Runtime (min)
1200 150

800 100

400 50

0
jpeg_encoder 0
mpeg2
-25 -20 -15 -10 -9 -8 -7 -6
TNS (ns) TNS (ns)

Back annotation (BA) Prediction (w/o LVT-only syn)


Prediction (w/ LVT-only syn) Average ofA.various BA flows
B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 32
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 33


BEOL Multi-Patterning Impacts

Mandrel Spacer Mx metal


Line-end cuts
Mwidth Wire1width = Mwidth Line-end extensions
Swidth Floating fill wires
Mspace Wire2width = Mspace 2*Swidth

Mandrel

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 34


[ICCAD15]
Placement-Sizing Interference
New interferences between post-layout optimization
and P&R
Rules for device layers (FEOL) become considerably
more complex and restrictive
Minimum implant width rules for implant region
Minimum notch and jog width rule for oxide diffusion (OD)

OD

HVT LVT HVT LVT HVT

HVT LVT HVT


Cellboundary

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 35


[ICCAD15]
Placement-Sizing Interference (cont.)
Drain-to-drain abutment (DDA)
Poly
D D D S Activeregion
Cellboundary
D S Connection
Power/ground

Example solution
Minimplantwidth
DDA violation
Intertwinethe
violation historically
Minjog/notchwidth
separatetasksof
violation P&Randpost
routeoptimization
Minimplantwidth
violation

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 36


Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 37


[ISQED14]
I. Flexible Timing Models
Setup time, hold time and clock-to-q hold
(c2q) delay of FF setupholdc2q
flexiblemodel
values interdependent, but NOT fixed
Flexible FF timing model can exploit c2q1

operating (function/test) modes

...
setupholdc2q c2qn
Free pessimism reduction in STA fixedmodel

Goal: Find best {setup, hold, c2q} for each FF instance


Sequential LP: C2qsetupholdsurface c2q
setup-c2q opt
c2q
hold-c2q opt setup hold setup
c2q

hold
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 38
Flexible Timing Model Recover Margin
Independent datapaths in PBA: using fixed FF timing
model loses performance optimization opportunity

setup:10ps c2q:20ps
FF1
480ps
470ps 470ps
Total:500ps Total:500ps
460ps

setup:10ps FF3 460ps FF2 c2q:10ps


20ps 20ps
c2q:20ps 480ps setup:20ps
10ps 10ps
Total:500ps 500ps!
520ps?

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 39


Improved Timing Signoff Flow
Netlist (andSPEF,ifrouted)
Takeaways
Fix timing violations for free
Extractpathtiminginformation
48ps average improvement of
slack over 5 designs in a
LPformulation foundry 65nm technology
withflexibleflipfloptimingmodel

Next
SolveSequentialLP Better exploitation of disjoint
(STA_FTmax ,STA_FTmin) cycles/modes
Solution More accurate modeling of
Annotatenewtimingmodel setup-hold-c2q tradeoff
foreachflipflop Circuit optimization should
natively exploit FF timing model
flexibility
Timingsignoffwithannotatedtiming

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 40


[DATE13]
II. Signoff Definition (e.g., with AVS, Aging)
VBTI:VoltageforBTIagingestimation
Vlib :Supplyvoltagefortiminglibrarycharacterization
Vfinal:Vdd ofacircuitwithAVSatendoflifetime
VBTI |Vt| Circuit
Derated
implementation
Vlib library
and signoff
Circuitimplementation
dependsonVBTI andVlib
VBTI andVlib
? dependon
agingduring Chicken& Vfinal
depends
AVS(Vfinal) EggLoop oncircuit

Vfinal BTI degradation


and AVS circuit

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 41


Observations and Heuristics
Observation#1:Vfinal isnotsensitive Observation#2:Vt withaconstantVfinal
tocellsalongthetimingcriticalpath throughoutlifetimeadaptiveVdd

Heuristic #1: Use average of


Heuristic#2:approximate
critical path replicas to
estimate Vfinal (Vheur)
Vdd inAVSbyconstantVheur

SolveChicken&EggLoopbyhavingVBTI =Vlib =VheurVfinal

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 42


Knee Point for Signoff Definition
Optimisticaginglibrary
largepowerpenalty
IgnoreAVS largerarea
Low Vlib High Vlib
Low Slowercircuit Fastercircuit
VBTI Lessaging Lessaging
High Slowercircuit Fastercircuit
VBTI Moreaging Moreaging

Overlypessimisticaginglibrary
largeareapenalty

OurmethodfindsKneepointfor
balancedareaandpowertradeoff
Experimentsetup:
DC/ACBTI@125C
32nmPTMtechnology
4benchmarkcircuitimplementations
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 43
Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 44


[ICCAD15]
Mixed Cell Height Implementation (!)
Large cell height better timing, but large area and power
Small cell height smaller area/power per gate, but large delay
and more #buffers
Mixing cell height enables tradeoffs between performance and
area/power (recall FinFET introduction!) better design QoR
E.g., use large-height high-fanin cells to improve pin accessibility
Already have flop trays, etc. as problematic multi-height instances

Technology: 28nm LP
In red are 12T cells = larger area, smaller delay
In blue are 8T cells = smaller area, larger delay

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 45


Cost of Mixing Cell Heights
Breaker cells are required to align regions with different cell heights
Optimization must comprehend corresponding area cost

X directional shift
four sites
8T Cell 12T Cell
Y directional shift
one M2 pitch
64nm 48nm
64nm
Assume: M2 pitch = 64nm

12T Cell 12T Cell


Cell boundary
P/G rail

No routing blockage Routing blockage on M1/M2


A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 46
Optimization Flow

Synthesis Initial placement uses modified LEF


enable optimization with a
Initial placement conventional flow
Slicing-based partition with DP to
Partitioning
divide die area into regions with
different cell heights
Legalization
Internal-timer guided placement
legalization
Floorplan Update
Floorplan update with breaker cell
penalty
Cell mapping
Row-based cell mapping places cells
Routing / RoutOpt onto rows with corresponding heights

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 47


Example of Optimization Flow

Initial placement Partitioning Legalization


(8T/12T cells are freely placed) (Yellow blocks = regions)

Technology: 28nm LP
Design: AES
8T cells are in blue
12T cells are in red

Mixed-height placement New floorplan


A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 48
Benefits from Mixing Cell Heights
Technology: 28nm LP (12T/8T) Design: AES
25% area reduction as compared to 12T-only design
20% performance improvement compared to 8T-only design

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 49


Outline
Why Physical Synthesis
Physical Synthesis 1.0
Example Challenges
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
Physical Synthesis 2.0 ?

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 50


Physical Synthesis 2.0
Its the predictability! (and, prediction is challenged)
New devices and patterning technologies
Complex PD tool chain; chaotic behavior of tools and flows
Oblivious to clocks, corners, coupling how can Physical
Synthesis be doing the right thing? (= target for margin recovery!)

LS
LS

What will Physical Synthesis 2.0 look like?


(1) Higher-level value: what Physical Design cannot do
Datapath architecture selection
Resource sharing
Mux mapping
(2) Other types of prediction (machine learning, big data, etc.) !
(3) Constructive prediction deeper into implementation flow
(More integration ) Clock and MCMM awareness
Hyperlocality awareness: coloring, congestion, coupling, interactions
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 51
THANK YOU !

A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 52