Académique Documents
Professionnel Documents
Culture Documents
3, MARCH 2014
691
12.00
10.00
8.00
6.00
4.00
400
200
100
50
2.00
252.00
227.00
56.64
28.32
7.08
3.54
14.16
12
113.29
0.00
25
Cell Input
Slew (AU)
DelayDegradation
I. I NTRODUCTION
1063-8210 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
692
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 3, MARCH 2014
TABLE I
VARIOUS S CENARIOS OF A SYMMETRIC A GING IN P OWER -M ANAGED SoC
Power Managed
Mode
Transistor Stress
Details
Voltage domain
Different stress
voltage
Standby/ power
gate
DVFS
Clock gating
Half-cycle paths
Impact to Transistor
After a Sufficiently
Long Time
Fresh
Aged
Fresh
Aged
Fresh
Aged
Clock A
Clock B
Significant difference
in delay degradations
(10% skew)
Additional 2 delay
degradation at lower
OPP
Clock C
Fig. 2.
Schematics and degradation maps of (Clock A) free running,
(Clock B) gated-high, and (Clock C) a gated-low clock along with fresh/aged
waveforms for each case.
Rising Edge
Falling Edge
Average
0
100% Gated
Low
~70% Gated
Low
~ Free
Running
~ Free
Running
~70% Gated
High
100% Gated
High
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 3, MARCH 2014
693
the cell delays (now separated as rise and fall) to rise and fall NBTI
parameters, as in (2) and (3)
As discussed, for all practical purposes, gated clocks are the ones
that are most affected by asymmetric aging. Notably, for a buffer
based clock tree:
1) a clock gated-low or power-gated domain driving an active
(ungated or gated-high, but enabled) domain can experience
hold failure as the rising edge of the capture clock ages while
the launchs rising edge remains essentially unaged;
2) an active domain driving an inactive domain can experience
setup failure as the rising edge of the launch clock degrades,
whereas the rising edge of the capture clock remains unaged;
3) the falling edge logic has similar relationships, i.e., falling edge
launch risks setup and falling edge capture carries hold risk.
Theoretically, multitudes of aging combinations are possible
between two interacting clocks. In practice, however, a simple guidance can be prepared for limited cases for conventional STA-based
handling.
B. SSTA-Based Problem Formulation and Desensitization
As a next step, we now share exploiting the SSTA infrastructure
for asymmetric aging analysis and, furthermore, desensitization of
the design to extreme NBTI degradation.
1) Basic Formulation: In a simplistic manner, the basic formulations of SSTA can be extended to incorporate transistor degradation
by assigning a unit threshold voltage shift to the pMOS transistors and
computing the sensitivities of the delay to this shift, as in (1). Here, d
is the edge delay (rising/falling), vari is the regular statistical variation
parameter, and ki is the paths sensitivity to that. For example, kn
indicates path-delays sensitivity to threshold voltage shift due to
NBTI
(1)
d = d0 + k1 var 1 + kn Vth,NBTI.
The above method can easily be extended to the case where different
transistors within a circuit get assigned with different threshold
shifts based on the circuit-level activity. Importantly, while doing
the statistical timing at the block level, the tool can readily print out
the cumulative sensitivity of the entire path [7]. The slack sensitivity
of the path to the NBTI parameter can be used to identify and fix
NBTI-sensitive paths. The changes in the operating conditions of the
circuit (due to voltage/temperature) can also be accounted for through
an equivalent-time computation model, which computes a stress time
under reference conditions, such that the degradation induced is the
same as that of the original stress applicable for the original stress
time [2].
2) Asymmetric Aging Analysis: The above formulation, however,
cannot be employed for asymmetric aging, wherein different circuit
elements undergo different degradation. This brings up a need to
differentiate NBTI impact to rise and fall delays of a cell when it
is in different aging states. Therefore, we extend the above method
to assign two variables for individually capturing the sensitivity of
(2)
(3)
It can be noted that krn and kfn are, respectively, equal to the
NBTI sensitivity parameter as obtained through (1). Note that, for
asymmetric aging analysis, NBTI_rise/fall parameters always take
complementary valueshigh or zero. For example, a standard buffer
in the clock tree with gated-high stress should be set with the
NBTI_rise parameter as high, and, conversely, for a buffer in a gatedlow clock tree. The next step is to clusterize the circuit so as to
group the identically degrading instances together. As an example,
unless an inverting logic exists between two clock gates, the entire
clock sub-tree between them degrades alikeonly one of the edges
will degrade. On the other hand, if an inverting logic does exist, the
cluster needs to be distributed into two small clustersfrom first
clock gate to the inverting logic, and from the inverting logic to the
second clock gate.
A sample application of above method is shown through a fairly
complex timing circuit of Fig. 4. The circuit has been distributed
into clusters that degrade in the same manner. An SSTA run with
this definition of clusters and parameter assignment provides slack
results in the parameterized form.
A path-based analysis can then be employed to determine the worst
case assignments of NBTI parameter values that minimizes the slack,
as shown in Table II.
The key advantage of using an SSTA formulation is that it
facilitates the option of a bounded graph-based analysis (apart from
parameter sensitivities being an analysis byproduct), thereby also
ensuring exhaustive coverage of all paths. Furthermore, Table III
compares the results as obtained from a regular aging analysis versus
the approach outlined above.
We note that, for a 1 frequency degradation estimate through
regular aging analysis, up to 3 frequency degradation can happen in
the asymmetric aging case. The linear relationship between insertion
delay and frequency degradation can also be seen. Summing up,
Fig. 5 highlights the overall analysis flow.
3) Results From Production Design: We now present the results
from using the above methods on several advanced power-managed
designs. As can be seen from the Table IV, application of such a
method results in several new violations as compared to the regular
timing. A detailed analysis of the violations and the clock tree was
separately done to realize the nature of violations and the exact aging
of certain portion of the clock trees.
For one such design, a pre- and post-BI shmoo is plotted in Fig. 6,
which shows a post-BI Vmin wall. Indeed, detailed analysis reveals
the impact of asymmetric aging during BI on the failing paths.
While, typically, Vmin walls are associated with hold failures, in
this particular instance peculiarly, asymmetric aging creates a case
of setup failure doing this. A detailed study revealed the case of a
694
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 3, MARCH 2014
TABLE II
VARIOUS C OMBINATIONS OF A GE A SSIGNMENTS FOR O BTAINING P RACTICALLY W ORST P OSSIBLE A GING
Cluster 1
R1
F1
0
0
1
1
1
0
1
1
1
1
1
1
1
1
Cluster 2
R2
F2
0
0
1
1
X
X
1
0
0
1
0
1
1
0
Cluster 3
R3
F3
0
0
1
1
X
X
1
0
X
X
0
1
X
X
Cluster 4
R4
F4
0
0
1
1
X
X
0
1
X
X
1
0
X
X
Comment
All clusters at time zero (no aging)
No gating enabled free running clock
Invalid as R1 and F1 are fully correlated
TABLE III
H IGHLIGHTING THE E XTENT OF A SYMMETRIC -AGING -I NDUCED F REQUENCY D EGRADATION
Fig. 5.
Flops 1, 2 interact
[t = 0 500 ps]
Flops 2, 3 interact
[t = 0 1000 ps]
Fig. 6. Pre- and post-BI shmoo plots for a production design, highlighting
asymmetric-aging-induced setup failure and inadequacy of existing method.
TABLE IV
R ESULTS FROM P RODUCTION D ESIGNS
Design details
(28 nm)
Design A
Design B
Design C
Violations with
asymmetric aging
938
631
958
on one of the 28-nm design blocks for two aging cases: (a) free
running and (b) asymmetrically aged while being gated at logic 1.
Clearly, in such a case, large PW degradations can be observed for
the gated clock.
In reality, for a production design, convolved effects of PW
requirement as well as degradation can happen. As an instance,
Fig. 8 shows the shmoo data (post-BI) for a device under two
test conditions, while the pre-BI shmoo data for both the patterns
remained standard with linear slope.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 3, MARCH 2014
(a)
(b)
Fig. 8. Post-BI shmoo data for a production design with two test patterns.
(a) With high activity. (b) With low activity.
695
to effect similar degradation on both the edges. Needless to say, halfcycle paths and inverter-based clock trees will need further special
considerations, and various implementations are possible to realize
the above intents.
VIII. C ONCLUSION
In this paper, a detailed introduction to asymmetric NBTI aging
was given with special focus on clock skew and pulse width,
along with aspects of BI. We noted that various power management
techniques could be detrimental from the NBTI perspective. Analysis
and management techniques for the problem were shared, including
conventional STA procedures, and a detailed problem formulation
and desensitization scheme in SSTA framework was presented.
Additionally, data from production designs was shared to highlight
the inadequacy of existing analysis techniques, resulting in failures.
ACKNOWLEDGMENT
The authors would like to thank R. Venkatraman and P. Rana
(and the respective design teams) for helping with shmoo data from
different products.
R EFERENCES