Vous êtes sur la page 1sur 12

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

Sense-Amplifier-Based Flip-Flop With Transition


Completion Detection for Low-Voltage Operation
Hanwool Jeong, Tae Woo Oh, Seung Chul Song, and Seong-Ook Jung , Senior Member, IEEE

Abstract— A novel high-speed and highly reliable sense- VDD decreases, leading to a large variation in gate delay.
amplifier-based flip-flop with transition completion detection As a result, the setup time, tsetup , in master–slave-based edge-
(SAFF-TCD) is proposed for low supply voltage (VDD ) operation. triggered FFs [3]–[6], which is determined by the worst case
The SAFF-TCD adopts the internally generated detection signal
to indicate the completion of sense-amplifier stage transition. The variation, is significantly increased [7]. In the pulse-triggered
detection signal gates the pull-down path of the sense-amplifier FFs proposed in [8]–[11], this problem is resolved. Input
stage and the slave latch, thus overcoming the operational yield D of the pulse-triggered FFs starts to be sampled by the
degradation, current contention, and glitches of previous SAFFs. latch right after the clock rising edge, which results in near-
The operational yield, speed, hold time, energy consumption, zero or negative tsetup. However, these FFs suffer from conflict-
and area of the proposed and previous FFs are quantitatively
compared for a wide range of VDD with 22-nm FinFET tech- ing requirements for the width of the sampling window. A very
nology. It is shown that the minimum VDD of the SAFF-TCD small width cannot guarantee that the input data value properly
is 573 mV lower than that of previous SAFFs, which means propagates into the latch, whereas a large width increases
the SAFF-TCD can operate even when VDD is in the near- the hold time, thold . This so-called sizing problem becomes
threshold or subthreshold region. At 0.3–0.4 V, the SAFF-TCD more severe as variation effects increase in low VDD regions,
operates twice as fast as the master–slave-based FF (MSFF)
with a practical hold time. Even with these benefits, the energy because the pulsewidth required to reliably propagate the
consumption overhead is limited to less than 20% compared with input into the latch and thold are determined by the respective
that of MSFF, and the area is similar to that of previous SAFFs. worst variation corners. There are also approaches to achieve
Index Terms— Flip-flop (FF), low-voltage circuit design, sense low VDD operation of FFs by utilizing 28-nm fully depleted
amplifier (SA). silicon on insulator (FD-SOI) with back biasing [12], [13].
With the back biasing, circuit designers are allowed to control
I. I NTRODUCTION Vth dynamically, which enables to widen the operating voltage
range. Especially in [13], it is demonstrated that nonvolatile

D EMAND for an ultralow-power system on chip (SoC)


continues to increase because of the growing interest in
highly energy-constrained mobile SoC applications. In par-
FF based on magnetic tunnel junction can be operated with
near-Vth FD-SOI circuits with the use of multiple VDD values.
The sense-amplifier-based FF (SAFF) [14], which is
ticular, for applications where performance is of secondary composed of a differential SA stage followed by a slave
importance, one of the simplest and most efficient meth- element of NAND-based reset–set (RS) latch, is relatively
ods to improve energy consumption is to reduce the supply unencumbered by the aforementioned large tsetup and the
voltage (VDD) at the expense of speed loss. As part of sizing problem. For this reason, SAFF is regarded as an
this trend, digital circuit design techniques for subthreshold appropriate choice for low VDD applications. However, this
or near-threshold voltage operation have received increased conventional SAFF suffers from two major problems in
attention [1]. low VDD environments. First, the NAND-based slave latch
The flip-flop (FF) is a key element as most modern micro- operates slowly, because the Q delay depends on the load
processors operate under the synchronous pipeline structure. on /Q and vice versa. Second, the SA stage may latch
In low VDD regions, to minimize speed degradation, it is the wrong data because of the reduced voltage headroom
preferable to use a fine-grained pipeline with fewer combi- and transistor mismatch. In this paper, a novel SAFF with
national logics between FFs [2]. This means that the relative transition completion detection (SAFF-TCD) is proposed in
portion of the power dissipation and clock cycle time of FFs order to resolve these limitations at low VDD .
is significant. Thus, the design of low-power FFs with small
input (D) to output (Q) delay, tDQ , is essential.
In addition, the effect of process variation on the II. S ENSE -A MPLIFIER -BASED F LIP F LOPS
driving strength of a transistor dramatically increases as Fig. 1(a) and (b) shows the structure and operational
waveforms, respectively, of the conventional SAFF proposed
Manuscript received July 17, 2017; revised October 16, 2017; accepted
November 18, 2017. (Corresponding author: Seong-Ook Jung.) in [14]. The conventional SAFF is basically composed of two
H. Jeong, T. W. Oh, and S.-O. Jung are with the School of Electrical stages: an SA stage followed by a NAND2-based RS latch.
and Electronic Engineering, Yonsei University, Seoul 120-749, South Korea During the precharge phase, the clock signal (CLK)
(e-mail: sjung@yonsei.ac.kr).
S. C. Song is with Qualcomm Inc., San Diego, CA 92121 USA. becomes low. Accordingly, MN1 is turned OFF, and MP1
Digital Object Identifier 10.1109/TVLSI.2017.2777788 and MP4 are turned ON. As a result, the two output nodes
1063-8210 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 2. Structure of PowerPC FF.

OFF . As a result, Y falls, while X falls slower than Y, because


X is discharged through the MN4 –MN3 stacked path. Provided
that the drive strength of MN4 is sufficiently weak, the voltage
of Y is significantly lower than that of X. This voltage
difference between X and Y, VXY , means that the drain
current of MN6 is larger than that of MN5 . This causes /R to be
discharged more than /S, making VSG of MP2 larger than that
of MP3 , which further widens the gap between /S and /R. With
this positive feedback operation of MP2 –MP3 –MN5 –MN6 , /S
and /R finally become latched as VDD and VGND, respectively.
The purpose of always-turned-ON MN4 is to guarantee that,
when CLK is high, either /R or /S can always be driven
to VGND. For example, after /R becomes low in the SA stage
transition in response to high /D at the CLK rising edge,
/D can be changed to low during the high CLK phase, which
turns OFF MN3 . Even with this change, /R should be driven
to VGND to guarantee static operation. This is guaranteed
by MN4 , which forms a path from /R to VGND. This static
operation becomes more important in low VDD regions.
As a consequence of SA transition, either /R or /S becomes
low. Then, Q and /Q are driven by the slave latch according
to the state of /R and /S. For example, if /R changes to low
under the condition that the previous states of Q and /Q are
Fig. 1. (a) Structure and (b) operational waveforms of the conventional
SAFF.
high and low, respectively, /Q is first changed to high by G3 .
This results in both inputs of G2 being high and Q is changed
to low.
of the SA stage, /S and /R, are precharged to high, which One advantage of the SAFF is its near-zero tsetup , which
makes the RS slave latch hold the old state of Q and /Q. is attributed to the fact that D starts to be captured right after
In addition, the internal nodes in the SA stage, X, Y, and Z, the rising edge of CLK. In the widely used master–slave-based
are also precharged to intermediate voltage levels determined FFs (MSFFs), however, D is sampled before the rising edge
by the threshold voltages (Vth ) of MN2 , MN3 , MN4 , MN5 , of CLK, and tsetup is greater than that of SAFF. Fig. 2 shows
and MN6 . For example, in the ideal case in which there is no one example of an MSFF, PowerPC FF, which is proposed
Vth variation and the Vth values of MN2 , MN3 , MN4 , MN5 , in [4]. In this structure, to sample D correctly, the internal
and MN6 are all equal to Vth,nMOS, X, Y, and Z are precharged nodes of the master latch should be well developed according
to VDD − Vth,nMOS. to D before the rising edge of CLK. This means that D should
When the evaluation phase starts with CLK rising, the be stable sufficiently long before the clock rising edge, which
SA stage starts to transit by turning ON MN1 and turning results in a large tsetup. The overall performance of the FF is
OFF MP1 and MP4 . This causes Z to be discharged to the determined by the worst case tDQ , which is the sum of tsetup
ground voltage, VGND , by MN1 . If D is low and /D is high, this and the clock-to-Q delay, tCQ . In particular, at low VDD , the
discharged Z causes MN3 to turn ON, while MN2 stays turned tsetup portion of tDQ increases because of the large variation
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

JEONG et al.: SAFF-TCD FOR LOW-VOLTAGE OPERATION 3

Fig. 3. Structure of TGPL.

effects, as pointed out in [2]. This encourages the use of SAFFs


instead of MSFFs.
Pulse-triggered FFs [8]–[13] are considered as alternatives
to MSFFs, because they do not suffer from the large tsetup
problem. Fig. 3 shows one example of a pulse-triggered FF,
the transmission gate-based pulsed latch (TGPL). In this
structure, a short pulse is generated at the rising edge of
CLK, which is activated for TON . During TON , the FF becomes
transparent and samples D. Because D starts to be sampled
right after the rising edge, tsetup becomes negative or close to
zero. However, there are two conflicting requirements for TON
that complicate the design of a pulse generation scheme. If TON
is too short, D cannot propagate into FF. However, if TON is
too long, D should be stable for a sufficient length of time after
the rising edge, which means that thold increases significantly.
In particular, the variation of TON , which is determined by
the delay in the inverter chain (G11 , G12 , . . . , G1n in Fig. 3),
becomes significantly large in low VDD regions. Whether
D can propagate into FF properly is determined by the
shortest TON , whereas thold is determined by the longest TON .
Thus, at low VDD , the design for balancing TON under large
variation effect is extremely challenging. The SAFF is free
from this problem, because the sampling window closes as Fig. 4. Structures of the slave latch in (a) Nikolic’s SAFF, (b) Kim’s SAFF,
soon as data are latched in the SA stage. Thus, SAFF has and (c) Strollo’s SAFF.
small tsetup and thold at the same time, which makes it an
appropriate solution for low VDD .
However, there are two problems in using conventional delay of Q is independent of the /Q (Q) delay. The slave latch
SAFFs in low VDD regions. The first problem is the of Kim’s SAFF [16], shown in Fig. 4(b), also provides the
NAND 2-based RS slave latch, where the low-to-high transition pull-down path for Q (/Q), which is controlled independently
of /Q must occur first to enable the high-to-low transition of Q. of /Q (Q). However, Kim’s SAFF suffers from a glitch on
This dependence of the Q delay on the /Q delay degrades the the nodes Q and /Q. This occurs when the previous states
overall performance of the SAFF, especially when the load on of both Q and the sampled D are high. At the rising edge
Q or /Q is large and VDD is low. of CLK, Q must stay high in this case. However, because
To resolve this problem, various slave latches for the SAFF /S has not yet been discharged by the SA stage, the pull-
have been introduced, as shown in Fig. 4(a)–(c). The slave down paths of Q, MN7 and MN8 , are enabled. This results
latch of Nikolic’s SAFF [15] is shown in Fig. 4(a). In this in a discharge of Q until /S is fully discharged, leading to a
structure, R and S signals generated by two inverters are used glitch on the node Q. The amount of this glitch is significant,
to control the pull-down paths of Q and /Q, respectively. Thus, as discharging /S is a lengthy process. This is because MN5 in
the speed of the high-to-low transition of Q (/Q) is determined the discharging path of /S in the SA stage (MN5 –MN2 –MN1
by how fast R (S) rises in the SA stage, which means that the in Fig. 1) is driven by the temporary decrease in /R from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

VDD at the rising edge of CLK, as shown in Fig. 1(b).


In addition to this glitch problem, the contention current flows
through Q and /Q when they have changed from the previous
state because of the cross-coupled inverter latches (G1 –G4 ).
Consequently, tCQ and the static power consumption increase
simultaneously.
The slave latch of Strollo’s SAFF [17], which is shown
in Fig. 4(c), combines Kim’s design with the conventional
NAND 2-based slave latch to overcome the limitations of the
former and the conventional SAFFs. The pull-down path
MN7 –MN8 –MN9 (MN11–MN12 –MN13 ) is formed at Q (/Q),
which is independent of /Q (Q). In addition, by inserting
nMOSs gated by /D and D at the pull-down paths of Q and /Q,
respectively, the glitch found in Kim’s SAFF is prevented at
the expense of an increase in input loading and nMOS stack Fig. 5. Utilizing series stacked nMOS transistors to enhance the operational
yield of the SA stage in previous SAFFs.
number on the pull-down paths of Q and /Q. However, this
structure does not remove the current contention perfectly. For
example, at the rising edge of CLK when /D is high and
the previous state of /Q is low, MP6 is turned ON and the
pull-down path of Q (MN7 –MN8 –MN9 in Fig. 1) is enabled.
Thus, the contention current flows until /Q is charged by
MP8 and MP6 is turned OFF. The effect of this problem is
insignificant at high VDD , but becomes severe in low VDD
regions because of the large gate delay variation.
Besides the slave element, the second issue with conven-
tional SAFF at low VDD stems from the SA stage, where MN4
is always turned ON. During the transition of the SA stage after
the rising edge of CLK, VXY should be sufficiently large to
quickly and correctly develop /R and /S. However, when VDD
is low, the voltage headroom is decreased, making it hard to
obtain sufficiently large VXY , even when the minimum width
is used for MN4 . Moreover, the larger impact of variation
at low VDD exacerbates this problem, because the driving
strength of MN4 becomes more uncertain and the effect of
transistor mismatch between MN2 and MN3 or MN5 and MN6
on VXY becomes more severe. As a result, the delay in the
SA stage increases significantly. More importantly, the likeli-
hood of wrong data being latched in the SA stage becomes
significant, which degrades the operational yield of SAFF.
One intuitive solution for preventing this malfunction is
to increase the width of all transistors except MN4 in the
SA stage. In this way, the transistor mismatch is suppressed
and the relative strength of MN4 is decreased. However, this
simultaneously increases the delay, energy consumption, and
area. Another possible approach is to directly decrease the
driving strength of MN4 . Because it is insufficient to use the
minimum width for MN4 , the gate length of MN4 must be Fig. 6. Structure of SAFF-TCD.
increased. However, as pointed out in [18] for deep submi-
crometer technology using FinFET, the gate length is fixed
to follow stringent design rules. Thus, merely increasing the III. P ROPOSED S ENSE -A MPLIFIER -BASED F LIP -F LOP
gate length of MN4 may not be possible. Instead, the driving The proposed SAFF-TCD is designed to resolve the prob-
strength of MN4 needs to be suppressed by stacking multiple lems of previous SAFFs at low VDD by adopting the transition
transistors in series, as shown in Fig. 5. Although this approach complete detection signal, TC. Fig. 6 shows the structure of
only increases the energy consumption slightly, it significantly the SAFF-TCD. The TC signal is generated by a NAND2,
increases the area of the FF, especially for low VDD operation. whose two inputs are the outputs of the SA stage, /R and /S.
The cost of these transistor sizing and stacking approaches for According to the SA stage operation, both /R and /S are
enhancing the operational yield of the SA stage is evaluated precharged to high when CLK is low, which means that TC
in detail in Section IV. stays low before the rising edge of CLK. When CLK becomes
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

JEONG et al.: SAFF-TCD FOR LOW-VOLTAGE OPERATION 5

Fig. 7. Operational waveforms of SAFF-TCD.

high, the SA stage starts its transition, and if the transition


finishes, one of /R or /S becomes low, which makes TC
high. Because TC is connected to the gate of MN4 , MN4
is turned OFF during the transition of the SA stage, and is
only turned ON after the SA stage transition has finished. Fig. 8. Structure of (a) SA stage with DCLK gating MN4 and (b) DCK
Thus, SAFF-TCD is free from the degradation in speed and generation circuit.
operational yield of the SA stage at low VDD caused by the
always-turned-ON MN4 in previous SAFFs. This is confirmed
by the operational waveforms of SAFF-TCD, which are shown
in Fig. 7.
Another noticeable feature of the proposed SAFF-TCD is
that X and Y are not equalized during the precharge phase
(CLK = 0). In the previous SAFFs, X and Y are equalized
during CLK = 0 and the voltages of X and Y are same
at the rising edge of CLK, which can alleviate the impact
of the mismatch. However, in SAFF-TCD, X and Y can be
different at the rising edge of CLK, and thus sensing stability
can be degraded. Nevertheless, this negative effect of X = Y
at the rising edge of CLK in SAFF-TCD is overcome by the
positive effect gained by turning OFF MN4 during CLK is high. Fig. 9. (a) Cell level yield of an SA stage with DCLK with and without
This is because the proper voltage difference, which can be delay line variation. (b) Chip level yield for different values of Nshare .
obtained between X and Y (according to D and /D) “during”
the amplification phase with turn- OFF MN4 , is much more can be very slow. Because DCLK is typically shared by a
important than X = Y “at” the beginning of amplification number of SAFFs for energy and area efficiency, TD should be
phase with turn-ON MN4 . determined as a conservatively large value to guarantee a stable
Instead of utilizing internally generated TC, the gate of operation of all SAFFs, including the SAFF with the slowest
MN4 can be controlled by an external global signal, delayed SA stage, in order to achieve the target operational yield.
CLK (DCLK), as shown in Fig. 8(a). In this case, MN4 is This makes TD be excessively large, which can significantly
OFF at the rising edge of CLK, when D and /D start to be degrade the noise immunity of SAFF. Furthermore, using an
amplified by the SA stage of SAFF. Thus, a sufficiently larger additional global control signal significantly increases power
VXY can be obtained compared with previous SAFFs, and consumption.
the stability or speed problem caused by always-turned-ON Fig. 8(b) shows the structure of a DCLK generation circuit,
MN4 can be mitigated. After a delay TD (after this CLK which uses a delay line with Ninv inverters. With this structure,
edge), DCLK rises which turns ON MN4 , and thus the ground TD can be controlled by adjusting Ninv to achieve certain
path to /R or /S can be provided to guarantee the static target yield. Fig. 9(a) shows the cell level operational yield of
operation. However, this approach has a critical limitation; the SAFF with DCLK gating MN4 for different Ninv values
it is highly challenging to achieve the noise immunity and with and without process variation of the delay line inverters,
operational yield at the same time. Due to the process, voltage, when VDD = 0.35 V and temperature is at the worst corner,
and temperature variations, the SA stage of some SAFFs −40 °C. It is observed that Ninv should be larger than 10
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

to achieve the target operational yield, 3σ , and there is a


significant yield degradation due to the variation of the delay
line inverters. Fig. 9(b) also shows the chip level yield of
the SAFF with DCLK gating MN4 versus Ninv . It is assumed
that the statistical characteristics of parasitic RC follow the
results shown in [19], and 384 FFs are used in one chip,
as shown in [20]. As mentioned, multiple SAFFs can share one
DCLK generation circuit and the number of SAFFs sharing
one DCLK generation circuit, Nshare , affects the SoC level
yield of SAFFs. Fig. 8(d) shows the results for different values
of Nshare . SoC level yield is improved as Nshare is increased.
This is because, with a larger Nshare , a smaller number of
DCLK generation circuits can be used in the given chip, and
thus, the probability that DCLK generation circuit causes error
is reduced.
In addition to MN4 gating, the TC signal is applied to
the gate nodes for nMOSs forming the pull-down paths of
Q and /Q in the slave latch. The structure of the slave latch
in SAFF-TCD is similar to the slave latch in Strollo’s SAFF,
but not exactly the same. The number of nMOS stacks in
the slave latch is three, which is smaller than Strollo’s SAFF
whose slave latch has four-stacked nMOS path. With this Fig. 10. Operational waveforms of Q and /Q in (a) Kim’s SAFF, (b) proposed
structure, the problems caused by the slave latches in previous SAFF-TCD without CLK gating slave latch, and (c) SAFF-TCD with CLK
SAFFs, which were mentioned in Section II, can be resolved gating slave latch.
as follows. First, unlike the conventional SAFF, Q (/Q) can
be discharged independently of /Q (Q) with a three-stage gate
delay: 1) /R (/S) is discharged; 2) TC is charged up by G1 ; and
3) Q (/Q) is discharged by the MN8 –MN7 (MN11 –MN10) path.
Thus, even when the load on /Q (Q) is large, the slave latch
can rapidly discharge Q (/Q). Second, unlike Kim’s SAFF,
no glitches occur in the Q or /Q nodes at the rising edge
of CLK. This is because the pull-down path of Q or /Q is
not enabled until /S or /R is discharged, respectively. Third,
unlike Strollo’s SAFF, no contention current occurs when
Q or /Q transits. This is also attributed to the nature of TC,
which becomes high only after /R or /S falls. In the example
of falling Q, when the pull-down path of Q is enabled by
TC, /R is already discharged and /Q is charging up, which
means that MP6 is being turned OFF. In this way, the current Fig. 11. Slave latch of SAFF-TCD for glitch protection.
contention between the pull-down paths of Q or /Q and
latching pMOS transistors is prevented.
In SAFF-TCD, on the other hand, a glitch occurs in the in tDQ (<5%) and area (∼10%). The waveforms of Q and /Q
output nodes at the falling edge of CLK. For example, with this structure are also presented in Fig. 10(c), which
if Q is high during the evaluation phase (CLK is high), shows that the glitch is significantly reduced.
the lowered /S should be charged up at the falling edge of
CLK. In this case, TC is only discharged after /S has charged
up, which is delayed by G1 . Thus, both TC and /S are high for IV. Q UANTITATIVE C OMPARISON
a moment at the falling edge of CLK, and MN8 and MN7 are To quantitatively analyze various FFs, HSPICE Monte Carlo
simultaneously turned ON, leading to a glitch at Q. However, simulations were performed with the 22-nm BSIM-CMG Fin-
this glitch is insignificant and far smaller than that arising in FET model [21]. In particular, the 22-nm BSIM-CMG FinFET
Kim’s SAFF. This is because, as soon as /S is charged, TC can model was fit to the experimental data in [22], whose key
be quickly discharged by two stacked nMOSs in G1 , whose parameters are shown in Table I, and the transistor parasitic
gates are driven by full VDD . Fig. 10(a) and (b) compares the capacitances are fit to the results shown in [23] and [24].
Q and /Q waveforms in Kim’s SAFF and SAFF-TCD, In addition, the parasitic resistance and capacitance of the wire
respectively, which are obtained from Monte Carlo are extracted from the layout and added in simulations, based
simulations. The glitch in SAFF-TCD can be suppressed on the parasitic resistance and capacitance values of metal
by inserting CLK gating nMOSs in the pull-down paths of wires reported in [19]. The process variation is considered
Q and /Q, as shown in Fig. 11, at the expense of an increase using Pelgrom’s law [25] by setting the standard deviation of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

JEONG et al.: SAFF-TCD FOR LOW-VOLTAGE OPERATION 7

TABLE I
T ECHNOLOGY PARAMETERS FOR VDD = 0.8 V

TABLE II
D ESIGN RULES FOR 22-nm FinFET

TABLE III
F IN N UMBERS FOR AN SA S TAGE

Fig. 12. Layouts of (a) PowerPC FF and (b) Montanaro’s SAFF.

the threshold voltage, σVth , as


AVt
σVth =  (1)
L g × Wg
where L g and Wg are the gate length and width, respectively,
and 1.8 mV · μm is used for the Pelgrom coefficient AVt [26].
In FinFET, Wg is equal to Nfin × (2Hfin + Tfin ), where Nfin ,
Hfin , and Tfin denote the number of fins, fin height, and fin
thickness, respectively.
The design rules of 22-nm FinFET are presented in Table II
[19], [22]. The layouts of the PowerPC FF and conventional
SAFF for a nine-track cell height based on the determined
design rules are shown in Fig. 12(a) and (b), respectively. The Fig. 13. Operational yield of the previous SAFFs and SAFF-TCD for
different VDD values.
maximum Nfin per finger is four, which is determined by how
many fins can be located within the cell height considering
the design rules. simulations. This guarantees a 95% confidence level with 1%
Fig. 13 compares the operational yield of the previous error criteria when the operational yield is 3σ (∼99.87%).
SAFFs with that of SAFF-TCD for the Nfin values listed The minimum VDD of SAFF, VDD,min , is determined
in Table III. The minimum Nfin was used for the always- as VDD that satisfies the target operational yield 3σ at the
turned-ON transistor, MN4 , whereas the maximum Nfin worst temperature corner. According to Fig. 13, VDDmin is
allowed for one finger was used for nMOS transistors forming determined by the cold temperature condition for all SAFF
discharging paths to suppress the relative strength of MN4 and structures. This can be explained by the dependence of the ON
minimize the effect of transistor mismatch. The results shown current (ION ) of transistor on the temperature; at high VDD , ION
in Fig. 13 were obtained by performing 300 000 Monte Carlo is larger with lower temperature by mobility effect, whereas
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 14. Comparison of /R and /S waveforms in SAFF-TCD and conventional


SAFF at the near-threshold region.

at low VDD , ION becomes larger with higher temperature by Fig. 15. D to Q delay (tDQ ) versus D to CLK delay (tDC ) for various FFs
at TT corner with VDD = 0.8 V.
Vth effect. For the previous SAFFs whose VDDmin is critically
determined by high VDD operation, the yield becomes worse
at the cold temperature because of higher ION of MN4 . On the
other hand, for the proposed SAFF-TCD, the effect of MN4
on the yield is negligible. Instead, the mismatch between input
transistors of SA stage—MN2 and MN3 in Fig. 6—critically
determines the yield. In low VDD region, where VDDmin of
SAFF-TCD is determined, the voltage difference between
X and Y at the beginning of sensing procedure, which is
developed by the two input transistors, is smaller in cold
temperature due to smaller ION . As a result, VDDmin is
determined at the cold temperature condition in SAFF-TCD.
VDD,min of the previous SAFFs is 735 mV. In other words,
the previous SAFFs cannot satisfy the target yield if VDD is
less than 735 mV. However, VDD,min of SAFF-TCD is 162 mV,
an improvement of 573 mV. This is attributed to the adaptively
turn ON of MN4 controlled by TC. Thus, when Nfin values for
the transistors in the SA stage are set as shown in Table III,
SAFF-TCD can operate in the near-threshold or subthreshold Fig. 16. Comparison of 3σ worst case values of tsetup in various FFs for
region, whereas previous SAFFs cannot. Because the variation different VDD values.
effect is rapidly increased as VDD lowers in sub-Vth region,
the operational yield of SAFF drops steeply VDD < 200 mV. Montanaro’s SAFF, Nikolic’s SAFF, Kim’s SAFF, and
Thus, it is highly unstable to operate SAFF-TCD near VDD,min , Strollo’s SAFF—have the same tsetup, because these four previ-
175 mV, and it is more appropriate to determine VDD,min with ous SAFFs have identical SA stages. As also previously men-
safety margin, for example, VDD,min = 200 mV at which tioned in Section II, tsetup of these four previously proposed
the operational yield is much higher than the target yield. SAFFs is near-zero negative, since the sampling of D starts at
Nevertheless, this conservative VDD,min of SAFF-TCD is still the CLK rising edge, and ends as soon as D is latched inside
far better than VDD,min of previously proposed SAFFs. the SA stage. SAFF-TCD has a slightly smaller tsetup (better)
Fig. 14 compares the waveforms of /R and /S in SAFF-TCD than the previous SAFFs. This is because, in SAFF-TCD
with those in the previous SAFFs when VDD is at the near- even if D stabilizes slower than in the previous SAFFs and
threshold voltage (400 mV) and the sampled D is high. It is the difference between D and /D at the sampling edge is
observed that /R and /S are correctly developed in SAFF-TCD, smaller, the SA stage of SAFF-TCD operates more quickly and
unlike in the previous SAFFs. correctly because of the turned-OFF MN4 during the transition.
Fig. 15 compares tDQ versus D-to-CLK delay (tDC ) curves Fig. 16 compares 3σ worst case values of tsetup in various
of various FFs at TT corner with VDD = 0.8 V. According FFs for different VDD values. As stated in Section II, tsetup of
to [2], tsetup is derived as tDC that minimizes tDQ . As explained PowerPC is greatly increased in low VDD , which results in a
in Section II, PowerPC FF has the largest tsetup, because D significant performance degradation. For the previous SAFFs,
should be well copied into the master latch sufficiently long tsetup can only be obtained when VDD ≥ 800 mV, because
before the CLK rising edge, and TGPL has the smallest tsetup, they have VDD,min of 735 mV. For the previous SAFFs and
because D can be captured for a positive time period, TON , SAFF-TCD, tsetup is negative or close to zero, as explained
after the CLK rising edge. Four previously proposed SAFFs— previously.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

JEONG et al.: SAFF-TCD FOR LOW-VOLTAGE OPERATION 9

Fig. 18. 3σ worst case values of tDQ in various FFs.

Fig. 17. 3σ worst tsetup versus VDD at −40° and 120° for (a) PowerPC FF,
(b) TGPL, (c) SAFF-TCD, and (d) previous SAFFs.

The results shown in Fig. 16 are obtained at the respective


temperature worst conditions. Fig. 17(a)–(d) shows 3σ worst
case values of tsetup versus VDD of various FFs at the cold and
hot temperature corners. For PowerPC FF shown in Fig. 17(a),
3σ worst tsetup is larger at hot temperature for VDD ≥ 700 mV,
because lower ION at hot temperature increases time for the
internal nodes of FF to be developed before clock edge. Fig. 19. 3σ worst case values of tDQ in SAFF-TCD at cold and hot
As VDD decreases, ION becomes smaller at the cold temperature corners.
temperature condition, which results in 3σ worst tsetup
that is larger at cold temperature for VDD < 700 mV. For transistor issue in the SA stage and the current contention
TGPL case shown in Fig. 17(b), for VDD ≥ 700 mV, fast in the slave latch. Similar to tsetup case, the temperature
operation of pulse generator at cold temperature shortens sensitivity of tDQ varies as VDD is changed. Fig. 19 shows 3σ
the pulsewidth of TON , which means that D should be worst tDQ of SAFF-TCD at cold and hot temperature corners.
stabilized faster at the clock edge. Thus, tsetup is larger at cold For VDD ≥ 700 mV, tDQ is larger at hot temperature corner,
temperature for VDD ≥ 700 mV, whereas tsetup is larger at because ION becomes smaller at hot temperature in this VDD
hot temperature for VDD < 700 mV. For SAFF results shown region, whereas tDQ is larger at the cold temperature in lower
in Fig. 17(c) and (d), fast latch operation of the SA stage at VDD region. The results shown in Fig. 18 correspond to the
the cold temperature more quickly closes the data sampling values obtained at the respective worst temperature corner.
window for VDD ≥ 700 mV, thus results in larger tsetup, Although TGPL is the fastest, due to the small transistor
whereas, for VDD < 700 mV, tsetup is larger at hot temperature. numbers and tsetup, it becomes less useful at low voltages
Fig. 18 compares the 3σ worst case values of tDQ in because of the thold problem. Fig. 20 compares the 3σ
various FFs for different VDD values, where tDQ is the sum of worst case values of thold , thold,3σ , in Montanaro’s SAFF,
tsetup and tCQ . It is assumed that the FO4 inverter is driven by SAFF-TCD, and TGPL for different VDD values and the rela-
all FFs, and, between tDQ for rising Q and falling Q, the larger tive portion of the typical clock period (Tclk ), or thold,3σ /Tclk .
tDQ is plotted. Because of its larger tsetup , PowerPC FF has It should be noted that, for each VDD , Tclk is determined as
the largest tDQ at low VDD . Compared with PowerPC FF, being 40 FO4 inverter delays, as in [27], and the number of
SAFF-TCD and TGPL are 45% and 64% faster at 300 mV inverters in the pulse generator or TGPL is adjusted so that the
while 30% and 20% faster at 400 mV, respectively. This is 3σ operational yield of TGPL is guaranteed. At low VDD , due
attributed to the near-zero tsetup. Even at high VDD , the pro- to the enlarged variation effect, large TON should be guaranteed
posed structure exhibits the best performance, except for for providing timing margin for the target operational yield.
TGPL, because it does not suffer from the always-turned-ON This in turn increases thold,3σ of TGPL significantly, becoming
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 21. 3σ worst thold of (a) SAFF-TCD and (b) TGPL at cold and hot
temperature corners.
Fig. 20. thold,3σ in various FFs and thold,3σ / Tclk .
compared with PowerPC, because the energy consumption is
over 44% of Tclk at 300 mV, as shown in Fig. 20. According dominated by the pulse generation, which is performed every
to [28], thold should be limited to 10% of Tclk for the practical CLK edge.
use of FFs. Thus, it can be concluded that TGPL is not At high VDD , even though most SAFFs exhibit comparable
applicable at low voltages. However, in SAFF-TCD, thold,3σ is results, Kim’s consumes the most energy because of the
limited below 10% of Tclk even at low VDD , as the sampling current contention at the output nodes. At low VDD values,
window in the SA stage is shut as soon as the data have where only PowerPC FF and SAFF-TCD can operate, SAFF-
been latched. The results for Montanaro’s SAFF are included TCD consumes 15%–20% more energy. This is because, even
only VDD ≥ 800 mV, where the target yield can be achieved. when D is the same as the previous Q, /R, /S, and TC must be
In this region, SAFF-TCD has larger thold,3σ compared with switched ON every CLK cycle, whereas only CLK and /CLK
Montanaro’s SAFF, because a NAND2 for implementing switch in the PowerPC FF. However, this energy overhead
TCD scheme increases the capacitance of /R and /S. seems tolerable, considering the 2 times speed benefit of
However, because of low variation effect in high VDD region, SAFF-TCD and the unique advantages of SAFFs, such as
SAFF-TCD still has small thold,3σ (<4% of Tclk ), which is level-shifting applications and differential signal availability.
sufficiently lower than the practical limit. In Fig. 21(a) and (b), Compared with Strollo’s SAFF, the proposed SAFF-TCD
thold versus VDD are derived at cold and hot temperature does not show a significant improvement in speed and energy.
corners for SAFF-TCD and TGPL, respectively. Similar to However, the proposed SAFF-TCD has the advantage in terms
tsetup and tDQ , for VDD ≥ 700 mV, thold is larger at hot of operating voltage, as previously mentioned. In other words,
temperature corner, because ION becomes smaller at hot SAFF-TCD can operate at much wider supply voltage range
temperature, which increases time required to sufficiently compared with Strollo’s SAFF.
develop the internal nodes of SAFF-TCD and TGPL after Fig. 23 shows the layout of SAFF-TCD based on the design
clock edge. On the contrary, for VDD < 700 mV, thold is rules listed in Table II, and Table IV compares the layout area
larger at cold temperature corner in which ION is smaller of various FFs, normalized according to that of the PowerPC
compared with hot temperature corner. FF. With the exception of the conventional SAFF, the areas
Fig. 22 shows the energy consumption of various FFs for are comparable.
different values of activity factor, α; the energy was measured As stated in Section II, the previous SAFFs can also achieve
by the method demonstrated in [29]. It is assumed that the target operational yields at low VDD if larger Nfin values
short pulse enabling TGPL is generated implicitly as in [8]. are used in the SA stage, except for the always-turned-ON
As mentioned in Section II, a large number or a large size transistor MN4 . Fig. 24 shows the ratio of Nfin for achieving
of transistors should be used in a pulse generator to obtain the 3σ yield (Nfin,3σ ) to the typical Nfin set (Nfin,typ) given
sufficiently large TON that can tolerate large variation effect in in Table III for different VDD values. To compensate for
low VDD , which causes large power consumption. Thus, TGPL the effect of variation and reduced voltage headroom at
consumes larger energy than any other FFs. low VDD , this should be accompanied by an extremely large
By comparing the energy consumption with different val- scaling-up using multiple fingers. This results in a large
ues of α, it can be observed that PowerPC FF is highly overhead in terms of area, performance, and energy. Stacking
sensitive to α compared with the SAFFs. This is because, transistors to reduce the driving strength of MN4 may be
at CLK rising edge, the internal nodes of PowerPC FF switch another approach. However, the required stack number for
only when D is changed from the previous value, while the the target yield is significant at low VDD (12 for 300 mV
internal nodes of the SAFFs switch every CLK rising edge and 7 for 400 mV, according to simulations), which results
whether or not D switches. TGPL is also less sensitive to α, in a large area overhead. More importantly, with large stack
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

JEONG et al.: SAFF-TCD FOR LOW-VOLTAGE OPERATION 11

Fig. 22. Energy consumption of various FFs for activity factors, α. (a) α = 0. (b) α = 0.2. (c) α = 0.4. (d) α = 0.6. (e) α = 0.8. (f) α = 1.

Fig. 24. Required scale-up for low voltage to achieve target operational
Fig. 23. Layout of SAFF-TCD. yield.

TABLE IV R EFERENCES
A REA C OMPARISON OF FF S
[1] H. Kaul, M. Anders, S. Hsu, A. Agarwal, R. Krishnamurthy,
and S. Borkar, “Near-threshold voltage (NTV) design—Opportunities
and challenges,” in Proc. 49th ACM/EDAC/IEEE Design Autom.
Conf. (DAC), Jun. 2012, pp. 1149–1154.
[2] D. Jeon, M. Seok, C. Chakrabarti, D. Blaauw, and D. Sylvester,
“A super-pipelined energy efficient subthreshold 240 MS/s FFT core in
65 nm CMOS,” IEEE J. Solid-State Circuits, vol. 47, no. 1, pp. 23–34,
Jan. 2012.
[3] Y. Suzuki, K. Odagawa, and T. Abe, “Clocked CMOS calculator
circuitry,” IEEE J. Solid-State Circuits, vol. SSC-8, no. 6, pp. 462–469,
Dec. 1973.
[4] G. Gerosa et al., “A 2.2 W, 80 MHz superscalar RISC microprocessor,”
numbers at low VDD , the noise immunity of the /R and /S IEEE J. Solid-State Circuits, vol. 29, no. 12, pp. 1440–1454, Dec. 1994.
[5] D. Markovic, B. Nikolic, and R. W. Brodersen, “Analysis and design of
nodes becomes degraded because of the weak pull-down low-energy flip-flops,” in Proc. Int. Symp. Low Power Electron. Design,
path. Thus, a structural solution for enhancing the operational 2001, pp. 52–55.
yield of the SA stage in SAFFs, such as offered by the [6] C. K. Teh, T. Fujita, H. Hara, and M. Hamada, “A 77% energy-saving
22-transistor single-phase-clocking D-flip-flop with adaptive-coupling
proposed SAFF-TCD, is essential for operating SAFFs at configuration in 40 nm CMOS,” in IEEE Int. Solid-State Circuits Conf.
low VDD . (ISSCC) Dig. Tech. Papers, Feb. 2011, pp. 338–340.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

[7] N. Lotze, M. Ortmanns, and Y. Manoli, “Variability of flip-flop timing [29] M. Alioto, E. Consoli, and G. Palumbo, “Analysis and compari-
at sub-threshold voltages,” in Proc. ACM/IEEE Int. Symp. Low Power son in the energy-delay-area domain of nanometer CMOS flip-flops:
Electron. Design (ISLPED), Aug. 2008, pp. 221–224. Part I—Methodology and design strategies,” IEEE Trans. Very Large
[8] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 725–736, May 2011.
“Flow-through latch and edge-triggered flip-flop hybrid elements,” [30] S. Lin, H. Yang, and R. Luo, “High speed soft-error-tolerant latch and
in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, flip-flop design for multiple VDD circuit,” in Proc. IEEE Comput. Soc.
Feb. 1996, pp. 138–139. Annu. Symp. Very Large Scale Integr., Mar. 2007, pp. 273–278.
[9] Z. Peiyi, T. Darwish, and M. Bayoumi, “Low power and high [31] W. Wang and H. Gong, “Sense amplifier based RADHARD flip
speed explicit-pulsed flip-flops,” in Proc. 45th Midwest Symp. Circuits flop design,” IEEE Trans. Nucl. Sci., vol. 51, no. 6, pp. 3811–3815,
Syst. (MWSCAS), vol. 2. Aug. 2002, pp. II-477–II-480. Dec. 2004.
[10] F. Klass et al., “A new family of semidynamic and dynamic flip-flops
with embedded logic for high-performance processors,” IEEE J. Solid-
State Circuits, vol. 34, no. 5, pp. 712–716, May 1999. Hanwool Jeong was born in Seoul, South Korea,
[11] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J. Sullivan, in 1987. He received the B.S. degree in electrical
and T. Grutkowski, “The implementation of the Itanium 2 microproces- and electronic engineering from Yonsei University,
sor,” IEEE J. Solid-State Circuits, vol. 37, no. 11, pp. 1448–1460, Seoul, South Korea, in 2012, where he is currently
Nov. 2002. working toward the Ph.D. degree in electrical and
[12] S. Bernard, M. Belleville, J.-D. Legat, A. Valentian, and D. Bol, electronic engineering.
“Ultra-wide voltage range pulse-triggered flip-flops and register file with His current research interests include near-
tunable energy-delay target in 28 nm UTBB-FDSOI,” Microelectron. J., threshold digital logic circuit design, low-voltage
vol. 57, pp. 76–86, Nov. 2016. SRAM peripheral circuit design, and advanced
[13] H. Cai, Y. Wang, L. A. Naviner, W. Kang, and W. Zhao, “Energy effi- device-based SRAM cell design.
cient magnetic tunnel junction based hybrid LSI using multi-threshold
UTBB-FD-SOI device,” in Proc. Great Lakes Symp. VLSI, 2017,
pp. 23–28.
[14] J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC micro- Tae Woo Oh was born in Seoul, South Korea,
processor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703–1714, in 1992. He received the B.S. degree in electrical
Nov. 1996. and electronic engineering from Yonsei University,
[15] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. K.-S. Chiu, Seoul, in 2015, where he is currently working
and M. M.-T. Leung, “Improved sense-amplifier-based flip-flop: Design toward the Ph.D. degree in electrical and electronic
and measurements,” IEEE J. Solid-State Circuits, vol. 35, no. 6, engineering.
pp. 876–884, Jun. 2000. His current research interests include FinFET-
[16] J.-C. Kim, Y.-C. Jang, and H.-J. Park, “CMOS sense amplifier-based flip- based low-power and high-performance SRAM.
flop with two N-C/sup 2/MOS output latches,” Electron. Lett., vol. 36,
no. 6, pp. 498–500, Mar. 2000.
[17] A. G. M. Strollo, D. De Caro, E. Napoli, and N. Petra, “A novel high-
speed sense-amplifier-based flip-flop,” IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 13, no. 11, pp. 1266–1274, Nov. 2005. Seung Chul Song received the Ph.D. degree in
[18] D. H. Saari and D. G. Nairn, “Analog integrated circuit design using solid-state electronics from The University of Texas
fixed-length devices,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), at Austin, Austin, TX, USA, in 2000.
May 2016, pp. 1798–1801. Since 2000, he has been in engineering and
[19] D. Ingerly et al., “Low-k interconnect stack with metal-insulator-metal management positions in various organizations,
capacitors for 22 nm high volume manufacturing,” in Proc. IEEE Int. involved in advanced CMOS process/device technol-
Interconnect Technol. Conf. (IITC), Jun. 2012, pp. 1–3. ogy development. He is currently with Qualcomm
[20] H.-T. Lin, Y.-L. Chuang, Z.-H. Yang, and T.-Y. Ho, “Pulsed-latch Inc., San Diego, CA, USA, where he leads the
utilization for clock-tree power optimization,” IEEE Trans. Very Large 28-nm HK/MG technology development with lead-
Scale Integr. (VLSI) Syst., vol. 22, no. 4, pp. 721–733, Apr. 2014. ing foundries. He has contributed several key papers
[21] Y. S. Chauhan et al., “BSIM—Industry standard compact MOSFET to high-profile journals and conferences on various
models,” in Proc. (ESSCIRC), Sep. 2012, pp. 30–33. topics of CMOS technology, including SiON, HK/MG, and FinFET. He is the
[22] C. Auth et al., “A 22 nm high performance and low-power CMOS holder of six U.S. patents.
technology featuring fully-depleted tri-gate transistors, self-aligned
contacts and high density MIM capacitors,” in Proc. Symp. VLSI
Technol. (VLSIT), Jun. 2012, pp. 131–132. Seong-Ook Jung (M’00–SM’03) received the B.S.
[23] R. W. Mann et al., “Impact of circuit assist methods on margin and and M.S. degrees in electronic engineering from
performance in 6 T SRAM,” Solid-State Electron., vol. 54, no. 11, Yonsei University, Seoul, South Korea, in 1987 and
pp. 1398–1407, 2010. 1989, respectively, and the Ph.D. degree in electri-
[24] M. Guillorn et al., “FinFET performance advantage at 22 nm: An AC cal engineering from the University of Illinois at
perspective,” in Proc. Int. Symp. VLSI Technol., Jun. 2008, pp. 12–13. Urbana–Champaign, Urbana, IL, USA, in 2002.
[25] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, “Matching From 1989 to 1998, he was with Samsung Elec-
properties of MOS transistors,” IEEE J. Solid-State Circuits, vol. 24, tronics, where he was involved in the specialty
no. 5, pp. 1433–1439, Oct. 1989. memories, such as video RAM, graphic RAM, and
[26] H. Kawasaki et al., “Challenges and solutions of FinFET integration in window RAM, and merged memory logic. From
an SRAM cell and a logic circuit for 22-nm node and beyond,” in IEDM 2001 to 2003, he was with T-RAM Inc., Milpitas,
Tech. Dig., Dec. 2009, pp. 289–292. CA, USA, where he was the Leader of the Thyristor-Based Memory Design
[27] M. Alioto, E. Consoli, and G. Palumbo, “General strategies to design Team. From 2003 to 2006, he was with Qualcomm Inc., San Diego, CA, USA,
nanometer flip-flops in the energy-delay space,” IEEE Trans. Circuits where he was involved in high-performance low-power embedded memories,
Syst. I, Reg. Papers, vol. 57, no. 7, pp. 1583–1596, Jul. 2010. process variation tolerant circuit design, and low-power circuit techniques.
[28] C. Chia-Hsiang, K. Bowman, C. Augustine, Z. Zhengya, and Since 2006, he has been a Professor with Yonsei University. His current
J. Tschanz, “Minimum supply voltage for sequential logic circuits in research interests include process variation tolerant circuit design, low-power
a 22 nm technology,” in Proc. IEEE Int. Symp. Low Power Electron. circuit design, mixed-mode circuit design, and future generation memory.
Design (ISLPED), Sep. 2013, pp. 181–186. Dr. Jung is currently a Board Member of the IEEE SSCS Seoul Chapter.

Vous aimerez peut-être aussi