Académique Documents
Professionnel Documents
Culture Documents
I. I NTRODUCTION
Manuscript received October 18, 2012; revised February 15, 2013; accepted
March 30, 2013.
I. Levi and A. Belenky are with the Department of Electrical and computer Engineering, Ben-Gurion University, Beer-Sheva 84105, Israel (e-mail:
itamarlevi@gmail.com; belenky@ee.bgu.ac.il).
A. Fish is with the Faculty of Engineering, Bar-Ilan University, Ramat Gan
52900, Israel (e-mail: afish@ee.bgu.ac.il).
Digital Object Identifier 10.1109/TVLSI.2013.2257902
A basic DML gate is composed of any static logic family gate, which can be a conventional CMOS gate, and an
additional transistor. DML gates have a very simple and
intuitive structure, requiring unconventional sizing methodology to achieve the desired performance. Conventional LE
methodology cannot be used with the DML family as
it does not consider its unconventional sizing rules and
topology.
The objective of this paper is to develop a simple method
for minimizing delays and achieving an optimized number of
stages in logical paths containing CMOS-based DML gates.
A unified LE method is introduced for the delay evaluation
and optimization of logic paths constructed with DML logic
gates. DML-LE answers complete (unapproximate) design
problems, which can be solved numerically, and simplifies
these problems to a straightforward and easy computational
problem [approximate and semiapproximate (SA) solutions]
with a unified analytic model. With this model, we can
estimate the minimum to maximum error under delay approximation and the error in the target optimum number of stages
for a given logic function. The efficiency of the developed
method is shown by a comparison of the theoretical results,
achieved using the proposed method with simulation results of
the Cadence Virtuoso optimizer tool using a standard 40-nm
technology.
The rest of this paper is organized as follows: a review
of the DML family is described in Section II. DML-LE
model for simple inverter chains is developed in Section III
with three different levels of approximations. In Section IV
we compare the methods by simplicity and accuracy. The
dependence of the optimum number of stages is also described
in Section IV, which provides an intuitive graphical visualization of the problem. DML-LE is expanded to complex
nets containing branching in Section V. In Section VI, the
efficiency of the DML-LE theoretical optimization is examined
for a standard 40-nm process. A discussion and conclusions of the use and applicability of DML-LE are given in
Section VII.
II. DML OVERVIEW
As previously mentioned, a basic DML gate architecture is
composed of a static gate and an additional transistor M1,
whose gate is connected to a global clock signal [9][11]. In
this paper, we specifically focus on DML gates that utilize
conventional CMOS gates for the static gate implementation.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2
Fig. 1. DML topology: (a) Type A and (b) Type B. (c) CMOS-based DML
gate with sizing factors. (d) Standard CMOS gate with sizing factors.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LEVI et al.: LE FOR CMOS-BASED DML GATES
(1)
(2)
ln(2).Rmin _A C D,min
t p0_DML
p_DML
LE_DML
R
(3)
Rinv C D,inv
Rinv C G,inv C G,gate
f _DML
Fig. 2.
follows:
tpd_i
D=
i
= t p0_DML
(2Si + 1) Si+1 + 1
+
+
3Si
2Si
odd_i
Type_A
.
(2Si + 1) Si+1 + 1
+
n/ p
3Si
2Si
even_i
Type_A
(5)
In the following sections, three different solutions to
the delay optimization problem are developed as follows:
1) complete unapproximated solution; 2) complete approximated solution; and 3) partially/SA solution. These solutions
are trading off complexity with accuracy.
C. Complete Unapproximated (CS) Method for the Sizing
Factors of DML Inverter Chain
To solve this problem we differentiate (5) by all Si factors
of the chain and equate to zero, i.e., d D/dsi = 0.
After simplification and substituting , the following
expression can be written for all odd i (6) and all even i (7):
( + 1 + Si+1 ) 1
Si
=
Si1
Si
n/ p
(6)
Si
( + 1 + Si+1 )
=
n/ p .
Si1
Si
(7)
(8)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4
N
(2S +1)
Si+1 +1
i
= t p0_DML
+
3Si
2Si
odd _i
Type _ A
(2Si +1) Si+1 +1
+
+
n/ p
. (9)
3Si
2Si
even_i
Type _ B
= t p0
_D M L
2
2 Si+1
Si+1
+
+
+
n/ p
.
3
2Si
3
2Si
odd _i
Type _ A
even_i
Type _ B
(10)
TABLE I
I NVERTER C HAIN S IZING FACTORS Si OF THE C A M ETHOD
S1
S2
n/ p A0.5
S3
S4
n/ p A1.5
S5
SN
S N +1
A2
n/ p A( 2 0.5)
A2
(11)
(12)
CLoad
N
= f N , f = F.
(13)
Cin,g
N
N
S N+1
= A 2 , FDML =
= f DML 2 ,
S1
N
A = f DML = 2 FDML (14)
In CMOS: F =
In DML:
SN+1
S1
(15)
1 n/p
.
2
2
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LEVI et al.: LE FOR CMOS-BASED DML GATES
N
FDML ln (FDML )
N
0.5
0.5
=0
+n/ p
+ FDML
n/ p
N
C1
(16)
0.5 ) and the optimal sizing factor
where C1 = (0.5
+
n/ p
n/ p
can be numerically solved using: fDML_opt = exp(1 + C1 /
f DML_opt ), which leads to f DML_opt = 4.65. As developed in Section II, for large FDML , which means Nopt >
Nminimum , Nopt can be approximated by
Nopt
= log fDML_opt (FDML ).
(17)
As presented by Sutherland et al. [1][3], the delay deviation from the minimum delay, achieved under Nopt length
implementation, can be plotted as shown in Fig. 3.
Fig. 3 plots the expression given in
Dopt (Dev N )
C1 Dev + Dev f opt 1/Dev
=
C1 + f opt
Dopt Nopt
(18)
where Dev is the deviation factor from Nopt , the opt lowercase
is for optimal, Dopt (N) is the optimal delay for a given
general N, and Dopt (Nopt ) is the optimal delay with the
optimal N. The graph values are normalized to Dopt(Nopt ). The
sole difference from the CMOS solution is the constant C1 that
affects the graphs slopes.
As can be seen, the DML behavior is similar to the
CMOS solution, the overestimation in N is preferable over
underestimation by means of delay. For example, when N =
Nopt /2, 68.8% deviation in delay is observed. However, when
N = 2Nopt , a decreased deviation of 30.2% is achieved.
E. SA Method for the Sizing Factors of DML Inverter Chain
To compromise between the methods presented in
Sections III-C and III-D, a SA approach is introduced. The SA
approach, which is presented in detail in Appendix II, achieves
Fig. 4. Deviation of sizing factors from the complete solution: (a) CLoad =
50 Cin and (b)CLoad = 5 Cin .
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6
Fig. 5.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LEVI et al.: LE FOR CMOS-BASED DML GATES
TABLE II
G ENERALIZED DML D ELAY E XPRESSIONS FOR T YPE A G ATES ( FOR
TABLE III
P AND LE A PPROXIMATIONS FOR S EVERAL C OMPLEX G ATES
p and L E Approximations
NAND3_B,NOR3_ A
(4Si +1)/Si 4
OAI21_B,AOI21_A
(4Si +2)/Si 4
OAI21_A,AOI21_B
(5Si +2)/Si 5
NAND2_B,NOR2_ A
(3Si +1)/Si 3
(On+off) Path_i
X S_gate_on_path_i
(20)
N
D = t p0
DML
p_DML
2si
+
LE_DML
X S_gate Si +1
2 si
bi
CLoad_on
b_DM L
X S_gate Si +1
( f b)_D M L
EF_DML
(19)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8
i s in (26) as follows:
(i )X i+1
Si
Si+1
=
Si1
Si n/ p (i 1)X i
(i )X i+1
Si+1
Si
=
n/ p
Si1
Si
(i 1)X i
(25)
(26)
LEDML_i bDML_i
; X s_gate_i = X i .
X s_gate_i
i1
Si+1 X i+1
for odd i s :
Si+1 X i+1
Fig. 8. Six CMOS inverter chain delay as a function of the normalized load
capacitance for the DML CS method and for Cadence optimizer solution.
EFi
=
Si X i
LEi .bi . n/ p
EFi . n/ p
=
Si X i .
LEi .bi
Fig. 9. Six CMOS inverter chain delay as a function of the normalized load
capacitance for the LE solution and for Cadence optimizer solution.
(28)
N
PEDML ln(PEDML )
N
0.5
0.5
=0
(n/ p + n/ p ) + PEDML
N
C1
(29)
where the optimal sizing factor can be numerically solved
using: EFDML_OPT = e(1+C1/EFDML_OPT ) , which leads to
EFDML_opt = 4.65. Similar to the results presented in
Section II for inverters, for large FDML , Nopt can be approximated by
Nopt
= logEFopt_DML (PE) = logEFopt_DML (F LE B). (30)
Similarly to the simple inverter chain, the CA computation
effort and the final mathematical results for complex gates are
intuitive and handy as for the standard CMOS LE. It can be
shown, that the SA method, which can be also easily derived
for the complex gates, is also very simple and requires use of
an additional lookup table.
VI. DML-LE E VALUATION FOR 40-nm P ROCESS
Here, the proposed methodology is examined by a comparison between results of the LE optimization and that derived by
Cadence Virtuoso optimizer tool. The evaluation is performed
on two different logic networks, implemented in a low power
standard 40-nm technology.
Initially, a simple logic chain, identical to the chain theoretically analyzed in Section IV (N = 6), is examined. The
objective of this test is to compare between the results of the
delay optimization of all proposed LE methods to the results
of the Cadence Virtuoso Optimizer tool for different loads.
Simulated delay (SPICE) of the chain, sized according to the
Fig. 10. Six DML inverter chain delay as a function of the normalized load
capacitance of all the DML-LE solutions and for Cadence optimizer solution.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LEVI et al.: LE FOR CMOS-BASED DML GATES
TABLE IV
C OMPLEX G ATES C HAIN S IZING FACTORS (Si ) FOR THE CA M ETHOD
S1
1
Fig. 11.
EF
S2
n/ p
1 X 2
S3
1
EF2
1 2 X 2 X 3
EF3
S4
n/ p
1 2 3 X 2 X 3 X 4
S N +1
1
EF N
1 2 3 . . . N X 2 X 3 . . . X N +1
Fig. 13. DML and CMOS complex network delays, optimized using DML
and CMOS LEs, accordingly. In addition, the optimizer results for both
methods are presented.
VII. C ONCLUSION
Fig. 12.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10
(31)
hi
(32)
Con_path_i + Coff_path_i
.
Con_path_i
(33)
E Fi
= t p0 ( pi + EFi )
(36)
(37)
(38)
(39)
Rinv
Reqn .
(41)
2
p
(2S1 +1)
3S1
+ n/ p
Di = tp0
N
( pi + EFi ).
N
EFopt = PE = N F
LEi
bi
(43)
where PE is the path effort and F is the Cload to input
capacitance ratio. For a given chain, containing N CMOS
gates, N is not necessarily equal to the optimal number of
stages, Nopt . If N < Nopt , a number of inverters can be added
to better fit the stage effort of the path and therefore to improve
its delay. For N < Nopt , EFopt is given by
EFopt = 3.6 (for = 1).
(44)
Nopt
PE = EFopt .
For this case, Nopt is given by
For N > Nopt , EFopt can be approximated as follows:
N
PE = EFopt .
(45)
EFopt ( , Pinv dependent) may not be feasible in any path,
where N and PE are constrained.
odd_i > 1
(Si+1 +1)
(2Si +1)
+
+
3S1
2Si
Type_A : 3, 5, 7 . . .
%
$
even_i > 2
(2S2 +1)
(S3 +1)
+
+
n/ p (2S3Si +1)
3S2 + 2S2
i
Type_B : 4, 6, 8 . . .
+
$
(S2 +1)
2S1
(42)
hi
D = N Di = t p0_DML
N
A PPENDIX II
= t p0 pi + LEi bi
Cin_gate_i
D = tpd =
+
(Si+1 +1)
2Si
%
(46)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LEVI et al.: LE FOR CMOS-BASED DML GATES
11
TABLE V
I NVERTER C HAIN S IZING FACTORS Si OF THE SA M ETHOD
S1
1
S2
S2 ( A1 )
S3
A0.5
1
n/ p
S2 ( A1 )
S4
S5
A1 S2 ( A1 )
.
(49)
A1 =
S1 1 + 1 + 4( + 1)/A1
Calculation of A1 requires extraction of SN+1 from CLoad
(15). The set of (47) can be solved for any CLoad and any N
using M ATLAB or a similar tool to produce one lookup table
for increased user convenience/automation.
As can be seen, the difference between the SA and CA
methods is the addition of the 4( + 1)n/p term in (48).
To conclude, to utilize the SA method, the sizing factors
should be calculated from the FDML and A1 metrics as follows:
N
(50)
A1 = f DML = 2 FDML
N
2
S N+1
2
FDML =
= f DML
. (51)
S1 1 + 1 + 4( + 1)/A1
To calculate the optimal chain length Nopt , under a given
CLoad , (50) and (51) are substituted in (46) to obtain the delay
D as follows:
D = t0 p_D M L
(2s +1)
(2s2 +1) (s3 +1)
1
(s2 +1)
+
+n/ p
+
3s1
2s1
3s2
2s2
&
'
.
n/ p A10.5
(N 2)
(N 2) 2 3
(1+n/ p ) +
2
+
2 32
2
2
(52)
Consequently, S1 S3 from (48), Table V is substituted
in (52), which is then, using (53), N differentiated and zero
equated as follows:
2 ln S (1+2SAN+1
1
1 +4(1+ ))
N A1 =
.
(53)
ln(A1 )
Finally, we reach (54) that only contain A1 as follows:
(A0.5 + b) 1 + (1 + )
n/ p 1
$
%2
8
2S
(
A
)
2
1
n/ p
.
0.5
1
A1
0.5
A1 + N(A1 ).
&
'
n/ p A10.5
ln(A1 )
+ (1+n/ p )+
=
0
4b.(1+ )
N( A1 )
2
2
2
A1
A (1+b)
1
(54)
A1.5
1
n/ p
S2 ( A1 )
S N 1
N 1
A2 S
2 ( A1 )
SN
N1
A 2
1
n/ p
S2 ( A1 )
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12