Académique Documents
Professionnel Documents
Culture Documents
Hong Li, Jianping Hu, and Cheng Zhang Ningbo University Ningbo City, Zhejiang, China
nbhjpgyahoo.com.cn
Abstract-A register file is one of the most power-consuming blocks in microprocessors because it contains large capacitances on bit lines, word lines, address lines, and storagecell array, and is frequently accessed. This paper presents a novel low-power register file that is realized entirely based on adiabatic logic. The proposed register file consists of a storagecell array, address decoders, read/write control circuits, sense amplifiers, and read/write drivers. An N-type dual transmission gate adiabatic logic (N-DTGAL) is used to drive read/write bit lines and read word lines with large capacitances. A P-type dual transmission gate adiabatic logic (P-DTGAL) that is complementary to the N-DTGAL is used to drive write word lines and power the storage cells, so that energy of the storage cells can be well recovered before new values are written. HSPICE simulations indicate that the proposed register file achieves considerable energy savings over similar implementations.
I.
INTRODUCTION
Power dissipation has become a critical concern in VLSI circuits, as density and operating speed of CMOS chips increase [1]. Adiabatic computing, which utilizes AC power supplies to recycle the energy of node capacitances, is a particularly attractive approach to reduce power dissipation. Over the past decade, several adiabatic logic families were proposed and achieved considerable energy savings over conventional CMOS circuits [2-9]. The current adiabatic circuits can be classified into two types: full-adiabatic circuits, which don't have non-adiabatic loss, and quasi-adiabatic circuits, which have non-adiabatic loss. The full-adiabatic circuits are much more complex than the quasi-adiabatic circuits. For example, the complexity of a 16-bit carry-lookahead adder (CLA) in a fully reversible manner is about 32 times that of static CMOS CLA [2]. The quasi-adiabatic circuits have relatively simple architecture and power clock system. Moreover, the quasi-adiabatic logic can operate at a higher frequency, thus it is a promising scheme for practical applications, such as multipliers [9] and register files [10]. Several quasi-adiabatic circuits have been reported, such
as the 2N-2N2P logic [3], the clocked CMOS adiabatic logic (CAL) [4], the efficient charge recovery logic (ECRL) [5], and the pass-transistor adiabatic logic with NMOS pull-down configuration (PAL-2N) [6]. Although these circuits consume lower power than the conventional CMOS, they have non-adiabatic energy loss on output nodes, and their energy loss highly depend on load capacitances. Some quasiadiabatic circuits can efficiently recover the charge of output capacitances by utilizing the bootstrapping technique [7], but the size of bootstrapping switch must be sufficiently large and the non-adiabatic loss of internal nodes is not small [8]. Large macro blocks are necessary in order to realize a complete system such as a microprocessor. One of these blocks is a register file. A register file is one of the most power-consuming blocks in microprocessors because it contains large capacitances on bit lines, word lines, storagecell array, and address lines, and is frequently accessed [11]. In recent years, several adiabatic memories have been reported. In [10, 12, 13], adiabatic circuits, such as ECRL, 2N-2N2P, and PAL-2N etc., are used to drive address lines, bit lines, and word lines. However, there is large nonadiabatic energy loss on large capacitance nodes in these designs. In [14, 15], the charge of nodes with large capacitance can be well recovered by using the bootstrapping technique, but storage cells are still powered by a DC supply, so that short-circuit power consumption in storage cells is not small for write operation because of gradually rising and falling clocked-signals [15]. In [16], the charge of nodes including storage cells can be well recovered by using P-type and N-type dual transmission gate adiabatic logic circuits, but read and write operations can not completed with a clock period and control circuits are more complex than the other implementations. This paper presents an improved design of adiabatic register file in [ 16]. The N-type dual transmission gate adiabatic logic (N-DTGAL) is used to drive read/write bit lines and read word lines with large capacitances, while the P-type dual transmission gate adiabatic logic (P-DTGAL) is used to drive write word lines and power the storage cells. The storage cells are also modified for the proposed design.
728
Read and write operations can completed with a clock period and control circuits are also simplified.
ADIABATIC DRIVER The typical adiabatic circuit 2N-2N2P is showed in Fig. l(a) [3]. Cascaded 2N-2N2P gates are driven by four-phase power-clocks, as shown in Fig. 1 (b). A complementary logic (2P-2P2N) also exists, as shown in Fig. l(b) [16]. Its structure and operation are complementary to the 2N-2N2P. Cascaded 2N-2N-2P and 2P-2P2N gates are driven by the same four-phase power-clocks, as shown in Fig. l(c). The simulated waveforms for the two circuits are shown Fig. l(d). It can be seen that they have non-adiabatic energy loss on output nodes. Their non-adiabatic loss is dependent on the load capacitance. Therefore, if they are used for driving large load capacitance on the bit-lines and word-lines of SRAM, non-adiabatic energy loss is large.
II.
OUT
OUT
iN N
IN
OUT
IN
-J
NiN1b
IN
To overcome this disadvantage, a N-type dual transmission gate adiabatic logic was presented in [3], as shown in Fig. 2. The power-clock cJ charges the output (OUT or OUT ) through Ni and PI (or Nib and P2) by control of the inputs (IN and IN). The energy of output nodes is recovered to qJ through N1 and P1 (or N2 and P2) by control of the feedback signals (FIN and FIN ), which are from the outputs of the next-stage buffer. For the final-stage NDTGAL gate in a pipelined chain, an additional 2N-2N2P (or ECRL) buffer is used and its outputs (FIN4 and FIN 4) control energy-recovery of the final-stage N-DTGAL gate. A P-type DTGAL also exists, as shown in Fig. 3 [16], and its structure and operation are complementary to the NDTGAL. For the final-stage P-DTGAL gate in a pipelined chain, an additional 2P-2P2N buffer is used and its outputs (FIN4 and FIN4) control energy-recovery of the final-stage P-DTGAL gate. Cascaded P-DTGAL and N-DTGAL gates are driven by the same four-phase power-clocks as the 2N2N2P. The simulated waveforms for the P-DTGAL and NDTGAL circuits are shown Fig. 4. It can be seen that NDTGAL and P-DTGAL haven't non-adiabatic loss on output loads. Although the additional 2N-2N2P (or 2P-2P-2N) buffer has the non-adiabatic energy loss cvp (or CV2N ) this energy loss is small, because the capacitance C, which mainly consists of gate capacitance of input transistors in the N-DTGAL and P-DTGAL buffers, is far smaller than the load capacitance CL. Therefore, they are suitable for driving large capacitance to realize power-efficient design.
FIN
FIN
OUT
IN
IN
OUT
I 1
rN1lP1
VDD (b) 2P-2P-2N and symbol
IN
P2lN2H
ib
(])
IN
OUT
IN
ND
OUT
OUT
N3
N4
FIN
IN
o1
out4o3
ou4
(fi2
CL
T
2N-2N2P
3
(i,
41)2
N
413
out2
N
out3
4b4
ND
41)
011(4
IN b@=
.....
= C L~~~~~~~~~~~C
2P-2P2N
/FIN4 N
_
10
20
30
40
50
Time ( ns
2N-2N2P buffer large size transistors (b) N-type DTGAL adiabatic driver
729
FIN
FIN
B.
P1
IN
OUT
IN
N2 P2
Pib Ia-IN
II'V
G
OUT
OUT
FIN
VDD
c1b3
out2
out3
fi4
out4
Address Decoder Separate address decoders are used for read and write operations, as shown in Fig. 6. The 6-bit address is divided into the two-level address decoding. The 3-bit addresses are pre-decoded. The read/write address-decoding signals are produced by using AND gates with the two output of the predecoding and the read/write enable signals (RE and WE). The two-level address decoders are realized using 2P-2P2N and 2N-2N2P for write and read address-decoding, respectively. The write word-lines (WWL) are driven by PDTGAL, while the read word-lines (RWL) are driven by NDTGAL
C. Sense Circuit and Read Driver The sense amplifier is shown in Fig. 7. Its operation is similar to N-DTGAL circuit. The charge of read bit-lines is recovered to power-clock (qt3) through N1 and P1 (or N2 and P2) by control of the outputs of the 2N-2N2P. The write bitlines word-lines (WBL) are driven by N-DTGAL, because they contain large capacitance. RBL is charged when q goes high for correct write operation timing.
/0cell
IN
using
2P-2P2N buffer
WWL
......., (f4
P-DTGAL
tlo
c
3_
A0~~~0t
0 10 20 Time 30
(
WBL
RBL
WBL
RBL
40
50
ns
RWL
ADIABATIC REGISTER FILE The adiabatic register file consists of a storage-cell address decoders, read/write control circuits, amplifiers, and read/write drivers.
III.
A. Storage Cell The storage-cell structure is similar as a conventional memory cell, as shown in Fig. 5. The supply of storage cells is connected to ckcell, which is powered by the output of PDTGAL circuits instead of a fixed DC supply. The two pairs of access transistors are enabled by WWL (write word-line) and RWL (read word-line) for write and read operations, respectively. The write access transistors are two PMOS ones, but not NMOS in the other implementations. The memory array is composed of a multiplicity of these cells arrayed horizontally and vertically. The WWL, RWL and ckcell of a row are connected along the horizontal axis, while the read bit-lines (RBL and RBL ) and the write bit-lines (WBL and WBL ) are connected for all cells in a column.
Ao A1 A2 A3 A4 As WE
RE
730
RE Address
A..
1------t------
If2
ck3
The timing diagram for read operation is shown in Fig. 8. During T1, read address pre-decoding is processed, and the RE (read enable signal) is prepared. During T2, The read address-decoding signal is produced. During T3, The RWL (read word line) is selected and the RBL (read bit line) follows the RWL or stays at a ground level. During T4, the RD (read data) is carried out. The read operation can be completed in one cycle.
E.
Address
Write address pre-decoding
(cell X/Y
Write driver and timing diagram for write operation is shown in Fig. 9 and Fig. 10. During T1, write address predecoding is processed. During T2, The write addressdecoding signal is produced. During T2, cell is discharged by the P-DTGAL circuits, so that the charge stored in storage cells is recovered before new values are written. During T3, the WWL is selected. During T4, the write operation is completed by rising WBL and dkceii The register file can execute one write and one read operation within a period.
WWL WD WBL
||.t_,1VI1|Nls^'
X.
>
l's"s
'1
Q1J4
>
rr
11
12
13
14
WBL
,,
11
iL
/1
1ss
_
,_
r------r | o i
^
0_
11
|, ,//
"
"l
_l
T1
T2
T3
T4
731
[1] J. M. Rabaey, M. Pedram, Low Power Design Methodologies, Kluwer Academic Publishers, Boston, 1996. [2] J. Lim, D. G. Kim, S. I. Chae, A 16-bit carry-lookahead adder using reversible energy recovery logic for ultra-low-energy systems. IEEE Journal of Solid-State Circuits 34(6) (1999) 898-903. [3] A. Kramer, J. S. Denker, B. Flower, J. Moroney, 2nd order adiabatic computation with 2N-2P and 2N-2N2P logic circuits, Proceedings of International Symposium on Low Power Design, Dana Point, April 1995,pp. 191-196. 2. 'I'|'Is |. WE 111 '1s' jI [4] D. Maksimovic, V. G. Oklobdzija, B. Nikolic, K. W. Current, .5 ii.... 1 ....1 wK;-1 .11 Clocked CMOS adiabatic logic with integrated single-phase power2..5 - 1 r1 ,1 A;;;; 2 clock supply, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 8 (4) (2000) 460-463 2000. 2. 0 [5] Y. Moon, D. Jeong, An efficient charge recovery logic circuit, IEEE 5 - k;;1 ;; ',;;;,1/;,III ; A; en ;*;;w;;;;;; A;; L (f3 Journal of Solid-State Circuits 31 (4) (1996) 514-521. 2. [6] F. Liu, K. T. Lau, Pass-transistor adiabatic logic with NMOS pullI5 Ii I Ai Iii Ii down configuration, Electronics Letters 34(8) (1998) 739-741. I iVvB iiIi 2. 0 5 RWcLl iY Jf\1 [7] R. C. Chang, P. -C. Hung, I.-H. Wang, Complementary passf2 C) transistor energy recovery logic for low-power applications, IEE to C. Proceedings-Computers and Digital Techniques 149(4) (2002) 146S 1 ' , <TmVn) 0 2. O5 _ 0 i 151. .5 V;; A if iT iT ..II..II..I l lIl ~~I l;;;~~~~~~ RE};I;;|;;l; ;;l; \;;1;;1 ;\; [8] J. P. Hu, W. J. Zhang, Y. S. Xie, Complementary pass-transistor rs 0 ~~~i I !!I I!!I1 !! 2. 0ll1 ~~~ !!!I adiabatic logic and sequential circuits using three-phase power 40 60 80 90 1.,I, 100 1301 .5 30 ,I; 50 ;,1, 70 ;;I,,I,I,I,, 10 12 ,;I, 4150II ;;I, 5 |iii .ii ; .:| .. |.. |:,.:| ,: :.:| .::RWL. |, supply, Proceedings of 47TH Midwest Symposium on Circuits and 2. Systems, Hiroshima, July 2004, pp.201-204. 4b3 [9] S. Kim, C. H. Ziesler, M. C. Papaefthymiou, A true single-phase 2. energy-recovery multiplier, IEEE Transactions on Very Large Scale 11 |@I 30 40* 0 90l 10 120j 130. 14 15,0lT 60 fl:| 80 S1 70 ., \is |...:, K Ti m)\.!|?.0|i. \.S t. ;| \|r v nsi. Integration (VLSI) Systems 11(2) 2003 194-207. [10] Y. Moon, D. K. Jeong, A 32 x 32-b adiabatic register file with supply clock generator, IEEE Journal of Solid-State Circuits, 33 (5) (1998) 696-701. [11] J. Montanaro, R. T. Witek, K. Anne, et al., A 160-MHz, 32-b, 0.5-W Figure 1 1. Simulation waveforms of adiabatic register file. CMOS RISC microprocessor, IEEE Journal of Solid-State Circuits 31 (11) (1996) 1703-1714. TABLE I. ENERGY DISSIPATION BREAKDOWN [12] S. Avery and M. Jabri, A three-port adiabatic register file suitable for embedded applications, Proceedings of international symposium on Function ECRL-based Proposed Reference [16] Low power design, Monterey, 1998, pp. 288-292. [13] K. W. Ng, K. T. Lau, A novel adiabatic register file design, Journal of Word-line driver 0.35pJ 0.7lpJ 0.35pJ Circuits, Systems, and Computers 10 (1) (2000) 67-76. Write bit-line driver 0.23pJ 2.1pJ 0.23pJ Sense amp. output 0.36pJ [14] N. Tzartzanis, W. C. Athas, Energy recovery for the design of high2.6pJ 0.36pJ driver speed, low-power static RAMs, Proceedings of International Symposium on Low Power Design, Monterey, August 1996, pp. 55Single memory cell 0.06pJ 0.1 6pJ 0.06pJ 60. Address decoder 7.5pJ lpJ 12pJ [15] J. P. Hu, T. F. Xu, H. Li, A Lower-power register file based on complementary pass-transistor adiabatic logic, IEICE Transactions on Informations and Systems E88-D (7) (2005) 1479-1485. IV. CONCLUSION
O ;1 1.l
Simulations The register file is simulated using ideal four-phase sinusoidal power-clocks with 0.25gtm TSMC process. Sinusoidal power-clocks have more practical significance, as it can be easily produced [ 17]. The simulation waveforms are shown in Fig. 11. To reduce simulation time, the simulations for energy consumption is carried out on a subset that includes an 8x8 cell array, 8 bit word line drivers, 8 bit write line drivers, 8 sense amplifiers, 8 bit read output drivers, address decoder. Table 1 shows a breakdown of the energy dissipation of the subset at 1OOMHz. The proposed register file exhibits lower dissipation compared to the ECRL-based. Compared with [ 16], The register file exhibits lower dissipation in the address decoder, because the control circuits have been simplified.
ACKNOWLEDGMENT
This work is supported by the Zhejiang Science and Technology Project of China (No. 2006C31012), Zhejiang Provincial Natural Science Foundation of China under Grant No. Y104327, and Ningbo Natural Science Foundation (2006A61 0005).
REFERENCES
r.
l ,,
-,-
1.
,1
11
...
I;; ;;;I /,
.I,,I,,
,,
1,,
A low-power register file entirely based on adiabatic logic is presented. The power consumption of the proposed register file is significantly reduced, because the charge of large node capacitances on storage cells, bit-lines and wordlines is well recovered. One read and one write operations can be completed with a clock period.
[16] J. P. Hu, H. Li, H. Y. Domg, A low-power adiabatic register file with two types of energy-efficient line drivers, Proceedings of 48TH Midwest Symposium on Circuits and Systems, August 7-10, 2005.
[17] D. Maksimovic, V. G. Oklobdzija, Integrated power clock generators for low energy logic, Proceedings of IEEE Power Electronics Specialists Conference, Atlanta, June 1995, pp.61-67.
732