Vous êtes sur la page 1sur 5

Improved Design of Low-Power Register File Using P-type Adiabatic Line Drivers

Hong Li, Jianping Hu, and Cheng Zhang Ningbo University Ningbo City, Zhejiang, China

nbhjpgyahoo.com.cn

Abstract-A register file is one of the most power-consuming blocks in microprocessors because it contains large capacitances on bit lines, word lines, address lines, and storagecell array, and is frequently accessed. This paper presents a novel low-power register file that is realized entirely based on adiabatic logic. The proposed register file consists of a storagecell array, address decoders, read/write control circuits, sense amplifiers, and read/write drivers. An N-type dual transmission gate adiabatic logic (N-DTGAL) is used to drive read/write bit lines and read word lines with large capacitances. A P-type dual transmission gate adiabatic logic (P-DTGAL) that is complementary to the N-DTGAL is used to drive write word lines and power the storage cells, so that energy of the storage cells can be well recovered before new values are written. HSPICE simulations indicate that the proposed register file achieves considerable energy savings over similar implementations.
I.

INTRODUCTION

Power dissipation has become a critical concern in VLSI circuits, as density and operating speed of CMOS chips increase [1]. Adiabatic computing, which utilizes AC power supplies to recycle the energy of node capacitances, is a particularly attractive approach to reduce power dissipation. Over the past decade, several adiabatic logic families were proposed and achieved considerable energy savings over conventional CMOS circuits [2-9]. The current adiabatic circuits can be classified into two types: full-adiabatic circuits, which don't have non-adiabatic loss, and quasi-adiabatic circuits, which have non-adiabatic loss. The full-adiabatic circuits are much more complex than the quasi-adiabatic circuits. For example, the complexity of a 16-bit carry-lookahead adder (CLA) in a fully reversible manner is about 32 times that of static CMOS CLA [2]. The quasi-adiabatic circuits have relatively simple architecture and power clock system. Moreover, the quasi-adiabatic logic can operate at a higher frequency, thus it is a promising scheme for practical applications, such as multipliers [9] and register files [10]. Several quasi-adiabatic circuits have been reported, such

as the 2N-2N2P logic [3], the clocked CMOS adiabatic logic (CAL) [4], the efficient charge recovery logic (ECRL) [5], and the pass-transistor adiabatic logic with NMOS pull-down configuration (PAL-2N) [6]. Although these circuits consume lower power than the conventional CMOS, they have non-adiabatic energy loss on output nodes, and their energy loss highly depend on load capacitances. Some quasiadiabatic circuits can efficiently recover the charge of output capacitances by utilizing the bootstrapping technique [7], but the size of bootstrapping switch must be sufficiently large and the non-adiabatic loss of internal nodes is not small [8]. Large macro blocks are necessary in order to realize a complete system such as a microprocessor. One of these blocks is a register file. A register file is one of the most power-consuming blocks in microprocessors because it contains large capacitances on bit lines, word lines, storagecell array, and address lines, and is frequently accessed [11]. In recent years, several adiabatic memories have been reported. In [10, 12, 13], adiabatic circuits, such as ECRL, 2N-2N2P, and PAL-2N etc., are used to drive address lines, bit lines, and word lines. However, there is large nonadiabatic energy loss on large capacitance nodes in these designs. In [14, 15], the charge of nodes with large capacitance can be well recovered by using the bootstrapping technique, but storage cells are still powered by a DC supply, so that short-circuit power consumption in storage cells is not small for write operation because of gradually rising and falling clocked-signals [15]. In [16], the charge of nodes including storage cells can be well recovered by using P-type and N-type dual transmission gate adiabatic logic circuits, but read and write operations can not completed with a clock period and control circuits are more complex than the other implementations. This paper presents an improved design of adiabatic register file in [ 16]. The N-type dual transmission gate adiabatic logic (N-DTGAL) is used to drive read/write bit lines and read word lines with large capacitances, while the P-type dual transmission gate adiabatic logic (P-DTGAL) is used to drive write word lines and power the storage cells. The storage cells are also modified for the proposed design.

1-4244-0173-9/06/$20.00 2006 IEEE.

728

Read and write operations can completed with a clock period and control circuits are also simplified.
ADIABATIC DRIVER The typical adiabatic circuit 2N-2N2P is showed in Fig. l(a) [3]. Cascaded 2N-2N2P gates are driven by four-phase power-clocks, as shown in Fig. 1 (b). A complementary logic (2P-2P2N) also exists, as shown in Fig. l(b) [16]. Its structure and operation are complementary to the 2N-2N2P. Cascaded 2N-2N-2P and 2P-2P2N gates are driven by the same four-phase power-clocks, as shown in Fig. l(c). The simulated waveforms for the two circuits are shown Fig. l(d). It can be seen that they have non-adiabatic energy loss on output nodes. Their non-adiabatic loss is dependent on the load capacitance. Therefore, if they are used for driving large load capacitance on the bit-lines and word-lines of SRAM, non-adiabatic energy loss is large.
II.

OUT

OUT
iN N

IN

OUT

IN

-J

NiN1b

IN

(a) 2N-2N2P and symbol

To overcome this disadvantage, a N-type dual transmission gate adiabatic logic was presented in [3], as shown in Fig. 2. The power-clock cJ charges the output (OUT or OUT ) through Ni and PI (or Nib and P2) by control of the inputs (IN and IN). The energy of output nodes is recovered to qJ through N1 and P1 (or N2 and P2) by control of the feedback signals (FIN and FIN ), which are from the outputs of the next-stage buffer. For the final-stage NDTGAL gate in a pipelined chain, an additional 2N-2N2P (or ECRL) buffer is used and its outputs (FIN4 and FIN 4) control energy-recovery of the final-stage N-DTGAL gate. A P-type DTGAL also exists, as shown in Fig. 3 [16], and its structure and operation are complementary to the NDTGAL. For the final-stage P-DTGAL gate in a pipelined chain, an additional 2P-2P2N buffer is used and its outputs (FIN4 and FIN4) control energy-recovery of the final-stage P-DTGAL gate. Cascaded P-DTGAL and N-DTGAL gates are driven by the same four-phase power-clocks as the 2N2N2P. The simulated waveforms for the P-DTGAL and NDTGAL circuits are shown Fig. 4. It can be seen that NDTGAL and P-DTGAL haven't non-adiabatic loss on output loads. Although the additional 2N-2N2P (or 2P-2P-2N) buffer has the non-adiabatic energy loss cvp (or CV2N ) this energy loss is small, because the capacitance C, which mainly consists of gate capacitance of input transistors in the N-DTGAL and P-DTGAL buffers, is far smaller than the load capacitance CL. Therefore, they are suitable for driving large capacitance to realize power-efficient design.
FIN
FIN

OUT
IN

IN

OUT

I 1

rN1lP1
VDD (b) 2P-2P-2N and symbol
IN

P2lN2H
ib

(])

IN
OUT

IN

ND

OUT

OUT

N3

N4

FIN

IN

o1

out4o3

ou4

(fi2
CL

T
2N-2N2P
3

(i,

(a) Schematic and symbol


,P1
IN
out1

(c) Buffer chain and power clock


04

41)2
N

413
out2
N
out3

4b4
ND

41)

011(4
IN b@=

.....

= C L~~~~~~~~~~~C

2P-2P2N

/FIN4 N
_

N-DTGAL buffer using


11

10

20

30

40

50

Time ( ns

2N-2N2P buffer large size transistors (b) N-type DTGAL adiabatic driver

(d) Simulated waveforms Figure 1. 2N-2N2P and 2P-2P-2N buffers.

Figure 2. N-type DTGAL buffer.

729

FIN

FIN

B.

P1
IN
OUT

IN

N2 P2
Pib Ia-IN
II'V
G
OUT
OUT

FIN

VDD

(a) Schematic and symbol


4b2
out1

c1b3
out2
out3

fi4
out4

Address Decoder Separate address decoders are used for read and write operations, as shown in Fig. 6. The 6-bit address is divided into the two-level address decoding. The 3-bit addresses are pre-decoded. The read/write address-decoding signals are produced by using AND gates with the two output of the predecoding and the read/write enable signals (RE and WE). The two-level address decoders are realized using 2P-2P2N and 2N-2N2P for write and read address-decoding, respectively. The write word-lines (WWL) are driven by PDTGAL, while the read word-lines (RWL) are driven by NDTGAL
C. Sense Circuit and Read Driver The sense amplifier is shown in Fig. 7. Its operation is similar to N-DTGAL circuit. The charge of read bit-lines is recovered to power-clock (qt3) through N1 and P1 (or N2 and P2) by control of the outputs of the 2N-2N2P. The write bitlines word-lines (WBL) are driven by N-DTGAL, because they contain large capacitance. RBL is charged when q goes high for correct write operation timing.
/0cell

IN

P-DTGAL buffer large size transistors

using

2P-2P2N buffer

(b) P-type DTGAL adiabatic driver


Figure 3. P-type DTGAL buffer.
N-DTGAL
3
-1

WWL

......., (f4
P-DTGAL

tlo
c

3_

A0~~~0t
0 10 20 Time 30
(

WBL
RBL

WBL
RBL

40

50

ns

Figure 4. Simulated waveforms for N-DTGAL and P-DTGAL adiabatic


drivers.

RWL

ADIABATIC REGISTER FILE The adiabatic register file consists of a storage-cell address decoders, read/write control circuits, amplifiers, and read/write drivers.
III.

Figure 5. Storage cell.


array, sense

A. Storage Cell The storage-cell structure is similar as a conventional memory cell, as shown in Fig. 5. The supply of storage cells is connected to ckcell, which is powered by the output of PDTGAL circuits instead of a fixed DC supply. The two pairs of access transistors are enabled by WWL (write word-line) and RWL (read word-line) for write and read operations, respectively. The write access transistors are two PMOS ones, but not NMOS in the other implementations. The memory array is composed of a multiplicity of these cells arrayed horizontally and vertically. The WWL, RWL and ckcell of a row are connected along the horizontal axis, while the read bit-lines (RBL and RBL ) and the write bit-lines (WBL and WBL ) are connected for all cells in a column.

Ao A1 A2 A3 A4 As WE

RE

Figure 6. Address decoder.

730

RE Address

A..

Read address pre-decoding Read address decoding RWL RBL


RD

Figure 8. Read operation timing.


WD

,I||| I " \, lI,


L L

1------t------

Figure 7. Sense circuit and raed driver.

D. Read operation Timing

If2

ck3

The timing diagram for read operation is shown in Fig. 8. During T1, read address pre-decoding is processed, and the RE (read enable signal) is prepared. During T2, The read address-decoding signal is produced. During T3, The RWL (read word line) is selected and the RBL (read bit line) follows the RWL or stays at a ground level. During T4, the RD (read data) is carried out. The read operation can be completed in one cycle.
E.

Figure 9. write driver.


WE

Address
Write address pre-decoding
(cell X/Y

Write Driver and Write operation Timing

Write driver and timing diagram for write operation is shown in Fig. 9 and Fig. 10. During T1, write address predecoding is processed. During T2, The write addressdecoding signal is produced. During T2, cell is discharged by the P-DTGAL circuits, so that the charge stored in storage cells is recovered before new values are written. During T3, the WWL is selected. During T4, the write operation is completed by rising WBL and dkceii The register file can execute one write and one read operation within a period.

WWL WD WBL

Figure 10. Write operation timing.

||.t_,1VI1|Nls^'
X.

V, ,' L=0 _H,


r 1*t
o
v

>

l's"s
'1

Q1J4
>

rr

11

12

13

14

WBL

,,

11

iL

/1

1ss

_
,_

r------r | o i
^

0_

11

|, ,//

"

"l

_l

T1

T2

T3

T4

731

[1] J. M. Rabaey, M. Pedram, Low Power Design Methodologies, Kluwer Academic Publishers, Boston, 1996. [2] J. Lim, D. G. Kim, S. I. Chae, A 16-bit carry-lookahead adder using reversible energy recovery logic for ultra-low-energy systems. IEEE Journal of Solid-State Circuits 34(6) (1999) 898-903. [3] A. Kramer, J. S. Denker, B. Flower, J. Moroney, 2nd order adiabatic computation with 2N-2P and 2N-2N2P logic circuits, Proceedings of International Symposium on Low Power Design, Dana Point, April 1995,pp. 191-196. 2. 'I'|'Is |. WE 111 '1s' jI [4] D. Maksimovic, V. G. Oklobdzija, B. Nikolic, K. W. Current, .5 ii.... 1 ....1 wK;-1 .11 Clocked CMOS adiabatic logic with integrated single-phase power2..5 - 1 r1 ,1 A;;;; 2 clock supply, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 8 (4) (2000) 460-463 2000. 2. 0 [5] Y. Moon, D. Jeong, An efficient charge recovery logic circuit, IEEE 5 - k;;1 ;; ',;;;,1/;,III ; A; en ;*;;w;;;;;; A;; L (f3 Journal of Solid-State Circuits 31 (4) (1996) 514-521. 2. [6] F. Liu, K. T. Lau, Pass-transistor adiabatic logic with NMOS pullI5 Ii I Ai Iii Ii down configuration, Electronics Letters 34(8) (1998) 739-741. I iVvB iiIi 2. 0 5 RWcLl iY Jf\1 [7] R. C. Chang, P. -C. Hung, I.-H. Wang, Complementary passf2 C) transistor energy recovery logic for low-power applications, IEE to C. Proceedings-Computers and Digital Techniques 149(4) (2002) 146S 1 ' , <TmVn) 0 2. O5 _ 0 i 151. .5 V;; A if iT iT ..II..II..I l lIl ~~I l;;;~~~~~~ RE};I;;|;;l; ;;l; \;;1;;1 ;\; [8] J. P. Hu, W. J. Zhang, Y. S. Xie, Complementary pass-transistor rs 0 ~~~i I !!I I!!I1 !! 2. 0ll1 ~~~ !!!I adiabatic logic and sequential circuits using three-phase power 40 60 80 90 1.,I, 100 1301 .5 30 ,I; 50 ;,1, 70 ;;I,,I,I,I,, 10 12 ,;I, 4150II ;;I, 5 |iii .ii ; .:| .. |.. |:,.:| ,: :.:| .::RWL. |, supply, Proceedings of 47TH Midwest Symposium on Circuits and 2. Systems, Hiroshima, July 2004, pp.201-204. 4b3 [9] S. Kim, C. H. Ziesler, M. C. Papaefthymiou, A true single-phase 2. energy-recovery multiplier, IEEE Transactions on Very Large Scale 11 |@I 30 40* 0 90l 10 120j 130. 14 15,0lT 60 fl:| 80 S1 70 ., \is |...:, K Ti m)\.!|?.0|i. \.S t. ;| \|r v nsi. Integration (VLSI) Systems 11(2) 2003 194-207. [10] Y. Moon, D. K. Jeong, A 32 x 32-b adiabatic register file with supply clock generator, IEEE Journal of Solid-State Circuits, 33 (5) (1998) 696-701. [11] J. Montanaro, R. T. Witek, K. Anne, et al., A 160-MHz, 32-b, 0.5-W Figure 1 1. Simulation waveforms of adiabatic register file. CMOS RISC microprocessor, IEEE Journal of Solid-State Circuits 31 (11) (1996) 1703-1714. TABLE I. ENERGY DISSIPATION BREAKDOWN [12] S. Avery and M. Jabri, A three-port adiabatic register file suitable for embedded applications, Proceedings of international symposium on Function ECRL-based Proposed Reference [16] Low power design, Monterey, 1998, pp. 288-292. [13] K. W. Ng, K. T. Lau, A novel adiabatic register file design, Journal of Word-line driver 0.35pJ 0.7lpJ 0.35pJ Circuits, Systems, and Computers 10 (1) (2000) 67-76. Write bit-line driver 0.23pJ 2.1pJ 0.23pJ Sense amp. output 0.36pJ [14] N. Tzartzanis, W. C. Athas, Energy recovery for the design of high2.6pJ 0.36pJ driver speed, low-power static RAMs, Proceedings of International Symposium on Low Power Design, Monterey, August 1996, pp. 55Single memory cell 0.06pJ 0.1 6pJ 0.06pJ 60. Address decoder 7.5pJ lpJ 12pJ [15] J. P. Hu, T. F. Xu, H. Li, A Lower-power register file based on complementary pass-transistor adiabatic logic, IEICE Transactions on Informations and Systems E88-D (7) (2005) 1479-1485. IV. CONCLUSION
O ;1 1.l

Simulations The register file is simulated using ideal four-phase sinusoidal power-clocks with 0.25gtm TSMC process. Sinusoidal power-clocks have more practical significance, as it can be easily produced [ 17]. The simulation waveforms are shown in Fig. 11. To reduce simulation time, the simulations for energy consumption is carried out on a subset that includes an 8x8 cell array, 8 bit word line drivers, 8 bit write line drivers, 8 sense amplifiers, 8 bit read output drivers, address decoder. Table 1 shows a breakdown of the energy dissipation of the subset at 1OOMHz. The proposed register file exhibits lower dissipation compared to the ECRL-based. Compared with [ 16], The register file exhibits lower dissipation in the address decoder, because the control circuits have been simplified.

ACKNOWLEDGMENT

This work is supported by the Zhejiang Science and Technology Project of China (No. 2006C31012), Zhejiang Provincial Natural Science Foundation of China under Grant No. Y104327, and Ningbo Natural Science Foundation (2006A61 0005).
REFERENCES

r.

l ,,

-,-

1.

,1

11

...

I;; ;;;I /,

.I,,I,,

,,

1,,

A low-power register file entirely based on adiabatic logic is presented. The power consumption of the proposed register file is significantly reduced, because the charge of large node capacitances on storage cells, bit-lines and wordlines is well recovered. One read and one write operations can be completed with a clock period.

[16] J. P. Hu, H. Li, H. Y. Domg, A low-power adiabatic register file with two types of energy-efficient line drivers, Proceedings of 48TH Midwest Symposium on Circuits and Systems, August 7-10, 2005.
[17] D. Maksimovic, V. G. Oklobdzija, Integrated power clock generators for low energy logic, Proceedings of IEEE Power Electronics Specialists Conference, Atlanta, June 1995, pp.61-67.

732

Vous aimerez peut-être aussi