A Low Power 900 MHZ Register File

A LOW POWER 900 MHz REGISTER FILE 8 PORTS, 32 words x 64 bits IN 1.8 V, 0.
25m SOI TECHNOLOGY

R. V. Joshi, W. Hwang, S. Wilson, G. Shahidi, and C. T. Chuang
power 900 MHz dynamic register le 6 Read and 2 Write ports, 32 wordlines x 64 bitlines. Such a register le is designed for bulk silicon technology but is fabricated in 0.25m Silicon on Insulator SOI technology without any body contacts. This paper also proposes a new method to extract the performance gain which is limited by the tester speed of a register le in bulk and SOI technology based on internal picoprobe measurements along the critical path. Based on the hardware and simulation data the register le is capable of functioning at 900 MHz for read and write operations in a single cycle. The register le can even function above 1 GHz for read operation. A power reduction of 8-12 is realized for SOI over bulk technology especially at higher frequencies.
Abstract| This paper shows full functionality of a low
Multi-port register les play a signi cant role for parallel processing of instructions in high speed microprocessors or digital signal processors DSP. These lications require low power, high performance and low cost. In our recent publications we have presented architecture and performance of multi-port self resetting CMOS SRCMOS register le using "dual rail" input signals 1, 2 . It has been pointed out that the race conditions in self resetting CMOS circuits can cause "pulse collisions" and may result in "non-functionality" and high power consumption. This particular problem gets worse when "external dual-rail" signals are used. This even aggravates when SOI technology is used. In this paper we have modi ed the register le by providing all the external read port addresses with the "single rail" signals interlocked with a global "read pointer". Since the "cycle time" measurement is limited due to tester limit we have proposed a method based on "internal pico-probing" to evaluate cycle time of this "single rail input" register le. Also we have evaluated performance gain due to junction capacitance reduction and dynamic threshold voltages when bulk design is m ed into SOI technology. We have demonstrated that using SOI the power consumption is reduced by 10-12 compared to bulk technology. A preliminary architecture of the "dual rail" input register le is described in a recent publications 1 . Here we will concentrate mainly on the modi cation of the above architecture to make the register le even robust using
R. V. Joshi, W. Hwang, and C. T. Chuang are with the IBM Research Division, T. J. Watson Research Center, Yorktown Heights, NY 10598. S. Wilson, and *G. Shahidi are with the IBM Semiconductor Research and Development Center, Hopewell Junction, NY 12533.
I. INTRODUCTION
II. CIRCUITS
single rail inputs. Also we will describe its hardware performance in SOI technology using 0.15m Le . Figure 1 shows the forward read path for a read port. There are total 5 "single rail" read addresses which are internally converted to "dual rail" using a read pointer "RSTRA" or "global clock" Fig. 1a. This "pointer" synchronizes these single rail signals which are in "DC" state. The "pointer" generates the internal pulses with ropriate states for proper functionality i.e "active" "low" and "standby" "high". Figure 1 b shows the forward path of the read port for a 1 bit cross-section using the dual rail signals generated in Fig. 1 a. To read the data from the cell various stages in read port such as "decoder", "wordline driver", "read mux" and "output latches" are provided. A simplied mutliple-chain reset path is used to reset the nodes to "standby" state on completion of evaluation at that node. Pulse widening of reset signals is minimized by optimizing the gate dimensions used in the inverters. Since the input addresses are "synchronized" with respect to the read pointer and delayed "LSB" is used for reset generation which is interlocked with the "LSB" iteself Fig. 2 the "pulse collision" is signi cantly minimized in either bulk or SOI technology. Write port circuitry for ports A and B is shown in Fig. 3. The forward path for a 1 bit cross-section shown in Fig. 3 a has following stages- receiver with the input of ve single rail addresses, generation of "dual rail signals", decoding of write wordlines" with the priority to B port when both the write enables WEAIN, WEBIN are active. Write Address Timing signal WATS synchronizes the timing of write addresses of port A and B. This reduces "pulse collision" commonly observed in self-resetting circuits. Fig. 3 b shows the write enable and priority signal generations. To enable writing a global write enable timing signal WETS alongwith individual write enable for port A WEAIN and WEBIN are provided. WEBIN signal is also used to create a signal WEBI which ensures write priority to port B when both WEBIN AND WEAIN are active. When either signal WEBIN, WEAIN is active the writing is done in the corresponding ports port B or A. The signals WEA, WEB generated by the timing signals such as WEAIN, WEBIN are used to decode and create complementary write wordlines. These wordlines are used to trigger the transmission gates to write the data into the cell. This data is muxed with read wordlines to 1
generate the read output Fig. 3c. The reset chain shown in Fig. 3 d for write ports is used to chop or enlarge incoming write signals using "CUT" signal. In addition it is used to reset the evaluated nodes of the write paths see Fig. 3 a and b for decoders NOR, NAND. The technology features are summarized in Table 1. The register le is fabricated in a 0.25m technology with Le of about 0.15m and gate oxide of 40 nm. The partially depleted SOI devices are used in fabricating the register le. The optical micrograph of a full functional chip of the register le with area 1.5 mm x 1.8 mm is shown in Fig. 4.
III. PROCESS TECHNOLOGY
IV. HARDWARE AND SIMULATION RESULTS

Fig. 5 shows the timing diagram for critical signals such as read addresses RA 0:4 , read pointer RSTRA and write addresses WBA 0:4 and Write address timing WATS and write enable WEA, WEB signals for write ports A and B. By changing the timing of the RSTRA and WATS signals the timings of read and write wordlines can be changed. If the read wordline preceeds the write wordline the read before write RBW can be performed. If the sequence is reversed the write before read WBR can be performed. Based on this timing diagram and due to the limitation in testing at higher frequencies than 666 MHz following method is adopted to evaluate the cycle time. For the proper functionality of WBR operation in one cycle WATS signal should generate the write wordline and then read wordline should read the data written in the cell. Thus the total delay from WATS to the read output would form minimum cycle time. For RBW operation the data written in previous cycle is read rst and then writing is done in the same cycle. Since the write port is slower compared read port the minimum cycle time in this case is determined by the sum of how late the write wordline arrives in the same cycle and the access time to read the data from the read wordline to the output Fig. 6. Thus the minimum cycle time in RBW = delay from WATS to Write wordline + delay from read wordline to data output. For the RBW operation the total time needed for WATS to write wordline is 740 ps and read wordline to output is 270 ps. The minimum cycle time in this case is 1.1 ns. The read access for this operation is close to 600 ps. Similarly for WBR operation cycle time can be obtained. Fig.7 shows delays across various paths. The delay from WATS to WWLB is 770 ps and from read wordline RWLA1 to output OA1 is 400 ps. The minimum cycle time for WBR is 1.17 ns 770 ps + 400 ps for "1" to be written rst and then read in the same cycle. Once again access time is close to 600 ps. The analysis for bulk wafer can be done in similar fashion. The minimum cycle time for WBR operation is 1.37 ns sum total of delay between WATS to WWLB 820 ps
and RWLA1 to OA1 550 ps. The access time for this operation is 920 ps Fig. 8. For RBR operation the cycle time for bulk case is 1.45 ns with access time of around 940 ps Fig. 9. In all these cases it is arent that the cycle time improvement for SOI over bulk is more than 15 and access time improvement is more than 25. This is related to reduction in junction capacitance and variable threshold voltages observed during switching. Simulation data shown for multiple cycles show similar trends observed in the hardware performance. Proper waveform propagation at internal nodes are also shown in Figs. 10 and 11. The simulated minimum cycle time for SOI as shown in Fig. 11 is close to 1.05 ns while that of bulk is close to 1.3 ns for WBR operation Fig. 11. Once again the improvement of SOI over bulk is more than 15. The cycle time of one "read" operation in a cycle exceeds over 1.2 GHz Not shown here. This performance improvement is attributed to capacitance reduction and dynamic threshold voltages. Power is plotted against frequency for SOI and bulk wafers in Fig. 12. SOI shows reduction 8-12 at higher frequencies in power due to reduction in junction capacitances over bulk. The total worst case power including all the ports is close to 1.368 W at 1.9 V and 666 MHz for SOI. Functionality of a dynamic register le with single rail input addresses is demonstrated in 0.25 m SOI and bulk technology. A method is proposed to evaluate high frequencies which are limited by tester speed. Based on the hardware data and internal pico-probing it is shown that the register le is quite capable of functioning around 900 MHz and 730 MHz in SOI and bulk respectively. Simulation data show similar trends in performance. SOI shows reduction in power by 10-12 compared to bulk wafers. The performance gain in SOI is attributed to reduction in capacitance and dynamic threshold voltages. The circuit techniques implemented in this dynamic register le can be extended to multi-gegahertz microprocessor design.
V. CONCLUSIONS
REFERENCES
1 W. H. Henkels, W. Hwang and R. V. Joshi," A 500 MHz 32-Word x 64-Bit 8-Port Self-Resetting CMOS Register le and associated dynamic-to-static latch," VLSI Design Symp., 1997. 2 R. V. Joshi et al., "A 666 MHz self-resetting 8 Port, 32x64 bits register le and latch in 0.25m SOI technology," Proc. 1998 IEEE Int. SOI Conf., Oct 1998, .131-132. 2

A Low Power 900 MHZ Register File

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A Low Power 900 MHZ Register File

Transféré par

Droits d'auteur :

Formats disponibles

A LOW POWER 900 MHz REGISTER FILE 8 PORTS, 32 words x 64 bits IN 1.8 V, 0.

25m SOI TECHNOLOGY

Abstract| This paper shows full functionality of a low

III. PROCESS TECHNOLOGY

IV. HARDWARE AND SIMULATION RESULTS

Vous aimerez peut-être aussi

A Low Power 900 MHZ Register File

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A Low Power 900 MHZ Register File

Transféré par

Droits d'auteur :

Formats disponibles

A LOW POWER 900 MHz REGISTER FILE 8 PORTS, 32 words x 64 bits IN 1.8 V, 0.

25m SOI TECHNOLOGY

Abstract| This paper shows full functionality of a low

III. PROCESS TECHNOLOGY

IV. HARDWARE AND SIMULATION RESULTS

Vous aimerez peut-être aussi

A LOW POWER 900 MHz REGISTER FILE 8 PORTS, 32 words x 64 bits IN 1.8 V, 0.

25m SOI TECHNOLOGY