Vous êtes sur la page 1sur 15

EE577b Register File

By Joong-Seok Moon

Register File

A set of registers that store data Consists of a small array of static memory cells Smallest size and fastest access time in memory hierarchy (Register File On-chip Cache Off-chip Cache Main Memory DISK) Frequently used by microprocessors and DSPs Permits multiple read and write ports

2-read/1-write: Scalar microprocessor (e.g. DLX) 8-read/4-write: Super-scalar microprocessor (often more than that), VLIW 1-read/1-write: DSP data/coefficient memory

Register File Cell Single-ended Read/Write

Single-ended 2-read/1-write ports (Slow-write) Fully-static, No precharge required NMOS of I1 should be sized bigger because node A will be Vdd-Vth during write operation I2 should be weak (N1-N2 change the data) I3: buffer for the storage node

Register File Cell Single-ended Read/Dual-ended Write


bitWR wordRD1 bitWR bitRD2 bitRD1

Dual-ended write: Either A or B pulled low

But actually single-ended operation (Its ok usually write is much faster than read)
wordRD2 A I1 B

N2

Precharge required for read


I2 N4

N3 N1

B=1: discharge bitRD (slow read for large bitline cap) B=0: hold precharge value
wrEN N6

No buffer inside cell

N5

Sense-amplifier or skewed inverter to amplify slow discharge

N6

Two write bitline drivers

bitWR/bitWR

Register File Cell Single-ended Read/Dual-ended Write

Further optimization

Only one write bitline driver

bitWR=1

N4,N6 on: Node A pulled down N5 on: Node B pulled up True dual-ended write

bitWR=0

N5 on: Node B pulled down One transistor on pull-down path Single-ended write with enhanced speed

Write Operation

Address Decoder Static


A0 A1 A2 wordline0 A0 A1 A2 wordline1

Static N to 2N decoder

wordline0=A0bA1bA2bA(N-1)b More than 32 registers: multi-level decoder is desired Works well with edgetriggered flip-flops for address inputs Can we connect decoder output directly to drive wordline?
A0 A1 A2

Extremely dangerous, why? Glitches Read might be ok, but write can be problematic Put latches at the decoder output
AN-1 AN-2 AN-1 AN-2 AN-1 AN-2

wordlineN-1

AN-1 AN-2

Address Decoder Dynamic

Dynamic N to 2N decoder

Domino N-input AND gate Charge sharing problem for large N Gate Keeper may be required Long NMOS chain for large N No glitch at the output Need qualified address input

Two-phase latch Dynamic Flops

Address Decoder Dynamic (Revised)

Revised dynamic N to 2N decoder


Word[N-1] A0 W/2 A1 W/2 A2 A3


A2 A1 A3 A0

Make NMOS half size Reverse input sequence Same active strenght Charge-sharing reduced

wordEN

Write Driver

Tri-state Buffer

Write operation requires full-swing bitline

Read-Out Circuitry

Small bitline capacitance Single-ended sensing May not need sense amplifier

Skewed buffer is fine for precharged scheme Sensing value only when bitline goes to 0 Latching old value (Latch and sensing)

Read-Out Circuitry

Complete Static Circuit

Data is sensed by I1 During read

Nl is off Pf is on only if Vdd-Vth (read 1) Pf charges back to Vdd I1 must be sized with higher beta

After read

RE=0, Nl is on Latch is formed through I1 and I2

Architectural Consideration
W M W E M W D E M W

Pipelined processor

Add R1,R2,R3 F D E M F D E F D Sub R4,R1,R2 F

In the same cycle, read value just written

DLX assumes write in high-phase of clock and read in low-phase of clock: implicit bypassing But only half of the clock cycle is allowed for read Explicit bypassing: compare read and write addresses

If same: bypass write data to read output directly without read or discard read value If different: normal read

Architectural Consideration

Read caching

Add R1,R2,R3 Sub R4,R1,R2

Compare read addresses If same, do not read and direct cached value As write-read bypass, comparators are required Make sense only if comparators consume less power than register file

Precharge for 0 or 1 value?

In DSP, quantitative study shows that values contain more 0 than 1 For precharged register file design,

Value in memory = 0: preserve precharge Value in memory = 1: discharge precharged value in bitlines

Some comments

Many designer choose precharged design over pure static design

Skewed inverter for read-out circuit burns lots of power (slow slew rate, reduced voltage-level) Precharge time and reading time should not overlap to avoid short-circuit currents Precharge on->request read->precharge off->ack read->request precharge->read off-> Asynchronous concepts is widely used in register file design

Vous aimerez peut-être aussi