Académique Documents
Professionnel Documents
Culture Documents
Memory-Block Basics
• Uses:
Whenever a large collection of state elements is required.
– data & program storage
– general purpose registers log2(M)
– data buffering
– table lookups
– CL implementation
M X N memory:
Depth = M, Width = N.
• Basic Types:
M words of memory, each word
– RAM - random access memory N bits wide.
• Non-volatile:
– Read Only Memory (ROM):
• Mask ROM "mask programmable"
• EPROM "electrically programmable"
• EEPROM "erasable electrically programmable"
• FLASH memory - similar to EEPROM with programmer
integrated on chip
Address
wor wor
Word selects this cell,
bit bit bit bit
and all others in a row.
bit bit bit bit
word line
For write operation, column bit
lines are driven differentially
(0 on one, 1 on the other). bit bit
Values overwrites cell state.
For read operation, column bit lines are equalized (set to same
voltage), then released. Cell pulls down one bit line or the other.
Spring 2013 EECS150 - Lec11-sram Page 7
Cascading Memory-Blocks
How to make larger memory blocks out of smaller ones.
Increasing the depth. Example: given 1Kx8, want 2Kx8
COUT
Virtex-5 LX110T
memory blocks.
DI
DI2
D6 A6
DPRAM64/32
D5 A5
Distributed
D4
RAM A4
SPRAM64/32
SRL32 O6
using LUTs
D3 A3
SRL16
LUT
O5
DI2
C6 A6
DPRAM64/32
C5 A5 SPRAM64/32
C4 A4 SRL32 O6
SRL16
C3 A3 O5
Serial ()*
!"#$%&'4
LUT
C2
C1
A2
A1
RAM
ROM
DI1
MC31
!"#$%&')
WA1-WA6
slide 7
A SLICEM 6-LUT ...
WA7
WA8
CX Memory data input
Normal
BI
5/6-LUT
Normal B6
DI2
A6 DPRAM64/32 outputs.
6-LUT B5 A5 SPRAM64/32
inputs.
B4 A4 SRL32
SRL16
O6 Memory
O5
B3
B2
A3 LUT
A2 RAM DI1
data
B1 A1 ROM
MC31
input.
WA1-WA6 Control output for
Memory WA7 chaining LUTs to
write
WA8 make larger memories.
address BX
Synchronous write / asychronous read
AI
A 1.1 Mb distributed RAM can be made if
all A6
SLICEMs ofA6 an LX110T are used as RAM.
DI2
DPRAM64/32
Spring 2013 A5 A5 SPRAM64/32
EECS150 - Lec11-sram Page 16
A4 A4 SRL32 O6
SRL16 O5
A3 A3 LUT
A2 A2 RAM DI1
A1 A1 ROM
MC31
SLICEL vs SLICEM ...
SLICEM
CLB Overview
R
SLICEL COUT
Reset Type
COUT Reset Type Sync
Sync
Async
Async DI
DMUX
DI2
DMUX D6 A6
DPRAM64/32
D5 A5 SPRAM64/32 D
D6 A6 D
LUT D4 A4 SRL32 O6 FF
D5 A5 D SRL16
ROM D D3 A3 O5 LATCH
D4 A4 O6 FF LUT DX INIT1 Q DQ
O5 LATCH D2 A2 RAM DI1 D INIT0
D3 A3 DX DQ
INIT1 Q D1 A1 ROM MC31 SRHIGH
D2 A2 D CE
INIT0 SRLOW
D1 A1 CE SRHIGH WA1-WA6 CK
SR REV
CK SRLOW WA7
DX SR REV WA8
DX
CMUX
C6 A6 CI
CMUX
C5 A5 LUT
ROM C DI2
C C6 A6
C4 A4 O6 FF DPRAM64/32
C3 A3 O5 LATCH C5 A5 SPRAM64/32 C
CX Q CQ
INIT1 SRL32 C
C2 A2 D INIT0 C4 A4 O6 FF
SRL16 O5 LATCH
C1 A1 CE SRHIGH C3 A3 CX Q CQ
SRLOW LUT INIT1
CK C2 A2 RAM DI1 D INIT0
CX SR REV ROM
C1 A1 MC31 CE SRHIGH
CK SRLOW
WA1-WA6
SR REV
BMUX WA7
WA8
B6 A6
B5 A5 LUT CX
ROM B
B FF
B4 A4 O6 BI BMUX
O5 LATCH
B3 A3 BX INIT1 Q BQ
D DI2
B2 A2 INIT0 B6 A6 DPRAM64/32
B1 A1 CE SRHIGH
SRLOW B5 A5 SPRAM64/32 B
CK B
SR REV B4 A4 SRL32 O6 FF
BX SRL16 LATCH
B3 A3 LUT O5 Q BQ
BX INIT1
B2 A2 RAM DI1 D INIT0
AMUX B1 A1 ROM MC31 CE SRHIGH
CK SRLOW
WA1-WA6 SR REV
A6 A6
LUT WA7
A5 A5 A
ROM A WA8
A4 A4 O6 FF
O5 LATCH
A3 A3 AX INIT1 Q AQ BX
A2 A2 D INIT0
A1 A1 CE SRHIGH
SRLOW AI
CK AMUX
AX SR REV
DI2
0/1 A6 A6
SR DPRAM64/32
CE A5 A5 SPRAM64/32 A
A
CLK A4 A4 SRL32 O6 FF
SRL16 O5 LATCH
A3 A3 LUT AX INIT1 Q AQ
A2 A2 RAM DI1 D INIT0
CIN
CLK
Spring 2013 EECS150 - Lec11-sram WE
WSGEN Page 17
WE CIN
UG190_5_03_041006
Example configuration:
RAM256X1S
D DI1
SPRAM64
O6 Single-port 256b x 1,
registered output.
6
A[7:0] A[6:1]
8
WA[8:1]
(CLK)
WCLK CLK
(WE/CE) A6 (CX)
WE WE
6
A[6:1]
8
WA[8:1] A6 (AX)
CLK
WE
SPRAM64 F7AMUX
DI1 O6
6
A[6:1]
8
WA[8:1]
CLK
WE
UG190_5_14_050506
endmodule
initial
begin “data.dat” contains initial RAM
$readmemb("data.dat", mem); contents, it gets put into the bitfile
end and loaded at configuration time.
(Remake bits to change contents)
always@(posedge CLK)
read_addr <= ADDR;
endmodule
assign q0 = mem[reg_waddr0];
assign q1 = mem[reg_waddr1];
endmodule
Spring 2013 EECS150 - Lec11-sram Page 30
Processor Design Considerations (1/2)
• Register File: Consider distributed RAM (LUT RAM)
– Size is close to what is needed: distributed RAM primitive
configurations are 32 or 64 bits deep. Extra width is easily
achieved by parallel arrangements.
– LUT-RAM configurations offer multi-porting options - useful for
register files.
– Asynchronous read, might be useful by providing flexibility on where
to put register read in the pipeline.
• Instruction / Data Caches : Consider Block RAM
– Higher density, lower cost for large number of bits
– A single 36kbit Block RAM implements 1K 32-bit words.
– Configuration stream based initialization, permits a simple “boot
strap” procedure.
• Other Memories? FIFOs? Video “Frame Buffer”? How big?
Spring 2013 EECS150 - Lec11-sram Page 31