Académique Documents
Professionnel Documents
Culture Documents
Magnetic RAM
Lus Vitrio Cargnini, Lionel Torres, Raphael Martins Brum, Sophiane Senni, Gilles Sassatelli
LIRMM - UMR CNRS 5506 - University of Montpellier 2
161 Rue Ada, Montpellier, 34095, France
E-mail(s): {Torres,cargnini,brum,senni,sassatelli}@lirmm.fr
I. I NTRODUCTION
SRAM currently is the de-facto technology to design cache
memories at Levels 1&2 of processors memory hierarchy. It is
a fast, yet power-hungry kind of memory. DRAM comes next
in the hierarchy, serving as a larger but not so fast volatile
memory, another drawback the process to build DRAMs and
kept the 30 of capacitance are highly complex at sub-micronic
nodes. Finally, in embedded systems secondary storage is
usually made with solid-state devices based on Flash memory.
Many obstacles threaten continued scaling of these three
technologies. From increasing leakage power to lithography
issues, it has been estimated that, by 2018, SRAM, DRAM
and Flash technologies will likely be replaced if Moores law
is to hold [1]. This landscape motivated the appearance of
a number of non-volatile memory (NVMs) technologies in
the past years. Spin-Transfer Torque Magnetic RAM (STTMRAM), Phase-Change RAM (PCM) and Resistive RAM
(RRAM), among others, are considered by ITRS as the most
promising candidates to take over the mainstream market. In
Table I, a quick comparison of those technologies is provided.
MRAM density (depending of the MRAM technology style)
is around four to seven times higher than the SRAMs, but
its access time is between three and ten times higher. But,
to be optimistic, last results from Toshiba [4] concerning
perpendicular STT, shows access time approximately of 4ns
Min. cell
size(F)
150
20
4
10
22
30
4
Endurance
(cycles)
1016
1012
104
105
1012
105
1012
Read
latency (ns)
2
5
3
30
100E3
15
40
100
12
Write
latency (ns)
2
530
3
30
1E6
1E3
65
100
100
Features
32 bits RISC Processor - 8-11 stages pipeline - 2 instructions
per cycle
64 Kbyte SRAM - 4-way set associative, 2 ns access latency
- 32 byte per cache line
2 Mbyte SRAM - 8-way set associative, 20 ns access latency
- 32 byte per cache line
0.213 nJ
0.213 nJ
0.22nJ
26.5mW
24.3mW
2.2mW
70.8
75.1
SRAM
16.2
5879046
21113987
22.8 nJ
957.7 pJ
0.13
0.020
STT-MRAM
17.11
5944740
22309621
170.6 nJ
150.4 pJ
1.01
0.0033
SRAM
16.2
1326.7mW
21.49
MRAM
17.1
26.5mW
0.45315
here that CMOS will be only used for data decoding, whole
memory-array is no more leaking (data are stored into the
magnetic tunnel junction).
For the current state of the technology MRAM consumes
more dynamic energy than a SRAM, for dynamic energy
operation as noticed in Table IV (at least for our particular
case a X264 encoder, presently available in all embedded
devices on the market). Indeed, if we consider the total amount
of energy as the sum of dynamic plus leakage, the MRAM
has the advantage, as notices into Table V. For Write access
we observe that the MRAM takes x7.5 times more dynamic
energy than the SRAM for write operations, while the read
operations on SRAM takes x6 times more energy, in overall
the MRAM took a x1.25 times more dynamic energy than the
SRAM for overall operation to this specific application.
In [6], for example, a 2 MB L2 SRAM Cache was replaced
with an 8 MB L2 MRAM Cache, using roughly the same
silicon fingerprint. In their particular case, the increase on the
cache size was not enough to compensate the penalty due
to the cache access delay. By employing write buffers and a
novel cache access policy, they managed to achieve similar
performance while reducing the power consumption on the
PERFORMANCE SYSTEM
200
150
100
50
0
1 KB SRAM
4 KB MRAM
unepic
1.07 nJ
1.07 nJ
0.03 nJ
1326.7mW
1180.6mW
146.1mW
18.8
10.1
mpeg2enc
70.1ns
66.0 ns
75.1ns
mpeg2dec
18.8ns
2.9ns
10.1 ns
texgen
2.2mm2
1.8mm2
0.39mm2
osdemo
5.6mm2
5mm2
0.63mm2
mipmap
MRAM
45nm
2MB
8
epic
SRAM
45nm
2MB
8
djpeg
field
Technology
Size
Associativity
Area
Total Area
Data Array Area
Tag Array Area
Timing
Cache Hit Latency
Cache Miss Latency
Cache Write Latency
Power
Hit Dynamic Energy2
Miss Dynamic Energy3
Write Dynamic Energy4
Total Leakage Power
Data Array Leakage Power
Tag Array Leakage Power
hit(ns)
response (ns)
cjpeg
8.72
3
2
R EFERENCES
128 KB SRAM
512 KB MRAM
unepic
mpeg2enc
texgen
osdemo
mipmap
epic
djpeg
cjpeg
mpeg2dec
area, as follows:
CPI penalty 1
CPIMRAM
CPISRAM
(1)
.
Based on the CPI penalty , in Figure 5, the best-case, the
worst-case and the average performance over the benchmark
set are shown as a function of the cache capacity. Given our
assumptions are valid, MRAM does present a CPI gain rather
than a CPI penalty for most cases. Once the cache capacity
is large enough to contain the whole benchmark data, the CPI
gain turns into a penalty which can no longer be compensated
if no specific technique is employed.
CPI Penalty (less means 'better than the reference')
20
0
-20
-40
-60
-80
-100
1:4
2:8
4:16
8:32
16:64
32:128
64:256
128:512
Worst case
Average
VI. C ONCLUSION
We presented in this paper our working methodology for
memory hierarchy evaluation, and results we can obtain to
corroborate our assertions. Also, we investigated possible
applications of new memory technologies that can evolve together with the advanced nodes for embedded processors. The
use of MRAM for Level-1 or Level-2 caches is being explored
by several research groups, including ourselves. Current results
indicate that it could be an attractive solution to address the
rising power consumption verified in CMOS circuits. The
use of eNVMs opens a new paradigm on the implementation
of power-saving mechanisms, as the non-volatility could be
explored to power-off the devices whenever they are idle.