Académique Documents
Professionnel Documents
Culture Documents
Multilevel caches
Critical word first and early start
Giving priority to read misses over writes
Merging write buffer
Victim Caches
Multilevel caches
CPU gets faster
Caches speed need to keep up with CPU!
Caches need to be BIGGER.
We need both but hard.
Miss rate
Local miss rate is the number of misses in a given cache divided by
the total number of memory access to this cache.
Miss rate
for L1.
Miss rate
for L2.
Miss rate
Local miss rate is large for 2nd-level caches
because 1st level cache skims the juicy memory accesses.
Global miss rate is a more useful measure
Global miss rate measure what fraction of memory accesses must
go all the way to memory.
L1 and L2
For data in L1 cache, should it be on the L2 cache too?
Multilevel inclusion
consistency between I/O and caches (or among caches).
Bad things: Statistics suggest it is a good idea to have small block
sizes for L1 cache and bigger block size for L2 caches.
Still doable, Pentium 4 does it.
Multilevel Exclusion
L1 data is NEVER found in L2 cache
If designer can only afford a L2 cache that is slightly bigger than L1
cache, then they dont want to waste space in L2 cache.
AMD Athlon uses exclusion.
Two methods
Critical word first: request the missed word from memory and send
it to CPU as soon as it arrives;
Let CPU execute while filling the rest of the word in the block.
Early restart Fetch the words in normal order, but as soon as the
requested word of the block arrives, send it to the CPU and let it
works.
Drawback
This has benefits when the block is large.
One problem is that with spatial locality, there is more than random
chance that the next miss is in remainder of the block.
The effective miss penalty is the time from the miss until the second
piece arrives.
Solution
Wait until the write buffer is empty
OR check the contents of the write buffer on a read miss, if no
conflicts and the memory system is available, let the read miss
continue.
Also save cost on write-back cache if the content is found on write
buffer.
Victim caches
Remember what was discarded in case it is needed again.
The discarded data has already been fetched, it can be used again
at small cost.
Require a small, fully associative cache between a cache and its
refill path.
AMD Athlon uses victim cache with 8 entries.
Victim cache of 1 to 5 entries are effective at reducing misses,
especially for small, direct-mapped data caches.