Académique Documents
Professionnel Documents
Culture Documents
When a miss occurs, the cache controller must select a block to be replaced with
the desired data. A replacement policy determines which block should be replaced.
With direct-mapped placement the decision is simple because there is no choice:
only one block frame is checked for a hit and only that block can be replaced.
With fully-associative or set-associative placement , there are more than one block to
choose from on a miss.
Primary strategies:
Random - to spread allocation uniformly, candidate blocks are randomly selected.
Advantage: simple to implement in hardware
Disadvantage: ignores principle of locality
Least-Recently Used (LRU) - to reduce a chance of throwing out information that
will be needed soon, accesses to blocks are recorded. The block replaced is the one
that has been unused for the longest time.
Advantage: takes locality into account
Disadvantage: as the number of blocks to keep track of increases,
LRU becomes more expensive (harder to implement, slower and often just approximated)
Other strategies:
First In First Out (FIFO)
Most-Recently Used (MRU)
Least-Frequently Used (LFU) DAP Spr.‘98 ©UCB 2
0.1
4-way
0.08
8-way
0.06
Capacity
0.04
0.02
0
1
16
32
64
128
Note: Compulsory Compulsory
Cache Size (KB)
Miss small DAP Spr.‘98 ©UCB 6
2:1 Cache Rule
miss rate 1-way associative cache size X
= miss rate 2-way associative cache size X/2
0.14
1-way
0.12 Conflict
2-way
Miss Rate per Type
0.1
4-way
0.08
8-way
0.06
Capacity
0.04
0.02
0
1
16
32
64
128
Cache Size (KB) Compulsory
DAP Spr.‘98 ©UCB 7
How Can Reduce Misses?
• 3 Cs: Compulsory, Capacity, Conflict
• In all cases, assume total cache size not changed:
• What happens if:
1) Change Block Size:
Which of 3Cs is obviously affected?
2) Change Associativity:
Which of 3Cs is obviously affected?
3) Change Compiler:
Which of 3Cs is obviously affected?
20% 1K
4K
15%
Miss
16K
Rate
10%
64K
5% 256K
0%
16
32
64
128
(Red means A.M.A.T. not improved by more associativity) DAP Spr.‘98 ©UCB 11
3. Reducing Misses via a
“Victim Cache”
• How to combine fast hit time of direct mapped
yet still avoid conflict misses?
• Add buffer to place data discarded from cache
• Jouppi [1990]: 4-entry victim cache removed 20% to
95% of conflicts for a 4 KB direct mapped data cache
• Used in Alpha, HP machines
• Definitions:
– Local miss rate— misses in this cache divided by the total
number of memory accesses to this cache (Miss rateL2)
– Global miss rate—misses in this cache divided by the total
number of memory accesses generated by the CPU
(Miss RateL1 x Miss RateL2)
– Global Miss Rate is what matters
DAP Spr.‘98 ©UCB 20
An Example (pp. 576)
Q: Suppose we have a processor with a base CPI of 1.0 assuming
all references hit in the primary cache and a clock rate of 500
MHz. The main memory access time is 200 ns. Suppose the
miss rate per instn is 5%. What is the revised CPI? How much
faster will the machine run if we put a secondary cache (with 20-
ns access time) that reduces the miss rate to memory to 2%?
Assume same access time for hit or miss.
A: Miss penalty to main memory = 200 ns = 100 cycles. Total CPI =
Base CPI + Memory-stall cycles per instn. Hence, revised CPI =
1.0 + 5% x 100 = 6.0
When an L2 with 20-ns (10 cycles) access time is put, the miss
rate to memory is reduced to 2%. So, out of 5% L1 miss, L2 hit is
3% and miss is 2%.
The CPI is reduced to 1.0 + 5% ( 10 + 40% x 100) = 3.5. Thus, the
m/c with secondary cache is faster by 6.0/3.5 = 1.7
Subblock Placement + + 1
Early Restart & Critical Word 1st + 2
Non-Blocking Caches + 3
Second Level Caches + 2