Académique Documents
Professionnel Documents
Culture Documents
Abstract—The commonly used LRU replacement policy always without any external information on the re-reference interval
predicts a nearimmediate re-reference interval on cache hits and for every missing cache block, LRU or LFU cannot identify
misses. Applications that exhibit either a distant re-reference and preserve non-scan blocks in a mixed access pattern.
interval or near re-reference interval perform badly under LRU.
Such applications usually have a working-set larger than the Whilst, scanresistance using RRIP requires that the width of
cache or have frequent bursts of references to nontemporal data the RRPV register to be appropriately sized to avoid sources
(called scans). To improve the performance of such workloads, we of performance degradation.
here emulates and evaluates cache replacement using Rereference
Interval Prediction (RRIP). We give quantitative measure of II. SRRIP T ECHNIQUE
improvement of SRRIP over LRU and LFU with detailed analysis A. Short description of SRRIP technique
over benchmarks.
The primary goal of RRIP is to prevent blocks with a distant
I. I NTRODUCTION rereference interval from polluting the cache. In the absence of
L EAST R ECENTLY U SED (LRU) replacement policy, the any external re-reference information, RRIP statically predicts
LRU chain represents the recency of cache blocks referenced the block’s re-reference interval. Since always predicting a
with the MRU position representing a cache block that was near-immediate or a distant re-reference interval at cache
most recently used while the LRU position representing a insertion time is not robust across all access patterns, RRIP
cache block that was least recently used. always inserts new blocks with a long re-reference interval.
L EAST F REQUENTLY U SED (LFU) replacement policy, the A long re-reference interval is defined as an intermediate re-
LFU counter represents the recency of cache blocks refer- reference interval that is skewed towards a distant re-reference
enced. In this replacement policy, when the cache is full and interval. We use an RRPV of 2M–2 to represent a long re-
requires more room the system will purge the item with the reference interval. The intuition behind always predicting a
lowest reference frequency. long re-reference interval on cache insertion is to prevent cache
R E - REFERENCE I NTERVAL P REDICTION (RRIP), uses M- blocks with re-references in the distant future from polluting
bits per cache block to store one of 2M possible Reref- the cache. Additionally, always predicting a long re-reference
erence Prediction Values (RRPV). RRIP dynamically learns interval instead of a distant re-reference interval allows RRIP
rereference information for each block in the cache access more time to learn and improve the re-reference prediction.
pattern. Like NRU, an RRPV of zero implies that a cache If the newly inserted cache block has a near-immediate re-
block is predicted to be re-referenced in the near-immediate reference interval, RRIP can then update the re-reference
future while RRPV of saturation (i.e., 2M–1) implies that a prediction to be shorter than the previous prediction. In effect,
cache block is predicted to be re-referenced in the distant RRIP learns the block’s re-reference interval.
future. Since the re-reference predictions made by RRIP are B. Implementation details
statically determined on cache hits and misses, we refer to
this replacement policy as S TATIC R E - REFERENCE I NTERVAL On a cache miss, the RRIP victim selection policy selects
P REDICTION (SRRIP). the victim block by finding the first block that is predicted
With only one bit of information, LFU/LRU can predict to be rereferenced in the distant future (i.e., the block whose
either a nearimmediate re-reference interval or a distant re- RRPV is 2M–1). Like NRU, the victim selection policy breaks
reference interval for all blocks filled into the cache. Always ties by always starting the victim search from a fixed location
predicting a near-immediate re-reference interval on all cache (the left in our studies). In the event that RRIP is unable to
insertions limits cache performance for mixed access patterns find a block with a distant re-reference interval, RRIP updates
because scan blocks unnecessarily occupy the cache space the re-reference predictions by incrementing the RRPVs of
without receiving any cache hits. On the other hand, always all blocks in the cache set and repeats the search until a
predicting a distant re-reference interval significantly degrades block with a distant re-reference interval is found. Updating
cache performance for access patterns that predominantly RRPVs at victim selection time allows RRIP to adapt to
have a near-immediate re-reference interval. Consequently, changes in the application working set by removing stale
blocks from the cache. A natural opportunity to change the
Computer Science and Engineering, Texas A&M University re-reference prediction of a block occurs on a hit to the block.
Fig. 2. An example of the algorithm of SRRIP (2 bit)
III. M ETHODOLOGY
We use Zsim, a full featured memory based system sim-
ulator for Caches, to conduct our performance studies. Our
Fig. 3. Internal breakup of array access functions
baseline processor is single core westmere system( or a 8
core processor for multi threaded instructions)with 64 bit Thus we implement the update(), replaced() and rank() func-
wordlength and three level cache. The L1 instruction cache is tions to our SRRIP implementation.
a 4way associative 32K, L1 data cache is 8 way Set associative
32K. L2 cache is 256K, 8 way associative and L3 is 2MB 16 void update(uint32_t id, const MemReq* req) {
way associative. Only demand references to the cache update if(!miss) // Variable to check
if the entry is through a miss
the LRU state while non-demand references (e.g., write back {
references) leave the LRU state unchanged. The load-to-use array[id] = 0; // update for
latencies for the L1, L2, and L3 caches are 1, 10, and 24 SRRIP-HP to zero on HIT
} }
miss=0; // Set miss to zero
for future iterations. Internal if(flag == 1)
variable for differentation {
between hits and misses. Default return bestCand;
value is 0 }
} else
{
return 0; // This part is nver
Underneath is the code snippet of the replaced() function. accessed, but included to avoid
void replaced(uint32_t id) { syntactical clearity.
array[id] = rpvMax - 1; //Reduce }
the score by 1 upon replacement }
miss = 1; // Set miss
variable for update(). IV. E VALUATION
}
We conduct evaluation based on the following parameters
Underneath is the code snippet of the rank() function, that amongst LRU, LFU, SRRIP(2) and SRRIP(3).
identifies the victim for replacement. • Number of Cycles
– Here we observe that overall across all benchmarks increase in M, it reaches 2.27% for SRRIP with
SRRIP policy shows better performance over LRU M=3.
and LFU in terms of IPC. Fig. 6 and Fig. 7 – The percentage improvement is more significant in
– Overall, with SPEC and PARSEC into consideration PARSEC( 5.8%) than SPEC.
together, SRRIP demonstrates 2.16% lesser cycles – Individually SPEC shows an inverse trend with
than LRU. decrement of 0.16% with SRRIP(2) vs. -0.4% with
– Moreover this increases very insignificantly with LFU. Thus in SPEC, both LFU and SRRIP have
Fig. 6. IPC comparison over Benchmarks
lower IPC than LRU, with SRRIP being little better – With an exception of hmmer in INT and x264
amonst the two. This value though starts to show and BODYTRACK in PARSEC, SRRIP seems to
improvement to 0.21% with M=3. perform always better than the two.
– Individually PARSEC shows 5.89% improvement – The percentage improvement is more significant in
over LRU with SRRIP(2) vs. 2.83% with LFU. This PARSEC( 5.8%) than SPEC.
value reduces to 5.5% with M=3. – Amongst SPEC, INT and FLOAT both show in-
Fig. 8. MPKI comparison across Benchmarks