ch17 1

Cache Memory
Chapter 17
S. Dandamudi
Outline
Introduction
How cache memory works
Why cache memory works
Cache design basics
Mapping function
Direct mapping
Associative mapping
Set-associative mapping
Types of cache misses

Types of caches
Example implementations
Pentium
PowerPC
MIPS
Cache operation summary

Design issues
Cache capacity
Cache line size
Degree of associatively
Replacement policies
Write policies
Space overhead
2003
S. Dandamudi
Chapter 17: Page 2
To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.
Introduction
Memory hierarchy
Registers
Memory
Disk
Cache memory is a small amount of fast memory

Placed between two levels of memory hierarchy
To bridge the gap in access times
Between processor and main memory (our focus)
Between main memory and disk (disk cache)
Expected to behave like a large amount of fast memory

2003
S. Dandamudi
Chapter 17: Page 3
Introduction (contd)
2003
S. Dandamudi
Chapter 17: Page 4
How Cache Memory Works

Prefetch data into cache before the processor
needs it
Need to predict processor future access requirements
Not difficult owing to locality of reference
Important terms
2003
Miss penalty
Hit ratio
Miss ratio = (1 hit ratio)
Hit time
S. Dandamudi
Chapter 17: Page 5
How Cache Memory Works (contd)
Cache read operation
2003
S. Dandamudi
Chapter 17: Page 6
Cache write operation
2003
S. Dandamudi
Chapter 17: Page 7
Why Cache Memory Works

Example
for (i=0; i<M; i++)
for(j=0; j<N; j++)
X[i][j] = X[i][j] + K;
Each element of X is double (eight bytes)
Loop is executed (M*N) times
Placing the code in cache avoids access to main memory
Repetitive use (one of the factors)
Temporal locality
Prefetching data
Spatial locality
2003
S. Dandamudi
Chapter 17: Page 8

300
Execution time (ms
250
Column-order
200
150
100
Row-order
50
0
500
600
700
800
900
1000
Matrix size
2003
S. Dandamudi
Chapter 17: Page 9
Cache Design Basics

On every read miss
A fixed number of bytes are transferred
More than what the processor needs
Effective due to spatial locality
Cache is divided into blocks of B bytes

b-bits are needed as offset into the block
b = log2B
Block are called cache lines
Main memory is also divided into blocks of same

size
Address is divided into two parts
2003
S. Dandamudi
Chapter 17: Page 10
Cache Design Basics (contd)

B = 4 bytes
b = 2 bits
2003
S. Dandamudi
Chapter 17: Page 11

Transfer between main
memory and cache
In units of blocks
Implements spatial locality
Transfer between main

memory and cache
In units of words
Need policies for
2003
Block placement
Mapping function
Block replacement
Write policies
S. Dandamudi
Chapter 17: Page 12

Read cycle operations
2003
S. Dandamudi
Chapter 17: Page 13
Mapping Function
Determines how memory blocks are mapped to
cache lines
Three types
Direct mapping
Specifies a single cache line for each memory block
Specifies a set of cache lines for each memory block
Associative mapping
No restrictions
Any cache line can be used for any memory block
2003
S. Dandamudi
Chapter 17: Page 14
Mapping Function (contd)

Direct mapping example
2003
S. Dandamudi
Chapter 17: Page 15

Implementing direct mapping
Easier than the other two
Maintains three pieces of information
Cache data
Actual data
Cache tag
Problem: More memory blocks than cache lines
4Several memory blocks are mapped to a cache line
Tag stores the address of memory block in cache line
Valid bit
Indicates if cache line contains a valid block
2003
S. Dandamudi
Chapter 17: Page 16
2003
S. Dandamudi
Chapter 17: Page 17
Direct mapping
Reference pattern:
0, 4, 0, 8, 0, 8,
0, 4, 0, 4, 0, 4
Hit ratio = 0%
2003
S. Dandamudi
Chapter 17: Page 18
Direct mapping
Reference pattern:
0, 7, 9, 10, 0, 7,
9, 10, 0, 7, 9, 10
Hit ratio = 67%
2003
S. Dandamudi
Chapter 17: Page 19

Associative mapping
2003
S. Dandamudi
Chapter 17: Page 20
Associative
mapping
Reference pattern:
0, 4, 0, 8, 0, 8,
0, 4, 0, 4, 0, 4
Hit ratio = 75%
2003
S. Dandamudi
Chapter 17: Page 21
Address match logic for

associative mapping
2003
S. Dandamudi
Chapter 17: Page 22

Associative cache with address match logic
2003
S. Dandamudi
Chapter 17: Page 23

2003
S. Dandamudi
Chapter 17: Page 24

Address partition in set-associative mapping
2003
S. Dandamudi
Chapter 17: Page 25
Set-associative
mapping
Reference pattern:
0, 4, 0, 8, 0, 8,
0, 4, 0, 4, 0, 4
Hit ratio = 67%
2003
S. Dandamudi
Chapter 17: Page 26
Replacement Policies
We invoke the replacement policy
When there is no place in cache to load the memory
block
Depends on the actual placement policy in effect

Direct mapping does not need a special replacement
policy
Replace the mapped cache line
Several policies for the other two mapping functions

Popular: LRU (least recently used)
Random replacement
Less interest (FIFO, LFU)
2003
S. Dandamudi
Chapter 17: Page 27
Replacement Policies (contd)

LRU
Expensive to implement
Particularly for set sizes more than four
Implementations resort to approximation

Pseudo-LRU
Partitions sets into two groups
Maintains the group that has been accessed recently
Requires only one bit
Requires only (W-1) bits (W = degree of associativity)
PowerPC is an example
4Details later
2003
S. Dandamudi
Chapter 17: Page 28
Replacement Policies (contd)

Pseudo-LRU
implementation
2003
S. Dandamudi
Chapter 17: Page 29
Write Policies
Memory write requires special attention
We have two copies
A memory copy
A cached copy
Write policy determines how a memory write operation

is handled
Two policies
Write-through
4Update both copies
Write-back
4Update only the cached copy
4Needs to be taken care of the memory copy
2003
S. Dandamudi
Chapter 17: Page 30
Write Policies (contd)

Cache hit in a write-through cache
Figure 17.3a
2003
S. Dandamudi
Chapter 17: Page 31

Cache hit in a write-back cache
2003
S. Dandamudi
Chapter 17: Page 32

Write-back policy
Updates the memory copy when the cache copy is
being replaced
We first write the cache copy to update the memory copy
Number of write-backs can be reduced if we write only

when the cache copy is different from memory copy
Done by associating a dirty bit or update bit
Write back only when the dirty bit is 1
Write-back caches thus require two bits
A valid bit
A dirty or update bit
2003
S. Dandamudi
Chapter 17: Page 33

Needed only in write-back caches
2003
S. Dandamudi
Chapter 17: Page 34

Other ways to reduce write traffic
Buffered writes
Especially useful for write-through policies
Writes to memory are buffered and written at a later time
Allows write combining
4Catches multiple writes in the buffer itself
Example: Pentium
Uses a 32-byte write buffer
Buffer is written at several trigger points
An example trigger point
4Buffer full condition
2003
S. Dandamudi
Chapter 17: Page 35

Write-through versus write-back
Write-through
Advantage
Both cache and memory copies are consistent
4Important in multiprocessor systems
Disadvantage
Tends to waste bus and memory bandwidth
Write-back
Advantage
Reduces write traffic to memory
Disadvantages
Takes longer to load new cache lines
Requires additional dirty bit
2003
S. Dandamudi
Chapter 17: Page 36
Space Overhead
The three mapping functions introduce different
space overheads
Overhead decreases with increasing degree of
associativity
4 GB address space
Several examples in the text
32 KB cache
2003
S. Dandamudi
Chapter 17: Page 37
Types of Cache Misses

Three types
Compulsory misses
Due to first-time access to a block
Also called cold-start misses or compulsory line fills
Capacity misses
Induced due to cache capacity limitation
Can be avoided by increasing cache size
Conflict misses
Due to conflicts caused by direct and set-associative mappings
Can be completely eliminated by fully associative
mapping
Also called collision misses
2003
S. Dandamudi
Chapter 17: Page 38
Types of Cache Misses (contd)

Compulsory misses
Reduced by increasing block size
We prefetch more
Cannot increase beyond a limit
Cache misses increase
Capacity misses
Reduced by increasing cache size
Law of diminishing returns
Conflict misses
Reduced by increasing degree of associativity
Fully associative mapping: no conflict misses
2003
S. Dandamudi
Chapter 17: Page 39
Types of Caches
Separate instruction and data caches
Initial cache designs used unified caches
Current trend is to use separate caches (for level 1)
2003
S. Dandamudi
Chapter 17: Page 40
Types of Caches (contd)

Several reasons for preferring separate caches
Locality tends to be stronger
Can use different designs for data and instruction
caches
Instruction caches
Read only, dominant sequential access
No need for write policies
Can use a simple direct mapped cache implementation
Data caches
Can use a set-associative cache
Appropriate write policy can be implemented
Disadvantage
Rigid boundaries between data and instruction caches
2003
S. Dandamudi
Chapter 17: Page 41

Number of cache levels
Most use two levels
Primary (level 1 or L1)
On-chip
Secondary (level 2 or L2)
Off-chip
Examples
Pentium
L1: 32 KB
L2: up to 2 MB
PowerPC
L1: 64 KB
L2: up to 1 MB
2003
S. Dandamudi
Chapter 17: Page 42

Two-level caches work as follows:
First attempts to get data from L1 cache
If present in L1, gets data from L1 cache (L1 cache hit)
If not, data must come form L2 cache or main memory (L1
cache miss)
In case of L1 cache miss, tries to get from L2 cache

If data are in L2, gets data from L2 cache (L2 cache hit)
Data block is written to L1 cache
If not, data comes from main memory (L2 cache miss)
Main memory block is written into L1 and L2 caches
Variations on this basic scheme are possible

2003
S. Dandamudi
Chapter 17: Page 43
Virtual and physical caches
2003
S. Dandamudi
Chapter 17: Page 44
Example Implementations
We look at three processors
Pentium
PowerPC
MIPS
Pentium implementation
Two levels
L1 cache
Split cache design
4Separate data and instruction caches
L2 cache
Unified cache design
2003
S. Dandamudi
Chapter 17: Page 45
Example Implementations (contd)

Pentium allows each page/memory region to have
its own caching attributes
Uncacheable
All reads and writes go directly to the main memory
Useful for
4Memory-mapped I/O devices
4Large data structures that are read once
4Write-only data structures
Write combining
Not cached
Writes are buffered to reduce access to main memory
Useful for video buffer frames
2003
S. Dandamudi
Chapter 17: Page 46

Write-through
Uses write-through policy
Writes are delayed as they go though a write buffer as in write
combining mode
Write back
Uses write-back policy
Writes are delayed as in the write-through mode
Write protected
Inhibits cache writes
Write are done directly on the memory
2003
S. Dandamudi
Chapter 17: Page 47

Two bits in control register CR0 determine the
mode
Cache disable (CD) bit
w Not write-through (NW) bit
Write-back
2003
S. Dandamudi
Chapter 17: Page 48

PowerPC cache implementation
Two levels
L1 cache
Split cache
4Each: 32 KB eight-way associative
Uses pseudo-LRU replacement
Instruction cache: read-only
Data cache: read/write
4Choice of write-through or write-back
L2 cache
Unified cache as in Pentium
Two-way set associative
2003
S. Dandamudi
Chapter 17: Page 49

Write policy type and
caching attributes can be set
by OS at the block or page
level
L2 cache requires only a
single bit to implement
LRU
Because it is 2-way
associative
L1 cache implements a
pseudo-LRU
Each set maintains seven
PLRU bits (B0B6)
2003
S. Dandamudi
Chapter 17: Page 50

PowerPC placement
policy (incl. PLRU)
2003
S. Dandamudi
Chapter 17: Page 51

MIPS implementation
Two-level cache
L1 cache
Split organization
Instruction cache
4Virtual cache
L1 line size: 16 or 32 bytes
4Direct mapped
4Read-only
Data cache
4Virtual cache
4Direct mapped
4Uses write-back policy
2003
S. Dandamudi
Chapter 17: Page 52

L2 cache
Physical cache
Either unified or split
4Configured at boot time
Direct mapped
Uses write-back policy
Cache block size
416, 32, 64, or 128 bytes
4Set at boot time
L1 cache line size L2 cache size
Direct mapping simplifies replacement

No need for LRU type complex implementation
2003
S. Dandamudi
Chapter 17: Page 53
Cache Operation Summary

Various policies used by cache
Placement of a block
Direct mapping
Fully associative mapping
Location of a block
Depends on the placement policy
Replacement policy
LRU is the most popular
Pseudo-LRU is often implemented
Write policy
Write-through
Write-back
2003
S. Dandamudi
Chapter 17: Page 54
Design Issues
Several design issues
Cache capacity
Law of diminishing
returns
2003
Cache line size/block size

Degree of associativity
Unified/split
Single/two-level
Write-through/write-back
Logical/physical
S. Dandamudi
Chapter 17: Page 55
Design Issues (contd)
Last slide
2003
S. Dandamudi
Chapter 17: Page 56

ch17 1

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

ch17 1

Transféré par

Droits d'auteur :

Formats disponibles

Cache Memory

Types of cache misses

Cache operation summary

Chapter 17: Page 2

Cache memory is a small amount of fast memory

Expected to behave like a large amount of fast memory

Chapter 17: Page 3

Chapter 17: Page 4

How Cache Memory Works

Chapter 17: Page 5

How Cache Memory Works (contd)

Cache read operation

Chapter 17: Page 6

How Cache Memory Works (contd)

Cache write operation

Chapter 17: Page 7

Why Cache Memory Works

Chapter 17: Page 8

How Cache Memory Works (contd)

Execution time (ms

Chapter 17: Page 9

Cache Design Basics

Cache is divided into blocks of B bytes

Main memory is also divided into blocks of same

Chapter 17: Page 10

Cache Design Basics (contd)

Chapter 17: Page 11

Cache Design Basics (contd)

Transfer between main

Need policies for

Chapter 17: Page 12

Cache Design Basics (contd)

Chapter 17: Page 13

Chapter 17: Page 14

Mapping Function (contd)

Chapter 17: Page 15

Mapping Function (contd)

Chapter 17: Page 16

Mapping Function (contd)

Chapter 17: Page 17

Mapping Function (contd)

Chapter 17: Page 18

Mapping Function (contd)

Chapter 17: Page 19

Mapping Function (contd)

Chapter 17: Page 20

Mapping Function (contd)

Hit ratio = 75%

Chapter 17: Page 21

Mapping Function (contd)

Address match logic for

Chapter 17: Page 22

Mapping Function (contd)

Chapter 17: Page 23

Mapping Function (contd)

Chapter 17: Page 24

Mapping Function (contd)

Chapter 17: Page 25

Mapping Function (contd)

Chapter 17: Page 26

Depends on the actual placement policy in effect

Several policies for the other two mapping functions

Chapter 17: Page 27