Vous êtes sur la page 1sur 56

Cache Memory

Chapter 17
S. Dandamudi

Outline

Introduction
How cache memory works
Why cache memory works
Cache design basics
Mapping function
Direct mapping
Associative mapping
Set-associative mapping

Types of cache misses


Types of caches
Example implementations
Pentium
PowerPC
MIPS

Cache operation summary


Design issues
Cache capacity
Cache line size
Degree of associatively

Replacement policies
Write policies
Space overhead
2003

S. Dandamudi

Chapter 17: Page 2

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Introduction
Memory hierarchy

Registers
Memory
Disk

Cache memory is a small amount of fast memory


Placed between two levels of memory hierarchy
To bridge the gap in access times
Between processor and main memory (our focus)
Between main memory and disk (disk cache)

Expected to behave like a large amount of fast memory


2003

S. Dandamudi

Chapter 17: Page 3

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Introduction (contd)

2003

S. Dandamudi

Chapter 17: Page 4

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

How Cache Memory Works


Prefetch data into cache before the processor
needs it
Need to predict processor future access requirements
Not difficult owing to locality of reference

Important terms

2003

Miss penalty
Hit ratio
Miss ratio = (1 hit ratio)
Hit time

S. Dandamudi

Chapter 17: Page 5

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

How Cache Memory Works (contd)

Cache read operation

2003

S. Dandamudi

Chapter 17: Page 6

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

How Cache Memory Works (contd)

Cache write operation

2003

S. Dandamudi

Chapter 17: Page 7

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Why Cache Memory Works


Example
for (i=0; i<M; i++)
for(j=0; j<N; j++)
X[i][j] = X[i][j] + K;
Each element of X is double (eight bytes)
Loop is executed (M*N) times
Placing the code in cache avoids access to main memory
Repetitive use (one of the factors)
Temporal locality
Prefetching data
Spatial locality
2003

S. Dandamudi

Chapter 17: Page 8

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

How Cache Memory Works (contd)


300

Execution time (ms

250

Column-order

200
150
100

Row-order

50
0
500

600

700

800

900

1000

Matrix size

2003

S. Dandamudi

Chapter 17: Page 9

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Cache Design Basics


On every read miss
A fixed number of bytes are transferred
More than what the processor needs
Effective due to spatial locality

Cache is divided into blocks of B bytes


b-bits are needed as offset into the block
b = log2B
Block are called cache lines

Main memory is also divided into blocks of same


size
Address is divided into two parts
2003

S. Dandamudi

Chapter 17: Page 10

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Cache Design Basics (contd)


B = 4 bytes
b = 2 bits

2003

S. Dandamudi

Chapter 17: Page 11

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Cache Design Basics (contd)


Transfer between main
memory and cache
In units of blocks
Implements spatial locality

Transfer between main


memory and cache
In units of words

Need policies for

2003

Block placement
Mapping function
Block replacement
Write policies
S. Dandamudi

Chapter 17: Page 12

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Cache Design Basics (contd)


Read cycle operations

2003

S. Dandamudi

Chapter 17: Page 13

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function
Determines how memory blocks are mapped to
cache lines
Three types
Direct mapping
Specifies a single cache line for each memory block

Set-associative mapping
Specifies a set of cache lines for each memory block

Associative mapping
No restrictions
Any cache line can be used for any memory block

2003

S. Dandamudi

Chapter 17: Page 14

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Direct mapping example

2003

S. Dandamudi

Chapter 17: Page 15

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Implementing direct mapping
Easier than the other two
Maintains three pieces of information
Cache data
Actual data
Cache tag
Problem: More memory blocks than cache lines
4Several memory blocks are mapped to a cache line
Tag stores the address of memory block in cache line
Valid bit
Indicates if cache line contains a valid block
2003

S. Dandamudi

Chapter 17: Page 16

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

2003

S. Dandamudi

Chapter 17: Page 17

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

Direct mapping
Reference pattern:
0, 4, 0, 8, 0, 8,
0, 4, 0, 4, 0, 4
Hit ratio = 0%

2003

S. Dandamudi

Chapter 17: Page 18

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

Direct mapping
Reference pattern:
0, 7, 9, 10, 0, 7,
9, 10, 0, 7, 9, 10
Hit ratio = 67%

2003

S. Dandamudi

Chapter 17: Page 19

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Associative mapping

2003

S. Dandamudi

Chapter 17: Page 20

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

Associative
mapping
Reference pattern:
0, 4, 0, 8, 0, 8,
0, 4, 0, 4, 0, 4

Hit ratio = 75%

2003

S. Dandamudi

Chapter 17: Page 21

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

Address match logic for


associative mapping

2003

S. Dandamudi

Chapter 17: Page 22

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Associative cache with address match logic

2003

S. Dandamudi

Chapter 17: Page 23

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Set-associative mapping

2003

S. Dandamudi

Chapter 17: Page 24

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Address partition in set-associative mapping

2003

S. Dandamudi

Chapter 17: Page 25

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

Set-associative
mapping
Reference pattern:
0, 4, 0, 8, 0, 8,
0, 4, 0, 4, 0, 4
Hit ratio = 67%
2003

S. Dandamudi

Chapter 17: Page 26

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Replacement Policies
We invoke the replacement policy
When there is no place in cache to load the memory
block

Depends on the actual placement policy in effect


Direct mapping does not need a special replacement
policy
Replace the mapped cache line

Several policies for the other two mapping functions


Popular: LRU (least recently used)
Random replacement
Less interest (FIFO, LFU)
2003

S. Dandamudi

Chapter 17: Page 27

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Replacement Policies (contd)


LRU
Expensive to implement
Particularly for set sizes more than four

Implementations resort to approximation


Pseudo-LRU
Partitions sets into two groups
Maintains the group that has been accessed recently
Requires only one bit
Requires only (W-1) bits (W = degree of associativity)
PowerPC is an example
4Details later
2003

S. Dandamudi

Chapter 17: Page 28

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Replacement Policies (contd)


Pseudo-LRU
implementation

2003

S. Dandamudi

Chapter 17: Page 29

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies
Memory write requires special attention
We have two copies
A memory copy
A cached copy

Write policy determines how a memory write operation


is handled
Two policies
Write-through
4Update both copies
Write-back
4Update only the cached copy
4Needs to be taken care of the memory copy
2003

S. Dandamudi

Chapter 17: Page 30

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Cache hit in a write-through cache

Figure 17.3a
2003

S. Dandamudi

Chapter 17: Page 31

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Cache hit in a write-back cache

2003

S. Dandamudi

Chapter 17: Page 32

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Write-back policy
Updates the memory copy when the cache copy is
being replaced
We first write the cache copy to update the memory copy

Number of write-backs can be reduced if we write only


when the cache copy is different from memory copy
Done by associating a dirty bit or update bit
Write back only when the dirty bit is 1
Write-back caches thus require two bits
A valid bit
A dirty or update bit
2003

S. Dandamudi

Chapter 17: Page 33

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Needed only in write-back caches

2003

S. Dandamudi

Chapter 17: Page 34

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Other ways to reduce write traffic
Buffered writes
Especially useful for write-through policies
Writes to memory are buffered and written at a later time
Allows write combining
4Catches multiple writes in the buffer itself

Example: Pentium
Uses a 32-byte write buffer
Buffer is written at several trigger points
An example trigger point
4Buffer full condition
2003

S. Dandamudi

Chapter 17: Page 35

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Write-through versus write-back
Write-through
Advantage
Both cache and memory copies are consistent
4Important in multiprocessor systems
Disadvantage
Tends to waste bus and memory bandwidth

Write-back
Advantage
Reduces write traffic to memory
Disadvantages
Takes longer to load new cache lines
Requires additional dirty bit
2003

S. Dandamudi

Chapter 17: Page 36

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Space Overhead
The three mapping functions introduce different
space overheads
Overhead decreases with increasing degree of
associativity
4 GB address space
Several examples in the text
32 KB cache

2003

S. Dandamudi

Chapter 17: Page 37

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Cache Misses


Three types
Compulsory misses
Due to first-time access to a block
Also called cold-start misses or compulsory line fills

Capacity misses
Induced due to cache capacity limitation
Can be avoided by increasing cache size

Conflict misses
Due to conflicts caused by direct and set-associative mappings
Can be completely eliminated by fully associative
mapping
Also called collision misses
2003

S. Dandamudi

Chapter 17: Page 38

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Cache Misses (contd)


Compulsory misses
Reduced by increasing block size
We prefetch more
Cannot increase beyond a limit
Cache misses increase

Capacity misses
Reduced by increasing cache size
Law of diminishing returns

Conflict misses
Reduced by increasing degree of associativity
Fully associative mapping: no conflict misses
2003

S. Dandamudi

Chapter 17: Page 39

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Caches
Separate instruction and data caches
Initial cache designs used unified caches
Current trend is to use separate caches (for level 1)

2003

S. Dandamudi

Chapter 17: Page 40

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Caches (contd)


Several reasons for preferring separate caches
Locality tends to be stronger
Can use different designs for data and instruction
caches
Instruction caches
Read only, dominant sequential access
No need for write policies
Can use a simple direct mapped cache implementation
Data caches
Can use a set-associative cache
Appropriate write policy can be implemented

Disadvantage
Rigid boundaries between data and instruction caches
2003

S. Dandamudi

Chapter 17: Page 41

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Caches (contd)


Number of cache levels
Most use two levels
Primary (level 1 or L1)
On-chip
Secondary (level 2 or L2)
Off-chip

Examples
Pentium
L1: 32 KB
L2: up to 2 MB
PowerPC
L1: 64 KB
L2: up to 1 MB
2003

S. Dandamudi

Chapter 17: Page 42

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Caches (contd)


Two-level caches work as follows:
First attempts to get data from L1 cache
If present in L1, gets data from L1 cache (L1 cache hit)
If not, data must come form L2 cache or main memory (L1
cache miss)

In case of L1 cache miss, tries to get from L2 cache


If data are in L2, gets data from L2 cache (L2 cache hit)
Data block is written to L1 cache
If not, data comes from main memory (L2 cache miss)
Main memory block is written into L1 and L2 caches

Variations on this basic scheme are possible


2003

S. Dandamudi

Chapter 17: Page 43

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Caches (contd)

Virtual and physical caches

2003

S. Dandamudi

Chapter 17: Page 44

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations
We look at three processors
Pentium
PowerPC
MIPS

Pentium implementation
Two levels
L1 cache
Split cache design
4Separate data and instruction caches
L2 cache
Unified cache design
2003

S. Dandamudi

Chapter 17: Page 45

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


Pentium allows each page/memory region to have
its own caching attributes
Uncacheable
All reads and writes go directly to the main memory
Useful for
4Memory-mapped I/O devices
4Large data structures that are read once
4Write-only data structures

Write combining
Not cached
Writes are buffered to reduce access to main memory
Useful for video buffer frames
2003

S. Dandamudi

Chapter 17: Page 46

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


Write-through
Uses write-through policy
Writes are delayed as they go though a write buffer as in write
combining mode

Write back
Uses write-back policy
Writes are delayed as in the write-through mode

Write protected
Inhibits cache writes
Write are done directly on the memory
2003

S. Dandamudi

Chapter 17: Page 47

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


Two bits in control register CR0 determine the
mode
Cache disable (CD) bit
w Not write-through (NW) bit

Write-back
2003

S. Dandamudi

Chapter 17: Page 48

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


PowerPC cache implementation
Two levels
L1 cache
Split cache
4Each: 32 KB eight-way associative
Uses pseudo-LRU replacement
Instruction cache: read-only
Data cache: read/write
4Choice of write-through or write-back
L2 cache
Unified cache as in Pentium
Two-way set associative
2003

S. Dandamudi

Chapter 17: Page 49

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


Write policy type and
caching attributes can be set
by OS at the block or page
level
L2 cache requires only a
single bit to implement
LRU
Because it is 2-way
associative

L1 cache implements a
pseudo-LRU
Each set maintains seven
PLRU bits (B0B6)

2003

S. Dandamudi

Chapter 17: Page 50

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


PowerPC placement
policy (incl. PLRU)

2003

S. Dandamudi

Chapter 17: Page 51

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


MIPS implementation
Two-level cache
L1 cache
Split organization
Instruction cache
4Virtual cache
L1 line size: 16 or 32 bytes
4Direct mapped
4Read-only
Data cache
4Virtual cache
4Direct mapped
4Uses write-back policy
2003

S. Dandamudi

Chapter 17: Page 52

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


L2 cache
Physical cache
Either unified or split
4Configured at boot time
Direct mapped
Uses write-back policy
Cache block size
416, 32, 64, or 128 bytes
4Set at boot time
L1 cache line size L2 cache size

Direct mapping simplifies replacement


No need for LRU type complex implementation
2003

S. Dandamudi

Chapter 17: Page 53

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Cache Operation Summary


Various policies used by cache
Placement of a block
Direct mapping
Fully associative mapping
Set-associative mapping

Location of a block
Depends on the placement policy

Replacement policy
LRU is the most popular
Pseudo-LRU is often implemented

Write policy
Write-through
Write-back
2003

S. Dandamudi

Chapter 17: Page 54

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Design Issues
Several design issues
Cache capacity
Law of diminishing
returns

2003

Cache line size/block size


Degree of associativity
Unified/split
Single/two-level
Write-through/write-back
Logical/physical

S. Dandamudi

Chapter 17: Page 55

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Design Issues (contd)

Last slide

2003

S. Dandamudi

Chapter 17: Page 56

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Vous aimerez peut-être aussi