Vous êtes sur la page 1sur 45

ECE 2300

Digital Logic & Computer Organization


Spring 2018

Caches

Lecture 20: 1
Announcements
• HW7 will be posted tonight

• Instructor OH cancelled today

• Lab sessions resume next week

Lecture 20: 2
Course Content
• Binary numbers and logic gates
• Boolean algebra and combinational logic
• Sequential logic and state machines
• Binary arithmetic
• Memories
• Instruction set architecture
• Processor organization
• Caches and virtual memory
• Input/output
• Advanced topics

Lecture 20: 3
Review: Pipelined Microprocessor
PCJ CU
=? LD
MW, MD
sign bit MB, F

Adder
+2 Fm … F0 Data
M
RF U M RAM
M LD X
P U
Decoder

U Inst SA X M
M ALU
X C RAM SB
M
U U
DR X D_IN
U
X
M
U
X
PCJ X MB
D_in MW MD
PCL
SE
IF/ID ID/EX EX/MEM MEM/WB

Lecture 20: 4
Example: Data Hazards with Forwarding
• Assume HW forwarding and NO delay slot for load

• Identify all data hazards in the following instruction


sequences by circling each source register that is read
before the updated value is written back
LW R2, 0(R1)
X: SW R2, 4(R1)
ADDI R3, R2, 1
BEQ R3, R1, X

Lecture 20: 5
We Need Fast and Large Memory
IF ID EX MEM WB

Instruction Reg Reg


A Data
RAM L RAM
U

IF/ID ID/EX EX/MEM MEM/WB

• Processor cycle time: ~300ps-2ns (~3GHz-500MHz)


• DRAM
– Slow (10-50 ns for a read or write)
– Cheap (1 transistor + capacitor per bit cell)
• SRAM
– Fast (100’s of ps to few ns for a read/write)
– Expensive (6 transistors per bit cell)
Lecture 20: 6
Using Caches in the Pipeline
IF ID EX MEM WB

Instruction Reg Data Reg


A
Cache L Cache
(SRAM) U (SRAM)

IF/ID ID/EX EX/MEM MEM/WB

Main
Memory
(DRAM)

Lecture 20: 7
Cache
• Small SRAM memory that permits rapid access to a
subset of instructions or data
– If the data is in the cache (cache hit), we retrieve it without
slowing down the pipeline
– If the data is not in the cache (cache miss), we retrieve it from the
main memory (penalty incurred in accessing DRAM)

• The hit rate is the fraction of memory accesses found in


the cache
– The miss rate is (1 – hit rate)

Lecture 20: 8
Memory Access with Cache
• Average memory access time with cache:
Hit time + Miss rate * Miss penalty

• Example
– Main memory access time = 50ns
– Cache hit time = 2ns
– Miss rate = 10%

Average mem access time w/o cache = 50ns

Average mem access time w/ cache = 2 + 0.1*50 = 7ns

Lecture 20: 9
Why Caches Work: Principle of Locality
• Temporal locality
– If memory location X is accessed, then it is likely to
be accessed again in the near future
• Caches exploit temporal locality by keeping a referenced
instruction or data in the cache

• Spatial locality
– If memory location X is accessed, then locations near
X are likely to be accessed in the near future
• Caches exploit spatial locality by bringing in a block of
instructions or data into the cache on a miss

Lecture 20: 10
Some Important Terms
• Cache is partitioned into blocks
– Each cache block (or cache line) typically contains
multiple bytes of data
– A whole block is read or written during data transfer
between cache and main memory

• Each cache block is associated with a tag and


a valid bit
– Tag: A unique ID to differentiate between different
memory blocks may be mapped into the same block
– Valid bit: indicates whether the data in a cache block
is valid (1) or not (0)

Lecture 20: 11
Direct Mapped Cache Concepts
• A given memory block is mapped to one and
only one cache block

Cache Memory Example:


Block Block • A cache with 8 blocks
0 0, 8, 16, 24 – Block addresses in decimal
1 1, 9, 17, 25 • Assume the main
2 2, 10, 18, 26 memory is 4 times larger
than cache (i.e., 32
3 3, 11, 19, 27 blocks)
4 4, 12, 20, 28
5 5, 13, 21, 29
6 6, 14, 22, 30
7 7, 15, 23, 31

Lecture 20: 12
Direct Mapped (DM) Cache Concepts
00001

00101 Same example


• Cache has 8 blocks and
01001 main memory has 32
Cache blocks
01101 • Block addresses are in
binary
Memory
10001
• 4 different memory blocks
may be mapped to the
same cache location
10101

11001

11101
Lecture 20: 13

Lecture 21: 4
Address Translation for DM Cache

• Breakdown of a n-bit memory address for cache use

n-i-b tag bits i index bits


b byte offset bits

• DM cache parameters
– Size of each cache block is 2b bytes
• “cache block” and “cache line” are synonymous
– Number of blocks is 2i
– Total cache size is 2b × 2i = 2b+i bytes

Lecture 20: 14
DM Cache Organization
32-bit memory address
DM cache parameters
• 2 byte offset bits
• 10 index bits
• 20 tag bits

Lecture 20: 15
Reading DM Cache
• Use the index bits to retrieve
the tag, data, and valid bit

• Compare the tag from the


address with the retrieved tag

• If valid & a match in tag (hit), select


the desired data using the byte offset

• Otherwise (miss)
– Bring the memory block into the cache (also set valid)
– Store the tag from the address with the block
– Select the desired data using the byte offset

Lecture 20: 16
Writing DM Cache
• Use the index bits to retrieve
the tag and valid bit
Data

• Compare the tag from the


address with the retrieved tag

• If valid & a match in tag (hit),


write the data into the cache location

• Otherwise (miss), one option


– Bring the memory block into the cache (also set valid)
– Store the tag from the address with the block
– Write the data into the cache location

Lecture 20: 17
Direct Mapped Cache Example
• Size of each block is 4 bytes
• Cache holds 4 blocks
• Memory holds 16 blocks
• Memory address has 6 bits
V tag data
00
01
10
2 tag bits 2 byte offset bits 11

2 index bits

Lecture 20: 18
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 0 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 0 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250

Memory block
Data (decimal)
address (binary) Lecture 20: 19
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 1 00 100 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 0 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 20
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
miss 00 1 00 100 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 0 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 21
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
miss 00 1 00 100 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 22
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 1 00 100 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 23
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 1 01 140 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 24
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
00 1 01 140 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] miss 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 25
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
00 1 01 140 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] miss 10 0 0100 140
11 1 01 170 150
R1 <= M[000100] 0101
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 26
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 1 01 140 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
11 1 01 170 150
R1 <= M[000100] 0101
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 27
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 1 00 100 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
11 1 01 170 150
R1 <= M[000100] 0101
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 28
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
00 1 00 100 0010 120
R3 <= M[010000] hit
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
11 1 01 170 150
R1 <= M[000100] 0101
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 29
Direct Mapped Cache Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
00 1 00 100 0010 120
R3 <= M[010000] hit
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
11 1 01 170 150
R1 <= M[000100] 0101
0110 160
0111 170
R0 1000 180
R1 110 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 20: 30
Doubling the Block Size
• Size of each block is 8 bytes
• Cache holds 2 blocks
• Memory holds 8 blocks
• Memory address has 6 bits
V tag data
0
1
2 tag bits 3 byte offset bits
1 index bit

Lecture 20: 31
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 0 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 190
R2 101 200
R3 101 210
110 220
110 230
111 240
111 250

Lecture 20: 32
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 101 200
R3 101 210
110 220
110 230
111 240
111 250

Lecture 20: 33
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] hit
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 101 200
R3 101 210
110 220
110 230
111 240
111 250

Lecture 20: 34
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] hit
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 110 101 200
R3 101 210
110 220
110 230
111 240
111 250

Lecture 20: 35
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 110 101 200
R3 101 210
110 220
110 230
111 240
111 250

Lecture 20: 36
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 01 150 140 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 110 101 200
R3 140 101 210
110 220
110 230
111 240
111 250

Lecture 20: 37
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 01 150 140 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 110 101 200
R3 140 101 210
110 220
110 230
111 240
111 250

Lecture 20: 38
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 01 150 140 001 120
R2 <= M[011100] 1 1 01 170 160 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 170 101 200
R3 140 101 210
110 220
110 230
111 240
111 250

Lecture 20: 39
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 01 150 140 001 120
R2 <= M[011100] 1 1 01 170 160 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 170 101 200
R3 140 101 210
110 220
110 230
111 240
111 250

Lecture 20: 40
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 1 01 170 160 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 170 101 200
R3 140 101 210
110 220
110 230
111 240
111 250

Lecture 20: 41
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] hit
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 1 01 170 160 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 170 101 200
R3 140 101 210
110 220
110 230
111 240
111 250

Lecture 20: 42
Doubling the Block Size
Processor Cache Memory

000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] hit
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 1 01 170 160 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 110 100 190
R2 170 101 200
R3 140 101 210
110 220
110 230
111 240
111 250

Lecture 20: 43
Block Size Considerations
• Larger blocks may reduce miss rate due to
spatial locality

• But in a fixed-sized cache


– Larger blocks => fewer of them => increased miss rate
due to conflicts
– Larger blocks => data fetched along with the
requested data may not be used

• Larger blocks increase the miss penalty


– Takes longer to transfer a larger block from memory

Lecture 20: 44
Next Time

More Caches

Lecture 20: 45

Vous aimerez peut-être aussi