Vous êtes sur la page 1sur 15

Computer Architecture Memory Management

Memory Paging Segmentation Virtual Memory Caches


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Memory Hierarchy
The farther away from CPU, the larger and slower the memory. The hierarchy is the consequence of locality.
Caches

Memory hierarchy levels in typical desktop / server computers, figure from [HP06 p.288]

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Locality Principle
Programs tend to reuse data and instructions. Rule of thumb:
[HP06 p.38]

A program spends 90% of its execution time in only 10% of the code.
Temporal locality: recently accessed items are likely to be accessed in near future. Spatial locality: items whose addresses are near one another tend to be referenced close together in time.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Locality Principle

Example of a memory-access trace of a process

Figure from [Sil00 p.327] Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Caches
Cache: a safe place for hiding or storing things
Websters Dictionary [HP06 p. C-1]

Here: Fast memory that stores copies of data from the most frequently used main memory locations. Used by the CPU to reduce the average time to access memory locations. Effect: instructions (in execution) can proceed quicker.
Instruction fetch is quicker Memory operands are accessed quicker
from the CPUs point of view

Result: faster program execution improved system performance


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Cached Memory Access


Steps in accessing memory (here: reading from memory), simplified.
Caches

CPU requests content from a memory location Cache is checked for this datum When present, deliver datum from cache When not, transfer datum from main memory to cache Then deliver from cache to CPU

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Caches
To take advantage of spatial locality a cache contains blocks of data rather than individual bytes. A block is a contiguous line of processor words. It is also called a cache line. Common block sizes: 8 ... 128 bytes block transfer

Cache components
Data Area Tag Area word transfer
Computer Architecture WS 06/07

Attribute Area
Dr.-Ing. Stefan Freinatis

Data Area
Caches

All blocks in the cache make up the data area.


Block

N bytes per block

0 1 2 3 4

...

...

...
Data area

Cache capacity = B N bytes

Computer Architecture

WS 06/07

...
B1
Dr.-Ing. Stefan Freinatis

Tag Area
Caches

The block addresses of the cached blocks make up the 1 tags of the cache lines. All tags form the tag area.
Block

N byte per block

0 1 2 3 4

...

...
Tag area

...
Data area

1 The statement is slightly simplified. In real caches, often just a

fraction of the block address is used as tag.


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Attribute Area
Caches

The attribute area contains attribute bits for each cache line.

V D

Validity bit V
V = 1 data is valid V = 0 data is invalid

indicates whether the cache line holds valid data 0 N bytes per block
1 2 3

...
Attributes

4 Dirty bit D the cache line data is modified ...indicates whether...

with respect to main memory D = 1 data is modified Tag area Data area D = 0 data is not modified

Computer Architecture

WS 06/07

...

...
B1
Block

B1

Dr.-Ing. Stefan Freinatis

Caches
Each cache line plus its tag plus its attributes forms a slot.
Block / Slot

V D

N bytes per block

0 1 2

Cache slot

3 4

...
Attributes

...
Tag area

...
Data area

Computer Architecture

WS 06/07

Caches
How to find a certain byte in the cache? Caches

The address generated by the CPU is divided into two fields. High order bits make up the block address Low order bits determine the offset within that block
m block address m-n offset n

Block address is compared against all tags simultaneously In case of a match (cache hit), the offset selects the byte
Remark: CPU address space = 2m, Cache line size (block size) = 2n
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

...
B1
Dr.-Ing. Stefan Freinatis

Block Address
Memory can be considered as an array of blocks.
block address 0 1 2 3 4 5 6 7 8 9 0 4 8 12 16 20 24 28 32 36 4 bytes per block

Caches

Memory address (binary) 000000 000100 001000 001100 010000 010100 011000 011100 100000 100100

The block address should not be confused with the memory address at which the block starts. The block address is a block number.
block address = memory address DIV block size

Memory
memory address (decimal)

block address offset

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Caches
V D Tags Data

Caches

Cache mechanism

...

...

...

Comparator

hit / miss

Data out

block address

offset

CPU memory address


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Hit Rate
Caches

Cache capacity is smaller than the capacity of main memory. Consequently, not all memory locations can be mirrored in the cache. When a required datum is found in the cache, we have a cache hit, otherwise a cache miss.

The hit rate is the fraction of cache accesses that result in a hit.
Hit rate = number of hits number of memory accesses

The miss rate is the fraction of cache accesses that result in a miss.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Amdahls Law
Used to find the maximum expected improvement to an overall system when a part of the system is improved.
The law is a general law, not restricted to caches or computers.

I=

1 (1 P ) + P S

I: maximum expected improvement, I > 0 (usually I > 1) P: proportion of the system improved, 0 P 1 S: speedup of that proportion, S > 0, usually S > 1
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Amdahls Law
Example: 30% of the computations can be made twice as fast. P = 0.3, S = 2. Improvement I =

1 (1 0.3) + 0.3 2

1 = 1.177 0.7 + 0.15

Amdahls Law in the special case of parallelization

I=

1 (1 F ) F+ N

See lecture Advanced Computer Architecture

F: proportion of sequential calculations (no speedup possible), 0 F 1 N: grade of parallelism (e.g. N processors), N > 0
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Caches
CPU (registers) Memory space Access time 500 Byte 250 ps Cache (SRAM) 64 kB 1 ns Main memory (DRAM) 1 GB 100 ns I/O Devices (disks) 1 TB 10 ms

Example: Assume: Cache = 1 ns, main memory = 100 ns, 90% hit rate. What is the overall improvement?

P = 0.9, S =

100 ns = 100 1 ns

I=

1 (1 0.9) + 0.9 100

= 9.175

Memory accesses (as seen by the CPU) now are more than 9 times as fast than without a cache.

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Read Access
Reading from memory (improvement)

CPU requests datum Search cache while fetching block from memory Cache hit: deliver datum, discard fetched block Cache miss: put block in cache and deliver datum
In case of a hit, the datum is available quickly. In case of a miss there is no benefit from the cache, but also no harm. Things are not that easy when writing into memory. Lets look at the cases of a write hit and a write miss.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Write Hit Policy


Assume a write hit. How to keep cache and main memory consistent on write-accesses?

Write through
The datum is written to both the block in the cache and the block in memory.
CPU

Cache

Memory

Write Buffer

Write back

Cache always clean (no dirty bit required) CPU write stall (problem reduced through write buffer) Main memory always has the most current copy (cache coherency in multi-processor systems)

The datum is only written to the cache (dirty bit is set). The modified block is written to main memory once it is evicted from cache.
Write speed = cache speed Multiple writes to the same block still result in only one write to memory Less memory bandwidth needed
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Write Miss Policy


Assume a write miss. What to do?

Write allocate
The block containing the referenced datum is transferred from main memory to the cache. Then one of the write hit policies is applied. Normally used with write back caches.

No-write allocate
Write misses do not affect the cache. Instead the datum is modified only in main memory. Write hits however do affect the cache. Normally used with write through caches.

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Write Miss Policy


Assume an empty cache and the following sequence of memory operations.
WriteMem[100] WriteMem[100] ReadMem[200] WriteMem[200] WriteMem[100]

What are the number of hits and misses when using no-write allocate versus write allocate?
No-write allocate WriteMem[100] WriteMem[100] ReadMem[200] WriteMem[200] WriteMem[100] miss miss miss hit miss
WS 06/07

Write allocate miss hit miss hit hit


Dr.-Ing. Stefan Freinatis

Computer Architecture

Caches

Cache

Where exactly are the blocks placed in the cache? Cache Organization What if the cache if full? Replacement Strategies
... Memory
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Cache Organization
Where can a block be placed in the cache?

Direct Mapped
With this mapping scheme a memory block can be placed in only one particular slot. The slot number is calculated from
((memory address) DIV (blocksize)) MOD (slots in cache).

Fully Associative
The block can be placed in any slot.

Set Associative
The block can be placed in a restricted set of slots. A set is a group of slots. The block is first mapped onto the set and can then be placed anywhere within the set. The set number is calculated from
(memory address) DIV (block size) MOD (number of sets in cache).
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Direct Mapped
Each memory block is mapped to exactly one slot in the cache (many-to-one mapping).
Memory
0 4 8 12 16 20 24 28

Cache

Slot

0 1 2 3

...

...

Block size = 4 byte Cache capacity = 4 x 4 = 16 byte


If slot occupied (V = 1) evict cache line

memory address (decimal)


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Direct Mapped
Slot = ((memory address) DIV (blocksize)) MOD (slots in cache).
offset within slot = (memory address) MOD (blocksize).

Examples
In which slot goes the block located at address 12D? 12 DIV 4 = 3 3 MOD 4 = 3 (slot 3) In which slot goes the block located at address 20D? 20 DIV 4 = 5 5 MOD 4 = 1 (slot 1)

MOD 4

Where goes the byte located at address 23D? 23 DIV 4 = 5 5 MOD 4 = 1 The byte goes in cache line (slot) 1 at offset 3 23 MOD 4 = 3
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Direct Mapped
Extracting slot number and offset directly from memory address
block address

m tag bits m-n slot offset n

The lower bits of the block address select the slot. The size of the slot field depends on the number of slots (size = ld(number of slots)).

Example

ld = logarithmus dualis (base 2)

Where goes the byte located at address 23D?

23D = 1 0 1 1 1B
slot offset

Slot 1, offset 3
WS 06/07 Dr.-Ing. Stefan Freinatis

Computer Architecture

Direct Mapped
Address (showing bit positions)
31 30 29 28

. . . . . . . . 19 18 17 16 15 14 13 12 . . . . 7 6 5 4 3 2 1 0

Slot
Hit 16 12

Word Byte offset offset

Data

16 bits Valid Tag Data

128 bits

16k 4k entries

lines

16

32

32

32

32

64 kByte cache using four-word (16 Byte) blocks

MUX 32

Figure from lecture CA WS05/06

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Direct Mapped
Explanations for previous slide

Logical address space of CPU: 232 byte Number of cache slots: 64kB / 16 Byte = 4K = 4096 slots. Bit 0,1 determine the position of the selected byte in a word. However, as the CPU uses 4-byte words as smallest entity, the byte offset is not used. Bit 2,3 determine the position of the word within a cache line. Bits 4 to 15 (12 bits) determine the slot. 212 = 4K = number of slots. Bits 16 to 31 are compared against the tags to see whether or not the block is in the cache.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Vous aimerez peut-être aussi