Académique Documents
Professionnel Documents
Culture Documents
CSCE430/830
Individual
Pages
Virtual
CSCE430/830
Memory
Memory
Map
Physical
Memory
Disk
Memory: Virtual Memory
Size of frames/pages is
defined by hardware (power
of 2 to ease calculations)
HARDWARE
An address is determined by:
page number ( index into table ) + offset
---> mapping into --->
base address ( from table ) + offset.
CSCE430/830
0a
1b
2c
3d
4e
5f
6g
7h
8I
9j
10 k
11 l
12 m
13 n
14 o
15 p
Logical Memory
CSCE430/830
0
4
1
8
0
1
2
3
5
6
1
2
Page Table
2
3
12
16
m
n
o
p
20
a
b
c
d
24
e
f
g
h
5
6
I
j
k
l
28
Memory: Virtual Memory
Physical Memory
Virtual address
31 30 29 28 27
15 14 13 12 11 10 9 8
Virtual page number
Page offset
20
Valid
3 2 1 0
12
Physical page number
Page table
18
If 0 then page is not
present in memory
29 28 27
15 14 13 12 11 10 9 8
Physical page number
3 2 1 0
Page offset
Physical address
CSCE430/830
CSCE430/830
TLB Structure
Virtual page
number
TLB
Valid
Tag
Physical page
address
1
1
Physical memory
1
1
0
1
Page table
Physical page
Valid or disk address
1
1
1
Disk storage
1
0
1
1
0
1
1
0
1
CSCE430/830
15 14 13 12 11 10 9 8
Virtual page number
3210
Page offset
20
Valid Dirty
12
Tag
TLB
TLB hit
20
16
Valid
Tag
Byte
offset
2
Data
Cache
32
Cache hit
CSCE430/830
Data
VA
CPU
Translation
with a TLB
TLB
Lookup
miss
miss
Cache
Main
Memory
hit
Translation
data
1/2 t
CSCE430/830
20 t
Memory: Virtual Memory
VA
CPU
miss
PA
Translation
Cache
Main
Memory
hit
data
It takes an extra memory access to translate VA to PA
This makes cache access very expensive, and this is the "innermost loop"
that you want to go as fast as possible
CSCE430/830
CSCE430/830
28
CSCE430/830
CSCE430/830
Virtual Memory
Crosscutting Issues: The Design of Memory Hierarchies
Superscalar CPU and Number of Ports to the Cache
Cache must provide sufficient peak bandwidth to benefit from
multiple issues. Some processors increase complexity of
instruction fetch by allowing instructions to be issued to be
found on any boundary instead of, say, multiples of 4 words.
Speculative Execution and the Memory System
Speculative and conditional instructions generate exceptions (by
generating invalid addresses) that would otherwise not occur,
which in turn can overwhelm the benefits of speculation with
the exception handling overhead. Such CPUs must be matched
with non-blocking caches and only speculate on L1 misses
(due to the unbearable penalty of L2).
Combining Instruction Cache with Instruction Fetch and Decode
Mechanisms
Increasing demand for ILP and clock rate has led to the merging
of the first part of instruction execution with instruction cache,
by incorporating trace cache (which combines branch
prediction with instruction fetch) and storing the internal RISC
operations in the trace cache (e.g., Pentium 4s NetBurst
microarchitecture). A cache hit in the merged cache saves
portion of the instruction execution cycles.
CSCE430/830
Virtual Memory
Crosscutting Issues: The Design of Memory Hierarchies
Embedded Computer Caches and Real-Time Performance
In real-time applications, variation of performance matters much
more than average performance. Thus, caches that offer
average performance enhancement have to be used carefully.
Instruction caches are often used due to the highly
predictability of instructions; whereas data caches are locked
down, forcing them to act as small scratchpad memory under
program control.
Embedded Computer Caches and Power
It is much more power efficient to access on-chip memory than
to access off-chip one (which needs to drive the pins, buses
and activate external memory chips, etc). Other techniques,
such as way prediction, can be used to save power (by only
powering half of the two-way set-associative cache).
I/O and Consistency of Cached Data
Cache coherence problem must be addressed when I/O devices
also share the same cached data.
CSCE430/830
CSCE430/830