Vous êtes sur la page 1sur 8

2/18/2009

Quick word about Branch Likely


Normal conditional branches:
1 delay slot Delay-slot instruction is always executed If delay slot cannot be filled, have to insert a NOP (noop) instruction one-cycle penalty

EE 560 Lecture 4
MIPS Instruction Set and MMU Samuel H. Russ
EE 560 Lecture 4 - Russ 1

Likely conditional branches:


The instruction after the conditional branch is treated as the first instruction after taking the branch If branch is not taken, it is flushed Used by the compiler when it cannot fill the delay slot Since taking the branch is likely, it is usually better than executing a NOP
EE 560 Lecture 4 - Russ 2

Branch Likely example


Consider the following piece of code
Add Add Beq Nop Add ... Add Add ... $8,$4,$6 $7,$2,$3 $7,$8,BLAH ;If (Reg7=Reg8) jump to BLAH $6,$8,$4 $6,$8,$5 $3,$2,$1
Must put a NOP here the contents of register 6 are directly impacted by the branch

Now do a branch likely


Consider the following piece of code
Add $8,$4,$6 Add $7,$2,$3 Beql $7,$8,BLAH Add $6,$8,$5 First instruction at BLAH is moved here flushed if Add $6,$8,$4 branch is not taken ... Add $3,$2,$1 First branch not taken instruction moves down one ...

BLAH:

BLAH:

Most of the time the branch is taken and the delay-slot instruction is executed normally. Sometimes the branch is not taken, but the flush is no worse than the original NOP. Usually better and never worse that works!
EE 560 Lecture 4 - Russ 3 EE 560 Lecture 4 - Russ 4

2/18/2009

Quick MMU Review


The MMU translates a virtual address to a physical address
Virtual address = Address used by the programmer Physical address = Address used to access physical memory

Typical instruction-address example


Generate virtual instruction address Perform virtual-to-physical translation If it misses (address is not in physical memory), stop everything and page it in Use physical address to access cache and/or main memory Receive the fetched instruction and begin decoding it
5 EE 560 Lecture 4 - Russ 6

Question: Which address does cache use?


EE 560 Lecture 4 - Russ

TLB Implementation
Different processors have different mixes of software and hardware to do virtual-tophysical translations In the MIPS, the processor has small number of TLB entries cached on-chip
Called the Micro-TLB

MIPS Address Space


MIPS makes provisions for operating systems that run in a protected kernel mode Active processes have 2GB of virtual address space The kernel has its own 2GB of virtual address space
1GB is mapped in the TLB and cached 0.5GB is not mapped and cached 0.5GB is not mapped and not cached (R4000 is a little different)

The rest of the TLB can reside in memory More details to follow on the MIPS TLB
EE 560 Lecture 4 - Russ 7

The kernel can select whether items are mapped in virtual memory and/or cached by mapping the address of the item
Common example: Map I/O devices (which need to be uncached) to the uncached area of the address space
EE 560 Lecture 4 - Russ 8

2/18/2009

MIPS address space continued


MIPS R2000/3000/6000 mapping:
Bit 31 0 1 1 1 Bit 30 X 0 0 1 Bit 30 X 0 0 1 1 Bit 29 X 0 1 X Bit 29 X 0 1 0 1 Type User Cached Y Mapped in VM Y N N Y Mapped in VM Y N N Y Y
9

ASIDs and Unmapped Address Spaces


Common problem in virtual-memory systems: Context switches and exceptions cause the TLB to become polluted
Root cause: Every context and every event handler potentially has shared virtual addresses Example: Two programs may have a variable located at address $0000f800

Kernel Y Kernel N Kernel Y Type User Cached Y

MIPS R4000 mapping:


Note extra region for supervisor state between user and kernel

Bit 31 0 1 1 1 1

Solution: Create an Address Space Identifier


Identifies which task or exception handler each TLB entry is associated with

Kernel Y Kernel N Supr Y Kernel Y

ASID permits different tasks with the same virtual addresses to have separate TLB entries Kernel accesses can be global if desired
The ASID is ignored
EE 560 Lecture 4 - Russ 10

EE 560 Lecture 4 - Russ

R2000/3000 Address Mapping


Low 12 bits are an offset into the 4kB page Next 20 bits (bits 31 to 12) are the virtual page number (VPN) Most significant bits of the VPN flag whether it is kernel / user, mapped / unmapped, etc. as described above If the virtual address is mapped in VM, TLB searches for a match to the VPN and 6-bit ASID
A matching entry contains the 20 bits of the physical address (plus some flags) used to access cache and RAM A non-matching entry is a TLB miss requires additional software action

R4000 and R6000 Address Mapping


R4000:
Variable page size (4k to 16M), so the page offset field (least significant address bits) is variable size ASID is 8 bits

R6000:
16k page size, so the low 14 bits are the page offset and the most significant 18 are the VPN ASID is 8 bits
11 EE 560 Lecture 4 - Russ 12

EE 560 Lecture 4 - Russ

2/18/2009

A quick picture
Programs ASID Programs Virtual Address

Example TLB Entry (R2000/R3000)


Virtual Page Number 20 bits ASID 6 bits Inputs to TLB NDVG Outputs from TLB Match Page Frame Number 20 bits Flags 4 bits Page Offset 12 bits

Virtual Page Number


Top 3 bits flag the 20 bits Top 3 bits flag the type of address type of address

Micro-TLB Micro-

Page Frame Number 20 bits

Page Offset 12 bits

Physical Address
EE 560 Lecture 4 - Russ 13

Noncacheable Flag: Page is marked noncacheable Dirty Flag: Dirty actually means the page is writable (able to be dirty) Valid Flag: Indicates if the entry is valid Global Flag: If set, ignore the ASID
EE 560 Lecture 4 - Russ 14

Differences in R4000 and R6000


R4000:
Variable VPN / PFN / Offset size 8-bit ASID 3-bit C flag: Specifies caching algorithm for the page supports multiple cache policies 24-bit PFN: Supports 36-bit physical address

Example TLB Entry (R4000)


Virtual Page Number Up to 20 bits ASID 8 bits Inputs to TLB CDVG Outputs from TLB Match Page Frame Number Up to 24 bits Flags 6 bits

R6000:
Micro-TLB is not on chip 3-bit CCA field cache-coherency algorithm
EE 560 Lecture 4 - Russ 15

Larger virtual memory page size shrinks the size of the page number fields C field (3 bits): Specifies if page is cacheable and, if so, which policy to use Dirty Flag: Dirty actually means the page is writable (able to be dirty) Valid Flag: Indicates if the entry is valid (that is, actually resident in RAM) Global Flag: If set, ignore the ASID
EE 560 Lecture 4 - Russ 16

2/18/2009

What happens with there is a TLB Miss?


First event: Need to map the new address into a new on-chip Micro-TLB entry
R2000/3000 and R4000 use a random replacement policy randomly selects a TLB entry for eviction The randomly selected TLB entry and the new TLB entry are swapped Not the same as a page fault simply means that the TLB entry itself is not on-chip

What else can go wrong?


User-mode process tries to access kernel address space Address error exception Valid bit of the entry is not set the entry has been marked as invalid same as a TLB miss Dirty bit is clear meaning the page is not writable creates a TLB Mod exception
Example: trying to write to the instruction area of a program
EE 560 Lecture 4 - Russ 18

Second event: New TLB entry is checked to see if it is in physical RAM


If not, page fault: Must swap the page of memory in from disk and swap out some other page back to disk Selection process of page to swap to disk is left to the operating system
EE 560 Lecture 4 - Russ 17

More details
A small number of Micro-TLB entries cannot be evicted
R2000/3000: 8 entries R4000: Controlled by a register
Wired register marks the entry number of the first entry that is allowed to be evicted Example: Wired=8 means that TLB entries 8 to 47 are allowed to be evicted

How does the processor control the TLB?


4 special instructions can manipulate TLB entries
Probe: Tries to find a match for an indicated virtual address Read: Finds the virtual address of the indicated entry Write Index: Writes a complete TLB entry to the entry indicated by the index register Write Random: Writes a complete TLB entry to an entry selected at random

The processor can also read/write various status registers


EE 560 Lecture 4 - Russ 20

EE 560 Lecture 4 - Russ

19

2/18/2009

What lies beyond the micro-TLB?


Up until now we have been discussing the micro-TLB
A small cache of page-table entries on-chip with associated hardware acceleration for virtual-to-physical translation

O/S level
The O/S must have a much larger translation structure
The MIPS micro-TLBs only hold 64 pages of mapping

Two questions
What if there is not a micro-TLB entry for the page what does the O/S do? How does (or how can) the micro-TLB interact with cache?
EE 560 Lecture 4 - Russ 21

Two methods are commonly used


Forward page table Inverted or hashed page table

EE 560 Lecture 4 - Russ

22

Forward Page Table

Forward Page Table Continued


In the forward case, there is a table entry for every piece of virtual memory that is in use
Final Final Physical Physical Address Address

Table size = f(Virtual memory in use)

Add a piece of the virtual address to a base Use the sum to perform a table lookup for the next base address Repeat the process a few times
EE 560 Lecture 4 - Russ 23 EE 560 Lecture 4 - Russ 24

2/18/2009

Inverted Page Table

Inverted Page Table, ctd.


Hashing function is used so that all active virtual addresses can map to a small number of table entries Size of table only has to be as large as physical memory
Table size = f(Physical memory size)

Hashing function can cause more virtual addresses to map to the same location than the capacity of the table
Virtual page number is hashed and used to calculate an offset into a table
A hash is a non-linear function to convert one number into a different, nearly random, but repeatable, number

In example above, may have more than 4 virtual addresses map to that row Need a way to handle table overflow
25 EE 560 Lecture 4 - Russ 26

Table contains a small number of entries that must be searched for a hit
EE 560 Lecture 4 - Russ

How do other processors do it?


MIPS (and SPARC and Alpha) have a hardware micro-TLB and then hand off all the mapping to software
Software TLB miss handler

Interaction of TLB and Cache


Result of TLB lookup is an address to present to cache Nave implementation: Just send it to cache

PowerPC and IA-32 also call out an upperlevel hardware system to do the mapping
Hardware TLB miss handler

EE 560 Lecture 4 - Russ

27

EE 560 Lecture 4 - Russ

28

2/18/2009

Alternate TLB/Cache arrangement

Offset is always sent to cache directly (TLB does not alter the offset) D-cache contains the physical page number as well as the data TLB lookup is in parallel with cache lookup Then check to see if you had a hit Parallel cache and TLB access can be a big advantage Some MIPS family members do this
EE 560 Lecture 4 - Russ 29

Vous aimerez peut-être aussi