Académique Documents
Professionnel Documents
Culture Documents
Computer Architecture
Spring 2004
Harvard University
Instructor: Prof. David Brooks
dbrooks@eecs.harvard.edu
Lecture 5: Exceptions, Multicycle Ops,
Dynamic Scheduling
Computer Science 146
David Brooks
Lecture Outline
Homework 1 Questions?
Finish up on control hazards
Multicycle Operations
Hazards
Precise Exceptions
Control Hazards
Cycle
Branch Instr.
IF
Instr +1
ID
EX
MEM
WB
IF
stall
stall
IF
ID
EX
MEM
stall
stall
stall
IF
ID
EX
Instr +2
Branch Instr.
IF
Instr +1
Instr +2
ID
EX
MEM
WB
stall
IF
ID
EX
MEM
WB
stall
IF
ID
EX
MEM
WB
Control Hazards:
Branch Characteristics
Integer Benchmarks: 14-16% instructions are
conditional branches
FP: 3-12%
On Average:
3. Assume taken
4. Delay Branches
Control Hazards:
Assume Not Taken
Cycle
Untaken Branch
IF
ID
EX
MEM
WB
Instr +1
IF
Instr +2
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
Taken Branch
IF
ID
EX
MEM
WB
Instr +1
IF
Branch Target
Branch Target +1
flush
flush
flush
flush
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
Control Hazards:
Branch Delay Slots
Find one instruction that will be executed no matter which
way the branch goes
Now we dont care which way the branch goes!
Harder than it sounds to find instructions
Interrupt Taxonomy
Synchronous vs. Asychronous (HW error, I/O)
User Request (exception?) vs. Coerced
User maskable vs. Nonmaskable (Ignorable)
Within vs. Between Instructions
Resume vs. Terminate
The difficult exceptions are resumable interrupts
within instructions
Save the state, correct the cause, restore the state,
continue execution
Computer Science 146
David Brooks
Interrupt Taxonomy
Exception Type
Sync vs.
Async
User Request
Vs. Coerced
Within vs.
Between Insn
Resume vs.
terminate
Async
Coerced
Nonmask
Between
Resume
Invoke O/S
Sync
User
Nonmask
Between
Resume
Tracing Instructions
Sync
User
Maskable
Between
Resume
Breakpoint
Sync
User
Maskable
Between
Resume
Arithmetic Overflow
Sync
Coerced
Maskable
Within
Resume
Sync
Coerced
Nonmask
Within
Resume
Misaligned Memory
Sync
Coerced
Maskable
Within
Resume
Sync
Coerced
Nonmask
Within
Resume
Sync
Coerced
Nonmask
Within
Terminate
Hardware/Power Failure
Async
Coerced
Nonmask
Within
Terminate
IF
ID
EXE
Arithmetic Overflow
MEM
WB
Misaligned Memory
IF
ID
IF
EX
ID
MEM
EX
WB
MEM
WB
Summary of Exceptions
Multicycle Operations
Basic RISC pipeline
All operations take 1 cycle
Difficulties
Hard to pipeline
Differ in number of clock cycles
Number of operands varies
Computer Science 146
David Brooks
Multicycle Operations
For example, longer latency in FP
unit
EX may continue for as long as FP
takes to finish
Assume four separate ALUs
Integer unit
FP/Integer Multiplier
FP Adder
FP/Integer Divider
Multicycle Metrics
Initiation Interval
Number of cycles that must elapse between issuing 2
operations of a given type
Latency
Number of cycles between an instruction that produces
a result and an instruction that uses the result
Functional Unit
Latency
Initiation Interval
Integer ALU
Data Memory
FP Add
FP Multiply
FP Divide
24
24
Multicycle Example
IF
ID EX MEM WB
IF
ID
E1
IF
ID
E2
E3
------Stall-------
E25
MEM
WB
E1
E2
Pipelining a FP Multiplier
Pipelining a FP Multiplier:
Limits to Pipelining
Cumulative Number of Latches
12
10
8
10FO4
13FO4
16FO4
19FO4
0
10
20
30
40
50
60
70
80
90
10
Multicycle: Hazards/Exceptions
New Issues
Structural Hazards
FP Divide not pipelined
Too much hardware needed
11
WAW Hazards
Why werent they a problem before?
MULD F0, F4, F6
IF
ID M1 M2
IF
M3
M4
M5
ID
EX
MEM WB
IF
ID
EX
MEM WB
IF
ID
A1
A2
M6 M7 MEM WB
A3
A4
MEM WB
12
DIVD
ADDD
SUBD
F0, F2, F4
F10,F10,F8
F12,F12,F14
Solutions
1.
2.
3.
4.
13
Dynamic Scheduling
Scheduling tries to re-arrange instructions to
improve performance
Previously:
We assume when ID detects a hazard that cannot be
hidden by bypassing/forwarding pipeline stalls
AND, we assume the compiler tries to reduce this
Now:
Re-arrange instructions at runtime to reduce stalls
Why hardware and not compiler?
Computer Science 146
David Brooks
14
Goals of Scheduling
Goal of Static Scheduling
Compiler tries to avoids/reduce dependencies
Code Portability
More information available dynamically (run-time)
ISA can limit registers ID space
Speculation sometimes needs hardware to work well
Not everyone uses gcc O3
Computer Science 146
David Brooks
Dependency
Independent
15
How to implement?
Previously:
Instruction Decode and Operation Fetch are a single cycle
Now:
Split ID into two phases
Issue decode instructions, check for structural hazards
Read Operands Wait until no data hazards, then read ops
WAR
WAW
Scoreboarding Basics
Previously:
Instruction Decode and Operation Fetch are a single cycle
Now:
To support multiple instructions in ID stage, we need two
things
Buffered storage (Instruction Buffer/Window/Queue)
Split ID into two phases
IF
ID
EX
WB
EX
Standard Pipeline
I-Buffer/Scoreboard
IF
IS
Rd
WB
New Pipeline
16
Scoreboarding
Centralized scheme
No bypassing
WAR/WAW hazards
are a problem
Originally proposed
in CDC6600 (S. Cray,
1964)
17
Scoreboarding Stages
Read Operands (Or Issue!)
Read Operands (Check Data Hazards)
Check scoreboard for whether source operands are
available
Available?
No earlier issued active instructions will write register
No currently active FU is going to write it
Scoreboarding Stages
Execution/Write Result
Execution
Execute/Update scoreboard
Write Result
Scoreboard checks for WAR stalls and stalls completing
instruction, if necessary
Before, stalls only occur at the beginning of instructions,
now it can be at the end as well
Can happen if:
Completing instruction destination register matches an older
instruction that has not yet read its source operands
Computer Science 146
David Brooks
18
Scoreboard Example
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Busy
Op
dest
Fi
S1
Fj
S2
Fk
F2
F4
F6
F8
FU
Qj
FU
Qk
Fj?
Rj
Fk?
Rk
...
F30
No
No
No
No
No
F10 F12
FU
Example courtesy of Prof. Broderson, CS152, UCB, Copyright (C) 2001 UCB
Computer Science 146
David Brooks
19
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Busy
Op
dest
Fi
Yes
No
No
No
No
Load
F6
F2
F4
S1
Fj
S2
Fk
FU
Qj
FU
Qk
Fj?
Rj
R2
F6
F8
Fk?
Rk
Yes
F10 F12
...
F30
Fj?
Rj
Fk?
Rk
Integer
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Busy
Op
dest
Fi
Yes
No
No
No
No
Load
F6
F2
F4
FU
S1
Fj
S2
Fk
FU
Qj
FU
Qk
R2
F6
F8
Yes
F10 F12
...
F30
Integer
20
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Busy
Op
dest
Fi
Yes
No
No
No
No
Load
F6
F2
F4
S1
Fj
S2
Fk
FU
Qj
FU
Qk
Fj?
Rj
R2
F6
F8
Fk?
Rk
No
F10 F12
...
F30
Fj?
Rj
Fk?
Rk
...
F30
Integer
Issue MULT?
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Busy
Op
dest
Fi
S1
Fj
S2
Fk
F2
F4
F6
F8
FU
FU
Qk
No
No
No
No
No
FU
Qj
F10 F12
Integer
21
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Busy
Op
dest
Fi
S1
Fj
Yes
No
No
No
No
Load
F2
F2
F4
S2
Fk
FU
Qj
FU
Qk
Fj?
Rj
R3
F6
F8
Fk?
Rk
Yes
F10 F12
...
F30
Fj?
Rj
Fk?
Rk
Integer
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
Busy
Op
dest
Fi
S1
Fj
S2
Fk
FU
Qj
Yes
Yes
No
No
No
Load
Mult
F2
F0
F2
R3
F4
Integer
No
Yes
Yes
F2
F4
F6
F8
F10 F12
...
F30
FU
Qk
FU Mult1 Integer
22
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
3
7
Busy
Op
dest
Fi
S1
Fj
S2
Fk
FU
Qj
Yes
Yes
No
Yes
No
Load
Mult
F2
F0
F2
R3
F4
Integer
Sub
F8
F6
F2
F2
F4
F6
FU Mult1 Integer
F8
FU
Qk
Integer
F10 F12
Fj?
Rj
Fk?
Rk
No
No
Yes
Yes
No
...
F30
Add
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
3
7
Busy
Op
dest
Fi
S1
Fj
S2
Fk
FU
Qj
Yes
Yes
No
Yes
Yes
Load
Mult
F2
F0
F2
R3
F4
Integer
Sub
Div
F8
F10
F6
F0
F2
F6
F2
F4
F6
FU Mult1 Integer
F8
FU
Qk
Fj?
Rj
Fk?
Rk
No
No
Yes
Mult1
Yes
No
No
Yes
F10 F12
...
F30
Integer
Add Divide
23
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
3
7
4
8
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Sub
Div
F8
F10
F6
F0
F2
F6
F2
F4
F6
No
Yes
No
Yes
Yes
F8
FU
Qj
FU
Qk
Fj?
Rj
Fk?
Rk
Yes
Yes
Mult1
Yes
No
Yes
Yes
F10 F12
...
F30
Fj?
Rj
Fk?
Rk
Yes
Yes
Mult1
Yes
No
Yes
Yes
F10 F12
...
F30
Add Divide
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
4
8
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Sub
Div
F8
F10
F6
F0
F2
F6
F2
F4
F6
Time Name
Integer
10 Mult1
Mult2
2 Add
Divide
No
Yes
No
Yes
Yes
FU Mult1
F8
FU
Qj
FU
Qk
Add Divide
24
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
4
8
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Sub
Div
F8
F10
F6
F0
F2
F6
F2
F4
F6
No
Yes
No
Yes
Yes
10
F8
FU
Qj
FU
Qk
Fj?
Rj
Fk?
Rk
No
No
Mult1
No
No
No
Yes
F10 F12
...
F30
Fj?
Rj
Fk?
Rk
No
No
Mult1
No
No
No
Yes
F10 F12
...
F30
Add Divide
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
11
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Sub
Div
F8
F10
F6
F0
F2
F6
F2
F4
F6
No
Yes
No
Yes
Yes
3
7
4
8
FU Mult1
F8
FU
Qj
FU
Qk
Add Divide
25
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
4
8
11
12
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Div
F10
F0
F6
F2
F4
F6
F8
No
Yes
No
No
Yes
12
FU
Qj
FU
Qk
Fj?
Rj
Fk?
Rk
No
No
Mult1
No
Yes
F10 F12
...
F30
Fj?
Rj
Fk?
Rk
No
No
Mult1
Yes
No
Yes
Yes
F10 F12
...
F30
Divide
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
4
8
11
12
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Add
Div
F6
F10
F8
F0
F2
F6
F2
F4
F6
F8
No
Yes
No
Yes
Yes
FU Mult1
Add
FU
Qj
FU
Qk
Divide
26
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
4
8
11
12
14
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Add
Div
F6
F10
F8
F0
F2
F6
F2
F4
F6
F8
No
Yes
No
Yes
Yes
14
Add
FU
Qj
FU
Qk
Fj?
Rj
Fk?
Rk
No
No
Mult1
Yes
No
Yes
Yes
F10 F12
...
F30
Fj?
Rj
Fk?
Rk
No
No
Mult1
No
No
No
Yes
F10 F12
...
F30
Divide
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
4
8
11
12
14
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Add
Div
F6
F10
F8
F0
F2
F6
F2
F4
F6
F8
No
Yes
No
Yes
Yes
FU Mult1
Add
FU
Qj
FU
Qk
Divide
27
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
4
8
11
12
14
16
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Add
Div
F6
F10
F8
F0
F2
F6
F2
F4
F6
F8
No
Yes
No
Yes
Yes
16
Add
FU
Qj
FU
Qk
Fj?
Rj
Fk?
Rk
No
No
Mult1
No
No
No
Yes
F10 F12
...
F30
Divide
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
4
8
11
12
14
16
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Add
Div
F6
F10
F8
F0
F2
F6
F2
F4
F6
F8
No
Yes
No
Yes
Yes
FU Mult1
WAR Hazard!
Add
FU
Qj
FU
Qk
Fj?
Rj
Fk?
Rk
No
No
Mult1
No
No
No
Yes
F10 F12
...
F30
Divide
28
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
4
8
11
12
14
16
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Add
Div
F6
F10
F8
F0
F2
F6
F2
F4
F6
F8
No
Yes
No
Yes
Yes
18
Add
FU
Qj
FU
Qk
Fj?
Rj
Fk?
Rk
No
No
Mult1
No
No
No
Yes
F10 F12
...
F30
Fj?
Rj
Fk?
Rk
No
No
Mult1
No
No
No
Yes
F10 F12
...
F30
Divide
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
19
11
14
16
Busy
Op
dest
Fi
S1
Fj
S2
Fk
Mult
F0
F2
F4
Add
Div
F6
F10
F8
F0
F2
F6
F2
F4
F6
F8
No
Yes
No
Yes
Yes
FU Mult1
4
8
12
Add
FU
Qj
FU
Qk
Divide
29
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
3
7
19
11
14
16
Busy
Op
dest
Fi
S1
Fj
S2
Fk
No
No
No
Yes
Yes
Add
Div
F6
F10
F8
F0
F2
F6
F2
F4
F6
F8
FU
20
4
8
20
12
Add
FU
Qj
FU
Qk
F10 F12
Fj?
Rj
Fk?
Rk
No
Yes
No
Yes
...
F30
Fj?
Rj
Fk?
Rk
No
Yes
No
Yes
...
F30
Divide
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
21
14
16
Busy
Op
dest
Fi
S1
Fj
S2
Fk
No
No
No
Yes
Yes
Add
Div
F6
F10
F8
F0
F2
F6
F2
F4
F6
F8
21
FU
3
7
19
11
4
8
20
12
Add
FU
Qj
FU
Qk
F10 F12
Divide
30
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
21
14
3
7
19
11
4
8
20
12
16
22
Busy
Op
dest
Fi
S1
Fj
S2
Fk
No
No
No
No
Yes
Div
F10
F0
F6
F2
F4
F6
F8
22
FU
FU
Qj
FU
Qk
F10 F12
Fj?
Rj
Fk?
Rk
No
No
...
F30
Divide
31
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
21
14
3
7
19
11
61
16
22
Busy
Op
dest
Fi
S1
Fj
S2
Fk
No
No
No
No
Yes
Div
F10
F0
F6
F2
F4
F6
F8
4
8
20
12
FU
61
FU
Qj
FU
Qk
F10 F12
Fj?
Rj
Fk?
Rk
No
No
...
F30
Fj?
Rj
Fk?
Rk
...
F30
Divide
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
21
14
3
7
19
11
61
16
4
8
20
12
62
22
Busy
Op
dest
Fi
S1
Fj
S2
Fk
F2
F4
F6
F8
FU
Qk
No
No
No
No
No
FU
Qj
F10 F12
FU
32
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
2
6
9
9
21
14
3
7
19
11
61
16
4
8
20
12
62
22
Busy
Op
dest
Fi
S1
Fj
S2
Fk
F2
F4
F6
F8
FU
Qk
Fj?
Rj
Fk?
Rk
...
F30
No
No
No
No
No
FU
Qj
F10 F12
FU
Scoreboarding Limitations
Number and type of functional units
Number of instruction buffer entries (scoreboard
size)
Amount of application ILP (RAW hazards)
Presence of antidependencies (WAR) and output
dependencies (WAW)
33
34