Académique Documents
Professionnel Documents
Culture Documents
PC Src
Add
4
Shift
left 2
RegWrite
Instruction [25 21]
PC
Read
address
Instruction
[31 0]
Instruction
memory
Read
register 1
Read
register 2
Read
data 1
MemWrite
ALUSrc
Read
Wr ite
data 2
register
Wr ite
Registers
data
RegDst
Instruction [15 0]
16
Sign
extend
AL U
Add result
1
M
u
x
0
1
M
u
x
0
Zer o
ALU ALU
result
MemtoReg
Address
Write
data
32
AL U
control
Read
data
Data
memory
1
M
u
x
0
MemRead
Instruction [5 0]
ALUOp
IF
ID
EX
Mem
WB
In Software
In Hardware
insert bubbles (i.e. stall the pipeline)
data forwarding
Instruction Fetch
Instruction Decode/
Register Fetch
Execute/
Address Calculation
Memory Access
Write Back
0
M
u
x
1
IF/ ID
ID/EX
EX/ MEM
MEM/WB
Add
Add
Add
resul t
Shif t
left 32
PC
Address
Instruction
memory
Read
register 1
Read
dat a 1
Read
register 2
Registers Read
Write
dat a 2
register
Write
dat a
16
Sign
ext end
32
Zero
0
M
u
x
1
ALU
ALU
resul t
Address
Data
memory
Write
dat a
Read
dat a
1
M
u
x
0
Data Hazards
DM
IM
Reg
DM
R2 Needed
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
sw $15, 100($2)
Reg
ALU
IM
ALU
or $13, $6, $2
CC3
ALU
CC2
ALU
CC1
CC4
CC5
CC6
CC7
CC8
Reg
Reg
Reg
Reg
DM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
IM
ALU
or $13, $6, $8
CC3
ALU
CC2
ALU
CC1
CC4
CC5
Reg
CC6
CC7
CC8
R2 Available
Reg
Reg
DM
Reg
In Software
In Hardware
insert bubbles (i.e. stall the pipeline)
data forwarding
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
IM
ALU
nop
CC3
ALU
nop
CC2
ALU
CC1
CC4
CC5
CC6
CC7
CC8
Reg
Reg
Reg
DM
Reg
CC2
Reg
IM
or $13, $6, $2
CC3
Bubble
CC4
CC5
DM
Reg
Bubble
Reg
IM
CC6
CC7
DM
Reg
IM
CC8
Reg
DM
Reg
Reg
DM
CC1
CC2
Reg
IM
or $13, $6, $2
CC3
CC4
DM
Reg
IM
CC5
CC7
CC8
CC9
CC10
CC11
Reg
DM
Bubble
CC6
Reg
Reg
IM
DM
Reg
IM
Bubble
Reg
DM
Reg
Bubble
Reg
DM
Pipeline Stalls
The Pipeline
Insert nops
set all control signals propagating to EX/MEM/WB
to zero
The Pipeline
or $5, $3, $2
Reg
DM
IM
Reg
ALU
IM
ALU
ID/EX
DM
EX/MEM
ALU
Registers
Reg
Reg
MEM/WB
Data
Memory
We could avoid stalling if we could get the ALU output from add to ALU input for or .
EX Hazard:
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10
(similar for the MEM stage)
Data Forwarding
CC2
CC3
Reg
ALU
DM
IM
Reg
ALU
DM
IM
Reg
ALU
DM
IM
Reg
ALU
DM
IM
Reg
ALU
CC1
or $13, $6, $2
sw $15, 100($2)
CC4
CC5
CC6
CC7
CC8
Reg
Reg
Reg
Reg
DM
CC2
CC3
lw $2, 10($1) IM
Reg
ALU
DM
IM
Reg
ALU
DM
IM
Reg
ALU
DM
IM
Reg
ALU
DM
IM
Reg
ALU
CC1
or $13, $6, $2
sw $15, 100($2)
CC4
CC5
CC6
CC7
CC8
Reg
Reg
Reg
Reg
DM
IF
CC1
ID
CC2
Exe
MEM
Reg
CC3
CC4
WB
CC5
CC6
CC7
CC8
lw $2, 10($1) IM
Reg
ALU
DM
IM
Reg
Bubble
ALU
DM
IM
Bubble
Reg
ALU
DM
IM
Reg
ALU
DM
IM
Reg
ALU
or $13, $6, $2
sw $15, 100($2)
Reg
Reg
Control hazards
Dependences
control dependence
Branch Hazards
Instruction Fetch
Instruction Decode
0
M
u
x
1
IF/ ID
Execute/
Address Calculation
ID/EX
Memory Access
EX/ MEM
Write Back
MEM/WB
Add
Add
Add
resul t
Shif t
left 32
PC
Address
Instruction
memory
Read
register 1
Read
dat a 1
Read
register 2
Registers Read
Write
dat a 2
register
Write
dat a
16
Sign
ext end
Zero
0
M
u
x
1
ALU
ALU
resul t
Address
Data
memory
Write
dat a
Read
dat a
1
M
u
x
0
32
Branch Hazards
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
here: lw ...
Reg
ALU
lw ...
IM
ALU
sub ...
CC3
ALU
add ...
CC2
ALU
CC1
These instructions
should not be executed!
CC4
CC5
CC6
CC7
CC8
Reg
Reg
Reg
Reg
DM
Hardware solutions
Software/hardware solution
delayed branch: Always execute instruction after
branch.
compiler puts something useful (or a no-op) there
CC1
beq $4, $0, there IM
or ...
add ...
sw ...
CC2
CC3
Reg
Bubble
CC4
DM
Bubble
Bubble
CC5
CC6
CC7
CC8
Reg
IM
Reg
IM
DM
Reg
IM
Reg
DM
Reg
IM
Reg
DM
Reg
or ...
add ...
sw ...
CC1
CC2
IM
Reg
IM
CC3
CC4
DM
Reg
IM
CC5
CC7
Reg
DM
Reg
IM
CC6
Reg
DM
Reg
IM
Reg
DM
Reg
CC8
or ...
add ...
CC1
CC2
IM
Reg
IM
CC3
CC4
DM
Reg
IM
CC5
CC6
CC7
CC8
Reg
Flush
Reg
Flush
IM
Flush
IM
Reg
or ...
add ...
sw ...
CC1
CC2
IM
Reg
Bubble
CC3
CC4
DM
IM
CC5
CC7
CC8
Reg
Reg
IM
CC6
DM
Reg
IM
Reg
DM
Reg
IM
Reg
DM
Reg
add ...
sw ...
CC1
CC2
IM
Reg
IM
CC3
CC4
DM
Reg
IM
CC5
CC7
Reg
DM
Reg
IM
CC6
Reg
DM
Reg
IM
Reg
DM
Reg
CC8
Pentium 4
20 cycle branch misprediction penalty!
a. From before
b. From target
Delay slot
if $s1 = 0 then
Delay slot
Becomes
Becomes
Becomes
add $s1, $s2, $s3
if $s1 = 0 then
if $s2 = 0 then
add $s1, $s2, $s3
Branch Prediction
Branch Prediction
for (i=0;i<10;i++) {
...
...
}
...
...
add $i, $i, #1
beq $i, #10, loop
Problems?
Indirect branches?