Vous êtes sur la page 1sur 50

CS M151B / EE M116C

Computer Systems Architecture

Data and Control Hazards

Instructor: Prof. Lei He


<LHE@ee.ucla.edu>

Some notes adopted from Glenn Reinman

Review -- Single Cycle CPU

Single Cycle Datapath Partitioning

PC Src

Add
4
Shift
left 2

RegWrite
Instruction [25 21]
PC

Read
address
Instruction
[31 0]
Instruction
memory

Instruction [20 16]


1
M
u
Instruction [15 11] x
0

Read
register 1
Read
register 2

Read
data 1

MemWrite
ALUSrc

Read
Wr ite
data 2
register
Wr ite
Registers
data

RegDst
Instruction [15 0]

16

Sign
extend

AL U
Add result

1
M
u
x
0

1
M
u
x
0

Zer o
ALU ALU
result

MemtoReg
Address

Write
data

32
AL U
control

Read
data

Data
memory

1
M
u
x
0

MemRead

Instruction [5 0]
ALUOp

IF

ID

EX

Mem

WB

Goal is to balance work done in each cycle - minimize cycle time!

Review: Dealing with Data Hazards

In Software

insert independent instructions (or no-ops)

In Hardware
insert bubbles (i.e. stall the pipeline)
data forwarding

Review: Pipeline with Control Logic

Pipelined Implementation Datapath

Instruction Fetch

Instruction Decode/
Register Fetch

Execute/
Address Calculation

Memory Access

Write Back

0
M
u
x
1

IF/ ID

ID/EX

EX/ MEM

MEM/WB

Add

Add

Add
resul t

Shif t
left 32
PC

Address
Instruction
memory

Read
register 1

Read
dat a 1
Read
register 2
Registers Read
Write
dat a 2
register
Write
dat a

16

Sign
ext end

32

Zero

0
M
u
x
1

ALU

ALU
resul t

Address
Data
memory
Write
dat a

Read
dat a

1
M
u
x
0

Data Hazards

When a result is needed in the pipeline before it


is available, a data hazard occurs.
R2 Available

DM

IM

Reg

DM

R2 Needed

IM

Reg

DM

IM

Reg

DM

IM

Reg

ALU

sw $15, 100($2)

Reg

ALU

add $14, $2, $2

IM

ALU

or $13, $6, $2

CC3

ALU

and $12, $2, $5

CC2

ALU

sub $2, $1, $3

CC1

CC4

CC5

CC6

CC7

CC8

Reg

Reg

Reg

Reg

DM

Dealing With Data Hazards

Register file bypass eliminates one hazard.


First half-cycle of cycle 5: register 2 written
Second half-cycle: new value is read

Reg

DM

IM

Reg

DM

IM

Reg

DM

IM

Reg

ALU

add $14, $2, $2

IM

ALU

or $13, $6, $8

CC3

ALU

and $12, $6, $5

CC2

ALU

sub $2, $1, $3

CC1

CC4

CC5
Reg

CC6

CC7

CC8

R2 Available

Reg

Reg

DM

Reg

Dealing with Data Hazards

In Software

insert independent instructions (or no-ops)

In Hardware
insert bubbles (i.e. stall the pipeline)
data forwarding

Dealing with Data Hazards in Software

Reg

DM

IM

Reg

DM

IM

Reg

DM

IM

Reg

ALU

add $12, $2, $5

IM

ALU

nop

CC3

ALU

nop

CC2

ALU

sub $2, $1, $3

CC1

CC4

CC5

CC6

CC7

CC8

Reg

Reg

Reg

DM

Insert enough no-ops (or other instructions that don t


use register 2) so that data hazard doesn t occur,

Reg

Where are No-ops needed?

sub $2, $1,$3


and $4, $2,$5
or $8, $2,$6
add $9, $4,$2
slt $1, $6,$7

Are no-ops really necessary?

Handling Data Hazards in Hardware

Stall the pipeline


CC1

CC2

sub $2, $1, $3 IM

Reg

add $12, $2, $5

IM

or $13, $6, $2

add $14, $2, $2

CC3

Bubble

CC4

CC5

DM

Reg

Bubble

Reg

IM

CC6

CC7

DM

Reg

IM

CC8

Reg

DM

Reg

Reg

DM

Handling Data Hazards in Hardware

CC1

CC2

sub $2, $1, $3 IM

Reg

add $12, $3, $5

IM

or $13, $6, $2

add $14, $12, $2

sw $14, 100 ($2)

CC3

CC4
DM

Reg

IM

CC5

CC7

CC8

CC9

CC10

CC11

Reg

DM

Bubble

CC6

Reg

Reg

IM

DM

Reg

IM

Bubble

Reg

DM

Reg

Bubble

Reg

DM

Pipeline Stalls

To insure proper pipeline execution in light of


register dependences, we must:
Detect the hazard
Stall the pipeline
prevent the IF and ID stages from making progress
the ID stage because we can t go on until the dependent
instruction completes correctly
the IF stage because we do not want to lose any instructions.

The Pipeline

What comparisons tell us when to stall?

Stalling the Pipeline

Prevent the IF and ID stages from proceeding


don t write the PC (PCWrite = 0)
don t rewrite IF/ID register (IF/IDWrite = 0)

Insert nops
set all control signals propagating to EX/MEM/WB
to zero

The Pipeline

Reducing Data Hazards Through Forwarding

or $5, $3, $2

Reg

DM

IM

Reg

ALU

IM

ALU

add $2, $3, $4

ID/EX

DM

EX/MEM

ALU

Registers

Reg

Reg

MEM/WB

Data
Memory

We could avoid stalling if we could get the ALU output from add to ALU input for or .

Reducing Data Hazards Through Forwarding

EX Hazard:
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10
(similar for the MEM stage)

Data Forwarding

Forwarding (just shown) handles two types of


data hazards
EX hazard
MEM hazard

We ve already handled the third type (WB)


hazard by using a transparent reg file
if the register file is asked to read and write the
same register in the same cycle, the reg file allows
the write data to be forwarded to the output.

Eliminating Data Hazards via Forwarding

CC2

CC3

sub $2, $1, $3 IM

Reg

ALU

DM

and $6, $2, $5

IM

Reg

ALU

DM

IM

Reg

ALU

DM

IM

Reg

ALU

DM

IM

Reg

ALU

CC1

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

CC4

CC5

CC6

CC7

CC8

Reg

Reg

Reg

Reg

DM

Does Forwarding Eliminate All Hazards?

CC2

CC3

lw $2, 10($1) IM

Reg

ALU

DM

and $12, $2, $5

IM

Reg

ALU

DM

IM

Reg

ALU

DM

IM

Reg

ALU

DM

IM

Reg

ALU

CC1

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

CC4

CC5

CC6

CC7

CC8

Reg

Reg

Reg

Reg

DM

You may need to stall after loads

IF

CC1

ID

CC2

Exe

MEM

Reg

CC3

CC4

WB

CC5

CC6

CC7

CC8

lw $2, 10($1) IM

Reg

ALU

DM

and $12, $2, $5

IM

Reg

Bubble

ALU

DM

IM

Bubble

Reg

ALU

DM

IM

Reg

ALU

DM

IM

Reg

ALU

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Reg

Reg

Try this one...

Show stalls and forwarding for this code


add $3, $2, $1
lw $4, 100($3)
and $6, $4, $3
sub $7, $6, $2

Data Hazard Key Points

Pipelining provides high throughput, but does


not handle data dependences easily.
Data dependences cause data hazards.
Data hazards can be solved by:
software (no-ops)
hardware stalling
hardware forwarding

Our processor, and indeed all modern


processors, use a combination of forwarding
and stalling.

Control hazards

Dependences

Data dependence: one instruction is


dependent on another instruction to provide its
operands.
Control dependence (aka branch
dependences): one instructions determines
whether another gets executed or not.
particularly critical add
with$5,
conditional
branches.
$3, $2
data dependences

control dependence

sub $6, $5, $2


beq $6, $7, somewhere
and $9, $3, $1

Branch Hazards

Branch dependences can result in branch


hazards (aka control hazards) when they are
too close to be handled correctly in the pipeline.

When are branches resolved?

Instruction Fetch

Instruction Decode

0
M
u
x
1

IF/ ID

Execute/
Address Calculation

ID/EX

Memory Access

EX/ MEM

Write Back

MEM/WB

Add

Add

Add
resul t

Shif t
left 32
PC

Address
Instruction
memory

Read
register 1

Read
dat a 1
Read
register 2
Registers Read
Write
dat a 2
register
Write
dat a

16

Sign
ext end

Zero

0
M
u
x
1

ALU

ALU
resul t

Address
Data
memory
Write
dat a

Read
dat a

1
M
u
x
0

32

Branch target address is put in PC during Mem stage.


Correct instruction is fetched during branch s WB stage.

Branch Hazards

DM

IM

Reg

DM

IM

Reg

DM

IM

Reg

DM

IM

Reg

ALU

here: lw ...

Reg

ALU

lw ...

IM

ALU

sub ...

CC3

ALU

add ...

CC2

ALU

beq $2, $1, here

CC1

These instructions
should not be executed!

CC4

CC5

CC6

CC7

CC8

Reg

the correct instruction

Reg

Reg

Reg

DM

Dealing With Branch Hazards

Hardware solutions

stall until you know which direction branch goes


guess which direction, start executing chosen path
(but be prepared to undo any mistakes!)
static branch prediction: base guess on instruction type
dynamic branch prediction: base guess on execution
history

reduce the branch delay

Software/hardware solution
delayed branch: Always execute instruction after
branch.
compiler puts something useful (or a no-op) there

Stalling for Branch Hazards

CC1
beq $4, $0, there IM

and $12, $2, $5

or ...

add ...

sw ...

CC2

CC3

Reg

Bubble

CC4
DM

Bubble

Bubble

CC5

CC6

CC7

CC8

Reg

IM

Reg

IM

DM

Reg

IM

Reg

DM

Reg

IM

Reg

DM

Reg

Stalling for Branch Hazards

All branches waste 3 cycles.

Seems wasteful, particularly when the branch isn t


taken.

It s better to guess branch direction


Easiest guess is branch is not taken

Assume Branch Not Taken

works pretty well when the prediction is right


no wasted cycles
beq $4, $0, there

and $12, $2, $5

or ...

add ...

sw ...

CC1

CC2

IM

Reg

IM

CC3

CC4
DM

Reg

IM

CC5

CC7

Reg

DM

Reg

IM

CC6

Reg

DM

Reg

IM

Reg

DM

Reg

CC8

Assume Branch Not Taken

same performance as stalling when you re


wrong
beq $4, $0, there

and $12, $2, $5

or ...

add ...

there: sub $12, $4, $2

CC1

CC2

IM

Reg

IM

CC3

CC4
DM

Reg

IM

CC5

CC6

CC7

CC8

Reg

none of these instructions


have changed memory or
registers.

Flush

Reg

Flush

IM

Flush

IM

Reg

Some other static strategies

Assume backwards branch is always taken,


forward branch never is
backwards = negative displacement field
loops (which branch backwards) are usually
executed multiple times.
if-then-else often takes the then (no branch)
clause.

Compiler makes educated guess


sets predict taken/not taken bit in instruction

Reducing the Branch Delay

it s easy to reduce stall to 2-cycles

Reducing the Branch Delay

it s easy to reduce stall to 2-cycles

One-Cycle Branch Misprediction Penalty


Target computation & equality check in ID
This figure also shows flushing hardware

Branch Hazard Stalls with ID Stage Branching

beq $4, $0, there

and $12, $2, $5

or ...

add ...

sw ...

CC1

CC2

IM

Reg

Bubble

CC3

CC4
DM

IM

CC5

CC7

CC8

Reg

Reg

IM

CC6

DM

Reg

IM

Reg

DM

Reg

IM

Reg

DM

Reg

Eliminating the Branch Stall

There s no rule that says we have to branch


immediately. We could wait an extra instruction
before branching.
The original SPARC and MIPS processors
used a branch delay slot to eliminate singlecycle stalls after branches.
The instruction after a conditional branch is
always executed in those machines, whether
the branch is taken or not!

Branch Delay Slot

beq $4, $0, there

and $12, $2, $5

there: xor ...

add ...

sw ...

CC1

CC2

IM

Reg

IM

CC3

CC4
DM

Reg

IM

CC5

CC7

Reg

DM

Reg

IM

CC6

Reg

DM

Reg

IM

Reg

DM

Reg

Branch delay slot instruction (next instruction after a branch) is


executed even if the branch is taken.

CC8

Filling the branch delay slot

The branch delay slot is only useful if you can find


something to put there.
Need earlier instruction that doesn t affect the branch

If you can t find anything, you must put a nop to


ensure correctness.
Worked well for early RISC machines
Doesn t help recent processors much
E.g. MIPS R10000, has a 5-cycle branch penalty, and
executes 4 instructions per cycle.

Pentium 4
20 cycle branch misprediction penalty!

Filling the Branch Delay Slot

a. From before

add $s1, $s2, $s3


if $s2 = 0 then
Delay slot

b. From target

sub $t4, $t5, $t6

c. From fall through

add $s1, $s2, $s3


if $s1 = 0 then

add $s1, $s2, $s3

Delay slot

if $s1 = 0 then
Delay slot

Becomes

Becomes

sub $t4, $t5, $t6

Becomes
add $s1, $s2, $s3
if $s1 = 0 then

if $s2 = 0 then
add $s1, $s2, $s3

add $s1, $s2, $s3


if $s1 = 0 then
sub $t4, $t5, $t6

sub $t4, $t5, $t6

Filling the Branch Delay Slot

add $5, $3, $7


sub $6, $1, $4
and $7, $8, $2
beq $6, $7, there
nop /* branch delay slot */
add $9, $1, $2
sub $2, $9, $5
...
there:
mult $2, $10, $11

Branch Prediction

Static branch prediction isn t good enough


when mispredicted branches waste 10 or 20
instructions
Dynamic branch prediction keeps a brief history
of what happened at each branch

Branch Prediction

Branch history table


program counter
1
1
0
1
1
0

for (i=0;i<10;i++) {
...
...
}
...
...
add $i, $i, #1
beq $i, #10, loop

Two-bit predictors are even better

this state means, the last


two branches at this
location were taken.

This one means, the


last two branches at this
location were not taken.

Problems?

We know the branch direction


what about the address?
Branch Target Buffer (BTB)

Procedure calls and returns?


Return Address Stack (RAS)

Indirect branches?

Control Hazards -- Key Points

Control (or branch) hazards arise because we


must fetch the next instruction before we know
if we are branching or where we are branching
Control hazards are detected in hardware
We can reduce the impact of branch hazards
through:
early detection of branch address and condition
branch delay slots
branch prediction static or dynamic

Vous aimerez peut-être aussi