Vous êtes sur la page 1sur 30

Lecture11

DynamicBranchPrediction
D
DynamicScheduling
i S h d li
(TomasulosApproach)

DynamicBranchPrediction(contd)

EE/CS520 Comp.Archi.

10/4/2012

2bit/(0,2)LocalPredictorwith4KBranches

Accuracyvariesfrom99%to82%,particularlylowerforintegerprograms
3

EE/CS520 Comp.Archi.

10/4/2012

2bit/(0,2)LocalPredictorwithUnlimitedBranches
/( , )

EE/CS520 Comp.Archi.

10/4/2012

TwoLevelGlobalPredictor
A(m,n)predictorusesthebehavioroflastm

branchestochoosefrom2m branchpredictors,
eachoneannbit branchpredictor
A(1,2)predictorusesthebehavioroflast1 branch

to choose among two 2


tochooseamongtwo
2bit
bit branchpredictors
branch predictors
LastbranchNT(0)
L b
LastbranchT(1)
h T (1)

EE/CS520 Comp.Archi.

2x2bitpredictors
forcurrentbranch

10/4/2012

2bitpredictorvs.GlobalPredictor

EE/CS520 Comp.Archi.

10/4/2012

StillNotGoodEnough!!!!
Nopredictorisclearlythebest
Differentbranchesexhibitdifferentbehaviors
Someconstant,someglobal,somelocal

Idea:
LetscombineLocalandGlobalbehaviors
Designapredictorthatpredictswhichpredictor
g p
p
p

(local/global)willpredictbetter

EE/CS520 Comp.Archi.

10/4/2012

TournamentPredictor
Usestwopredictors
P1=LocalPredictor

P2=GlobalPredictor

Usesa2bitsaturationcountertoselectbetweenthetwo
P2

P2

T
TransitiononP2correct
ii
P2
TransitiononP1correct

P1

EE/CS520 Comp.Archi.

P1

10/4/2012

TournamentPredictor
ConditionBranchMispredictionRatesforSPEC89Benchmarks

TotalPredictorSize
9

EE/CS520 Comp.Archi.

10/4/2012

TwoLevelPredictor(revisited)
2LevelGlobalPredictor:
A(m,n)globalpredictorusesthebehavioroflastm

other branches tochoosefrom2m branchpredictors,


each one an nbit
eachoneann
bit branchpredictor
branch predictor
SpatialCorrelation
2LevelLocalPredictor:
A(m,n)localpredictorusesthebehavioroflastm

occurrencesofthesamebranchtochoosefrom2m
branchpredictors,eachoneannbit branchpredictor
TemporalCorrelation
Temporal Correlation
10

EE/CS520 Comp.Archi.

10/4/2012

TournamentPredictor:Alpha21264
4K2bitcounterstochoosefromamongaglobalpredictor

andalocalpredictor
Localpredictor
L l
di
consistsofa2levelpredictor:
i
f 2l l
di
Toplevel: alocalhistorytableconsistingof1K10bit entries;each

10bitentrycorrespondstothemostrecent10branchoutcomesfor
thebranch.
Nextlevel: Selectedentryfromthelocalhistorytableisusedtoindex
atableof1K entriesconsistinga3bit saturatingcountersthat
providethelocalprediction
id h l l
di i

Globalpredictor:
4Kentries,indexedbytheglobalhistoryofthelast12branches;
y
g
y
eachentryintheglobalpredictorisastandard2bitpredictor

TotalSize:4K
Total Size: 4K*2
2+1K
+ 1K*10
10+1K
+ 1K*3
3+4K
+ 4K*2
2=29Kbits
= 29Kbits
11

EE/CS520 Comp.Archi.

10/4/2012

DynamicScheduling

12

EE/CS520 Comp.Archi.

10/4/2012

DynamicScheduling
Rearrangestheinstexecutiontoreducestalls
Allowsoutoforder executionofinsts

Advantages:
Handlesdependencesthatareunknownatcompiletime
H dl d
d
h
k
il i
Simplifiescompiler(Nostaticschedulingrequired)
Allows
Allowsprocessortohandleunpredictabledelays
processor to handle unpredictable delays
e.g.cachemisses
Executesothercodewhilewaitingforthecachemisstoresolve

Cost:
Significantincreaseinhardwarecomplexity
Exceptionsmaybecomeimprecise
Exceptions may become imprecise
13

EE/CS520 Comp.Archi.

10/4/2012

QuickReviewofDependences
RAW Hazard(DataDependence,True)
ADD.DF3,F1,F2
SUB.DF5,F6,F3

WAW Hazard(NameDependence,False)
H
d (N
D
d
F l )
DIV.DF3,F1,F2
SUB.DF3,F6,F5
SUB D F3 F6 F5

WAR Hazard(NameDependence,False)
DIV.DF3,F1,F2
SUB.DF5,F6,F3
ADD.DF3,F6,F7

14

EE/CS520 Comp.Archi.

10/4/2012

FalseDependences:Solution
Type

1.
2.
3.
4.
5.

DIV.D
ADD.D
S.D
SUB.D
MUL.D

F0,F2,F4
F6,F0,F8
F6,0(R1)
F8,F10,F14
F6,F10,F8

Between

Reg/FU

RAW

1,2

F0

RAW

2,3

F6

WAR

2,4

F8

Struct

2,4

ADDER

WAW

2,5

F6

WAR

35
3,5

F6

RAW

4,5

F8

BothWAW andWAR canberemovedby


renamingtheregisterseitherbycompiler
(statically) and or processor (dynamically)
(statically)andorprocessor(dynamically)
15

EE/CS520 Comp.Archi.

10/4/2012

StaticRegisterRenaming:Compiler
1.
2.
3.
4.
5.

DIV.D
ADD.D
S.D
SUB.D
MUL.D

F0,F2,F4
F6,F0,F8
F6,0(R1)
F8,F10,F14
F6,F10,F8
1.
2.
3.
4.
5.

16

EE/CS520 Comp.Archi.

DIV.D
ADD.D
S.D
SUB.D
MUL.D

F0,F2,F4
F6,F0,F8
F6,0(R1)
F12,F10,F14
F16,F10,F12
10/4/2012

Scoreboard
Instruction Status
Read
Issue
p
Operand

Instruction

FU no. Name

R0
17

R1

R2

Busy

R3

EE/CS520 Comp.Archi.

Execution
p
Complete

Write
Result

Comments

Functional Unit Status


Op Dest (Fi)Src1 (Fj)Src2 (Fk)FU1 (Qj) FU2 (Qk) Src1 rdy RjSrc2 rdy Rk

R4

R5

Register Status
R6 R7 R8 R9

R10 R11 R12 R13 R14 R15


10/4/2012

ScoreboardImplications
Outofordercompletion=>WAR,WAWhazards?
SolutionsforWAR:
StallinstinWRuntilregistershavebeenread

SolutionforWAW:
StallinstinISstageuntilpreviousinstcompletes

Greatlyreducesthepipelineefficiency
WARandWAWarefalsedependences
WAR d WAW
f l d
d
Canberemovedbyregisterrenaming

18

EE/CS520 Comp.Archi.

10/4/2012

TomasulosApproach

19

EE/CS520 Comp.Archi.

10/4/2012

TomasulosApproach
Dynamicregisterrenaming
Usesomearchitectureinvisibleregistersforrenaming
CalledrenameregisterstoavoidWAW

ReadandkeepacopyofavailableoperandsatISstage
R d dk
f
il bl
d
IS
AvoidsWAR
Valuesarestoredinreservationstations
Values are stored in reservation stations

20

EE/CS520 Comp.Archi.

10/4/2012

TomasulosApproach
UsedinIBM360/91(inthe60s)
Trackswhenoperandsareavailable
Satisfiesdatadependences(RAW)

Removesnamedependences(WAR,WAW)
Usesregisterrenaming

Verysimilartowhatisusedtoday
Very similar to what is used today
Almostallmodernhighperformanceprocessorsusea

derivativeofTomasulosHW
Muchoftheterminologysurvivestoday.

21

EE/CS520 Comp.Archi.

10/4/2012

MIPSFPUnitusingTomasulosAlgo
IS

22

EX

WR

EE/CS520 Comp.Archi.

10/4/2012

IssueStage(ArithmeticInsts)
Whattodo?
GetnextinstfromIQ
Struct.Hazard:
Findfreereservationstation
RAWHazard:
ReadoperandsfromRF
Record source of other operands
Recordsourceofotheroperands
Updatesourcemapping(RAT)

InstructionQueue
3.

F1=F2+F3

2.

F4=F1 F2

1.

F1=F2/F3

RegFile
F1

3.14

F2

1.00

F3

2.718

F4

0.707

RAT
F1

0
C1 (4)
C1(4)
A3(3)
A3
(3)

A1 (1)
A1(1)

F2 F4 F1
F2=F4+F1

F2

A1(1)

A2(2)

F4=F1F2 C1(4) A1(1)

F3

F4

0
A2(2)

A3
(3)
A3(3)

F1
F1=F2+F3
= F2+F3 A1
A1(1)
(1) 2
2.718
718

0 707
0.707

FPAdder

23

EE/CS520 Comp.Archi.

3 14
3.14

C1 (4)
C1(4)

F1 F2/F3 A1(1)
F1=F2/F3
A1 (1) 2.718
2 718

C2(5)

FPDivider

10/4/2012

ExecuteStage(ArithmeticInsts)
InstructionQueue
3.

F1=F2+F3

2.

F4=F1 F2

1.

F1=F2/F3

Whattodo?
WaitforOperandstobeReady
p
y
CompeteforFUs
ExecutetheoperationonFUs

F2=F4+F1
A1(1)=3.84
C1(4)

A1(1)
A2(2)

F4=F1F2

3.84
C1(4) A1(1)

A3(3)

F1=F2+F3

3.84
A1(1) 2.718

Adder

24

EE/CS520 Comp.Archi.

F1=F2/F3

2.718 A1(1)
3.84

C2(5)

FPDivider

10/4/2012

WriteResult(ArithmeticInsts)

IInstSequence
tS
0. F2=F4+F1
1. F1=F2/F3
2. F4=F1 F2
3. F1=F2+F3
RegFile
F1

3.14
6.558

F2

1.0
3.84

F3

2.718

F4

0.707
RAT

25

F1

0
3

F2

1
0

F3

F4

Whattodo?
BroadcastresultonCDB
W i b k RF
WritebacktoRF
UpdateMapping
Freereservationstation

Onlyupdatemapping
((andRF)ifRATstill
)
containsyourmapping!

A1(1)

F2=F4+F1 0.7071 3.14

C1(4)

A2(2)

F4=F1F2

(4)
0.707

3.84
(1)

C2(5)

A3(3)

F1=F2+F3

3.84
(1)

2.718

F1=F2/F3

2.718

3.84
(1)

EE/CS520 Comp.Archi.

Adder

(1)0.707+

FPDivider

WAWavoided
10/4/2012

DataStructure
Qj,Qk:Rsrv.StationforSrc1andSrc2
Vj,Vk:Valuesoftwooperands.ValuesarevalidifQjandQkarezero
Busy:Rsrv.StationandFUarebusy
Busy: Rsrv Station and FU are busy
Opcode:Operationtobeperformed
A:MemoryAddressesforLD/SD(initiallyImmValue)
Qi:ContainedinRegfile,indicatesthesrcRsrv.Station(like
Q
g
,
(

scoreboard)

ReservationStations
Is Ex W

26

Busy Op
LD1
LD2
AD1
AD2
AD3
ML1
ML2
SD1
SD2

RegisterStatus:Qi
EE/CS520 Comp.Archi.

F0

F2

Vj

F4

Vk

F6

F8

Qj

Qk

F10 F12

10/4/2012

MIPSFPUnitusingTomasulosAlgo
IS

27

EX

WR

EE/CS520 Comp.Archi.

10/4/2012

HowdoesL.Dgo!
Issue:(1CC)
Checkforstructuralhazardonloadbuffers
PutimmvalueinAfieldofloadbuffer

Ex:(12CC)
E (1 2 CC)
Calculateeffectiveaddress A=imm+Rbase
Update
Update A
A fieldinload
field in loadbuffer
buffer
IfmemoryunitfreefetchdataandputitonCDB

WR:(1CC)
(
)
DataisavailableonCDB
Loaditinregister

28

CS520 Comp.Archi.

10/4/2012

WhathappensinSD!
Issue:(1CC)
Checkforstructuralhazardonstorebuffers
PutimmvalueinAfieldofstorebuffer

Ex:(12CC)
E (1 2 CC)
Calculateeffectiveaddress A=imm+Rbase
Update
Update A
A fieldinstorebuffer
field in store buffer
Waitforsrcoperandtobeready(ifRAWhazard)
Putitinstorebufferaswell

WR:(1CC)
Ifmemoryunitfree,storedataintomem

29

CS520 Comp.Archi.

10/4/2012

Quiz3:Tuesday,09102012
Lecture911
Lecture 9 11
Topics
Topicscovered:2.1,2.2,2.3,2.4(4
covered: 2 1 2 2 2 3 2 4 (4th Ed.)
Ed )
Therewillbeproblemsandobjectivetypequestions
There will be problems and objective type questions

(TFand/orMCQs)

30

EE/CS520 Comp.Archi.

10/4/2012

Vous aimerez peut-être aussi