Académique Documents
Professionnel Documents
Culture Documents
DynamicBranchPrediction
D
DynamicScheduling
i S h d li
(TomasulosApproach)
DynamicBranchPrediction(contd)
EE/CS520 Comp.Archi.
10/4/2012
2bit/(0,2)LocalPredictorwith4KBranches
Accuracyvariesfrom99%to82%,particularlylowerforintegerprograms
3
EE/CS520 Comp.Archi.
10/4/2012
2bit/(0,2)LocalPredictorwithUnlimitedBranches
/( , )
EE/CS520 Comp.Archi.
10/4/2012
TwoLevelGlobalPredictor
A(m,n)predictorusesthebehavioroflastm
branchestochoosefrom2m branchpredictors,
eachoneannbit branchpredictor
A(1,2)predictorusesthebehavioroflast1 branch
EE/CS520 Comp.Archi.
2x2bitpredictors
forcurrentbranch
10/4/2012
2bitpredictorvs.GlobalPredictor
EE/CS520 Comp.Archi.
10/4/2012
StillNotGoodEnough!!!!
Nopredictorisclearlythebest
Differentbranchesexhibitdifferentbehaviors
Someconstant,someglobal,somelocal
Idea:
LetscombineLocalandGlobalbehaviors
Designapredictorthatpredictswhichpredictor
g p
p
p
(local/global)willpredictbetter
EE/CS520 Comp.Archi.
10/4/2012
TournamentPredictor
Usestwopredictors
P1=LocalPredictor
P2=GlobalPredictor
Usesa2bitsaturationcountertoselectbetweenthetwo
P2
P2
T
TransitiononP2correct
ii
P2
TransitiononP1correct
P1
EE/CS520 Comp.Archi.
P1
10/4/2012
TournamentPredictor
ConditionBranchMispredictionRatesforSPEC89Benchmarks
TotalPredictorSize
9
EE/CS520 Comp.Archi.
10/4/2012
TwoLevelPredictor(revisited)
2LevelGlobalPredictor:
A(m,n)globalpredictorusesthebehavioroflastm
occurrencesofthesamebranchtochoosefrom2m
branchpredictors,eachoneannbit branchpredictor
TemporalCorrelation
Temporal Correlation
10
EE/CS520 Comp.Archi.
10/4/2012
TournamentPredictor:Alpha21264
4K2bitcounterstochoosefromamongaglobalpredictor
andalocalpredictor
Localpredictor
L l
di
consistsofa2levelpredictor:
i
f 2l l
di
Toplevel: alocalhistorytableconsistingof1K10bit entries;each
10bitentrycorrespondstothemostrecent10branchoutcomesfor
thebranch.
Nextlevel: Selectedentryfromthelocalhistorytableisusedtoindex
atableof1K entriesconsistinga3bit saturatingcountersthat
providethelocalprediction
id h l l
di i
Globalpredictor:
4Kentries,indexedbytheglobalhistoryofthelast12branches;
y
g
y
eachentryintheglobalpredictorisastandard2bitpredictor
TotalSize:4K
Total Size: 4K*2
2+1K
+ 1K*10
10+1K
+ 1K*3
3+4K
+ 4K*2
2=29Kbits
= 29Kbits
11
EE/CS520 Comp.Archi.
10/4/2012
DynamicScheduling
12
EE/CS520 Comp.Archi.
10/4/2012
DynamicScheduling
Rearrangestheinstexecutiontoreducestalls
Allowsoutoforder executionofinsts
Advantages:
Handlesdependencesthatareunknownatcompiletime
H dl d
d
h
k
il i
Simplifiescompiler(Nostaticschedulingrequired)
Allows
Allowsprocessortohandleunpredictabledelays
processor to handle unpredictable delays
e.g.cachemisses
Executesothercodewhilewaitingforthecachemisstoresolve
Cost:
Significantincreaseinhardwarecomplexity
Exceptionsmaybecomeimprecise
Exceptions may become imprecise
13
EE/CS520 Comp.Archi.
10/4/2012
QuickReviewofDependences
RAW Hazard(DataDependence,True)
ADD.DF3,F1,F2
SUB.DF5,F6,F3
WAW Hazard(NameDependence,False)
H
d (N
D
d
F l )
DIV.DF3,F1,F2
SUB.DF3,F6,F5
SUB D F3 F6 F5
WAR Hazard(NameDependence,False)
DIV.DF3,F1,F2
SUB.DF5,F6,F3
ADD.DF3,F6,F7
14
EE/CS520 Comp.Archi.
10/4/2012
FalseDependences:Solution
Type
1.
2.
3.
4.
5.
DIV.D
ADD.D
S.D
SUB.D
MUL.D
F0,F2,F4
F6,F0,F8
F6,0(R1)
F8,F10,F14
F6,F10,F8
Between
Reg/FU
RAW
1,2
F0
RAW
2,3
F6
WAR
2,4
F8
Struct
2,4
ADDER
WAW
2,5
F6
WAR
35
3,5
F6
RAW
4,5
F8
EE/CS520 Comp.Archi.
10/4/2012
StaticRegisterRenaming:Compiler
1.
2.
3.
4.
5.
DIV.D
ADD.D
S.D
SUB.D
MUL.D
F0,F2,F4
F6,F0,F8
F6,0(R1)
F8,F10,F14
F6,F10,F8
1.
2.
3.
4.
5.
16
EE/CS520 Comp.Archi.
DIV.D
ADD.D
S.D
SUB.D
MUL.D
F0,F2,F4
F6,F0,F8
F6,0(R1)
F12,F10,F14
F16,F10,F12
10/4/2012
Scoreboard
Instruction Status
Read
Issue
p
Operand
Instruction
FU no. Name
R0
17
R1
R2
Busy
R3
EE/CS520 Comp.Archi.
Execution
p
Complete
Write
Result
Comments
R4
R5
Register Status
R6 R7 R8 R9
ScoreboardImplications
Outofordercompletion=>WAR,WAWhazards?
SolutionsforWAR:
StallinstinWRuntilregistershavebeenread
SolutionforWAW:
StallinstinISstageuntilpreviousinstcompletes
Greatlyreducesthepipelineefficiency
WARandWAWarefalsedependences
WAR d WAW
f l d
d
Canberemovedbyregisterrenaming
18
EE/CS520 Comp.Archi.
10/4/2012
TomasulosApproach
19
EE/CS520 Comp.Archi.
10/4/2012
TomasulosApproach
Dynamicregisterrenaming
Usesomearchitectureinvisibleregistersforrenaming
CalledrenameregisterstoavoidWAW
ReadandkeepacopyofavailableoperandsatISstage
R d dk
f
il bl
d
IS
AvoidsWAR
Valuesarestoredinreservationstations
Values are stored in reservation stations
20
EE/CS520 Comp.Archi.
10/4/2012
TomasulosApproach
UsedinIBM360/91(inthe60s)
Trackswhenoperandsareavailable
Satisfiesdatadependences(RAW)
Removesnamedependences(WAR,WAW)
Usesregisterrenaming
Verysimilartowhatisusedtoday
Very similar to what is used today
Almostallmodernhighperformanceprocessorsusea
derivativeofTomasulosHW
Muchoftheterminologysurvivestoday.
21
EE/CS520 Comp.Archi.
10/4/2012
MIPSFPUnitusingTomasulosAlgo
IS
22
EX
WR
EE/CS520 Comp.Archi.
10/4/2012
IssueStage(ArithmeticInsts)
Whattodo?
GetnextinstfromIQ
Struct.Hazard:
Findfreereservationstation
RAWHazard:
ReadoperandsfromRF
Record source of other operands
Recordsourceofotheroperands
Updatesourcemapping(RAT)
InstructionQueue
3.
F1=F2+F3
2.
F4=F1 F2
1.
F1=F2/F3
RegFile
F1
3.14
F2
1.00
F3
2.718
F4
0.707
RAT
F1
0
C1 (4)
C1(4)
A3(3)
A3
(3)
A1 (1)
A1(1)
F2 F4 F1
F2=F4+F1
F2
A1(1)
A2(2)
F3
F4
0
A2(2)
A3
(3)
A3(3)
F1
F1=F2+F3
= F2+F3 A1
A1(1)
(1) 2
2.718
718
0 707
0.707
FPAdder
23
EE/CS520 Comp.Archi.
3 14
3.14
C1 (4)
C1(4)
F1 F2/F3 A1(1)
F1=F2/F3
A1 (1) 2.718
2 718
C2(5)
FPDivider
10/4/2012
ExecuteStage(ArithmeticInsts)
InstructionQueue
3.
F1=F2+F3
2.
F4=F1 F2
1.
F1=F2/F3
Whattodo?
WaitforOperandstobeReady
p
y
CompeteforFUs
ExecutetheoperationonFUs
F2=F4+F1
A1(1)=3.84
C1(4)
A1(1)
A2(2)
F4=F1F2
3.84
C1(4) A1(1)
A3(3)
F1=F2+F3
3.84
A1(1) 2.718
Adder
24
EE/CS520 Comp.Archi.
F1=F2/F3
2.718 A1(1)
3.84
C2(5)
FPDivider
10/4/2012
WriteResult(ArithmeticInsts)
IInstSequence
tS
0. F2=F4+F1
1. F1=F2/F3
2. F4=F1 F2
3. F1=F2+F3
RegFile
F1
3.14
6.558
F2
1.0
3.84
F3
2.718
F4
0.707
RAT
25
F1
0
3
F2
1
0
F3
F4
Whattodo?
BroadcastresultonCDB
W i b k RF
WritebacktoRF
UpdateMapping
Freereservationstation
Onlyupdatemapping
((andRF)ifRATstill
)
containsyourmapping!
A1(1)
C1(4)
A2(2)
F4=F1F2
(4)
0.707
3.84
(1)
C2(5)
A3(3)
F1=F2+F3
3.84
(1)
2.718
F1=F2/F3
2.718
3.84
(1)
EE/CS520 Comp.Archi.
Adder
(1)0.707+
FPDivider
WAWavoided
10/4/2012
DataStructure
Qj,Qk:Rsrv.StationforSrc1andSrc2
Vj,Vk:Valuesoftwooperands.ValuesarevalidifQjandQkarezero
Busy:Rsrv.StationandFUarebusy
Busy: Rsrv Station and FU are busy
Opcode:Operationtobeperformed
A:MemoryAddressesforLD/SD(initiallyImmValue)
Qi:ContainedinRegfile,indicatesthesrcRsrv.Station(like
Q
g
,
(
scoreboard)
ReservationStations
Is Ex W
26
Busy Op
LD1
LD2
AD1
AD2
AD3
ML1
ML2
SD1
SD2
RegisterStatus:Qi
EE/CS520 Comp.Archi.
F0
F2
Vj
F4
Vk
F6
F8
Qj
Qk
F10 F12
10/4/2012
MIPSFPUnitusingTomasulosAlgo
IS
27
EX
WR
EE/CS520 Comp.Archi.
10/4/2012
HowdoesL.Dgo!
Issue:(1CC)
Checkforstructuralhazardonloadbuffers
PutimmvalueinAfieldofloadbuffer
Ex:(12CC)
E (1 2 CC)
Calculateeffectiveaddress A=imm+Rbase
Update
Update A
A fieldinload
field in loadbuffer
buffer
IfmemoryunitfreefetchdataandputitonCDB
WR:(1CC)
(
)
DataisavailableonCDB
Loaditinregister
28
CS520 Comp.Archi.
10/4/2012
WhathappensinSD!
Issue:(1CC)
Checkforstructuralhazardonstorebuffers
PutimmvalueinAfieldofstorebuffer
Ex:(12CC)
E (1 2 CC)
Calculateeffectiveaddress A=imm+Rbase
Update
Update A
A fieldinstorebuffer
field in store buffer
Waitforsrcoperandtobeready(ifRAWhazard)
Putitinstorebufferaswell
WR:(1CC)
Ifmemoryunitfree,storedataintomem
29
CS520 Comp.Archi.
10/4/2012
Quiz3:Tuesday,09102012
Lecture911
Lecture 9 11
Topics
Topicscovered:2.1,2.2,2.3,2.4(4
covered: 2 1 2 2 2 3 2 4 (4th Ed.)
Ed )
Therewillbeproblemsandobjectivetypequestions
There will be problems and objective type questions
(TFand/orMCQs)
30
EE/CS520 Comp.Archi.
10/4/2012