Académique Documents
Professionnel Documents
Culture Documents
SpeedTrends
Processorspeedsdoubleeverytwoyears.
SpeedLimits
Toreachsuperhighspeedsasequentialcomputerwouldhavetobereallytiny.
SpaceLimits
Atsomepoint,thephysicallimitsofsizewillbereached,solocalitywillhavetobeconsideredto
increasespeed.
BalanceinTime
Thereisagrowinggapbetweencomputespeedandcommunicationspeed.
R[ops]/[time]thisisrelatedtotransistordensity
Computationhasgottenveryfast.
Stream( )isthespeedofdatatransferbetweenslowandfastmemory.Thishasdoubled
every2.9years.
Streamismuchslowerthancomputation.
B==R/ ,thisdoublesevery5.5years(thisisthebalancepoint)
BalancePrinciples
Thisimplies,tradinglesscommunicationformorecomputation.
LookataDAGModel:
WorkW=W(n)==totaloperations
SpanD=D(n)[ops]
Q=Q(nZ,L) W
Themachinemodel:
P=#ofprocessors
Transactionsarethetimeittakesdatatogofromslowmemorytofastmemory.
Eachtransactioninitiatesatransferofdataoverthewires,inparallel.Thetimeittakeswordsto
crossthewireis
.
0
RecallW,D,Qcountthenumberofoperationsandignorethecostsoftransportation.
ThetimefortheprocessofthegivenDAGexample:
T
,W/(PR
,QL/B
)
p max(D/R
0
0
0
TherighthandsideisminimizedwhenW/P D
Tobenefitfromtransistortrendswewantthecomputetimetodominatethecommunication
time:
TheBalancePrincipleW/PR
0 QL/B
0
Achievingbalanceisthebestshotatscalinginthefuture.
W/Q (R
/B
)*PL
0
0
Thereforethelefthandside(W/PR
)shouldbeaslargeaspossiblesincetherighthandside
0
((R
/B
)*PL)willgrowovertime.
0
0
Double,Double,ToilandTroubleQuiz
Supposeamachineisperfectlybalancedforsortinglargearrays.Ifthenumberofcores
doubles,howcanbalancebemaintained?
Forsorting:W/Q~Llog(Z/L)
Recall:
W/Q (R
/B
)*PL
0
0
Llog(Z/L) (R
/B
)*PL
0
0
log(Z/L) (R
/B
)*P
0
0
Thendoublethecores:
log(Z/L) (R
/B
)*2P
0
0
Theanswer:SquareZandSquareLordoublethebandwidth
SquaringZandLisaveryexpensivewaytomaintainbalance.
But,bandwidthdoesnotgrowfast.
PowerLimits
Power=Energy/Time
Increasingclockfrequencymakesthepowerconsumptionskyrocket,whichiswhymulticores
areprevalent.
Constantpower=staticpowerandidlepower
Power=ConstantPower+DynamicPower
P=P
+ P
0
2
EnergyperGate=CV
f=clockfrequency,whichisthemaximumnumberoftimesacircuitcanswitch
a=activityfactor,thenumberofswitchespercycle
TheDynamicPowerEquation
2
DynamicPower=CV
*f*a
f Voltagefreqandvoltagemustchangetogethertomaintainthestabilityandreliabilityof
thecircuit.
PowerMotivatesParallelism
Giventhefollowingprocessors:
f
=4GHz
f
=1GHz
1
2
P =64watts
P =1watts
T
=Timetorunprogram
T
=Timetorunprogram=4T
1
2
1
2
CV
*f*a
Sobyreducingthefrequencybythepowerisreducedby1/64.
Thisisagoodargumentforusingparallelismtospeedupinsteadofclockfrequency.
PowerKnobs
2
RecalltheDynamicpowerEquation:CV
*f*a
Thereare4factorsthatcanchangedynamicpower:
1. Capacitance
2. SupplyVoltage
3. ClockFrequency
4. Activityfactor
Whichofthefourfactorscanbecontrolledbysoftware?
Voltage,frequency,activity
Capacitanceitisageometriccharacteristic
DynamicVoltageandFrequencyScaling(DVFS)someprocessorsallowthistobecontrolled
bysoftware.
Activityfactorifyouknowtherearelargeportionsofthealgorithmthatdonotrequirecertain
hardware,thatpartoftheprocessorcanbeturnedoff.
PowerlesstoChoose
SystemAhasloweraveragepower.
AvgPower=E/T.
Sopowercorrespondstotheslopeofthelines.
ExploitingDVFS
Giventwosystems:
SystemAandSystemB
E
=2E
andT
=T
B
A
B
A
SupposeyouuseDVFStorescaleB,sothatitspowermatchesA.
WillBstillbefasterthanA?Yes
BwasrescaledtomatchA,callitC.
Recalltimeisinverselyproportionaltofrequency.
Sincetheresultis>1,systemBisfaster
thansystemA.
AlgorithmicEnergy
Time:canbereducedorhiddenbyoverlap(parallelism)
Energy:Youmustpayenergycostforeveryoperation
T
=thetimeofthealgorithmwhenruninparallelonasingleprocessor.
1
WorkW(n)bestquantifiesenergy.ForenergyyoumustaccountforEVERYoperation.For
workyoumustaccountforeveryoperation.
AlgorithmicDynamicPower
SelfSpeedupbestexpressesdynamicpower.