Académique Documents
Professionnel Documents
Culture Documents
OpenMP
Author:BlaiseBarney,LawrenceLivermoreNationalLaboratory
UCRLMI133316
TableofContents
1.Abstract
2.Introduction
3.OpenMPProgrammingModel
4.OpenMPAPIOverview
5.CompilingOpenMPPrograms
6.OpenMPDirectives
1.DirectiveFormat
2.C/C++DirectiveFormat
3.DirectiveScoping
4.PARALLELConstruct
5.Exercise1
6.WorkSharingConstructs
1.DO/forDirective
2.SECTIONSDirective
3.WORKSHAREDirective
4.SINGLEDirective
7.CombinedParallelWorkSharingConstructs
8.TASKConstruct
9.Exercise2
10.SynchronizationConstructs
1.MASTERDirective
2.CRITICALDirective
3.BARRIERDirective
4.TASKWAITDirective
5.ATOMICDirective
6.FLUSHDirective
7.ORDEREDDirective
11.THREADPRIVATEDirective
12.DataScopeAttributeClauses
1.PRIVATEClause
2.SHAREDClause
3.DEFAULTClause
4.FIRSTPRIVATEClause
5.LASTPRIVATEClause
6.COPYINClause
7.COPYPRIVATEClause
8.REDUCTIONClause
13.Clauses/DirectivesSummary
14.DirectiveBindingandNestingRules
7.RunTimeLibraryRoutines
8.EnvironmentVariables
9.ThreadStackSizeandThreadBinding
10.Monitoring,DebuggingandPerformanceAnalysisToolsforOpenMP
11.Exercise3
12.ReferencesandMoreInformation
13.AppendixA:RunTimeLibraryRoutines
Abstract
OpenMPisanApplicationProgramInterface(API),jointlydefinedbyagroupofmajorcomputerhardwareandsoftware
vendors.OpenMPprovidesaportable,scalablemodelfordevelopersofsharedmemoryparallelapplications.TheAPI
supportsC/C++andFortranonawidevarietyofarchitectures.ThistutorialcoversmostofthemajorfeaturesofOpenMP3.1,
includingitsvariousconstructsanddirectivesforspecifyingparallelregions,worksharing,synchronizationanddata
environment.Runtimelibraryfunctionsandenvironmentvariablesarealsocovered.ThistutorialincludesbothCandFortran
examplecodesandalabexercise.
Level/Prerequisites:ThistutorialisidealforthosewhoarenewtoparallelprogrammingwithOpenMP.Abasicunderstanding
ofparallelprogramminginCorFortranisrequired.ForthosewhoareunfamiliarwithParallelProgrammingingeneral,the
materialcoveredinEC3500:IntroductiontoParallelComputingwouldbehelpful.
Introduction
WhatisOpenMP?
OpenMPIs:
AnApplicationProgramInterface(API)thatmaybeusedtoexplicitlydirectmulti
threaded,sharedmemoryparallelism.
ComprisedofthreeprimaryAPIcomponents:
CompilerDirectives
RuntimeLibraryRoutines
EnvironmentVariables
Anabbreviationfor:OpenMultiProcessing
OpenMPIsNot:
Meantfordistributedmemoryparallelsystems(byitself)
Necessarilyimplementedidenticallybyallvendors
Guaranteedtomakethemostefficientuseofsharedmemory
Requiredtocheckfordatadependencies,dataconflicts,raceconditions,deadlocks,orcodesequencesthatcausea
programtobeclassifiedasnonconforming
DesignedtohandleparallelI/O.Theprogrammerisresponsibleforsynchronizinginputandoutput.
GoalsofOpenMP:
Standardization:
Provideastandardamongavarietyofsharedmemoryarchitectures/platforms
Jointlydefinedandendorsedbyagroupofmajorcomputerhardwareandsoftwarevendors
LeanandMean:
Establishasimpleandlimitedsetofdirectivesforprogrammingsharedmemorymachines.
Significantparallelismcanbeimplementedbyusingjust3or4directives.
Thisgoalisbecominglessmeaningfulwitheachnewrelease,apparently.
EaseofUse:
Providecapabilitytoincrementallyparallelizeaserialprogram,unlikemessagepassinglibrarieswhichtypically
requireanallornothingapproach
Providethecapabilitytoimplementbothcoarsegrainandfinegrainparallelism
Portability:
TheAPIisspecifiedforC/C++andFortran
PublicforumforAPIandmembership
MostmajorplatformshavebeenimplementedincludingUnix/LinuxplatformsandWindows
History:
Intheearly90's,vendorsofsharedmemorymachinessuppliedsimilar,directivebased,Fortranprogramming
extensions:
TheuserwouldaugmentaserialFortranprogramwithdirectivesspecifyingwhichloopsweretobeparallelized
ThecompilerwouldberesponsibleforautomaticallyparallelizingsuchloopsacrosstheSMPprocessors
Implementationswereallfunctionallysimilar,butwerediverging(asusual)
FirstattemptatastandardwasthedraftforANSIX3H5in1994.Itwasneveradopted,largelyduetowaninginterestas
distributedmemorymachinesbecamepopular.
However,notlongafterthis,newersharedmemorymachinearchitecturesstartedtobecomeprevalent,andinterest
resumed.
TheOpenMPstandardspecificationstartedinthespringof1997,takingoverwhereANSIX3H5hadleftoff.
LedbytheOpenMPArchitectureReviewBoard(ARB).OriginalARBmembersandcontributorsareshownbelow.
(Disclaimer:allpartnernamesderivedfromtheOpenMPwebsite)
APRMembers
EndorsingApplicationDevelopers
Compaq/Digital
HewlettPackardCompany
IntelCorporation
InternationalBusiness
Machines(IBM)
Kuck&Associates,Inc.
(KAI)
SiliconGraphics,Inc.
SunMicrosystems,Inc.
U.S.DepartmentofEnergy
ASCIprogram
ADINAR&D,Inc.
ANSYS,Inc.
DashAssociates
Fluent,Inc.
ILOGCPLEXDivision
LivermoreSoftwareTechnology
Corporation(LSTC)
MECALOGSARL
OxfordMolecularGroupPLC
TheNumericalAlgorithmsGroup
Ltd.(NAG)
EndorsingSoftware
Vendors
AbsoftCorporation
EdinburghPortable
Compilers
GENIASSoftware
GmBH
MyriasComputer
Technologies,Inc.
ThePortlandGroup,
Inc.(PGI)
FormorenewsandmembershipinformationabouttheOpenMPARB,visit:openmp.org/wp/aboutopenmp.
ReleaseHistory
OpenMPcontinuestoevolvenewconstructsandfeaturesareaddedwitheachrelease.
Initially,theAPIspecificationswerereleasedseparatelyforCandFortran.Since2005,theyhavebeenreleased
together.
ThetablebelowchroniclestheOpenMPAPIreleasehistory.
Date
Oct1997
Oct1998
Nov1999
Nov2000
Mar2002
May2005
May2008
Jul2011
Jul2013
Nov2015
Version
Fortran1.0
C/C++1.0
Fortran1.1
Fortran2.0
C/C++2.0
OpenMP2.5
OpenMP3.0
OpenMP3.1
OpenMP4.0
OpenMP4.5
ThistutorialreferstoOpenMPversion3.1.Syntaxandfeaturesofnewerreleasesarenotcurrentlycovered.
References:
OpenMPwebsite:openmp.org
APIspecifications,FAQ,presentations,discussions,mediareleases,calendar,membershipapplicationandmore...
Wikipedia:en.wikipedia.org/wiki/OpenMP
OpenMPProgrammingModel
SharedMemoryModel:
OpenMPisdesignedformultiprocessor/core,sharedmemorymachines.Theunderlyingarchitecturecanbeshared
memoryUMAorNUMA.
UniformMemoryAccess
NonUniformMemoryAccess
ThreadBasedParallelism:
OpenMPprogramsaccomplishparallelismexclusivelythroughtheuseofthreads.
Athreadofexecutionisthesmallestunitofprocessingthatcanbescheduledbyanoperatingsystem.Theideaofa
subroutinethatcanbescheduledtorunautonomouslymighthelpexplainwhatathreadis.
Threadsexistwithintheresourcesofasingleprocess.Withouttheprocess,theyceasetoexist.
Typically,thenumberofthreadsmatchthenumberofmachineprocessors/cores.However,theactualuseofthreadsis
uptotheapplication.
ExplicitParallelism:
OpenMPisanexplicit(notautomatic)programmingmodel,offeringtheprogrammerfullcontroloverparallelization.
Parallelizationcanbeassimpleastakingaserialprogramandinsertingcompilerdirectives....
Orascomplexasinsertingsubroutinestosetmultiplelevelsofparallelism,locksandevennestedlocks.
ForkJoinModel:
OpenMPusestheforkjoinmodelofparallelexecution:
AllOpenMPprogramsbeginasasingleprocess:themasterthread.Themasterthreadexecutessequentiallyuntilthe
firstparallelregionconstructisencountered.
FORK:themasterthreadthencreatesateamofparallelthreads.
Thestatementsintheprogramthatareenclosedbytheparallelregionconstructarethenexecutedinparallelamong
thevariousteamthreads.
JOIN:Whentheteamthreadscompletethestatementsintheparallelregionconstruct,theysynchronizeandterminate,
leavingonlythemasterthread.
Thenumberofparallelregionsandthethreadsthatcomprisethemarearbitrary.
CompilerDirectiveBased:
MostOpenMPparallelismisspecifiedthroughtheuseofcompilerdirectiveswhichareimbeddedinC/C++orFortran
sourcecode.
NestedParallelism:
TheAPIprovidesfortheplacementofparallelregionsinsideotherparallelregions.
Implementationsmayormaynotsupportthisfeature.
DynamicThreads:
TheAPIprovidesfortheruntimeenvironmenttodynamicallyalterthenumberofthreadsusedtoexecuteparallel
regions.Intendedtopromotemoreefficientuseofresources,ifpossible.
Implementationsmayormaynotsupportthisfeature.
I/O:
OpenMPspecifiesnothingaboutparallelI/O.Thisisparticularlyimportantifmultiplethreadsattempttowrite/readfrom
thesamefile.
IfeverythreadconductsI/Otoadifferentfile,theissuesarenotassignificant.
ItisentirelyuptotheprogrammertoensurethatI/Oisconductedcorrectlywithinthecontextofamultithreaded
program.
MemoryModel:FLUSHOften?
OpenMPprovidesa"relaxedconsistency"and"temporary"viewofthreadmemory(intheirwords).Inotherwords,
threadscan"cache"theirdataandarenotrequiredtomaintainexactconsistencywithrealmemoryallofthetime.
Whenitiscriticalthatallthreadsviewasharedvariableidentically,theprogrammerisresponsibleforinsuringthatthe
variableisFLUSHedbyallthreadsasneeded.
Moreonthislater...
OpenMPAPIOverview
ThreeComponents:
TheOpenMPAPIiscomprisedofthreedistinctcomponents:
CompilerDirectives(44)
RuntimeLibraryRoutines(35)
EnvironmentVariables(13)
Theapplicationdeveloperdecideshowtoemploythesecomponents.Inthesimplestcase,onlyafewofthemare
needed.
ImplementationsdifferintheirsupportofallAPIcomponents.Forexample,animplementationmaystatethatitsupports
nestedparallelism,buttheAPImakesitclearthatmaybelimitedtoasinglethreadthemasterthread.Notexactlywhat
thedevelopermightexpect?
CompilerDirectives:
Compilerdirectivesappearascommentsinyoursourcecodeandareignoredbycompilersunlessyoutellthem
otherwiseusuallybyspecifyingtheappropriatecompilerflag,asdiscussedintheCompilingsectionlater.
OpenMPcompilerdirectivesareusedforvariouspurposes:
Spawningaparallelregion
Dividingblocksofcodeamongthreads
Distributingloopiterationsbetweenthreads
Serializingsectionsofcode
Synchronizationofworkamongthreads
Compilerdirectiveshavethefollowingsyntax:
s e n t i n e l
d i r e c t i v e - n a m e
[ c l a u s e ,
. . . ]
Forexample:
Fortran ! $ O M P
C/C++
# p r a g m a
P A R A L L E L
o m p
D E F A U L T ( S H A R E D )
p a r a l l e l
P R I V A T E ( B E T A , P I )
d e f a u l t ( s h a r e d )
p r i v a t e ( b e t a , p i )
Compilerdirectivesarecoveredindetaillater.
RuntimeLibraryRoutines:
TheOpenMPAPIincludesanevergrowingnumberofruntimelibraryroutines.
Theseroutinesareusedforavarietyofpurposes:
Settingandqueryingthenumberofthreads
Queryingathread'suniqueidentifier(threadID),athread'sancestor'sidentifier,thethreadteamsize
Settingandqueryingthedynamicthreadsfeature
Queryingifinaparallelregion,andatwhatlevel
Settingandqueryingnestedparallelism
Setting,initializingandterminatinglocksandnestedlocks
Queryingwallclocktimeandresolution
ForC/C++,alloftheruntimelibraryroutinesareactualsubroutines.ForFortran,someareactuallyfunctions,andsome
aresubroutines.Forexample:
Fortran I N T E G E R
C/C++
F U N C T I O N
O M P _ G E T _ N U M _ T H R E A D S ( )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ n u m _ t h r e a d s ( v o i d )
O M P _ N U M _ T H R E A D S
sh/bash e x p o r t
O M P _ N U M _ T H R E A D S = 8
OpenMPenvironmentvariablesarediscussedintheEnvironmentVariablessectionlater.
ExampleOpenMPCodeStructure:
FortranGeneralCodeStructure
1
P R O G R A M
H E L L O
I N T E G E R
V A R 1 ,
2
3
V A R 2 ,
V A R 3
4
5
S e r i a l
c o d e
.
7
.
8
.
9
1 0
1 1
B e g i n n i n g o f p a r a l l e l r e g i o n .
S p e c i f y v a r i a b l e s c o p i n g
F o r k
t e a m
o f
t h r e a d s .
1 2
1 3
! $ O M P
P A R A L L E L
P R I V A T E ( V A R 1 ,
V A R 2 )
S H A R E D ( V A R 3 )
1 4
1 5
1 6
1 7
1 8
1 9
2 0
P a r a l l e l r e g i o n e x e c u t e d
.
O t h e r O p e n M P d i r e c t i v e s
.
R u n - t i m e L i b r a r y c a l l s
.
b y
a l l
t h r e a d s
2 1
A l l
t h r e a d s
j o i n
m a s t e r
t h r e a d
a n d
d i s b a n d
2 2
2 3
! $ O M P
E N D
P A R A L L E L
2 4
2 5
R e s u m e
s e r i a l
2 6
2 7
2 8
c o d e
2 9
3 0
E N D
C/C++GeneralCodeStructure
1
# i n c l u d e
< o m p . h >
m a i n
2
3
( )
4
5
i n t
v a r 1 ,
v a r 2 ,
v a r 3 ;
6
7
S e r i a l
.
9
1 0
c o d e
1 1
1 2
1 3
B e g i n n i n g o f p a r a l l e l r e g i o n .
S p e c i f y v a r i a b l e s c o p i n g
F o r k
t e a m
o f
t h r e a d s .
1 4
1 5
1 6
# p r a g m a
{
o m p
p a r a l l e l
p r i v a t e ( v a r 1 ,
v a r 2 )
s h a r e d ( v a r 3 )
1 7
1 8
P a r a l l e l
r e g i
.
O t h e r O p e n M P
.
R u n - t i m e L i b r
.
A l l t h r e a d s j
1 9
2 0
2 1
2 2
2 3
2 4
o n
e x e c u t e d
b y
a l l
t h r e a d s
d i r e c t i v e s
a r y
c a l l s
o i n
m a s t e r
t h r e a d
a n d
d i s b a n d
2 5
2 6
2 7
2 8
R e s u m e
.
3 0
3 1
s e r i a l
2 9
c o d e
3 2
3 3
CompilingOpenMPPrograms
LCOpenMPImplementations:
AsofJune2016,thedocumentationsourcesforLC'sdefaultcompilersclaimthefollowingOpenMPsupport:
Compiler
IntelC/C++,Fortran
Version
14.0.3
Supports
OpenMP3.1
GNUC/C++,Fortran
4.4.7
OpenMP3.0
PGIC/C++,Fortran
8.0.1
OpenMP3.0
IBMBlueGeneC/C++
12.1
OpenMP3.1
IBMBlueGeneFortran
14.1
OpenMP3.1
IBMBlueGeneGNUC/C++,Fortran 4.4.7
OpenMP3.0
OpenMP4.0Support(accordingtovendorandopenmp.orgdocumentation):
GNU:supportedin4.9forC/C++and4.9.1forFortran
Intel:14.0has"some"support15.0supports"mostfeatures"version16supported
PGI:notcurrentlyavailable
IBMBG/Q:notcurrentlyavailable
OpenMP4.5Support:
NotcurrentlysupportedonanyofLC'sproductionclustercompilers.
SupportedinabetaversionoftheClangcompileronthenonproductionrzmistandrzhasgpuclusters(June
2016).
ToviewallLCcompilerversions,usethecommandu s e
- l
c o m p i l e r s toviewcompilerpackagesbyversion.
ToviewLC'sdefaultcompilerversionssee:https://computing.llnl.gov/?set=code&page=compilers
BestplacetoviewOpenMPsupportbyarangeofcompilers:http://openmp.org/wp/openmpcompilers/.
Compiling:
AllofLC'scompilersrequireyoutousetheappropriatecompilerflagto"turnon"OpenMPcompilations.Thetablebelow
showswhattouseforeachcompiler.
Compiler/Platform
Compiler
Flag
Intel
LinuxOpteron/Xeon
i c c
i c p c
i f o r t
- o p e n m p
PGI
LinuxOpteron/Xeon
p g c
p g C
p g f
p g f
- m p
c
C
7 7
9 0
GNU
LinuxOpteron/Xeon
IBMBlueGene
g c c
g + +
g 7 7
g f o r t r a n
IBM
BlueGene
b g x
b g x
b g x
b g x
b g x
b g x
b g x
b g x
l c _
l C _
l c 8
l c 9
l f _
l f 9
l f 9
l f 2
r ,
r ,
- f o p e n m p
b g c c _ r
b g x l c + + _ r
- q s m p = o m p
9 _ r
9 _ r
r
0 _ r
5 _ r
0 0 3 _ r
*Besuretouseathreadsafecompileritsnameendswith_r
CompilerDocumentation:
IBMBlueGene:www01.ibm.com/software/awdtools/fortran/andwww01.ibm.com/software/awdtools/xlcpp
Intel:www.intel.com/software/products/compilers/
PGI:www.pgroup.com
GNU:gnu.org
All:Seetherelevantmanpagesandanyfilesthatmightrelatein/ u s r / l o c a l / d o c s
OpenMPDirectives
FortranDirectivesFormat
Format:(caseinsensitive)
sentinel
directivename
AllFortranOpenMPdirectivesmustbeginwith
asentinel.Theacceptedsentinelsdepend
uponthetypeofFortransource.Possible
sentinelsare:
AvalidOpenMP
directive.Mustappear
afterthesentineland
beforeanyclauses.
[clause...]
Optional.Clausescanbein
anyorder,andrepeatedas
necessaryunlessotherwise
restricted.
! $ O M P
C $ O M P
* $ O M P
Example:
! $ O M P
P A R A L L E L
D E F A U L T ( S H A R E D )
P R I V A T E ( B E T A , P I )
FixedFormSource:
! $ O M P
C $ O M P
* $ O M P areacceptedsentinelsandmuststartincolumn1
AllFortranfixedformrulesforlinelength,whitespace,continuationandcommentcolumnsapplyfortheentiredirective
line
Initialdirectivelinesmusthaveaspace/zeroincolumn6.
Continuationlinesmusthaveanonspace/zeroincolumn6.
FreeFormSource:
! $ O M P istheonlyacceptedsentinel.Canappearinanycolumn,butmustbeprecededbywhitespaceonly.
AllFortranfreeformrulesforlinelength,whitespace,continuationandcommentcolumnsapplyfortheentiredirective
line
Initialdirectivelinesmusthaveaspaceafterthesentinel.
Continuationlinesmusthaveanampersandasthelastnonblankcharacterinaline.Thefollowinglinemustbeginwith
asentinelandthenthecontinuationdirectives.
GeneralRules:
Commentscannotappearonthesamelineasadirective
Onlyonedirectivenamemaybespecifiedperdirective
FortrancompilerswhichareOpenMPenabledgenerallyincludeacommandlineoptionwhichinstructsthecompilerto
activateandinterpretallOpenMPdirectives.
SeveralFortranOpenMPdirectivescomeinpairsandhavetheformshownbelow.The"end"directiveisoptionalbut
advisedforreadability.
! $ O M P
d i r e c t i v e
s t r u c t u r e d
! $ O M P
e n d
b l o c k
o f
c o d e
d i r e c t i v e
OpenMPDirectives
C/C++DirectivesFormat
Format:
#pragma
omp
directivename
[clause,...]
newline
Requiredfor
allOpenMP
C/C++
directives.
AvalidOpenMPdirective.
Mustappearafterthe
pragmaandbeforeany
clauses.
Optional.Clausescanbeinany
order,andrepeatedas
necessaryunlessotherwise
restricted.
Required.Precedesthe
structuredblockwhichis
enclosedbythisdirective.
Example:
# p r a g m a
o m p
p a r a l l e l
d e f a u l t ( s h a r e d )
p r i v a t e ( b e t a , p i )
GeneralRules:
Casesensitive
DirectivesfollowconventionsoftheC/C++standardsforcompilerdirectives
Onlyonedirectivenamemaybespecifiedperdirective
Eachdirectiveappliestoatmostonesucceedingstatement,whichmustbeastructuredblock.
Longdirectivelinescanbe"continued"onsucceedinglinesbyescapingthenewlinecharacterwithabackslash("\")at
theendofadirectiveline.
OpenMPDirectives
DirectiveScoping
Dowedothisnow...ordoitlater?Ohwell,let'sgetitoverwithearly...
Static(Lexical)Extent:
Thecodetextuallyenclosedbetweenthebeginningandtheendofastructuredblockfollowingadirective.
Thestaticextentofadirectivesdoesnotspanmultipleroutinesorcodefiles
OrphanedDirective:
AnOpenMPdirectivethatappearsindependentlyfromanotherenclosingdirectiveissaidtobeanorphaneddirective.It
existsoutsideofanotherdirective'sstatic(lexical)extent.
Willspanroutinesandpossiblycodefiles
DynamicExtent:
Thedynamicextentofadirectiveincludesbothitsstatic(lexical)extentandtheextentsofitsorphaneddirectives.
Example:
! $ O M P
! $ O M P
! $ O M P
! $ O M P
P R O
. . .
P A R
. . .
D O
D O
. . .
C A L
. . .
E N D
E N D
. . .
C A L
. . .
E N D
G R A M
T E S T
A L L E L
! $ O M P
! $ O M P
I = . . .
L
S U B R O U T I N E S U B 1
. . .
C R I T I C A L
. . .
E N D C R I T I C A L
E N D
S U B 1
D O
D O
! $ O M P
S U B 2
P A R A L L E L
STATICEXTENT
TheD O directiveoccurswithinanenclosingparallel
region
! $ O M P
S U B R O U T I N E S U B 2
. . .
S E C T I O N S
. . .
E N D S E C T I O N S
. . .
E N D
ORPHANEDDIRECTIVES
TheC R I T I C A L andS E C T I O N S directivesoccur
outsideanenclosingparallelregion
DYNAMICEXTENT
TheCRITICALandSECTIONSdirectivesoccurwithinthedynamicextentoftheDOandPARALLEL
directives.
WhyIsThisImportant?
OpenMPspecifiesanumberofscopingrulesonhowdirectivesmayassociate(bind)andnestwithineachother
Illegaland/orincorrectprogramsmayresultiftheOpenMPbindingandnestingrulesareignored
SeeDirectiveBindingandNestingRulesforspecificdetails
OpenMPDirectives
PARALLELRegionConstruct
Purpose:
Aparallelregionisablockofcodethatwillbeexecutedbymultiplethreads.ThisisthefundamentalOpenMPparallel
construct.
Format:
! $ O M P
P A R A L L E L
Fortran
[ c
I F
P R
S H
D E
F I
R E
C O
N U
l a
(
I V
A R
F A
R S
D U
P Y
M _
u s
s c
A T
E D
U L
T P
C T
I N
T H
e
a l
E
(
T
R I
I O
(
R E
. .
a r
( l
l i
( P
V A
N
l i
A D
. ]
_ l
i s
s t
R I
T E
( o
s t
S
)
( s c a l a r - i n t e g e r - e x p r e s s i o n )
[ c
i f
p r
s h
d e
f i
r e
c o
n u
l a
(
i v
a r
f a
r s
d u
p y
m _
u s
s c
a t
e d
u l
t p
c t
i n
t h
e . . .
a l a r _
e ( l i
( l i s
t ( s h
r i v a t
i o n (
( l i s
r e a d s
o g i c a l _ e x p r e s s i o n )
t )
)
V A T E | F I R S T P R I V A T E
( l i s t )
p e r a t o r : l i s t )
S H A R E D
N O N E )
b l o c k
! $ O M P
E N D
# p r a g m a
P A R A L L E L
o m p
p a r a l l e l
C/C++
]
e x
s t
t )
a r
e
o p
t )
(
n e w l i n e
p r e s s i o n )
)
e d | n o n e )
( l i s t )
e r a t o r : l i s t )
i n t e g e r - e x p r e s s i o n )
s t r u c t u r e d _ b l o c k
Notes:
WhenathreadreachesaPARALLELdirective,itcreatesateamofthreadsandbecomesthemasteroftheteam.The
masterisamemberofthatteamandhasthreadnumber0withinthatteam.
Startingfromthebeginningofthisparallelregion,thecodeisduplicatedandallthreadswillexecutethatcode.
Thereisanimpliedbarrierattheendofaparallelregion.Onlythemasterthreadcontinuesexecutionpastthispoint.
Ifanythreadterminateswithinaparallelregion,allthreadsintheteamwillterminate,andtheworkdoneupuntilthat
pointisundefined.
HowManyThreads?
Thenumberofthreadsinaparallelregionisdeterminedbythefollowingfactors,inorderofprecedence:
1.EvaluationoftheI F clause
2.SettingoftheN U M _ T H R E A D S clause
3.Useoftheo m p _ s e t _ n u m _ t h r e a d s ( ) libraryfunction
4.SettingoftheOMP_NUM_THREADSenvironmentvariable
5.ImplementationdefaultusuallythenumberofCPUsonanode,thoughitcouldbedynamic(seenextbullet).
Threadsarenumberedfrom0(masterthread)toN1
DynamicThreads:
Usetheo m p _ g e t _ d y n a m i c ( ) libraryfunctiontodetermineifdynamicthreadsareenabled.
Ifsupported,thetwomethodsavailableforenablingdynamicthreadsare:
1.Theo m p _ s e t _ d y n a m i c ( ) libraryroutine
2.SettingoftheOMP_DYNAMICenvironmentvariabletoTRUE
NestedParallelRegions:
Usetheo m p _ g e t _ n e s t e d ( ) libraryfunctiontodetermineifnestedparallelregionsareenabled.
Thetwomethodsavailableforenablingnestedparallelregions(ifsupported)are:
1.Theo m p _ s e t _ n e s t e d ( ) libraryroutine
2.SettingoftheOMP_NESTEDenvironmentvariabletoTRUE
Ifnotsupported,aparallelregionnestedwithinanotherparallelregionresultsinthecreationofanewteam,consisting
ofonethread,bydefault.
Clauses:
IFclause:Ifpresent,itmustevaluateto.TRUE.(Fortran)ornonzero(C/C++)inorderforateamofthreadstobe
created.Otherwise,theregionisexecutedseriallybythemasterthread.
Theremainingclausesaredescribedindetaillater,intheDataScopeAttributeClausessection.
Restrictions:
Aparallelregionmustbeastructuredblockthatdoesnotspanmultipleroutinesorcodefiles
Itisillegaltobranch(goto)intooroutofaparallelregion
OnlyasingleIFclauseispermitted
OnlyasingleNUM_THREADSclauseispermitted
Aprogrammustnotdependupontheorderingoftheclauses
Example:ParallelRegion
Simple"HelloWorld"program
Everythreadexecutesallcodeenclosedintheparallelregion
OpenMPlibraryroutinesareusedtoobtainthreadidentifiersandtotalnumberofthreads
FortranParallelRegionExample
1
P R O G R A M
H E L L O
2
3
4
I N T E G E R N T H R E A D S , T I D ,
O M P _ G E T _ T H R E A D _ N U M
O M P _ G E T _ N U M _ T H R E A D S ,
!
! $ O M P
8
5
F o r k a t e a m o f t h r e a d s
P A R A L L E L P R I V A T E ( T I D )
w i t h
e a c h
t h r e a d
h a v i n g
p r i v a t e
T I D
v a r i a b l e
O b t a i n a n d p r i n t t h r e a d i d
T I D = O M P _ G E T _ T H R E A D _ N U M ( )
P R I N T * , ' H e l l o W o r l d f r o m
1 0
1 1
t h r e a d
' ,
T I D
1 2
1 3
O n l y
I F (
N T
P R
E N D
m a s t
T I D .
H R E A D
I N T *
I F
A l l
E N D
t h r e a d s j o i n
P A R A L L E L
1 4
1 5
1 6
1 7
e r
t h r e a d d o e s t h i s
0 ) T H E N
= O M P _ G E T _ N U M _ T H R E A D S ( )
' N u m b e r o f t h r e a d s = ' ,
E Q .
S
,
N T H R E A D S
1 8
1 9
!
! $ O M P
2 0
m a s t e r
t h r e a d
a n d
d i s b a n d
2 1
2 2
E N D
C/C++ParallelRegionExample
1
# i n c l u d e
< o m p . h >
m a i n ( i n t
a r g c ,
2
3
c h a r
* a r g v [ ] )
4
5
i n t
n t h r e a d s ,
t i d ;
6
7
/ *
8
9
F o r k
# p r a g m a
{
t e a m o f t h r e a d s w i t h e a c h
p a r a l l e l p r i v a t e ( t i d )
t h r e a d
h a v i n g
% d \ n " ,
t i d ) ;
p r i v a t e
t i d
o m p
1 0
1 1
/ * O b t a i n a n d p r i n t t h r e a d i d * /
t i d = o m p _ g e t _ t h r e a d _ n u m ( ) ;
p r i n t f ( " H e l l o W o r l d f r o m t h r e a d =
1 2
1 3
1 4
1 5
/ *
i f
1 6
O n l y
( t i d
m a s t e r
= = 0 )
t h r e a d
d o e s
t h i s
* /
1 7
1 8
1 9
n t h r e a d s = o m p _ g e t _ n u m _ t h r e a d s ( ) ;
p r i n t f ( " N u m b e r o f t h r e a d s = % d \ n " ,
2 0
n t h r e a d s ) ;
2 1
2 2
/ *
A l l
t h r e a d s
j o i n
m a s t e r
t h r e a d
a n d
t e r m i n a t e
* /
2 3
2 4
OpenMPExercise1
GettingStarted
Overview:
LogintotheworkshopclusterusingyourworkshopusernameandOTPtoken
Copytheexercisefilestoyourhomedirectory
FamiliarizeyourselfwithLC'sOpenMPenvironment
Writeasimple"HelloWorld"OpenMPprogram
Successfullycompileyourprogram
Successfullyrunyourprogram
Modifythenumberofthreadsusedtorunyourprogram
v a r i a b l e
* /
GOTOTHEEXERCISEHERE
Approx.20minutes
OpenMPDirectives
WorkSharingConstructs
Aworksharingconstructdividestheexecutionoftheenclosedcoderegionamongthemembersoftheteamthat
encounterit.
Worksharingconstructsdonotlaunchnewthreads
Thereisnoimpliedbarrieruponentrytoaworksharingconstruct,howeverthereisanimpliedbarrierattheendofa
worksharingconstruct.
TypesofWorkSharingConstructs:
NOTE:TheFortranw o r k s h a r e constructisnotshownhere,butisdiscussedlater.
DO/forsharesiterationsofaloop
SECTIONSbreaksworkinto
acrosstheteam.Representsatypeof separate,discretesections.Each
"dataparallelism".
sectionisexecutedbyathread.Can
beusedtoimplementatypeof
"functionalparallelism".
SINGLEserializesasectionofcode
Restrictions:
Aworksharingconstructmustbeencloseddynamicallywithinaparallelregioninorderforthedirectivetoexecutein
parallel.
Worksharingconstructsmustbeencounteredbyallmembersofateamornoneatall
Successiveworksharingconstructsmustbeencounteredinthesameorderbyallmembersofateam
OpenMPDirectives
WorkSharingConstructs
DO/forDirective
Purpose:
TheDO/fordirectivespecifiesthattheiterationsoftheloopimmediatelyfollowingitmustbeexecutedinparallelbythe
team.Thisassumesaparallelregionhasalreadybeeninitiated,otherwiseitexecutesinserialonasingleprocessor.
Format:
! $ O M P
D O
Fortran
[ c
S C
O R
P R
F I
L A
S H
R E
C O
l a
H E
D E
I V
R S
S T
A R
D U
L L
u s
D U
R E
A T
T P
P R
E D
C T
A P
. . . ]
( t y p e
L E
[ , c h u n k ] )
D
E
(
R I V
I V A
( l
I O N
S E
l i
A T
T E
i s
(
( n
s t )
E ( l i s t )
( l i s t )
t )
o p e r a t o r
)
|
i n t r i n s i c
l i s t )
d o _ l o o p
! $ O M P
# p r a g m a
E N D
D O
o m p
[
f o r
C/C++
N O W A I T
[ c
s c
o r
p r
f i
l a
s h
r e
c o
n o
l a
h e
d e
i v
r s
s t
a r
d u
l l
w a
u s
d u
r e
a t
t p
p r
e d
c t
a p
i t
]
e
l e
. . . ]
n e w l i n e
( t y p e [ , c h u n k ] )
d
e
(
r i v
i v a
( l
i o n
s e
l i
a t
t e
i s
(
( n
s t )
e ( l i s t )
( l i s t )
t )
o p e r a t o r :
)
l i s t )
f o r _ l o o p
Clauses:
SCHEDULE:Describeshowiterationsofthelooparedividedamongthethreadsintheteam.Thedefaultscheduleis
implementationdependent.Foradiscussiononhowonetypeofschedulingmaybemoreoptimalthanothers,see
http://openmp.org/forum/viewtopic.php?f=3&t=83.
STATIC
Loopiterationsaredividedintopiecesofsizechunkandthenstaticallyassignedtothreads.Ifchunkisnot
specified,theiterationsareevenly(ifpossible)dividedcontiguouslyamongthethreads.
DYNAMIC
Loopiterationsaredividedintopiecesofsizechunk,anddynamicallyscheduledamongthethreadswhena
threadfinishesonechunk,itisdynamicallyassignedanother.Thedefaultchunksizeis1.
GUIDED
Iterationsaredynamicallyassignedtothreadsinblocksasthreadsrequestthemuntilnoblocksremaintobe
assigned.SimilartoDYNAMICexceptthattheblocksizedecreaseseachtimeaparcelofworkisgiventoa
thread.Thesizeoftheinitialblockisproportionalto:
n u m b e r _ o f _ i t e r a t i o n s
n u m b e r _ o f _ t h r e a d s
Subsequentblocksareproportionalto
n u m b e r _ o f _ i t e r a t i o n s _ r e m a i n i n g
n u m b e r _ o f _ t h r e a d s
Thechunkparameterdefinestheminimumblocksize.Thedefaultchunksizeis1.
RUNTIME
TheschedulingdecisionisdeferreduntilruntimebytheenvironmentvariableOMP_SCHEDULE.Itisillegalto
specifyachunksizeforthisclause.
AUTO
Theschedulingdecisionisdelegatedtothecompilerand/orruntimesystem.
NOWAIT/nowait:Ifspecified,thenthreadsdonotsynchronizeattheendoftheparallelloop.
ORDERED:Specifiesthattheiterationsoftheloopmustbeexecutedastheywouldbeinaserialprogram.
COLLAPSE:Specifieshowmanyloopsinanestedloopshouldbecollapsedintoonelargeiterationspaceanddivided
accordingtothes c h e d u l e clause.Thesequentialexecutionoftheiterationsinallassociatedloopsdeterminesthe
orderoftheiterationsinthecollapsediterationspace.
Otherclausesaredescribedindetaillater,intheDataScopeAttributeClausessection.
Restrictions:
TheDOloopcannotbeaDOWHILEloop,oraloopwithoutloopcontrol.Also,theloopiterationvariablemustbean
integerandtheloopcontrolparametersmustbethesameforallthreads.
Programcorrectnessmustnotdependuponwhichthreadexecutesaparticulariteration.
Itisillegaltobranch(goto)outofaloopassociatedwithaDO/fordirective.
Thechunksizemustbespecifiedasaloopinvarientintegerexpression,asthereisnosynchronizationduringits
evaluationbydifferentthreads.
ORDERED,COLLAPSEandSCHEDULEclausesmayappearonceeach.
SeetheOpenMPspecificationdocumentforadditionalrestrictions.
Example:DO/forDirective
Simplevectoraddprogram
ArraysA,B,C,andvariableNwillbesharedbyallthreads.
VariableIwillbeprivatetoeachthreadeachthreadwillhaveitsownuniquecopy.
TheiterationsoftheloopwillbedistributeddynamicallyinCHUNKsizedpieces.
Threadswillnotsynchronizeuponcompletingtheirindividualpiecesofwork(NOWAIT).
FortranDODirectiveExample
1
P R O G R A M
V E C _ A D D _ D O
I N T
P A R
P A R
R E A
N ,
2
3
4
5
6
E G E R
A M E T E
A M E T E
L A ( N
C H
( N =
R ( C H
) , B (
R
U N K S I Z E , C H U N K ,
1 0 0 0 )
U N K S I Z E = 1 0 0 )
N ) , C ( N )
7
8
S o m e
D O I
A (
B (
E N D D
C H U N
9
1 0
1 1
1 2
1 3
i n
=
I )
I )
i t i a l i z a t i o n s
1 , N
= I * 1 . 0
= A ( I )
O
K
C H U N K S I Z E
1 4
1 5
! $ O M P
P A R A L L E L
! $ O M P
D O
D O
S H A R E D ( A , B , C , C H U N K )
1 6
1 7
! $ O M P
S C
I
C (
E N D D O
E N D D
! $ O M P
E N D
1 8
1 9
2 0
2 1
H E D U L E ( D Y N A M I C , C H U N K )
= 1 , N
I ) = A ( I ) + B ( I )
N O W A I T
2 2
2 3
P A R A L L E L
2 4
2 5
E N D
C/C++forDirectiveExample
P R I V A T E ( I )
# i n c l u d e < o m p . h >
# d e f i n e N 1 0 0 0
# d e f i n e C H U N K S I Z E
2
3
1 0 0
4
5
m a i n ( i n t
a r g c ,
c h a r
* a r g v [ ] )
6
7
i n t i , c h u n k ;
f l o a t a [ N ] , b [ N ] ,
8
c [ N ] ;
9
1 0
1 1
1 2
1 3
/ * S o m e i n i t i a l i z a t i o n s * /
f o r ( i = 0 ; i < N ; i + + )
a [ i ] = b [ i ] = i * 1 . 0 ;
c h u n k = C H U N K S I Z E ;
1 4
1 5
1 6
# p r a g m a
{
o m p
p a r a l l e l
s h a r e d ( a , b , c , c h u n k )
p r i v a t e ( i )
1 7
1 8
# p r a g m a o m p f o r s c h e d u l e ( d y n a m i c , c h u n k )
f o r ( i = 0 ; i < N ; i + + )
c [ i ] = a [ i ] + b [ i ] ;
1 9
2 0
n o w a i t
2 1
2 2
/ *
e n d
o f
p a r a l l e l
r e g i o n
* /
2 3
2 4
OpenMPDirectives
WorkSharingConstructs
SECTIONSDirective
Purpose:
TheSECTIONSdirectiveisanoniterativeworksharingconstruct.Itspecifiesthattheenclosedsection(s)ofcodeareto
bedividedamongthethreadsintheteam.
IndependentSECTIONdirectivesarenestedwithinaSECTIONSdirective.EachSECTIONisexecutedoncebya
threadintheteam.Differentsectionsmaybeexecutedbydifferentthreads.Itispossibleforathreadtoexecutemore
thanonesectionifitisquickenoughandtheimplementationpermitssuch.
Format:
! $ O M P
S E C T I O N S
! $ O M P
Fortran
[ c
P R
F I
L A
R E
l a
I V
R S
S T
D U
u s
A T
T P
P R
C T
. .
( l
R I V A
I V A T
I O N
E
. ]
i s t )
T E ( l i s t )
E ( l i s t )
( o p e r a t o r
|
i n t r i n s i c
S E C T I O N
b l o c k
! $ O M P
S E C T I O N
b l o c k
! $ O M P
# p r a g m a
E N D
S E C T I O N S
o m p
s e c t i o n s
N O W A I T
[ c l a u s e . . . ]
n e w l i n e
p r i v a t e ( l i s t )
f i r s t p r i v a t e ( l i s t )
l a s t p r i v a t e
( l i s t )
l i s t )
l a s t p r i v a t e ( l i s t )
r e d u c t i o n ( o p e r a t o r :
n o w a i t
l i s t )
{
# p r a g m a
C/C++
o m p
s e c t i o n
n e w l i n e
s t r u c t u r e d _ b l o c k
# p r a g m a
o m p
s e c t i o n
n e w l i n e
s t r u c t u r e d _ b l o c k
}
Clauses:
ThereisanimpliedbarrierattheendofaSECTIONSdirective,unlesstheN O W A I T / n o w a i t clauseisused.
Clausesaredescribedindetaillater,intheDataScopeAttributeClausessection.
Questions:
WhathappensifthenumberofthreadsandthenumberofSECTIONsaredifferent?Morethreadsthan
SECTIONs?LessthreadsthanSECTIONs?
Answer
WhichthreadexecuteswhichSECTION?
Answer
Restrictions:
Itisillegaltobranch(goto)intooroutofsectionblocks.
SECTIONdirectivesmustoccurwithinthelexicalextentofanenclosingSECTIONSdirective(noorphanSECTIONs).
Example:SECTIONSDirective
Simpleprogramdemonstratingthatdifferentblocksofworkwillbedonebydifferentthreads.
FortranSECTIONSDirectiveExample
1
P R O G R A M
V E C _ A D D _ S E C T I O N S
2
3
I N T E G E R N , I
P A R A M E T E R ( N = 1 0 0 0 )
R E A L A ( N ) , B ( N ) , C ( N ) ,
4
5
D ( N )
6
7
S o m e
D O I
A ( I
B ( I
E N D D O
! $ O M P
P A R A L L E L
! $ O M P
S E C T I O N S
! $ O M P
S E C T I O N
D O I = 1 ,
C ( I ) =
E N D D O
8
9
1 0
1 1
i n
=
)
)
i t i a l i z a t i o n s
1 , N
= I * 1 . 5
= I + 2 2 . 3 5
1 2
1 3
S H A R E D ( A , B , C , D ) ,
1 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
A ( I )
2 2
! $ O M P
S E C T I O N
D O
1 ,
B ( I )
P R I V A T E ( I )
D O
= 1 ,
D ( I ) =
E N D D O
2 3
2 4
2 5
N
A ( I )
B ( I )
2 6
2 7
! $ O M P
E N D
S E C T I O N S
! $ O M P
E N D
P A R A L L E L
N O W A I T
2 8
2 9
3 0
3 1
E N D
C/C++sectionsDirectiveExample
1
# i n c l u d e < o m p . h >
# d e f i n e N 1 0 0 0
2
3
4
m a i n ( i n t
a r g c ,
c h a r
* a r g v [ ] )
5
6
i n t i ;
f l o a t a [ N ] ,
7
b [ N ] ,
c [ N ] ,
d [ N ] ;
8
9
1 0
1 1
1 2
1 3
/ * S o
f o r (
a [ i
b [ i
}
m e i n i t i a
i = 0 ; i <
] = i * 1
] = i + 2
l i z a t i o n s
N ; i + + ) {
. 5 ;
2 . 3 5 ;
* /
1 4
1 5
1 6
# p r a g m a
{
o m p
p a r a l l e l
s h a r e d ( a , b , c , d )
p r i v a t e ( i )
1 7
1 8
# p r a g m a
{
1 9
o m p
s e c t i o n s
n o w a i t
2 0
2 1
# p r a g m a o m p s e c t i o n
f o r ( i = 0 ; i < N ; i + + )
c [ i ] = a [ i ] + b [ i ] ;
2 2
2 3
2 4
2 5
# p r a g m a o m p s e c t i o n
f o r ( i = 0 ; i < N ; i + + )
d [ i ] = a [ i ] * b [ i ] ;
2 6
2 7
2 8
2 9
/ *
e n d
o f
s e c t i o n s
* /
3 0
3 1
/ *
e n d
o f
p a r a l l e l
r e g i o n
* /
3 2
3 3
OpenMPDirectives
WorkSharingConstructs
WORKSHAREDirective
Purpose:
Fortranonly
TheWORKSHAREdirectivedividestheexecutionoftheenclosedstructuredblockintoseparateunitsofwork,eachof
whichisexecutedonlyonce.
Thestructuredblockmustconsistofonlythefollowing:
arrayassignments
scalarassignments
FORALLstatements
FORALLconstructs
WHEREstatements
WHEREconstructs
atomicconstructs
criticalconstructs
parallelconstructs
SeetheOpenMPAPIdocumentationforadditionalinformation,particularlyforwhatcomprisesa"unitofwork".
Format:
! $ O M P
W O R K S H A R E
s t r u c t u r e d
Fortran
! $ O M P
E N D
b l o c k
W O R K S H A R E
N O W A I T
Restrictions:
TheconstructmustnotcontainanyuserdefinedfunctioncallsunlessthefunctionisELEMENTAL.
Example:WORKSHAREDirective
Simplearrayandscalarassigmentssharedbytheteamofthreads.Aunitofworkwouldinclude:
Anyscalarassignment
Forarrayassignmentstatements,theassignmentofeachelementisaunitofwork
FortranWORKSHAREDirectiveExample
1
P R O G R A M
W O R K S H A R E
2
3
I N T E G E R N , I , J
P A R A M E T E R ( N = 1 0 0 )
R E A L A A ( N , N ) , B B ( N , N ) ,
4
5
C C ( N , N ) ,
D D ( N , N ) ,
6
7
S o m e
D O I
D O
i n
=
J
A A (
B B (
E N D D O
E N D D O
8
9
1 0
1 1
1 2
1 3
i t
1 ,
=
J ,
J ,
i a l
N
1 ,
I )
I )
i z a t i o n s
N
=
1 . 0
1 . 0
1 4
1 5
! $ O M P
P A R A L L E L
! $ O M P
! $ O M P
W O
C C
D D
F I
L A
E N
! $ O M P
E N D
S H A R E D ( A A , B B , C C , D D , F I R S T , L A S T )
1 6
1 7
1 8
1 9
2 0
2 1
2 2
R K S
=
=
R S T
S T
D W
H A R E
A A * B
A A + B
= C C (
= C C ( N
O R K S H A
B
B
2 3
1 , 1 ) + D D ( 1 , 1 )
, N ) + D D ( N , N )
R E N O W A I T
2 4
2 5
2 6
E N D
P A R A L L E L
F I R S T ,
L A S T
OpenMPDirectives
WorkSharingConstructs
SINGLEDirective
Purpose:
TheSINGLEdirectivespecifiesthattheenclosedcodeistobeexecutedbyonlyonethreadintheteam.
Maybeusefulwhendealingwithsectionsofcodethatarenotthreadsafe(suchasI/O)
Format:
! $ O M P
Fortran
S I N G L E
[ c l a u s e . . . ]
P R I V A T E ( l i s t )
F I R S T P R I V A T E ( l i s t )
b l o c k
! $ O M P
E N D
# p r a g m a
S I N G L E
o m p
C/C++
s i n g l e
N O W A I T
[ c
p r
f i
n o
l a
i v
r s
w a
u s e . . . ]
n e w l i n e
a t e ( l i s t )
t p r i v a t e ( l i s t )
i t
s t r u c t u r e d _ b l o c k
Clauses:
ThreadsintheteamthatdonotexecutetheSINGLEdirective,waitattheendoftheenclosedcodeblock,unlessa
N O W A I T / n o w a i t clauseisspecified.
Clausesaredescribedindetaillater,intheDataScopeAttributeClausessection.
Restrictions:
ItisillegaltobranchintooroutofaSINGLEblock.
OpenMPDirectives
CombinedParallelWorkSharingConstructs
OpenMPprovidesthreedirectivesthataremerelyconveniences:
PARALLELDO/parallelfor
PARALLELSECTIONS
PARALLELWORKSHARE(fortranonly)
Forthemostpart,thesedirectivesbehaveidenticallytoanindividualPARALLELdirectivebeingimmediatelyfollowedby
aseparateworksharingdirective.
Mostoftherules,clausesandrestrictionsthatapplytobothdirectivesareineffect.SeetheOpenMPAPIfordetails.
AnexampleusingthePARALLELDO/parallelforcombineddirectiveisshownbelow.
FortranPARALLELDODirectiveExample
P R O G R A M
V E C T O R _ A D D
2
I N T E G E R N , I , C H U N K S I Z E , C H U N K
P A R A M E T E R ( N = 1 0 0 0 )
P A R A M E T E R ( C H U N K S I Z E = 1 0 0 )
R E A L
A ( N ) ,
B ( N ) ,
C ( N )
7
!
8
S o m e
D O I
A (
B (
E N D D
C H U N
9
1 0
1 1
1 2
1 3
i n
=
I )
I )
i t i a l i z a t i o n s
1 , N
= I * 1 . 0
= A ( I )
O
K
C H U N K S I Z E
1 4
! $ O M P P A R A L L E L D O
! $ O M P & S H A R E D ( A , B , C , C H U N K ) P R I V A T E ( I )
! $ O M P & S C H E D U L E ( S T A T I C , C H U N K )
1 5
1 6
1 7
1 8
D O
= 1 ,
C ( I ) =
E N D D O
1 9
2 0
2 1
N
A ( I )
B ( I )
2 2
! $ O M P
2 3
E N D
P A R A L L E L
D O
2 4
2 5
E N D
C/C++parallelforDirectiveExample
1
# i n c l u d e < o m p . h >
# d e f i n e N
1 0 0 0
# d e f i n e C H U N K S I Z E
1 0 0
2
3
4
5
m a i n ( i n t
a r g c ,
c h a r
* a r g v [ ] )
6
7
i n t i , c h u n k ;
f l o a t a [ N ] , b [ N ] ,
8
c [ N ] ;
9
1 0
1 1
1 2
1 3
/ * S o m e i n i t i a l i z a t i o n s * /
f o r ( i = 0 ; i < N ; i + + )
a [ i ] = b [ i ] = i * 1 . 0 ;
c h u n k = C H U N K S I Z E ;
1 4
1 5
1 6
1 7
1 8
# p r a
s h
s c
f o
1 9
2 0
g m
a r
h e
r
c [
o
e d (
d u l
( i =
i ]
m p p
a , b ,
e ( s t
0 ; i
= a [
a r a
c , c
a t i
<
i ]
l l
h u
c ,
n ;
+
e l
n k )
c h u
i +
b [ i
f o r \
p r i v a t e ( i )
n k )
+ )
] ;
\
OpenMPDirectives
TASKConstruct
Purpose:
TheTASKconstructdefinesanexplicittask,whichmaybeexecutedbytheencounteringthread,ordeferredfor
executionbyanyotherthreadintheteam.
Thedataenvironmentofthetaskisdeterminedbythedatasharingattributeclauses.
TaskexecutionissubjecttotaskschedulingseetheOpenMP3.1specificationdocumentfordetails.
AlsoseetheOpenMP3.1documentationfortheassociatedtaskyieldandtaskwaitdirectives.
Format:
! $ O M P
T A S K
[ c l a
I F
F I
U N
D E
M E
P R
F I
S H
Fortran
u s
(
N A
T I
F A
R G
I V
R S
A R
e
s c
L
E D
U L
E A
A T
T P
E D
. . . ]
a l a r l o g i c a l e x p r e s s i o n )
( s c a l a r l o g i c a l e x p r e s s i o n )
T
( P R
B L E
E ( l i
R I V A T
( l i s
I V A T E
F I R S T P R I V A T E
S H A R E D
s t )
E ( l i s t )
t )
b l o c k
! $ O M P
E N D
# p r a g m a
T A S K
o m p
t a s k
C/C++
[ c l a
i f
f i
u n
d e
m e
p r
f i
s h
u s
(
n a
t i
f a
r g
i v
r s
a r
e
s c
l
e d
u l
e a
a t
t p
e d
. . . ]
n e w l i n e
a l a r e x p r e s s i o n )
( s c a l a r e x p r e s s i o n )
t
( s h
b l e
e ( l i
r i v a t
( l i s
a r e d
n o n e )
s t )
e ( l i s t )
t )
s t r u c t u r e d _ b l o c k
ClausesandRestrictions:
PleaseconsulttheOpenMP3.1specificationsdocumentfordetails.
OpenMPExercise2
WorkSharingConstructs
Overview:
LogintotheLCworkshopcluster,ifyouarenotalreadyloggedin
WorkSharingDO/forconstructexamples:review,compileandrun
WorkSharingSECTIONSconstructexample:review,compileandrun
GOTOTHEEXERCISEHERE
Approx.20minutes
OpenMPDirectives
N O N E )
SynchronizationConstructs
Considerasimpleexamplewheretwothreadsontwodifferentprocessorsarebothtryingtoincrementavariablexat
thesametime(assumexisinitially0):
THREAD1:
THREAD2:
i n c r e m e n t ( x )
i n c r e m e n t ( x )
{
x
1 ;
}
THREAD1:
THREAD2:
1 0
1 0
2 0
3 0
L O A D A , ( x a d d r e s s )
A D D A , 1
S T O R E A , ( x a d d r e s s )
2 0
3 0
1 ;
L O A D A , ( x a d d r e s s )
A D D A , 1
S T O R E A , ( x a d d r e s s )
Onepossibleexecutionsequence:
1.Thread1loadsthevalueofxintoregisterA.
2.Thread2loadsthevalueofxintoregisterA.
3.Thread1adds1toregisterA
4.Thread2adds1toregisterA
5.Thread1storesregisterAatlocationx
6.Thread2storesregisterAatlocationx
Theresultantvalueofxwillbe1,not2asitshouldbe.
Toavoidasituationlikethis,theincrementingofxmustbesynchronizedbetweenthetwothreadstoensurethatthe
correctresultisproduced.
OpenMPprovidesavarietyofSynchronizationConstructsthatcontrolhowtheexecutionofeachthreadproceeds
relativetootherteamthreads.
OpenMPDirectives
SynchronizationConstructs
MASTERDirective
Purpose:
TheMASTERdirectivespecifiesaregionthatistobeexecutedonlybythemasterthreadoftheteam.Allotherthreads
ontheteamskipthissectionofcode
Thereisnoimpliedbarrierassociatedwiththisdirective
Format:
! $ O M P
M A S T E R
b l o c k
Fortran
! $ O M P
# p r a g m a
E N D
C/C++
M A S T E R
o m p
m a s t e r
n e w l i n e
s t r u c t u r e d _ b l o c k
Restrictions:
ItisillegaltobranchintooroutofMASTERblock.
OpenMPDirectives
SynchronizationConstructs
CRITICALDirective
Purpose:
TheCRITICALdirectivespecifiesaregionofcodethatmustbeexecutedbyonlyonethreadatatime.
Format:
! $ O M P
C R I T I C A L
[
n a m e
b l o c k
Fortran
! $ O M P
E N D
# p r a g m a
C/C++
C R I T I C A L
o m p
c r i t i c a l
n a m e
[
n a m e
n e w l i n e
s t r u c t u r e d _ b l o c k
Notes:
IfathreadiscurrentlyexecutinginsideaCRITICALregionandanotherthreadreachesthatCRITICALregionand
attemptstoexecuteit,itwillblockuntilthefirstthreadexitsthatCRITICALregion.
TheoptionalnameenablesmultipledifferentCRITICALregionstoexist:
Namesactasglobalidentifiers.DifferentCRITICALregionswiththesamenamearetreatedasthesameregion.
AllCRITICALsectionswhichareunnamed,aretreatedasthesamesection.
Restrictions:
ItisillegaltobranchintooroutofaCRITICALblock.
Fortranonly:Thenamesofcriticalconstructsareglobalentitiesoftheprogram.Ifanameconflictswithanyotherentity,
thebehavioroftheprogramisunspecified.
Example:CRITICALConstruct
Allthreadsintheteamwillattempttoexecuteinparallel,however,becauseoftheCRITICALconstructsurroundingthe
incrementofx,onlyonethreadwillbeabletoread/increment/writexatanytime
FortranCRITICALDirectiveExample
1
P R O G R A M
C R I T I C A L
I N T E G E R
X = 0
2
3
4
5
6
! $ O M P
P A R A L L E L
S H A R E D ( X )
! $ O M P
! $ O M P
C R I T I C A L
X = X + 1
E N D C R I T I C A L
! $ O M P
E N D
7
8
9
1 0
1 1
1 2
P A R A L L E L
1 3
1 4
E N D
C/C++criticalDirectiveExample
# i n c l u d e
< o m p . h >
# i n c l u d e
< o m p . h >
m a i n ( i n t
a r g c ,
2
3
c h a r
* a r g v [ ] )
4
5
i n t
x ;
0 ;
7
8
9
# p r a g m a
{
o m p
p a r a l l e l
s h a r e d ( x )
1 0
1 1
# p r a g m a
x = x +
1 2
o m p
1 ;
c r i t i c a l
1 3
1 4
/ *
e n d
o f
p a r a l l e l
r e g i o n
* /
1 5
1 6
OpenMPDirectives
SynchronizationConstructs
BARRIERDirective
Purpose:
TheBARRIERdirectivesynchronizesallthreadsintheteam.
WhenaBARRIERdirectiveisreached,athreadwillwaitatthatpointuntilallotherthreadshavereachedthatbarrier.All
threadsthenresumeexecutinginparallelthecodethatfollowsthebarrier.
Format:
Fortran
C/C++
! $ O M P
B A R R I E R
# p r a g m a
o m p
b a r r i e r
n e w l i n e
Restrictions:
Allthreadsinateam(ornone)mustexecutetheBARRIERregion.
Thesequenceofworksharingregionsandbarrierregionsencounteredmustbethesameforeverythreadinateam.
OpenMPDirectives
SynchronizationConstructs
TASKWAITDirective
Purpose:
OpenMP3.1feature
TheTASKWAITconstructspecifiesawaitonthecompletionofchildtasksgeneratedsincethebeginningofthecurrent
task.
Format:
Fortran
C/C++
! $ O M P
# p r a g m a
T A S K W A I T
o m p
t a s k w a i t
n e w l i n e
Restrictions:
BecausethetaskwaitconstructdoesnothaveaClanguagestatementaspartofitssyntax,therearesomerestrictions
onitsplacementwithinaprogram.Thetaskwaitdirectivemaybeplacedonlyatapointwhereabaselanguage
statementisallowed.Thetaskwaitdirectivemaynotbeusedinplaceofthestatementfollowinganif,while,do,switch,
orlabel.SeetheOpenMP3.1specificationsdocumentfordetails.
OpenMPDirectives
SynchronizationConstructs
ATOMICDirective
Purpose:
TheATOMICdirectivespecifiesthataspecificmemorylocationmustbeupdatedatomically,ratherthanlettingmultiple
threadsattempttowritetoit.Inessence,thisdirectiveprovidesaminiCRITICALsection.
Format:
! $ O M P
Fortran
A T O M I C
s t a t e m e n t _ e x p r e s s i o n
# p r a g m a
C/C++
o m p
a t o m i c
n e w l i n e
s t a t e m e n t _ e x p r e s s i o n
Restrictions:
Thedirectiveappliesonlytoasingle,immediatelyfollowingstatement
Anatomicstatementmustfollowaspecificsyntax.SeethemostrecentOpenMPspecsforthis.
OpenMPDirectives
SynchronizationConstructs
FLUSHDirective
Purpose:
TheFLUSHdirectiveidentifiesasynchronizationpointatwhichtheimplementationmustprovideaconsistentviewof
memory.Threadvisiblevariablesarewrittenbacktomemoryatthispoint.
ThereisafairamountofdiscussiononthisdirectivewithinOpenMPcirclesthatyoumaywishtoconsultformore
information.Someofitishardtounderstand?PertheAPI:
Iftheintersectionoftheflushsetsoftwoflushesperformedbytwodifferentthreadsisnonempty,thenthetwo
flushesmustbecompletedasifinsomesequentialorder,seenbyallthreads.
Saywhat?
Toquotefromtheopenmp.orgFAQ:
Q17:Isthe!$ompflushdirectivenecessaryonacachecoherentsystem?
A17:Yestheflushdirectiveisnecessary.LookintheOpenMPspecificationsforexamplesofit'suses.Thedirectiveis
necessarytoinstructthecompilerthatthevariablemustbewrittento/readfromthememorysystem,i.e.thatthe
variablecannotbekeptinalocalCPUregisterovertheflush"statement"inyourcode.
CachecoherencymakescertainthatifoneCPUexecutesareadorwriteinstructionfrom/tomemory,thenallother
CPUsinthesystemwillgetthesamevaluefromthatmemoryaddresswhentheyaccessit.Allcacheswillshowa
coherentvalue.However,intheOpenMPstandardtheremustbeawaytoinstructthecompilertoactuallyinsertthe
read/writemachineinstructionandnotpostponeit.Keepingavariableinaregisterinaloopisverycommonwhen
producingefficientmachinelanguagecodeforaloop.
AlsoseethemostrecentOpenMPspecsfordetails.
Format:
Fortran
C/C++
! $ O M P
F L U S H
# p r a g m a
( l i s t )
o m p
f l u s h
( l i s t )
n e w l i n e
Notes:
Theoptionallistcontainsalistofnamedvariablesthatwillbeflushedinordertoavoidflushingallvariables.Forpointers
inthelist,notethatthepointeritselfisflushed,nottheobjectitpointsto.
Implementationsmustensureanypriormodificationstothreadvisiblevariablesarevisibletoallthreadsafterthispoint
ie.compilersmustrestorevaluesfromregisterstomemory,hardwaremightneedtoflushwritebuffers,etc
TheFLUSHdirectiveisimpliedforthedirectivesshowninthetablebelow.ThedirectiveisnotimpliedifaNOWAIT
clauseispresent.
Fortran
C/C++
BARRIER
ENDPARALLEL
CRITICALandENDCRITICAL
ENDDO
ENDSECTIONS
ENDSINGLE
ORDEREDandENDORDERED
b a r
p a r
c r i
o r d
f o r
s e c
s i n
r i e r
a l l e l uponentryandexit
t i c a l uponentryandexit
e r e d uponentryandexit
uponexit
t i o n s uponexit
g l e uponexit
OpenMPDirectives
SynchronizationConstructs
ORDEREDDirective
Purpose:
TheORDEREDdirectivespecifiesthatiterationsoftheenclosedloopwillbeexecutedinthesameorderasiftheywere
executedonaserialprocessor.
Threadswillneedtowaitbeforeexecutingtheirchunkofiterationsifpreviousiterationshaven'tcompletedyet.
UsedwithinaDO/forloopwithanORDEREDclause
TheORDEREDdirectiveprovidesawayto"finetune"whereorderingistobeappliedwithinaloop.Otherwise,itisnot
required.
Format:
! $ O M P D O
( l o o p
! $ O M P
Fortran
O R D E R E D
r e g i o n )
[ c l a u s e s . . . ]
O R D E R E D
( b l o c k )
! $ O M P
E N D
O R D E R E D
( e n d o f l o o p
! $ O M P E N D D O
r e g i o n )
# p r a g m a o m p f o r o r d e r e d
( l o o p r e g i o n )
C/C++
# p r a g m a
o m p
o r d e r e d
n e w l i n e
s t r u c t u r e d _ b l o c k
( e n d o
o f
l o o p
[ c l a u s e s . . . ]
r e g i o n )
Restrictions:
AnORDEREDdirectivecanonlyappearinthedynamicextentofthefollowingdirectives:
DOorPARALLELDO(Fortran)
f o r orp a r a l l e l f o r (C/C++)
Onlyonethreadisallowedinanorderedsectionatanytime
ItisillegaltobranchintooroutofanORDEREDblock.
AniterationofaloopmustnotexecutethesameORDEREDdirectivemorethanonce,anditmustnotexecutemore
thanoneORDEREDdirective.
AloopwhichcontainsanORDEREDdirective,mustbealoopwithanORDEREDclause.
OpenMPDirectives
THREADPRIVATEDirective
Purpose:
TheTHREADPRIVATEdirectiveisusedtomakeglobalfilescopevariables(C/C++)orcommonblocks(Fortran)local
andpersistenttoathreadthroughtheexecutionofmultipleparallelregions.
Format:
Fortran
C/C++
! $ O M P
T H R E A D P R I V A T E
# p r a g m a
o m p
( / c b / ,
t h r e a d p r i v a t e
. . . )
c b
i s
t h e
n a m e
o f
c o m m o n
b l o c k
( l i s t )
Notes:
Thedirectivemustappearafterthedeclarationoflistedvariables/commonblocks.Eachthreadthengetsitsowncopyof
thevariable/commonblock,sodatawrittenbyonethreadisnotvisibletootherthreads.Forexample:
FortranTHREADPRIVATEDirectiveExample
1
P R O G R A M
T H R E A D P R I V
2
3
I N T E G E R A , B ,
R E A L * 4 X
C O M M O N / C 1 / A
4
5
I ,
T I D ,
O M P _ G E T _ T H R E A D _ N U M
6
7
! $ O M P
T H R E A D P R I V A T E ( / C 1 / ,
X )
E x p l i c i t l y t u r n o f f d y n a m i c t h r e a d s
C A L L O M P _ S E T _ D Y N A M I C ( . F A L S E . )
8
9
1 0
1 1
1 2
1 3
! $ O M P
1 4
1 5
1 6
1 7
1 8
1 9
! $ O M P
P R
P A
T I
A
B
X
P R
E N
D
I N
R A
D
=
=
=
I N
T
L L
=
T I
T I
1 .
T
P A
* , ' 1
E L P R
O M P _ G
D
D
1 * T
* , ' T
R A L L E
s t P a r a l l e l R e g i o n : '
I V A T E ( B , T I D )
E T _ T H R E A D _ N U M ( )
I D
+ 1 . 0
h r e a d ' , T I D , ' :
A , B , X = ' , A , B , X
2 0
2 1
2 2
2 3
P R I N T
P R I N T
P R I N T
* ,
* ,
' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * '
' M a s t e r t h r e a d d o i n g s e r i a l w o r k h e r e '
' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * '
P R I N T
* ,
' 2 n d
* ,
2 4
2 5
! $ O M P
P A R A L L E L
P a r a l l e l
P R I V A T E ( T I D )
R e g i o n :
'
2 6
! $ O M P
P A
T I
P R
E N
2 7
2 8
2 9
! $ O M P
R A L L E L P R
D = O M P _ G
I N T * , ' T
D P A R A L L E
I V A T E ( T I D )
E T _ T H R E A D _ N U M ( )
h r e a d ' , T I D , ' :
A , B , X = ' , A , B , X
3 0
3 1
E N D
O u t p u t :
1 s t
T h r
T h r
T h r
T h r
* * *
M a s
* * *
2 n d
T h r
T h r
T h r
T h r
P a
e a d
e a d
e a d
e a d
* * *
t e r
* * *
P a
e a d
e a d
e a d
e a d
r a l
0
1
3
2
* * *
t h
* * *
r a l
0
2
3
1
l e l
R e g
A , B
:
A , B
:
A , B
:
A , B
* * * * * * *
r e a d d o
* * * * * * *
l e l R e g
:
A , B
:
A , B
:
A , B
:
A , B
:
i o n
, X =
, X =
, X =
, X =
* * *
i n g
* * *
i o n
, X =
, X =
, X =
, X =
:
0
1 . 0
2 . 0
3 4 . 3
2 3 . 2
* * * * * *
e r i a l
* * * * * *
1
0 0 0
9 9 9
0 0 0
0 0 0
* * *
w o r
* * *
0 0 0
9 9 9
0 0 1
0 0 0
* * *
0 0 0
0 0 0
0 0 0
9 9 9
0 0 0
0 0 0
0 0 1
9 9 9
2
* *
s
* *
0 0
0 5
9 1
4 8
* * *
h e r e
* * * * * *
k
:
0
0
2
0
3
0
1
1 . 0
3 . 2
4 . 3
2 . 0
0 0
4 8
9 1
0 5
C/C++threadprivateDirectiveExample
1
# i n c l u d e
< o m p . h >
2
3
i n t
a , b ,
f l o a t x ;
4
i ,
t i d ;
5
6
# p r a g m a
o m p
t h r e a d p r i v a t e ( a ,
x )
7
8
m a i n ( i n t
a r g c ,
c h a r
* a r g v [ ] )
9
1 0
1 1
/ *
E x p l i c i t l y t u r n o f f
o m p _ s e t _ d y n a m i c ( 0 ) ;
d y n a m i c
t h r e a d s
* /
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
p r
# p r a
{
t i
a
b
x
p r
}
i n t f ( " 1 s t P a r a l l e l R e g i o n : \ n " ) ;
g m a o m p p a r a l l e l p r i v a t e ( b , t i d )
d
=
=
o m p _
;
;
*
" T h
n d
t i d
= t i d
= 1 . 1
i n t f (
/ * e
g e t _ t h r e a d _ n u m ( ) ;
t i d + 1 . 0 ;
r e a d % d :
a , b , x = % d % d
o f p a r a l l e l r e g i o n * /
% f \ n " , t i d , a , b , x ) ;
2 2
2 3
2 4
2 5
p r i n t f ( " * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * \ n " ) ;
p r i n t f ( " M a s t e r t h r e a d d o i n g s e r i a l w o r k h e r e \ n " ) ;
p r i n t f ( " * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * \ n " ) ;
2 6
2 7
2 8
2 9
3 0
3 1
3 2
p r
# p r a
{
t i
p r
}
i n t f ( " 2 n d P a r a l l e l R e g i o n : \ n " ) ;
g m a o m p p a r a l l e l p r i v a t e ( t i d )
d
= o m p _ g e t _ t h r e a d _ n u m ( ) ;
i n t f ( " T h r e a d % d :
a , b , x = % d % d
/ * e n d o f p a r a l l e l r e g i o n * /
% f \ n " , t i d , a , b , x ) ;
3 3
3 4
O u t p u t :
1 s t
T h r
T h r
T h r
T h r
* * *
M a s
* * *
2 n d
T h r
T h r
T h r
T h r
P a
e a d
e a d
e a d
e a d
* * *
t e r
* * *
P a
e a d
e a d
e a d
e a d
r a l
0 :
2 :
3 :
1 :
* * *
t h
* * *
r a l
0 :
3 :
1 :
2 :
l e l
* *
r e
* *
l e
R e
a , b
a , b
a , b
a , b
* * * *
a d d
* * * *
l R e
a , b
a , b
a , b
a , b
g i o
, x =
, x =
, x =
, x =
* * *
o i n
* * *
g i o
, x =
, x =
, x =
, x =
n :
0
1 . 0
3 . 2
3 4 . 3
1 2 . 1
* * * * * *
s e r i a l
* * * * * *
2
3
1
* *
g
* *
0 0 0
0 0 0
0 0 0
0 0 0
* * *
w o
* * *
0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0
0 0
0 0
0 0
* * * * * * *
r k h e r e
* * * * * * *
n :
0
0
3
0
1
0
2
1 . 0
4 . 3
2 . 1
3 . 2
0 0
0 0
0 0
Onfirstentrytoaparallelregion,datainTHREADPRIVATEvariablesandcommonblocksshouldbeassumed
undefined,unlessaCOPYINclauseisspecifiedinthePARALLELdirective
THREADPRIVATEvariablesdifferfromPRIVATEvariables(discussedlater)becausetheyareabletopersistbetween
differentparallelregionsofacode.
Restrictions:
DatainTHREADPRIVATEobjectsisguaranteedtopersistonlyifthedynamicthreadsmechanismis"turnedoff"andthe
numberofthreadsindifferentparallelregionsremainsconstant.Thedefaultsettingofdynamicthreadsisundefined.
TheTHREADPRIVATEdirectivemustappearaftereverydeclarationofathreadprivatevariable/commonblock.
Fortran:onlynamedcommonblockscanbemadeTHREADPRIVATE.
OpenMPDirectives
DataScopeAttributeClauses
AlsocalledDatasharingAttributeClauses
AnimportantconsiderationforOpenMPprogrammingistheunderstandinganduseofdatascoping
BecauseOpenMPisbaseduponthesharedmemoryprogrammingmodel,mostvariablesaresharedbydefault
Globalvariablesinclude:
Fortran:COMMONblocks,SAVEvariables,MODULEvariables
C:Filescopevariables,static
Privatevariablesinclude:
Loopindexvariables
Stackvariablesinsubroutinescalledfromparallelregions
Fortran:Automaticvariableswithinastatementblock
TheOpenMPDataScopeAttributeClausesareusedtoexplicitlydefinehowvariablesshouldbescoped.Theyinclude:
PRIVATE
FIRSTPRIVATE
LASTPRIVATE
SHARED
DEFAULT
REDUCTION
COPYIN
DataScopeAttributeClausesareusedinconjunctionwithseveraldirectives(PARALLEL,DO/for,andSECTIONS)to
controlthescopingofenclosedvariables.
Theseconstructsprovidetheabilitytocontrolthedataenvironmentduringexecutionofparallelconstructs.
Theydefinehowandwhichdatavariablesintheserialsectionoftheprogramaretransferredtotheparallel
regionsoftheprogram(andback)
Theydefinewhichvariableswillbevisibletoallthreadsintheparallelregionsandwhichvariableswillbeprivately
allocatedtoallthreads.
DataScopeAttributeClausesareeffectiveonlywithintheirlexical/staticextent.
Important:PleaseconsultthelatestOpenMPspecsforimportantdetailsanddiscussiononthistopic.
AClauses/DirectivesSummaryTableisprovidedforconvenience.
PRIVATEClause
Purpose:
ThePRIVATEclausedeclaresvariablesinitslisttobeprivatetoeachthread.
Format:
Fortran
C/C++
P R I V A T E
( l i s t )
p r i v a t e
( l i s t )
Notes:
PRIVATEvariablesbehaveasfollows:
Anewobjectofthesametypeisdeclaredonceforeachthreadintheteam
Allreferencestotheoriginalobjectarereplacedwithreferencestothenewobject
VariablesdeclaredPRIVATEshouldbeassumedtobeuninitializedforeachthread
ComparisonbetweenPRIVATEandTHREADPRIVATE:
PRIVATE
THREADPRIVATE
DataItem
C/C++:variable
Fortran:variableorcommonblock
C/C++:variable
Fortran:commonblock
Where
Declared
Atstartofregionorworksharinggroup
Indeclarationsofeachroutineusingblockor
globalfilescope
Persistent?
No
Yes
Extent
Lexicalonlyunlesspassedasan
argumenttosubroutine
Dynamic
Initialized
UseFIRSTPRIVATE
UseCOPYIN
SHAREDClause
Purpose:
TheSHAREDclausedeclaresvariablesinitslisttobesharedamongallthreadsintheteam.
Format:
Fortran
C/C++
S H A R E D
( l i s t )
s h a r e d
( l i s t )
Notes:
Asharedvariableexistsinonlyonememorylocationandallthreadscanreadorwritetothataddress
Itistheprogrammer'sresponsibilitytoensurethatmultiplethreadsproperlyaccessSHAREDvariables(suchasvia
CRITICALsections)
DEFAULTClause
Purpose:
TheDEFAULTclauseallowstheusertospecifyadefaultscopeforallvariablesinthelexicalextentofanyparallel
region.
Format:
Fortran
C/C++
D E F A U L T
( P R I V A T E
d e f a u l t
( s h a r e d
|
|
F I R S T P R I V A T E
S H A R E D
N O N E )
n o n e )
Notes:
SpecificvariablescanbeexemptedfromthedefaultusingthePRIVATE,SHARED,FIRSTPRIVATE,LASTPRIVATE,
andREDUCTIONclauses
TheC/C++OpenMPspecificationdoesnotincludeprivateorfirstprivateasapossibledefault.However,actual
implementationsmayprovidethisoption.
UsingNONEasadefaultrequiresthattheprogrammerexplicitlyscopeallvariables.
Restrictions:
OnlyoneDEFAULTclausecanbespecifiedonaPARALLELdirective
FIRSTPRIVATEClause
Purpose:
TheFIRSTPRIVATEclausecombinesthebehaviorofthePRIVATEclausewithautomaticinitializationofthevariablesin
itslist.
Format:
Fortran
C/C++
F I R S T P R I V A T E
( l i s t )
f i r s t p r i v a t e
( l i s t )
Notes:
Listedvariablesareinitializedaccordingtothevalueoftheiroriginalobjectspriortoentryintotheparallelorwork
sharingconstruct.
LASTPRIVATEClause
Purpose:
TheLASTPRIVATEclausecombinesthebehaviorofthePRIVATEclausewithacopyfromthelastloopiterationor
sectiontotheoriginalvariableobject.
Format:
Fortran
C/C++
L A S T P R I V A T E
( l i s t )
l a s t p r i v a t e
( l i s t )
Notes:
Thevaluecopiedbackintotheoriginalvariableobjectisobtainedfromthelast(sequentially)iterationorsectionofthe
enclosingconstruct.
Forexample,theteammemberwhichexecutesthefinaliterationforaDOsection,ortheteammemberwhichdoesthe
lastSECTIONofaSECTIONScontextperformsthecopywithitsownvalues
COPYINClause
Purpose:
TheCOPYINclauseprovidesameansforassigningthesamevaluetoTHREADPRIVATEvariablesforallthreadsinthe
team.
Format:
Fortran
C/C++
C O P Y I N
( l i s t )
c o p y i n
( l i s t )
Notes:
Listcontainsthenamesofvariablestocopy.InFortran,thelistcancontainboththenamesofcommonblocksand
namedvariables.
Themasterthreadvariableisusedasthecopysource.Theteamthreadsareinitializedwithitsvalueuponentryintothe
parallelconstruct.
COPYPRIVATEClause
Purpose:
TheCOPYPRIVATEclausecanbeusedtobroadcastvaluesacquiredbyasinglethreaddirectlytoallinstancesofthe
privatevariablesintheotherthreads.
AssociatedwiththeSINGLEdirective
SeethemostrecentOpenMPspecsdocumentforadditionaldiscussionandexamples.
Format:
Fortran
C/C++
C O P Y P R I V A T E
c o p y p r i v a t e
( l i s t )
( l i s t )
REDUCTIONClause
Purpose:
TheREDUCTIONclauseperformsareductiononthevariablesthatappearinitslist.
Aprivatecopyforeachlistvariableiscreatedforeachthread.Attheendofthereduction,thereductionvariableis
appliedtoallprivatecopiesofthesharedvariable,andthefinalresultiswrittentotheglobalsharedvariable.
Format:
Fortran
C/C++
R E D U C T I O N
( o p e r a t o r | i n t r i n s i c :
r e d u c t i o n
( o p e r a t o r :
l i s t )
l i s t )
Example:REDUCTIONVectorDotProduct:
Iterationsoftheparallelloopwillbedistributedinequalsizedblockstoeachthreadintheteam(SCHEDULESTATIC)
Attheendoftheparallelloopconstruct,allthreadswilladdtheirvaluesof"result"toupdatethemasterthread'sglobal
copy.
FortranREDUCTIONClauseExample
1
P R O G R A M
D O T _ P R O D U C T
I N T
P A R
P A R
R E A
N ,
2
3
4
5
6
E G E R
A M E T E
A M E T E
L A ( N
C H
( N =
R ( C H
) , B (
U N K S I Z E , C H U N K ,
1 0 0 )
U N K S I Z E = 1 0 )
N ) , R E S U L T
7
8
S o m e
D O I
A (
B (
E N D D
R E S U
C H U N
9
1 0
1 1
1 2
1 3
1 4
i n
=
I )
I )
i t i a l i z a t i o n s
1 , N
= I * 1 . 0
= I * 2 . 0
O
L T =
0 . 0
C H U N K S I Z E
A L L
A U L
E D U
U C T
E L
1 5
1 6
! $ O
! $ O
! $ O
! $ O
1 7
1 8
1 9
M P
P A R
D E F
S C H
R E D
M P &
M P &
M P &
D O
T ( S H A R E D ) P R I V A T E ( I )
L E ( S T A T I C , C H U N K )
I O N ( + : R E S U L T )
2 0
2 1
D O
I = 1 , N
R E S U L T = R E S U L T
E N D D O
2 2
2 3
( A ( I )
B ( I ) )
2 4
2 5
! $ O M P
E N D
P A R A L L E L
D O
2 6
2 7
P R I N T
E N D
2 8
* ,
' F i n a l
R e s u l t =
' ,
R E S U L T
C/C++reductionClauseExample
1
# i n c l u d e
< o m p . h >
m a i n ( i n t
a r g c ,
2
3
c h a r
* a r g v [ ] )
4
5
i n t
f l o a t
i ,
/ *
n , c h u n k ;
a [ 1 0 0 ] , b [ 1 0 0 ] ,
r e s u l t ;
1 0
S o m
1 0
n k
u l t
( i
[ i ]
[ i ]
=
1 2
c h u
r e s
f o r
1 3
1 4
1 5
1 1
i n i t i a l i z a t i o n s
* /
0 ;
=
1 0 ;
0 . 0 ;
= 0 ; i < n ; i + + )
= i * 1 . 0 ;
= i * 2 . 0 ;
=
1 6
1 7
1 8
1 9
2 0
# p r a
d e
s c
r e
g m a
f a u l
h e d u
d u c t
o m p
t ( s h
l e ( s
i o n (
p a r
a r e
t a t
+ : r
a l l
d )
i c ,
e s u
e l
f o r
p r i v a t e ( i )
c h u n k )
l t )
\
\
\
2 1
2 2
2 3
f o r
( i = 0 ; i < n ; i + + )
r e s u l t = r e s u l t + ( a [ i ]
*
b [ i ] ) ;
2 4
p r i n t f ( " F i n a l
r e s u l t =
% f \ n " , r e s u l t ) ;
2 5
p r i n t f ( " F i n a l
r e s u l t =
% f \ n " , r e s u l t ) ;
2 6
2 7
Restrictions:
Variablesinthelistmustbenamedscalarvariables.Theycannotbearrayorstructuretypevariables.Theymustalso
bedeclaredSHAREDintheenclosingcontext.
Reductionoperationsmaynotbeassociativeforrealnumbers.
TheREDUCTIONclauseisintendedtobeusedonaregionorworksharingconstructinwhichthereductionvariableis
usedonlyinstatementswhichhaveoneoffollowingforms:
Fortran
C/C++
x=xoperatorexpr
x=exproperatorx(exceptsubtraction)
x=intrinsic(x,expr)
x=intrinsic(expr,x)
xisascalarvariableinthelist
exprisascalarexpressionthatdoesnot
referencex
intrinsicisoneofMAX,MIN,IAND,IOR,IEOR
operatorisoneof+,*,,.AND.,.OR.,.EQV.,
.NEQV.
x=xopexpr
x=expropx(exceptsubtraction)
xbinop=expr
x++
++x
x
x
xisascalarvariableinthelist
exprisascalarexpressionthatdoesnotreferencex
opisnotoverloaded,andisoneof+,*,,/,&,^,|,
&&,||
binopisnotoverloaded,andisoneof+,*,,/,&,^,|
OpenMPDirectives
Clauses/DirectivesSummary
ThetablebelowsummarizeswhichclausesareacceptedbywhichOpenMPdirectives.
Directive
Clause
IF
PRIVATE
SHARED
DEFAULT
FIRSTPRIVATE
LASTPRIVATE
REDUCTION
COPYIN
COPYPRIVATE
SCHEDULE
ORDERED
NOWAIT
ThefollowingOpenMPdirectivesdonotacceptclauses:
MASTER
CRITICAL
BARRIER
ATOMIC
FLUSH
ORDERED
THREADPRIVATE
Implementationsmay(anddo)differfromthestandardinwhichclausesaresupportedbyeachdirective.
OpenMPDirectives
DirectiveBindingandNestingRules
ThissectionisprovidedmainlyasaquickreferenceonruleswhichgovernOpenMPdirectivesandbinding.Users
shouldconsulttheirimplementationdocumentationandtheOpenMPstandardforotherrulesandrestrictions.
Unlessindicatedotherwise,rulesapplytobothFortranandC/C++OpenMPimplementations.
Note:theFortranAPIalsodefinesanumberofDataEnvironmentrules.Thosehavenotbeenreproducedhere.
DirectiveBinding:
TheDO/for,SECTIONS,SINGLE,MASTERandBARRIERdirectivesbindtothedynamicallyenclosingPARALLEL,if
oneexists.Ifnoparallelregioniscurrentlybeingexecuted,thedirectiveshavenoeffect.
TheORDEREDdirectivebindstothedynamicallyenclosingDO/for.
TheATOMICdirectiveenforcesexclusiveaccesswithrespecttoATOMICdirectivesinallthreads,notjustthecurrent
team.
TheCRITICALdirectiveenforcesexclusiveaccesswithrespecttoCRITICALdirectivesinallthreads,notjustthe
currentteam.
AdirectivecanneverbindtoanydirectiveoutsidetheclosestenclosingPARALLEL.
DirectiveNesting:
Aworksharingregionmaynotbecloselynestedinsideaworksharing,explicittask,critical,ordered,atomic,ormaster
region.
Abarrierregionmaynotbecloselynestedinsideaworksharing,explicittask,critical,ordered,atomic,ormasterregion.
Amasterregionmaynotbecloselynestedinsideaworksharing,atomic,orexplicittaskregion.
Anorderedregionmaynotbecloselynestedinsideacritical,atomic,orexplicittaskregion.
Anorderedregionmustbecloselynestedinsidealoopregion(orparallelloopregion)withanorderedclause.
Acriticalregionmaynotbenested(closelyorotherwise)insideacriticalregionwiththesamename.Notethatthis
restrictionisnotsufficienttopreventdeadlock.
parallel,flush,critical,atomic,taskyield,andexplicittaskregionsmaynotbecloselynestedinsideanatomicregion.
RunTimeLibraryRoutines
Overview:
TheOpenMPAPIincludesanevergrowingnumberofruntimelibraryroutines.
Theseroutinesareusedforavarietyofpurposesasshowninthetablebelow:
Routine
Purpose
OMP_SET_NUM_THREADS
Setsthenumberofthreadsthatwillbeusedinthenext
parallelregion
OMP_GET_NUM_THREADS
Returnsthenumberofthreadsthatarecurrentlyintheteam
executingtheparallelregionfromwhichitiscalled
OMP_GET_MAX_THREADS
Returnsthemaximumvaluethatcanbereturnedbyacallto
theOMP_GET_NUM_THREADSfunction
OMP_GET_THREAD_NUM
Returnsthethreadnumberofthethread,withintheteam,
makingthiscall.
OMP_GET_THREAD_LIMIT
ReturnsthemaximumnumberofOpenMPthreadsavailableto
aprogram
OMP_GET_NUM_PROCS
Returnsthenumberofprocessorsthatareavailabletothe
program
OMP_IN_PARALLEL
Usedtodetermineifthesectionofcodewhichisexecutingis
parallelornot
OMP_SET_DYNAMIC
Enablesordisablesdynamicadjustment(bytheruntime
system)ofthenumberofthreadsavailableforexecutionof
parallelregions
OMP_GET_DYNAMIC
Usedtodetermineifdynamicthreadadjustmentisenabledor
not
OMP_SET_NESTED
Usedtoenableordisablenestedparallelism
OMP_GET_NESTED
Usedtodetermineifnestedparallelismisenabledornot
OMP_SET_SCHEDULE
Setstheloopschedulingpolicywhen"runtime"isusedasthe
schedulekindintheOpenMPdirective
OMP_GET_SCHEDULE
Returnstheloopschedulingpolicywhen"runtime"isusedas
theschedulekindintheOpenMPdirective
OMP_SET_MAX_ACTIVE_LEVELS
Setsthemaximumnumberofnestedparallelregions
OMP_GET_MAX_ACTIVE_LEVELS
Returnsthemaximumnumberofnestedparallelregions
OMP_GET_LEVEL
Returnsthecurrentlevelofnestedparallelregions
OMP_GET_ANCESTOR_THREAD_NUM Returns,foragivennestedlevelofthecurrentthread,the
threadnumberofancestorthread
OMP_GET_TEAM_SIZE
Returns,foragivennestedlevelofthecurrentthread,thesize
ofthethreadteam
OMP_GET_ACTIVE_LEVEL
Returnsthenumberofnested,activeparallelregions
enclosingthetaskthatcontainsthecall
OMP_IN_FINAL
Returnstrueiftheroutineisexecutedinthefinaltaskregion
otherwiseitreturnsfalse
OMP_INIT_LOCK
Initializesalockassociatedwiththelockvariable
OMP_DESTROY_LOCK
Disassociatesthegivenlockvariablefromanylocks
OMP_SET_LOCK
Acquiresownershipofalock
OMP_UNSET_LOCK
Releasesalock
OMP_TEST_LOCK
Attemptstosetalock,butdoesnotblockifthelockis
unavailable
OMP_INIT_NEST_LOCK
Initializesanestedlockassociatedwiththelockvariable
OMP_DESTROY_NEST_LOCK
Disassociatesthegivennestedlockvariablefromanylocks
OMP_SET_NEST_LOCK
Acquiresownershipofanestedlock
OMP_UNSET_NEST_LOCK
Releasesanestedlock
OMP_TEST_NEST_LOCK
Attemptstosetanestedlock,butdoesnotblockifthelockis
unavailable
OMP_GET_WTIME
Providesaportablewallclocktimingroutine
OMP_GET_WTICK
Returnsadoubleprecisionfloatingpointvalueequaltothe
numberofsecondsbetweensuccessiveclockticks
ForC/C++,alloftheruntimelibraryroutinesareactualsubroutines.ForFortran,someareactuallyfunctions,andsome
aresubroutines.Forexample:
Fortran I N T E G E R
C/C++
F U N C T I O N
O M P _ G E T _ N U M _ T H R E A D S ( )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ n u m _ t h r e a d s ( v o i d )
EnvironmentVariables
OpenMPprovidesthefollowingenvironmentvariablesforcontrollingtheexecutionofparallelcode.
Allenvironmentvariablenamesareuppercase.Thevaluesassignedtothemarenotcasesensitive.
OMP_SCHEDULE
AppliesonlytoDO,PARALLELDO(Fortran)andf o r , p a r a l l e l f o r (C/C++)directiveswhichhavetheirschedule
clausesettoRUNTIME.Thevalueofthisvariabledetermineshowiterationsofthelooparescheduledonprocessors.
Forexample:
s e t e n v
s e t e n v
O M P _ S C H E D U L E
O M P _ S C H E D U L E
" g u i d e d , 4 "
" d y n a m i c "
OMP_NUM_THREADS
Setsthemaximumnumberofthreadstouseduringexecution.Forexample:
s e t e n v
O M P _ N U M _ T H R E A D S
OMP_DYNAMIC
Enablesordisablesdynamicadjustmentofthenumberofthreadsavailableforexecutionofparallelregions.Valid
valuesareTRUEorFALSE.Forexample:
s e t e n v
O M P _ D Y N A M I C
T R U E
Implementationnotes:
Yourimplementationmayormaynotsupportthisfeature.
OMP_PROC_BIND
Enablesordisablesthreadsbindingtoprocessors.ValidvaluesareTRUEorFALSE.Forexample:
s e t e n v
O M P _ P R O C _ B I N D
T R U E
Implementationnotes:
Yourimplementationmayormaynotsupportthisfeature.
OMP_NESTED
Enablesordisablesnestedparallelism.ValidvaluesareTRUEorFALSE.Forexample:
s e t e n v
O M P _ N E S T E D
T R U E
Implementationnotes:
Yourimplementationmayormaynotsupportthisfeature.Ifnestedparallelismissupported,itisoftenonly
nominal,inthatanestedparallelregionmayonlyhaveonethread.
OMP_STACKSIZE
Controlsthesizeofthestackforcreated(nonMaster)threads.Examples:
s e t
s e t
s e t
s e t
s e t
s e t
s e t
e n v
e n v
e n v
e n v
e n v
e n v
e n v
O M P
O M P
O M P
O M P
O M P
O M P
O M P
_ S T
_ S T
_ S T
_ S T
_ S T
_ S T
_ S T
A C K
A C K
A C K
A C K
A C K
A C K
A C K
S I Z
S I Z
S I Z
S I Z
S I Z
S I Z
S I Z
2 0 0 0
" 3 0 0
1 0 M
" 1 0
" 2 0
" 1 G
2 0 0 0
E
E
E
E
E
E
5 0 0 B
0 k "
M
"
"
"
0
Implementationnotes:
Yourimplementationmayormaynotsupportthisfeature.
OMP_WAIT_POLICY
ProvidesahinttoanOpenMPimplementationaboutthedesiredbehaviorofwaitingthreads.AcompliantOpenMP
implementationmayormaynotabidebythesettingoftheenvironmentvariable.ValidvaluesareACTIVEand
PASSIVE.ACTIVEspecifiesthatwaitingthreadsshouldmostlybeactive,i.e.,consumeprocessorcycles,whilewaiting.
PASSIVEspecifiesthatwaitingthreadsshouldmostlybepassive,i.e.,notconsumeprocessorcycles,whilewaiting.The
detailsoftheACTIVEandPASSIVEbehaviorsareimplementationdefined.Examples:
s e t
s e t
s e t
s e t
e n v
e n v
e n v
e n v
O M P
O M P
O M P
O M P
_ W A
_ W A
_ W A
_ W A
I T _
I T _
I T _
I T _
P O L
P O L
P O L
P O L
I C Y
I C Y
I C Y
I C Y
A C T
a c t
P A S
p a s
I V E
i v e
S I V E
s i v e
Implementationnotes:
Yourimplementationmayormaynotsupportthisfeature.
OMP_MAX_ACTIVE_LEVELS
Controlsthemaximumnumberofnestedactiveparallelregions.Thevalueofthisenvironmentvariablemustbeanon
negativeinteger.Thebehavioroftheprogramisimplementationdefinediftherequestedvalueof
OMP_MAX_ACTIVE_LEVELSisgreaterthanthemaximumnumberofnestedactiveparallellevelsanimplementation
cansupport,orifthevalueisnotanonnegativeinteger.Example:
s e t e n v
O M P _ M A X _ A C T I V E _ L E V E L S
Implementationnotes:
Yourimplementationmayormaynotsupportthisfeature.
OMP_THREAD_LIMIT
SetsthenumberofOpenMPthreadstouseforthewholeOpenMPprogram.Thevalueofthisenvironmentvariable
mustbeapositiveinteger.Thebehavioroftheprogramisimplementationdefinediftherequestedvalueof
OMP_THREAD_LIMITisgreaterthanthenumberofthreadsanimplementationcansupport,orifthevalueisnota
positiveinteger.Example:
s e t e n v
O M P _ T H R E A D _ L I M I T
Implementationnotes:
Yourimplementationmayormaynotsupportthisfeature.
ThreadStackSizeandThreadBinding
ThreadStackSize:
TheOpenMPstandarddoesnotspecifyhowmuchstackspaceathreadshouldhave.Consequently,implementations
willdifferinthedefaultthreadstacksize.
Defaultthreadstacksizecanbeeasytoexhaust.Itcanalsobenonportablebetweencompilers.Usingpastversionsof
LCcompilersasanexample:
Compiler
Approx.StackLimit Approx.ArraySize(doubles)
Linuxicc,ifort
4MB
700x700
Linuxpgcc,pgf90
8MB
1000x1000
Linuxgcc,gfortran
2MB
500x500
Threadsthatexceedtheirstackallocationmayormaynotsegfault.Anapplicationmaycontinuetorunwhiledatais
beingcorrupted.
Staticallylinkedcodesmaybesubjecttofurtherstackrestrictions.
Auser'sloginshellmayalsorestrictstacksize
IfyourOpenMPenvironmentsupportstheOpenMP3.0O M P _ S T A C K S I Z E environmentvariable(coveredinprevious
section),youcanuseittosetthethreadstacksizepriortoprogramexecution.Forexample:
s e t
s e t
s e t
s e t
s e t
s e t
s e t
e n v
e n v
e n v
e n v
e n v
e n v
e n v
O M P
O M P
O M P
O M P
O M P
O M P
O M P
_ S T
_ S T
_ S T
_ S T
_ S T
_ S T
_ S T
A C K
A C K
A C K
A C K
A C K
A C K
A C K
S I Z
S I Z
S I Z
S I Z
S I Z
S I Z
S I Z
E
E
E
E
E
E
E
2 0 0 0
" 3 0 0
1 0 M
" 1 0
" 2 0
" 1 G
2 0 0 0
5 0 0 B
0 k "
M
m
"
"
"
0
Otherwise,atLC,youshouldbeabletousethemethodbelowforLinuxclusters.Theexampleshowssettingthethread
stacksizeto12MB,andasaprecaution,settingtheshellstacksizetounlimited.
csh/tcsh
ksh/sh/bash
s e t e n v K M P _ S T A C K S I Z E 1 2 0 0 0 0 0 0
l i m i t s t a c k s i z e u n l i m i t e d
e x p o r t
u l i m i t
K M P _ S T A C K S I Z E = 1 2 0 0 0 0 0 0
- s u n l i m i t e d
ThreadBinding:
Insomecases,aprogramwillperformbetterifitsthreadsareboundtoprocessors/cores.
"Binding"athreadtoaprocessormeansthatathreadwillbescheduledbytheoperatingsystemtoalwaysrunonathe
sameprocessor.Otherwise,threadscanbescheduledtoexecuteonanyprocessorand"bounce"backandforth
betweenprocessorswitheachtimeslice.
Alsocalled"threadaffinity"or"processoraffinity"
Bindingthreadstoprocessorscanresultinbettercacheutilization,therebyreducingcostlymemoryaccesses.Thisis
theprimarymotivationforbindingthreadstoprocessors.
Dependinguponyourplatform,operatingsystem,compilerandOpenMPimplementation,bindingthreadstoprocessors
canbedoneseveraldifferentways.
TheOpenMPversion3.1APIprovidesanenvironmentvariabletoturnprocessorbinding"on"or"off".Forexample:
s e t e n v
s e t e n v
O M P _ P R O C _ B I N D
O M P _ P R O C _ B I N D
T R U E
F A L S E
Atahigherlevel,processescanalsobeboundtoprocessors.
DetailedinformationaboutprocessandthreadbindingtoprocessorsonLCLinuxclusterscanbefoundHERE.
Monitoring,DebuggingandPerformanceAnalysisToolsforOpenMP
MonitoringandDebuggingThreads:
Debuggersvaryintheirabilitytohandlethreads.TheTotalViewdebuggerisLC'srecommendeddebuggerforparallel
programs.Itiswellsuitedforbothmonitoringanddebuggingthreadedprograms.
AnexamplescreenshotfromaTotalViewsessionusinganOpenMPcodeisshownbelow.
1.MasterthreadStackTracePaneshowingoriginalroutine
2.Process/threadstatusbarsdifferentiatingthreads
3.MasterthreadStackFramePaneshowingsharedvariables
4.WorkerthreadStackTracePaneshowingoutlinedroutine.
5.WorkerthreadStackFramePane
6.RootWindowshowingallthreads
7.ThreadsPaneshowingallthreadsplusselectedthread
SeetheTotalViewDebuggertutorialfordetails.
TheLinuxp s commandprovidesseveralflagsforviewingthreadinformation.Someexamplesareshownbelow.See
themanpagefordetails.
p
U I D
b l a
b l a
b l a
b l a
b l a
%
- L f
i s e
i s e
i s e
i s e
i s e
P I D
2 9
2 9
2 9
2 9
2 9
2 2 5
2 2 5
2 2 5
2 2 5
2 2 5
p s - T
P I D
S P I D
T T Y
P P
2 8 2
2 8 2
2 8 2
2 8 2
2 8 2
I D
4 0
4 0
4 0
4 0
4 0
L W P
2 9
3 0
3 1
3 2
3 3
2 2 5
2 2 5
2 2 5
2 2 5
2 2 5
N L W P
0
9 9
9 9
9 9
9 9
T I M E
C M D
S T I
1 1 :
1 1 :
1 1 :
1 1 :
1 1 :
M E
3 1
3 1
3 1
3 1
3 1
T T Y
p t s
p t s
p t s
p t s
p t s
/ 5 3
/ 5 3
/ 5 3
/ 5 3
/ 5 3
0 0 :
0 0 :
0 0 :
0 0 :
0 0 :
T I
0 0 :
0 1 :
0 1 :
0 1 :
0 1 :
M E
0 0
2 4
2 4
2 4
2 4
C M D
a . o
a . o
a . o
a . o
a . o
u t
u t
u t
u t
u t
2 2 5
2 2 5
2 2 5
2 2 5
2 2 5
2 9
2 2 5
2 2 5
2 2 5
2 2 5
2 2 5
2 9
2 9
2 9
2 9
p s P I D
2 2 5 2 9
-
2 9
p t s
p t s
p t s
p t s
p t s
3 0
3 1
3 2
3 3
/ 5 3
/ 5 3
/ 5 3
/ 5 3
/ 5 3
0 0 :
0 0 :
0 0 :
0 0 :
0 0 :
0 0 :
0 1 :
0 1 :
0 1 :
0 1 :
0 0
T I
1 8 :
0 0 :
0 4 :
0 4 :
0 4 :
0 4 :
M E
0 0 :
0 0 :
0 0 :
0 0 :
0 0 :
0 0 :
a . o
a . o
a . o
a . o
a . o
4 9
4 9
4 9
4 9
u t
u t
u t
u t
u t
L m
L W P
T T Y
p t s / 5 3
-
2 2 5
2 2 5
2 2 5
2 2 5
2 2 5
2 9
3 0
3 1
3 2
3 3
5 6
C M D
a . o u t
0 0
4 4
4 4
4 4
4 4
PerformanceAnalysisTools:
ThereareavarietyofperformanceanalysistoolsthatcanbeusedwithOpenMPprograms.Searchingthewebwillturn
upawealthofinformation.
AtLC,thelistofsupportedcomputingtoolscanbefoundat:computing.llnl.gov/code/content/software_tools.php.
Thesetoolsvarysignificantlyintheircomplexity,functionalityandlearningcurve.Coveringthemindetailisbeyondthe
scopeofthistutorial.
Sometoolsworthinvestigating,specificallyforOpenMPcodes,include:
Open|SpeedShop
TAU
PAPI
IntelVTuneAmplifier
ThreadSpotter
OpenMPExercise3
Assorted
Overview:
Logintotheworkshopcluster,ifyouarenotalreadyloggedin
Orphaneddirectiveexample:review,compile,run
GetOpenMPimplementationenvironmentinformation
Checkoutthe"bug"programs
GOTOTHEEXERCISEHERE
Thiscompletesthetutorial.
Pleasecompletetheonlineevaluationformunlessyouaredoingtheexercise,inwhichcaseplease
completeitattheendoftheexercise.
Wherewouldyouliketogonow?
Exercise
Agenda
Backtothetop
ReferencesandMoreInformation
Author:BlaiseBarney,LivermoreComputing.
TheOpenMPwebsite,whichincludestheC/C++andFortranApplicationProgramInterfacedocuments.
www.openmp.org
AppendixA:RunTimeLibraryRoutines
OMP_SET_NUM_THREADS
Purpose:
Setsthenumberofthreadsthatwillbeusedinthenextparallelregion.Mustbeapostiveinteger.
Format:
Fortran S U B R O U T I N E
C/C++
O M P _ S E T _ N U M _ T H R E A D S ( s c a l a r _ i n t e g e r _ e x p r e s s i o n )
# i n c l u d e < o m p . h >
v o i d o m p _ s e t _ n u m _ t h r e a d s ( i n t
n u m _ t h r e a d s )
Notes&Restrictions:
Thedynamicthreadsmechanismmodifiestheeffectofthisroutine.
Enabled:specifiesthemaximumnumberofthreadsthatcanbeusedforanyparallelregionbythedynamic
threadsmechanism.
Disabled:specifiesexactnumberofthreadstouseuntilnextcalltothisroutine.
Thisroutinecanonlybecalledfromtheserialportionsofthecode
ThiscallhasprecedenceovertheOMP_NUM_THREADSenvironmentvariable
OMP_GET_NUM_THREADS
Purpose:
Returnsthenumberofthreadsthatarecurrentlyintheteamexecutingtheparallelregionfromwhichitiscalled.
Format:
Fortran I N T E G E R
F U N C T I O N
O M P _ G E T _ N U M _ T H R E A D S ( )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ n u m _ t h r e a d s ( v o i d )
C/C++
Notes&Restrictions:
Ifthiscallismadefromaserialportionoftheprogram,oranestedparallelregionthatisserialized,itwillreturn1.
Thedefaultnumberofthreadsisimplementationdependent.
OMP_GET_MAX_THREADS
Purpose:
ReturnsthemaximumvaluethatcanbereturnedbyacalltotheOMP_GET_NUM_THREADSfunction.
Fortran I N T E G E R
F U N C T I O N
O M P _ G E T _ M A X _ T H R E A D S ( )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ m a x _ t h r e a d s ( v o i d )
C/C++
Notes&Restrictions:
GenerallyreflectsthenumberofthreadsassetbytheOMP_NUM_THREADSenvironmentvariableorthe
OMP_SET_NUM_THREADS()libraryroutine.
Maybecalledfrombothserialandparallelregionsofcode.
OMP_GET_THREAD_NUM
Purpose:
Returnsthethreadnumberofthethread,withintheteam,makingthiscall.Thisnumberwillbebetween0and
OMP_GET_NUM_THREADS1.Themasterthreadoftheteamisthread0
Format:
Fortran I N T E G E R
F U N C T I O N
O M P _ G E T _ T H R E A D _ N U M ( )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ t h r e a d _ n u m ( v o i d )
C/C++
Notes&Restrictions:
Ifcalledfromanestedparallelregion,oraserialregion,thisfunctionwillreturn0.
Examples:
Example1isthecorrectwaytodeterminethenumberofthreadsinaparallelregion.
Example2isincorrecttheTIDvariablemustbePRIVATE
Example3isincorrecttheOMP_GET_THREAD_NUMcallisoutsidetheparallelregion
Fortrandeterminingthenumberofthreadsinaparallelregion
Example1:Correct
P R O G R A M
H E L L O
I N T E G E R
! $ O M P
T I D ,
P A R A L L E L
T I D =
P R I N T
O M P _ G E T _ T H R E A D _ N U M
P R I V A T E ( T I D )
O M P _ G E T _ T H R E A D _ N U M ( )
* , ' H e l l o W o r l d f r o m
t h r e a d
' ,
T I D
' ,
T I D
' ,
T I D
. . .
! $ O M P
E N D
P A R A L L E L
E N D
Example2:Incorrect
! $ O M P
P R O G R A M
H E L L O
I N T E G E R
T I D ,
O M P _ G E T _ T H R E A D _ N U M
P A R A L L E L
T I D =
P R I N T
O M P _ G E T _ T H R E A D _ N U M ( )
* , ' H e l l o W o r l d f r o m
t h r e a d
. . .
! $ O M P
E N D
P A R A L L E L
E N D
Example3:Incorrect
P R O G R A M
H E L L O
I N T E G E R
T I D ,
T I D =
P R I N T
! $ O M P
O M P _ G E T _ T H R E A D _ N U M
O M P _ G E T _ T H R E A D _ N U M ( )
* , ' H e l l o W o r l d f r o m
t h r e a d
P A R A L L E L
. . .
! $ O M P
E N D
P A R A L L E L
E N D
OMP_GET_THREAD_LIMIT
Purpose:
ReturnsthemaximumnumberofOpenMPthreadsavailabletoaprogram.
Format:
Fortran I N T E G E R
C/C++
F U N C T I O N
O M P _ G E T _ T H R E A D _ L I M I T
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ t h r e a d _ l i m i t
( v o i d )
Notes:
AlsoseetheO M P _ T H R E A D _ L I M I T environmentvariable.
OMP_GET_NUM_PROCS
Purpose:
Returnsthenumberofprocessorsthatareavailabletotheprogram.
Format:
Fortran I N T E G E R
C/C++
F U N C T I O N
O M P _ G E T _ N U M _ P R O C S ( )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ n u m _ p r o c s ( v o i d )
OMP_IN_PARALLEL
Purpose:
Maybecalledtodetermineifthesectionofcodewhichisexecutingisparallelornot.
Format:
Fortran L O G I C A L
C/C++
F U N C T I O N
O M P _ I N _ P A R A L L E L ( )
# i n c l u d e < o m p . h >
i n t o m p _ i n _ p a r a l l e l ( v o i d )
Notes&Restrictions:
ForFortran,thisfunctionreturns.TRUE.ifitiscalledfromthedynamicextentofaregionexecutinginparallel,and
.FALSE.otherwise.ForC/C++,itwillreturnanonzerointegerifparallel,andzerootherwise.
OMP_SET_DYNAMIC
Purpose:
Enablesordisablesdynamicadjustment(bytheruntimesystem)ofthenumberofthreadsavailableforexecutionof
parallelregions.
Format:
Fortran S U B R O U T I N E
C/C++
O M P _ S E T _ D Y N A M I C ( s c a l a r _ l o g i c a l _ e x p r e s s i o n )
# i n c l u d e < o m p . h >
v o i d o m p _ s e t _ d y n a m i c ( i n t
d y n a m i c _ t h r e a d s )
Notes&Restrictions:
ForFortran,ifcalledwith.TRUE.thenthenumberofthreadsavailableforsubsequentparallelregionscanbeadjusted
automaticallybytheruntimeenvironment.Ifcalledwith.FALSE.,dynamicadjustmentisdisabled.
ForC/C++,ifdynamic_threadsevaluatestononzero,thenthemechanismisenabled,otherwiseitisdisabled.
TheOMP_SET_DYNAMICsubroutinehasprecedenceovertheOMP_DYNAMICenvironmentvariable.
Thedefaultsettingisimplementationdependent.
Mustbecalledfromaserialsectionoftheprogram.
OMP_GET_DYNAMIC
Purpose:
Usedtodetermineifdynamicthreadadjustmentisenabledornot.
Format:
Fortran L O G I C A L
C/C++
F U N C T I O N
O M P _ G E T _ D Y N A M I C ( )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ d y n a m i c ( v o i d )
Notes&Restrictions:
ForFortran,thisfunctionreturns.TRUE.ifdynamicthreadadjustmentisenabled,and.FALSE.otherwise.
ForC/C++,nonzerowillbereturnedifdynamicthreadadjustmentisenabled,andzerootherwise.
OMP_SET_NESTED
Purpose:
Usedtoenableordisablenestedparallelism.
Format:
Fortran S U B R O U T I N E
C/C++
O M P _ S E T _ N E S T E D ( s c a l a r _ l o g i c a l _ e x p r e s s i o n )
# i n c l u d e < o m p . h >
v o i d o m p _ s e t _ n e s t e d ( i n t
n e s t e d )
Notes&Restrictions:
ForFortran,callingthisfunctionwith.FALSE.willdisablenestedparallelism,andcallingwith.TRUE.willenableit.
ForC/C++,ifnestedevaluatestononzero,nestedparallelismisenabledotherwiseitisdisabled.
Thedefaultisfornestedparallelismtobedisabled.
ThiscallhasprecedenceovertheOMP_NESTEDenvironmentvariable
OMP_GET_NESTED
Purpose:
Usedtodetermineifnestedparallelismisenabledornot.
Format:
Fortran L O G I C A L
C/C++
F U N C T I O N
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ n e s t e d
O M P _ G E T _ N E S T E D
( v o i d )
Notes&Restrictions:
ForFortran,thisfunctionreturns.TRUE.ifnestedparallelismisenabled,and.FALSE.otherwise.
ForC/C++,nonzerowillbereturnedifnestedparallelismisenabled,andzerootherwise.
OMP_SET_SCHEDULE
Purpose:
Thisroutinesetsthescheduletypethatisappliedwhentheloopdirectivespecifiesaruntimeschedule.
Format:
S U B R O U T I N E O M P _ S E T _ S C H E D U L E ( K I N D , M O D I F I E R )
Fortran I N T E G E R ( K I N D = O M P _ S C H E D _ K I N D ) K I N D
I N T E G E R M O D I F I E R
C/C++
# i n c l u d e < o m p . h >
v o i d o m p _ s e t _ s c h e d u l e ( o m p _ s c h e d _ t
k i n d ,
i n t
m o d i f i e r )
OMP_GET_SCHEDULE
Purpose:
Thisroutinereturnstheschedulethatisappliedwhentheloopdirectivespecifiesaruntimeschedule.
Format:
S U B R O U T I N E O M P _ G E T _ S C H E D U L E ( K I N D , M O D I F I E R )
Fortran I N T E G E R ( K I N D = O M P _ S C H E D _ K I N D ) K I N D
I N T E G E R M O D I F I E R
C/C++
# i n c l u d e < o m p . h >
v o i d o m p _ g e t _ s c h e d u l e ( o m p _ s c h e d _ t
*
k i n d ,
i n t
m o d i f i e r
OMP_SET_MAX_ACTIVE_LEVELS
Purpose:
Thisroutinelimitsthenumberofnestedactiveparallelregions.
Format:
Fortran
C/C++
S U B R O U T I N E O M P _ S E T _ M A X _ A C T I V E _ L E V E L S
I N T E G E R M A X _ L E V E L S
# i n c l u d e < o m p . h >
v o i d o m p _ s e t _ m a x _ a c t i v e _ l e v e l s
( i n t
( M A X _ L E V E L S )
m a x _ l e v e l s )
Notes&Restrictions:
Ifthenumberofparallellevelsrequestedexceedsthenumberoflevelsofparallelismsupportedbytheimplementation,
thevaluewillbesettothenumberofparallellevelssupportedbytheimplementation.
Thisroutinehasthedescribedeffectonlywhencalledfromthesequentialpartoftheprogram.Whencalledfromwithin
anexplicitparallelregion,theeffectofthisroutineisimplementationdefined.
OMP_GET_MAX_ACTIVE_LEVELS
Purpose:
Thisroutinereturnsthemaximumnumberofnestedactiveparallelregions.
Format:
Fortran I N T E G E R
C/C++
F U N C T I O N
O M P _ G E T _ M A X _ A C T I V E _ L E V E L S ( )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ m a x _ a c t i v e _ l e v e l s ( v o i d )
OMP_GET_LEVEL
Purpose:
Thisroutinereturnsthenumberofnestedparallelregionsenclosingthetaskthatcontainsthecall.
Format:
Fortran I N T E G E R
C/C++
F U N C T I O N
O M P _ G E T _ L E V E L ( )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ l e v e l ( v o i d )
Notes&Restrictions:
Theomp_get_levelroutinereturnsthenumberofnestedparallelregions(whetheractiveorinactive)enclosingthetask
thatcontainsthecall,notincludingtheimplicitparallelregion.Theroutinealwaysreturnsanonnegativeinteger,and
returns0ifitiscalledfromthesequentialpartoftheprogram.
OMP_GET_ANCESTOR_THREAD_NUM
Purpose:
Thisroutinereturns,foragivennestedlevelofthecurrentthread,thethreadnumberoftheancestororthecurrent
thread.
Format:
Fortran
C/C++
I N T E G E R
I N T E G E R
F U N C T I O N
L E V E L
O M P _ G E T _ A N C E S T O R _ T H R E A D _ N U M ( L E V E L )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ a n c e s t o r _ t h r e a d _ n u m ( i n t
l e v e l )
Notes&Restrictions:
Iftherequestednestlevelisoutsidetherangeof0andthenestlevelofthecurrentthread,asreturnedbythe
omp_get_levelroutine,theroutinereturns1.
OMP_GET_TEAM_SIZE
Purpose:
Thisroutinereturns,foragivennestedlevelofthecurrentthread,thesizeofthethreadteamtowhichtheancestoror
thecurrentthreadbelongs.
Format:
Fortran
C/C++
I N T E G E R
I N T E G E R
F U N C T I O N
L E V E L
O M P _ G E T _ T E A M _ S I Z E ( L E V E L )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ t e a m _ s i z e ( i n t
l e v e l ) ;
Notes&Restrictions:
Iftherequestednestedlevelisoutsidetherangeof0andthenestedlevelofthecurrentthread,asreturnedbythe
omp_get_levelroutine,theroutinereturns1.Inactiveparallelregionsareregardedlikeactiveparallelregionsexecuted
withonethread.
OMP_GET_ACTIVE_LEVEL
Purpose:
Theomp_get_active_levelroutinereturnsthenumberofnested,activeparallelregionsenclosingthetaskthatcontains
thecall.
Format:
Fortran I N T E G E R
C/C++
F U N C T I O N
O M P _ G E T _ A C T I V E _ L E V E L ( )
# i n c l u d e < o m p . h >
i n t o m p _ g e t _ a c t i v e _ l e v e l ( v o i d ) ;
Notes&Restrictions:
Theroutinealwaysreturnsanonnegativeinteger,andreturns0ifitiscalledfromthesequentialpartoftheprogram.
OMP_IN_FINAL
Purpose:
Thisroutinereturnstrueiftheroutineisexecutedinafinaltaskregionotherwise,itreturnsfalse.
Format:
Fortran L O G I C A L
C/C++
F U N C T I O N
O M P _ I N _ F I N A L ( )
# i n c l u d e < o m p . h >
i n t o m p _ i n _ f i n a l ( v o i d )
OMP_INIT_LOCK
OMP_INIT_NEST_LOCK
Purpose:
Thissubroutineinitializesalockassociatedwiththelockvariable.
Format:
Fortran
C/C++
S U B R O U T I N E
S U B R O U T I N E
O M P _ I N I T _ L O C K ( v a r )
O M P _ I N I T _ N E S T _ L O C K ( v a r )
# i n c l u d e < o m p . h >
v o i d o m p _ i n i t _ l o c k ( o m p _ l o c k _ t * l o c k )
v o i d o m p _ i n i t _ n e s t _ l o c k ( o m p _ n e s t _ l o c k _ t
* l o c k )
Notes&Restrictions:
Theinitialstateisunlocked
ForFortran,varmustbeanintegerlargeenoughtoholdanaddress,suchasINTEGER*8on64bitsystems.
OMP_DESTROY_LOCK
OMP_DESTROY_NEST_LOCK
Purpose:
Thissubroutinedisassociatesthegivenlockvariablefromanylocks.
Format:
Fortran
C/C++
S U B R O U T I N E
S U B R O U T I N E
O M P _ D E S T R O Y _ L O C K ( v a r )
O M P _ D E S T R O Y _ N E S T _ L O C K ( v a r )
# i n c l u d e < o m p . h >
v o i d o m p _ d e s t r o y _ l o c k ( o m p _ l o c k _ t * l o c k )
v o i d o m p _ d e s t r o y _ n e s t _ l o c k ( o m p _ n e s t _ l o c k _ t
* l o c k )
Notes&Restrictions:
Itisillegaltocallthisroutinewithalockvariablethatisnotinitialized.
ForFortran,varmustbeanintegerlargeenoughtoholdanaddress,suchasINTEGER*8on64bitsystems.
OMP_SET_LOCK
OMP_SET_NEST_LOCK
Purpose:
Thissubroutineforcestheexecutingthreadtowaituntilthespecifiedlockisavailable.Athreadisgrantedownershipof
alockwhenitbecomesavailable.
Format:
Fortran
C/C++
S U B R O U T I N E
S U B R O U T I N E
O M P _ S E T _ L O C K ( v a r )
O M P _ S E T _ N E S T _ L O C K ( v a r )
# i n c l u d e < o m p . h >
v o i d o m p _ s e t _ l o c k ( o m p _ l o c k _ t * l o c k )
v o i d o m p _ s e t _ n e s t _ _ l o c k ( o m p _ n e s t _ l o c k _ t
* l o c k )
Notes&Restrictions:
Itisillegaltocallthisroutinewithalockvariablethatisnotinitialized.
ForFortran,varmustbeanintegerlargeenoughtoholdanaddress,suchasINTEGER*8on64bitsystems.
OMP_UNSET_LOCK
OMP_UNSET_NEST_LOCK
Purpose:
Thissubroutinereleasesthelockfromtheexecutingsubroutine.
Format:
Fortran
C/C++
S U B R O U T I N E
S U B R O U T I N E
O M P _ U N S E T _ L O C K ( v a r )
O M P _ U N S E T _ N E S T _ L O C K ( v a r )
# i n c l u d e < o m p . h >
v o i d o m p _ u n s e t _ l o c k ( o m p _ l o c k _ t * l o c k )
v o i d o m p _ u n s e t _ n e s t _ _ l o c k ( o m p _ n e s t _ l o c k _ t
* l o c k )
Notes&Restrictions:
Itisillegaltocallthisroutinewithalockvariablethatisnotinitialized.
ForFortran,varmustbeanintegerlargeenoughtoholdanaddress,suchasINTEGER*8on64bitsystems.
OMP_TEST_LOCK
OMP_TEST_NEST_LOCK
Purpose:
Thissubroutineattemptstosetalock,butdoesnotblockifthelockisunavailable.
Format:
Fortran
C/C++
S U B R O U T I N E
S U B R O U T I N E
O M P _ T E S T _ L O C K ( v a r )
O M P _ T E S T _ N E S T _ L O C K ( v a r )
# i n c l u d e < o m p . h >
i n t o m p _ t e s t _ l o c k ( o m p _ l o c k _ t * l o c k )
i n t o m p _ t e s t _ n e s t _ _ l o c k ( o m p _ n e s t _ l o c k _ t
* l o c k )
Notes&Restrictions:
ForFortran,.TRUE.isreturnedifthelockwassetsuccessfully,otherwise.FALSE.isreturned.
ForFortran,varmustbeanintegerlargeenoughtoholdanaddress,suchasINTEGER*8on64bitsystems.
ForC/C++,nonzeroisreturnedifthelockwassetsuccessfully,otherwisezeroisreturned.
Itisillegaltocallthisroutinewithalockvariablethatisnotinitialized.
OMP_GET_WTIME
Purpose:
Providesaportablewallclocktimingroutine
Returnsadoubleprecisionfloatingpointvalueequaltothenumberofelapsedsecondssincesomepointinthepast.
Usuallyusedin"pairs"withthevalueofthefirstcallsubtractedfromthevalueofthesecondcalltoobtaintheelapsed
timeforablockofcode.
Designedtobe"perthread"times,andthereforemaynotbegloballyconsistentacrossallthreadsinateamdepends
uponwhatathreadisdoingcomparedtootherthreads.
Format:
Fortran D O U B L E
C/C++
P R E C I S I O N
F U N C T I O N
# i n c l u d e < o m p . h >
d o u b l e o m p _ g e t _ w t i m e ( v o i d )
OMP_GET_WTICK
Purpose:
Providesaportablewallclocktimingroutine
O M P _ G E T _ W T I M E ( )
Returnsadoubleprecisionfloatingpointvalueequaltothenumberofsecondsbetweensuccessiveclockticks.
Format:
Fortran D O U B L E
C/C++
P R E C I S I O N
F U N C T I O N
# i n c l u d e < o m p . h >
d o u b l e o m p _ g e t _ w t i c k ( v o i d )
https://computing.llnl.gov/tutorials/openMP/
LastModified:06/07/201601:08:02blaiseb@llnl.gov
UCRLMI133316
O M P _ G E T _ W T I C K ( )