Vous êtes sur la page 1sur 38

Jgvldg

OpenMP Language features


By Sridhar Ranganathan B.E., M.Tech., M.Phil., PGSEM

Features provided by OpenMP


OpenMP provides
Directives Library functions Environment variables

to create and control the execution of parallel programs

Constructs I
Parallel construct Work sharing constructs
Loop construct Sections construct Single construct

Data sharing No wait Schedule

Constructs - II
These constructs will enable the programmer to orchestrate actions of different threads
Barrier construct Critical construct Atomic construct Locks Master construct

Terminology
OpenMP directive In C/C++, a #pragma that specifies OpenMP program behaviour Executable directive An OpenMP directive that is NOT declarative; that is it may be placed in an executable context Construct An OpenMP executable directive and the associated statement,loop pr structured block [ lexical extent of an executable directive]

Requirements for OpenMP


OpenMP requires well structured programs Constructs are associated with statements, loops or executable blocks In C/C++, a structured block is defined to be an executable statement, possibly a compound statement with a single entry at the top and a single exit at the bottom Point of entry cannot be a labelled statement Point of exit cannot be a branch of any type

Parallel construct
This is specified as #pragma omp parallel clause1 clause2 structured block

Use of parallel construct


This construct is used to specify the computations that would be executed in parallel Parts of the program that are NOT enclosed by the construct will be executed serially When a thread encounters this statement
A team of threads is created to execute the associated parallel region This construct does NOT distribute the work For distribution we need additional clauses If you do not specify additional clauses, same work will be done by all the threads

At the end of the parallel region, there is an implied barrier which makes all the threads to wait until the work inside the regions is completed Only the initial thread continues execution after the end of the parallel region

Explanation of the parallel construct contd..


The thread that encounters the parallel construct is the Master thread Each thread is assigned a unique thread id They range from zero for master thread to one less than maximum threads Each thread is allowed a different path of execution using an if clause Thread id could be found by omp_get_thread_num() function.

Clauses supported by parallel construct


if (scalar-expression) num_threads(scalar_expression) private(list) firstprivate(list) shared(list) default(none|shared) copyn(list) Reduction(operatorlist)

Restrictions on the parallel construct


A program should NOT branch into or out of the parallel region. If it does, then the behaviour is undefined. A program should NOT depend on any ordering of evaluations of the clauses or any side effects At most one if clause can appear on the directive At most one num_threads clause can appear on the directive. The expression of num_threads clause must evaluate to a positive integer

Sharing the work among threads


A worksharing construct specifies a region of code whose work is to be distributed among many threads and also specifies the manner in which the work in the region needs to be parceled out. There are three constructs
#pragma omp for #pragma omp sections #pragma omp single

Rules for work sharing constructs


Each work sharing region must be encountered by all threads in a team or by none at all The sequence of work sharing regions and barrier regions encountered must be the same for every thread in the team A work sharing construct does NOT launch any new threads It does NOT have any barrier on entry It has an implicit barrier at the end

Loop construct
#pragma omp for
for (init-expr; var relop b; incr expr)

init-expr must be an integer expression b is also an integer expression incr expr must also be an integer expression Using ++,+=,--,-= Alternatively it could be var = var+expr

Restrictions for the loop construct


Use of this is limited to those kinds of loops where the number of iterations can be Counted Example for loops where integer variable is used as a counter whose value is incremented by a fixed number for each iteration till a upper or lower bound is reached It means compiler should be able to count number of iterations and distribute the load

section/sections construct
Using sections construct, we can assign different threads to carry on different kinds of work Using sections construct we can specify different code regions which will be executed by one of the threads There are two directives
#pragma omp section #pragma omp sections

Example of section/sections code


#pragma omp parallel { #pragma omp sections { #pragma omp section structured block #pragma omp section structured block } }

Explanation of the section/sections construct


#pragma omp sections indicate the start of the construct #pragma omp section marks each independent section At run time, the specified code blocks are executed by threads in the team Each thread executes one code block at a time Each code block will be executed at once

Explanation of the section/sections contd..


If there are fewer threads than code blocks, some or all threads may execute multiple code blocks If there are fewer code blocks than threads, some threads will be idle Most common use of section/sections construct is to execute functions in parallel

Single construct
Single construct is used to specify that exactly one thread must execute the specified part We do not care which thread really execute this The thread executing this can differ from run to run This is used in initialization of variables

Example use of single construct


#pragma omp single clause1 clause2 structured block

Master construct
This is similar to single construct This guarantees that the work will be done by Master thread The Master construct does NOT have an implied barrier at entry or exit This may create problmes Solution is to have an explicit barrier statement

Example use of Master construct


#pragma omp master structured block #pragma omp barrier

Clauses to control parallel and worksharing constructs


shared private lastprivate firstprivate default nowait schedule

Shared clause
The shared clause specifies which data will be shared among threads executing the region it is associated with There will be an unique instance of the variable Each thread can freely read and modify the value A note of caution is that multiple threads may try to update the same variable simulataneously Synchronization constructs are available to resolve this issue A good use is when the threads only read this variable

Private clause
The private clause ensures that each thread is given a private copy of the variable Each variable in the private list is replicated such that each thread gets its own copy

Firstprivate clause
This is used if we need to initialize private variables prior to the region in which it will be used Variables that are used in firstprivate are private variables but they will be initialized to a value which a variable with the same name happens just before entry into the parallel region

Lastprivate clause
If a value of a private variable is needed after the parallel region is over, this clause is used In the case of a work-shared loop, the object will have a value from the iteration of the loop that would be last in a sequential execution In the case of a use in a sections statement, the object gets assigned the value that it has at the end of the lexically last sections construct

Default clause
The default clause is to give variables a default sharing attribute In C/C++, the default is none or shared If default (shared) is given, all variables other than private are shared variables If default(none) is given, programmer is forced to think about variable and to specify each variable in private list or shared list Default(none) is recommended

Nowait clause
Nowait clause allows the programmer to fine tune a programs performance When we add this clause to a construct, the barrier at the end of the associated construct will be suppressed Usage: when a parallel program runs correctly, we identify places where barrier is not necessary and introduce this clause When a thread is finished with the work associated with the parallel loop it continues without waiting for others to complete their work. Example: #pragma omp for nowait

Schedule clause
This is supported in the loop construct only as follows #pragma omp parallel schedule(kind, chunksize) There are four kinds of scheduling
Static Dynamic Guided runtime

Schedule clause contd..


Schedule kind Static Description Iterations are divided into chunks of sie chunk_size; the chunks are assigned to the threads statically in a round robin fashion; the last chunk to be assigned may have a smaller number of iterations; with no chunk size specified the iteration space is divided into chunks that are approximately equal in size. Each thread is assigned at most one chunk

Dynamic

Iterations are assigned to the threads as the threads request them; the thread executes the chunk of iterations controlled through chunk_size parameter; then requests another chunk until there are now more chunks to work on; last chunk may have fewrer iterationsl whenno chunk size specifed, it defaults to one..

Schedule clause contd further


Schedule kind Guided Description Iterations are assigned to the threads as the threads request them; The thread executes the chunk as controlled through the chunk_size parameter and then requests another chunk, until there are no more chunks. For a chunk_size of 1, the size of each chunk is proportional to the number of unassigned iterations/no of threads decreasing to 1; For a chunk size of k, the size of each chunk is determined in the same way, with the restricion that the chunks do not contain fewer than k iterations (except last chunk); for no chunk_size, it defaults to 1. Decision made at runtime; schedule and chunk size are set through environment variable OMP_SCHEDULE

Runtime

OpenMP synchronization constructs


Barrier construct
#pragma omp barrier

Ordered construct
#pragma omp ordered Allows one to execute a structured block within a parallel loop in sequential order

Critical construct
This provides a means to ensure that multiple threads do not attempt to update the same shared data simultaneously [ which is called critical region] An optional name is to be given to this which must be unique globally When a thread enters a critical region, it waits until no other thread is executing it #pragma omp critical name
Structured block

OpenMP synchronization constructs contd..


Atomic construct
This is an efficient alternative to critical region It is only applied to the single assignment statement that immediately follows it #pragma omp atomic
Count +=1 //example

Operations allowed in the assignment statement following atomic are


+ * - / & |,<<,>> only

OpenMP Locks
These are semaphores
OpenMP provides a set of lowlevel general purpose locking routines They provide greater flexibility for synchronization Nested locks are also possible Definition omp_lock_t *var1; Routines for simple locks Initialization omp_init_lock(var1); Set lock omp_set_lock(var1) Test lock omp_test_lock(var1) Unset lock omp_unset_lock(var1) Destroy lock omp_destroy_lock(var1)

Interaction with environment


OpenMP defines internal control variables They govern the behaviour of the program at runtime They cannot be modified or accessed at the application level They can be queried or accessed by OpenMP environment variables These variables are of the following types
Nthreads var Dyn var Nest var Run-sched var Def-sched var

Environment variables
OMP_NUM_THREADS
omp_set_num_threads() Omp_get_num_threads()

OMP_DYNAMIC (boolean)
Omp_set_dynamic() Omp_get_dynamic()

OMP_NESTED
Omp_set_nested() Omp_get_nested()

OMP_SCHEDULE

Vous aimerez peut-être aussi