Omp Examples

CONCURRENT AND PARALLEL
PROGRAMMING
Part I
OpenMP
Wojciech Mikanik, PhD

wojciech.mikanik@polsl.pl
room 503
February 2008
CONCURRENT AND PARALLEL PROGRAMMING
Outline
Matrix multiplication
Sequential version: pseudocode
1. Matrix multiplication example

I
I
I
sequential solution
pthreads solution
OpenMP solution
for i := 1 to n do
for j := 1 to n do
C[i][j] := 0;
for s := 1 to n do
C[i][j] := C[i][j] + A[i][s] * B[s][j];
end for;
end for;
end for;
2. Introduction to OpenMP
3. Expressing parallel execution
4. Work sharing
5. Data storage attributes
6. Synchronization
7. Nested parallelism
8. Using third party libraries
Sequential version: C implementation
PRAM CREW
void matrix_mult (double a[N][N],

double b[N][N],
double c[N][N]){
int i, j, s;
for (i = 0; i < N; i ++){

for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
parfor i := 1 to n do
parfor j := 1 to n do
C[i][j] := 0;
for s := 1 to n do
C[i][j] := C[i][j] + A[i][s] * B[s][j];
end for;
end parfor;
end parfor;
}
File matrix-seq.c.
(AGAIN) Sequential version: pseudocode
Parallel version: C + pthread library
Facts
for i := 1 to n do
for j := 1 to n do
C[i][j] := 0;
for s := 1 to n do
C[i][j] := C[i][j] + A[i][s] * B[s][j];
end for;
end for;
end for;
Reasonable size of matrices (n 64)
Thread creation is not free
Computing power comes from a CPU not from a thread
Computing system with small to medium number of CPUs
Decision
Thread i computes rows i, i + NT , i + 2NT , i + 3NT , . . . of the
result matrix C
(NT Number of Threads)
Parallel version: C + pthread library (part I). File matrix-pthreads.c.
Parallel version: C + pthread library (part II)
typedef struct {
double (*a) [N];
double (*b) [N];
double (*c) [N];
sem_t * s;
int id;
} thread_args;
void matrix_mult (double a [N][N], double b[N][N], double c[N][N]){

int i;
thread_args ta[NT];
sem_t s;
pthread_t p;
void * thread_proc (void * a){

thread_args * arg = a;
int i, j, s;
for (i = arg->id; i < N; i += NT)
for (j = 0; j < N; j ++){
arg->c[i][j] = 0.0;
for (s = 0; s < N; s ++)
arg->c[i][j] += arg->a[i][s] * arg->b[s][j];
}
assert (sem_post (arg->s) == 0);
return NULL;
}
assert (sem_init (&s, 0, 0) == 0);

for (i = 0; i < NT; i ++){
thread_args x = { a, b, c, &s, i};
ta[i] = x;
assert (0 == pthread_create (&p, NULL, thread_proc, ta + i));
}
for (i = 0; i < NT; i ++)
assert (sem_wait (&s) == 0);
}
.
Parallel version: C + OpenMP
Sequential version: C implementation

double b[N][N],
double c[N][N]){
int i, j, s;

double b[N][N],
double c[N][N]){
int i, j, s;
#pragma omp parallel for private(j, s)

for (i = 0; i < N; i ++){
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
for (i = 0; i < N; i ++){

for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
}
File: matrix-seq.c
File: matrix-omp.c
OpenMP
OpenMP
I
Goals
I
I
A way to do multithreaded programming
Focus: parallel execution of loops

OpenMP defines
I
I
I
set of compiler directives

(small) set of subroutines
bindings for programming languages (Fortran, C, C++)
History
I
I
OpenMP extends, but does not modify, a base programming

language
I
I
I
I
I
I
OpenMP enabled compilers
Nov 1996 spec. ANSI X3H5

Oct 1997 spec. 1.0 OpenMP (FORTRAN)
1998 spec. 1.0 OpenMP (C/C++)
1999 spec. 1.1 OpenMP (FORTRAN)
2006 spec. 2.5 compilers available (FORTRAN, C, C++)
Feb 2008 spec. 3.0 defined
OpenMP homepage: www.openmp.org

OpenMP user group: www.cOMPunity.org
Execution of OpenMP programs
Intel (Linux, MSWindows)

I
Scalable parallel applications for shared memory computers

Portability of performance
Fast and gradual development of parallel versions of existing
sequential code
Easy simultaneous maintenance of parallel and sequential
source code
Development of new parallel applications from scratch
C++ compiler ver 9.x: spec. 2.5 OpenMP

FORTRAN compiler

double b[N][N],
double c[N][N]){
int i, j, s;
Microsoft Visual Studio 2005, Visual .NET: spec. 2.0

OpenMP (MSWindows)
GNU C++ ver. 4.2.0 (TBA)
Sun Studio ver 11 (Solaris)
PGF77 and PGF90 Compilers from The Portland Group, Inc.

(PGI) (Intel Linux, Intel Solaris, Intel Windows/NT)
IBM XL Fortran and C (IBM AIX)
HP, SGI, Fujitsu
#pragma omp parallel for private(j, s)

for (i = 0; i < N; i ++){
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
}
File: matrix-omp.c
Coding OpenMP programs

I
I
I
Format
#pragma omp directive [clause ]
Parallel region
#pragma omp parallel [clause ]
Work sharing
Synchronization
I
I
I
I
I
I
#include <omp.h>
Query functions
Lock functions
Dynamic adjustment of number of threads
Nested parallelism
I
I
I
OMP
OMP
OMP
OMP
I
I
I
I
function omp set num threads()

OMP NUM THREADS environment variable
region wide
num threads clause
Dynamic adjustment of the number of threads

I
Work sharing directives
application wide
I
DYNAMIC
NUM THREADS
NESTED
SCHEDULE
Getting current thread number: omp get thread num()

Getting number of threads: omp get num threads()
Detecting parallel region: omp in parallel ()
Setting number of threads

I
Environment variables
I
parallel directive
#pragma omp parallel
{
printf("hello world\n");
}
Useful query functions
I
Runtime library
I
Complier directives
I
Expressing parallelism
function omp set dynamic ()

function omp get dynamic ()
OMP DYNAMIC environment variable
sections directive
Syntax
I
I
I
I
I
sections directive
combined parallel sections directive
for directive
combined parallel for directive
single directive
#pragma omp single
{
doSth (defaultDelay);
single2 = omp_get_thread_num();
}
#pragma omp sections [clause]

{
[ #pragma omp section ]
block1
[ #pragma omp section
block2 ]
[ #pragma omp section
block3 ]
...
}
cobegin
block1
block2
block3
...
coend;
master directive
#pragma omp master
master = omp_get_thread_num();
sections directive
sections directive
An example
sections vs pure parallel

{
#pragma omp sections
{
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
#pragma omp section

{
}
sections
#pragma omp section
{
}
#pragma omp section
{
}
I
I
parallel
I
I
Exactly 4 messages
Thread numbers printed is any 4 element
permutation with repetitions of integers
[0 . . . teamSize 1]
Exactly teamSize messages
Thread numbers printed is a permutation
without repetitions of integers
[0 . . . teamSize 1]
}
parallel sections directive
for directive
#pragma omp parallel sections

{
#pragma omp section
#pragma omp section
{
}
#pragma omp section
{
}
Syntax
#pragma omp for [clause]
for-loop
Example
int a[ARR SIZE];
{
#pragma omp for
for (i = 0; i < ARR SIZE; i ++)
a[i] = 1;
}
parallel for directive
parallel for directive
schedule clause
I
I
I
Default scheduling: implementation dependent

schedule(static, chunk size)
I
chunks of a chunk size size or 1 (if chunk size not provided)

a queue of chunks
schedule(guided, chunk size)

I
schedule(dynamic, chunk size)

I
chunks of a chunk size size or equal size (if chunk size not
provided)
round-robin in the order of the thread number,
chunks of a decreasing size (down to chunk size) or 1 (if

chunk size not provided)
a queue of chunks
schedule(runtime)
scheduling according to the value of OMP SCHEDULE
environment variable
Syntax
#pragma omp parallel for [clause]
for-loop
Example
double a [N][N], b [N][N], c [N][N];
int i;
...
#pragma omp parallel for
for (i = 0; i < N; i ++){
int j, s;
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
Data model
Data model
Declaration
Variables can be
I
I
I
Shared
Private
Subject to reduction
I
I
I
Defaults
I
I
Usually: shared
Private
I
I
I
Clauses
I
I
Local (not static) variables of subroutines called by a thread

Automatic variables within a block of threads code
Control variable of parallel for loop
list )
Thread private variables

I
I
shared ( list )
private ( list )
default ( list )
firstprivate
lastprivate
reduction (operator :
threadprivate directive
copyin clause
Data model
Data model
Example (from matrix-omp.c)
Example (from matrix-omp.c)
void matrix_mult2(){
int i;
...
{
#pragma omp for
for (i = 0; i < N; i ++){
int j, s;
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
}
}
Data model
void matrix_mult3(){
int i, j, s;
...
{
#pragma omp for private(j, s)
for (i = 0; i < N; i ++){
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
}
}
reduction clause
Consistency of shared data
Sequential consistency
I
I
benefits
drawbacks
Relaxed consistency
I
I
temporary view
synchronization points
I
I
I
barriers (both explicit and implicit)

entry to and exit from critical regions
entry to and exit from lock routines
directive flush ( list )
Syntax: reduction (operator :
list )
List of operators: +, *, -, &, ^, |, &&, ||
Example:
#define ARR_SIZE 100000
int a [ARR_SIZE], i, sum;
...
sum = 0;
{
#pragma omp for reduction (+: sum)
for (i = 0; i < ARR_SIZE; i ++)
sum += a[i];
}
Synchronization
I
I
for directive
sections directive
single directive
#pragma omp sections

{
printf("Thread #%d in sections\n", omp_get_thread_num());
nowait clause
Explicit synchronization (directives)
I
I
I
I
I

{
#pragma omp for nowait
for (i = 0; i < numThreads; i++)
doSth(i * delay);
Implicit synchronization
I
nowait example
master
critical
atomic
barrier
ordered directive and ordered clause
#pragma omp section

Explicit synchronization (locks)
#pragma omp section

}
}
critical directive
I
Named critical directive
Syntax
#pragma omp critical [ (name)]
block
Example
#define ARR_SIZE 100000
int a [ARR_SIZE], i, sum;
...
sum = 0;
#pragma omp parallel for
for (i = 0; i < ARR_SIZE; i ++)
#pragma omp critical
sum += a[i];
printf ("Sum == %d\n", sum);
int cs_even = 0, cs_odd = 0;

...
#pragma omp parallel for num_threads (numThreads)
for (i = 0; i < numThreads; i ++)
if (i % 2)
#pragma omp critical (odd)
cs_odd ++;
else
#pragma omp critical (even)
cs_even ++;
ordered example
Remaining issues
#pragma omp parallel for ordered

for (i = 0; i < numThreads; i ++)
{
do_sth_in_any_order();
#pragma omp ordered
doSth(); //it will be done in ordered manner
}
Nested parallelism
I
I
I
function omp set nested ()

function omp get nested ()
OMP NESTED environment variable
Using third part libraries

Omp Examples

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Omp Examples

Transféré par

Droits d'auteur :

Formats disponibles

CONCURRENT AND PARALLEL

Wojciech Mikanik, PhD

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

1. Matrix multiplication example

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Sequential version: C implementation

void matrix_mult (double a[N][N],

for (i = 0; i < N; i ++){

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

(AGAIN) Sequential version: pseudocode

Parallel version: C + pthread library

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Reasonable size of matrices (n 64)

Thread creation is not free

Computing power comes from a CPU not from a thread

Computing system with small to medium number of CPUs

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Parallel version: C + pthread library (part I). File matrix-pthreads.c.

Parallel version: C + pthread library (part II)

void matrix_mult (double a [N][N], double b[N][N], double c[N][N]){

void * thread_proc (void * a){

assert (sem_init (&s, 0, 0) == 0);

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

Parallel version: C + OpenMP

Sequential version: C implementation

void matrix_mult (double a[N][N],

CONCURRENT AND PARALLEL PROGRAMMING

void matrix_mult (double a[N][N],

#pragma omp parallel for private(j, s)

for (i = 0; i < N; i ++){

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

A way to do multithreaded programming

Focus: parallel execution of loops

set of compiler directives

OpenMP extends, but does not modify, a base programming

CONCURRENT AND PARALLEL PROGRAMMING

OpenMP enabled compilers

Nov 1996 spec. ANSI X3H5

OpenMP homepage: www.openmp.org

CONCURRENT AND PARALLEL PROGRAMMING

Execution of OpenMP programs

Intel (Linux, MSWindows)

Scalable parallel applications for shared memory computers

C++ compiler ver 9.x: spec. 2.5 OpenMP

void matrix_mult (double a[N][N],

Microsoft Visual Studio 2005, Visual .NET: spec. 2.0

GNU C++ ver. 4.2.0 (TBA)

Sun Studio ver 11 (Solaris)

PGF77 and PGF90 Compilers from The Portland Group, Inc.

IBM XL Fortran and C (IBM AIX)