Vous êtes sur la page 1sur 10

CONCURRENT AND PARALLEL

PROGRAMMING

Part I
OpenMP

Wojciech Mikanik, PhD


wojciech.mikanik@polsl.pl
room 503

February 2008

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Outline

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Matrix multiplication
Sequential version: pseudocode

1. Matrix multiplication example


I
I
I

sequential solution
pthreads solution
OpenMP solution

for i := 1 to n do
for j := 1 to n do
C[i][j] := 0;
for s := 1 to n do
C[i][j] := C[i][j] + A[i][s] * B[s][j];
end for;
end for;
end for;

2. Introduction to OpenMP
3. Expressing parallel execution
4. Work sharing
5. Data storage attributes
6. Synchronization
7. Nested parallelism
8. Using third party libraries

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Matrix multiplication

Matrix multiplication

Sequential version: C implementation

PRAM CREW

void matrix_mult (double a[N][N],


double b[N][N],
double c[N][N]){
int i, j, s;

for (i = 0; i < N; i ++){


for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}

parfor i := 1 to n do
parfor j := 1 to n do
C[i][j] := 0;
for s := 1 to n do
C[i][j] := C[i][j] + A[i][s] * B[s][j];
end for;
end parfor;
end parfor;

}
File matrix-seq.c.
Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

Matrix multiplication

Matrix multiplication

(AGAIN) Sequential version: pseudocode

Parallel version: C + pthread library

CONCURRENT AND PARALLEL PROGRAMMING

Facts
for i := 1 to n do
for j := 1 to n do
C[i][j] := 0;
for s := 1 to n do
C[i][j] := C[i][j] + A[i][s] * B[s][j];
end for;
end for;
end for;

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Reasonable size of matrices (n 64)

Thread creation is not free

Computing power comes from a CPU not from a thread

Computing system with small to medium number of CPUs

Decision
Thread i computes rows i, i + NT , i + 2NT , i + 3NT , . . . of the
result matrix C
(NT Number of Threads)

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Matrix multiplication

Matrix multiplication

Parallel version: C + pthread library (part I). File matrix-pthreads.c.

Parallel version: C + pthread library (part II)

typedef struct {
double (*a) [N];
double (*b) [N];
double (*c) [N];
sem_t * s;
int id;
} thread_args;

void matrix_mult (double a [N][N], double b[N][N], double c[N][N]){


int i;
thread_args ta[NT];
sem_t s;
pthread_t p;

void * thread_proc (void * a){


thread_args * arg = a;
int i, j, s;
for (i = arg->id; i < N; i += NT)
for (j = 0; j < N; j ++){
arg->c[i][j] = 0.0;
for (s = 0; s < N; s ++)
arg->c[i][j] += arg->a[i][s] * arg->b[s][j];
}
assert (sem_post (arg->s) == 0);
return NULL;
}
Wojciech Mikanik, PhD

assert (sem_init (&s, 0, 0) == 0);


for (i = 0; i < NT; i ++){
thread_args x = { a, b, c, &s, i};
ta[i] = x;
assert (0 == pthread_create (&p, NULL, thread_proc, ta + i));
}
for (i = 0; i < NT; i ++)
assert (sem_wait (&s) == 0);
}
.

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

Matrix multiplication

Matrix multiplication

Parallel version: C + OpenMP

Sequential version: C implementation

void matrix_mult (double a[N][N],


double b[N][N],
double c[N][N]){
int i, j, s;

CONCURRENT AND PARALLEL PROGRAMMING

void matrix_mult (double a[N][N],


double b[N][N],
double c[N][N]){
int i, j, s;

#pragma omp parallel for private(j, s)


for (i = 0; i < N; i ++){
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}

for (i = 0; i < N; i ++){


for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
}

File: matrix-seq.c

File: matrix-omp.c
Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

OpenMP

OpenMP
I

Goals
I
I

A way to do multithreaded programming

Focus: parallel execution of loops


OpenMP defines

I
I
I

set of compiler directives


(small) set of subroutines
bindings for programming languages (Fortran, C, C++)

History
I
I

OpenMP extends, but does not modify, a base programming


language

I
I
I
I

I
I
Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

OpenMP enabled compilers

Nov 1996 spec. ANSI X3H5


Oct 1997 spec. 1.0 OpenMP (FORTRAN)
1998 spec. 1.0 OpenMP (C/C++)
1999 spec. 1.1 OpenMP (FORTRAN)
2006 spec. 2.5 compilers available (FORTRAN, C, C++)
Feb 2008 spec. 3.0 defined

OpenMP homepage: www.openmp.org


OpenMP user group: www.cOMPunity.org
Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Execution of OpenMP programs

Intel (Linux, MSWindows)


I

Scalable parallel applications for shared memory computers


Portability of performance
Fast and gradual development of parallel versions of existing
sequential code
Easy simultaneous maintenance of parallel and sequential
source code
Development of new parallel applications from scratch

C++ compiler ver 9.x: spec. 2.5 OpenMP


FORTRAN compiler

void matrix_mult (double a[N][N],


double b[N][N],
double c[N][N]){
int i, j, s;

Microsoft Visual Studio 2005, Visual .NET: spec. 2.0


OpenMP (MSWindows)

GNU C++ ver. 4.2.0 (TBA)

Sun Studio ver 11 (Solaris)

PGF77 and PGF90 Compilers from The Portland Group, Inc.


(PGI) (Intel Linux, Intel Solaris, Intel Windows/NT)

IBM XL Fortran and C (IBM AIX)

HP, SGI, Fujitsu

#pragma omp parallel for private(j, s)


for (i = 0; i < N; i ++){
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
}
File: matrix-omp.c

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Coding OpenMP programs


I

I
I

Format
#pragma omp directive [clause ]
Parallel region
#pragma omp parallel [clause ]
Work sharing
Synchronization

I
I
I
I

I
I

#include <omp.h>
Query functions
Lock functions
Dynamic adjustment of number of threads
Nested parallelism

I
I
I

OMP
OMP
OMP
OMP

I
I

I
I
CONCURRENT AND PARALLEL PROGRAMMING

function omp set num threads()


OMP NUM THREADS environment variable

region wide
num threads clause

Dynamic adjustment of the number of threads


I

Work sharing directives

application wide
I

DYNAMIC
NUM THREADS
NESTED
SCHEDULE
Wojciech Mikanik, PhD

Getting current thread number: omp get thread num()


Getting number of threads: omp get num threads()
Detecting parallel region: omp in parallel ()

Setting number of threads


I

Environment variables
I

parallel directive
#pragma omp parallel
{
printf("hello world\n");
}
Useful query functions
I

Runtime library
I

Complier directives
I

Expressing parallelism

function omp set dynamic ()


function omp get dynamic ()
OMP DYNAMIC environment variable
Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

sections directive
Syntax

I
I
I
I
I

sections directive
combined parallel sections directive
for directive
combined parallel for directive
single directive
#pragma omp single
{
doSth (defaultDelay);
single2 = omp_get_thread_num();
}

#pragma omp sections [clause]


{
[ #pragma omp section ]
block1
[ #pragma omp section
block2 ]
[ #pragma omp section
block3 ]
...
}

cobegin
block1
block2
block3
...
coend;

master directive
#pragma omp master
master = omp_get_thread_num();
Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

sections directive

sections directive

An example

sections vs pure parallel

#pragma omp parallel


{
#pragma omp sections
{
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
#pragma omp section
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());

#pragma omp parallel


{
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
}

sections
#pragma omp section
{
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
}
#pragma omp section
{
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
}

I
I

parallel

I
I

Exactly 4 messages
Thread numbers printed is any 4 element
permutation with repetitions of integers
[0 . . . teamSize 1]
Exactly teamSize messages
Thread numbers printed is a permutation
without repetitions of integers
[0 . . . teamSize 1]

}
Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

parallel sections directive

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

for directive

#pragma omp parallel sections


{
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
#pragma omp section
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
#pragma omp section
{
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
}
#pragma omp section
{
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
}

Syntax
#pragma omp for [clause]
for-loop

Example
int a[ARR SIZE];
#pragma omp parallel
{
#pragma omp for
for (i = 0; i < ARR SIZE; i ++)
a[i] = 1;
}

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

parallel for directive

parallel for directive

schedule clause
I
I
I

Default scheduling: implementation dependent


schedule(static, chunk size)
I

chunks of a chunk size size or 1 (if chunk size not provided)


a queue of chunks

schedule(guided, chunk size)


I

schedule(dynamic, chunk size)


I

chunks of a chunk size size or equal size (if chunk size not
provided)
round-robin in the order of the thread number,

chunks of a decreasing size (down to chunk size) or 1 (if


chunk size not provided)
a queue of chunks

schedule(runtime)
scheduling according to the value of OMP SCHEDULE
environment variable
Wojciech Mikanik, PhD

Syntax
#pragma omp parallel for [clause]
for-loop
Example
double a [N][N], b [N][N], c [N][N];
int i;
...
#pragma omp parallel for
for (i = 0; i < N; i ++){
int j, s;
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}

CONCURRENT AND PARALLEL PROGRAMMING

Data model

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Data model
Declaration

Variables can be
I
I
I

Shared
Private
Subject to reduction

I
I
I

Defaults
I
I

Usually: shared
Private
I
I
I

Clauses

I
I

Local (not static) variables of subroutines called by a thread


Automatic variables within a block of threads code
Control variable of parallel for loop

CONCURRENT AND PARALLEL PROGRAMMING

list )

Thread private variables


I
I

Wojciech Mikanik, PhD

shared ( list )
private ( list )
default ( list )
firstprivate
lastprivate
reduction (operator :
threadprivate directive
copyin clause

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Data model

Data model

Example (from matrix-omp.c)

Example (from matrix-omp.c)

void matrix_mult2(){
double a [N][N], b [N][N], c [N][N];
int i;
...
#pragma omp parallel
{
#pragma omp for
for (i = 0; i < N; i ++){
int j, s;
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
}
}
Wojciech Mikanik, PhD
CONCURRENT AND PARALLEL PROGRAMMING

Data model

void matrix_mult3(){
double a [N][N], b [N][N], c [N][N];
int i, j, s;
...
#pragma omp parallel
{
#pragma omp for private(j, s)
for (i = 0; i < N; i ++){
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
}
}
Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

reduction clause

Consistency of shared data

Sequential consistency
I
I

benefits
drawbacks

Relaxed consistency
I
I

temporary view
synchronization points
I
I
I

barriers (both explicit and implicit)


entry to and exit from critical regions
entry to and exit from lock routines

directive flush ( list )

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Syntax: reduction (operator :

list )

List of operators: +, *, -, &, ^, |, &&, ||

Example:
#define ARR_SIZE 100000
int a [ARR_SIZE], i, sum;
...
sum = 0;
#pragma omp parallel
{
#pragma omp for reduction (+: sum)
for (i = 0; i < ARR_SIZE; i ++)
sum += a[i];
}

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Synchronization

I
I

for directive
sections directive
single directive

#pragma omp sections


{
printf("Thread #%d in sections\n", omp_get_thread_num());

nowait clause
Explicit synchronization (directives)
I
I
I
I
I

#pragma omp parallel


{
#pragma omp for nowait
for (i = 0; i < numThreads; i++)
doSth(i * delay);

Implicit synchronization
I

nowait example

master
critical
atomic
barrier
ordered directive and ordered clause

#pragma omp section


printf("Thread #%d in sections\n", omp_get_thread_num());

Explicit synchronization (locks)

#pragma omp section


printf("Thread #%d in sections\n", omp_get_thread_num());
}
}

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

critical directive
I

CONCURRENT AND PARALLEL PROGRAMMING

Named critical directive

Syntax
#pragma omp critical [ (name)]
block

Wojciech Mikanik, PhD

Example
#define ARR_SIZE 100000
int a [ARR_SIZE], i, sum;
...
sum = 0;
#pragma omp parallel for
for (i = 0; i < ARR_SIZE; i ++)
#pragma omp critical
sum += a[i];
printf ("Sum == %d\n", sum);

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

int cs_even = 0, cs_odd = 0;


...
#pragma omp parallel for num_threads (numThreads)
for (i = 0; i < numThreads; i ++)
if (i % 2)
#pragma omp critical (odd)
cs_odd ++;
else
#pragma omp critical (even)
cs_even ++;

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

ordered example

Remaining issues

#pragma omp parallel for ordered


for (i = 0; i < numThreads; i ++)
{
do_sth_in_any_order();
#pragma omp ordered
doSth(); //it will be done in ordered manner
}

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Nested parallelism
I
I
I

function omp set nested ()


function omp get nested ()
OMP NESTED environment variable

Using third part libraries

Wojciech Mikanik, PhD

CONCURRENT AND PARALLEL PROGRAMMING

Vous aimerez peut-être aussi