Vous êtes sur la page 1sur 4


Homework # 5

Serhat Can

Task 1
For this task we were required to parallelize the given Jacobi method, and compare its
results (computation time, speed up etc.) with the serial one. Firstly the theoretical speed up
presented here.
Theoretical speed up according to Amdahl is defined as following:
() =

, where is the serial and is the parallelizable part of the code. p denotes the
number of processors. From the given formula, the theoretical speed up can be calculated.
Below graphic represents this theoretical formula:

Fig. 1 Theoretical Speed up vs. Number of Processors ( = 803.275198s & = 803.230844s)

As it was required in the task, matrix A and vector b is obtained from the method.
Unfortunately a proper running code is not ready for use. Nevertheless my intent was to do a
row distribution for the vector x, do the partial sums over matrix vector multiplication and then
gather the partial sums on root processor 0. Then check if the tolerance is met or not.

Depending on this if statement, either continue to iterate of exit from the jacobi_solve()
My implementation is able to scatter the partial x vectors among processors correctly.
Partial sums are calculated but on the point of gathering these sums it fails. After this point
every calculation is done wrong. After the partial sums are summed and the current row value
of xNew is found, code checks the tolerance. But since the summation has a wrong value, it
tolerance check is also wrong. After the tolerance check, this Boolean value (either 1 or 0) is
scattered to the other processors from the root processor, but fails to do so because of reasons
that I cannot understand. Ive tried to broad cast the tolerance value but that didnt work as well.
If all these steps were run correctly and if the tolerance was not met, then the x value is
scattered to other processors again.
Reader can find the implemented code in the attachment of the mail.

Task 2
For this task we were required to parallelize the domain decomposition over a 2-D surface
heat problem.
Again a running code is not ready for use. Nevertheless my intent was to divide the
domain into sub-regions with equal areas. In order to do that, I did the following assumptions:
Processor numbers always come with the powers of number 2 and domain size is a square with
its sides exactly divisible to number of processors (meaning no remainder). These assumptions
are important because the equal sized sub-regions can be automatically divided with a
formulized for-loop in the solving method. This formulization is as following:
= log 2

If k is exactly divisible to 2 do the following: Divide the size of rows and columns into
equal parts.

If remainder from division of k to 2 is 1 do the following: Divide the size of rows into

equal parts and divide the size of columns into ( ) 2 equal parts.
Logic behind this formulization is first to divide columns than the rows consecutively.
For example if the number of processors is 8, then we the rows will be divided by 2 and columns

will be divided by 4 thus giving us 8 equal sized sub-region. If it was 16, than we would have
rows and columns divided into 4 equal parts resulting in 16 equal sized sub-regions.
After this formulization, the for-loops are prepared accordingly. Each sub region is
marked with a processor number and loops directly jump to their region with the help of
processor numbers.
After decomposing the domain, the most important part comes: data transaction among
regions. Unfortunately I couldnt make it to the deadline, but my intent was to use simple send
and receive functions, since only neighbouring regions exchange data between each other. After
all the regions complete their partial solution, gather all the partial solutions on root processor
and then return the iteration number and print one frame. After that do the whole process again
until maximum iteration number is met.
There were some couple of errors during the compiling part. For example, the index value
for the x vector is type double in my formulation because I divide the sub-regions and the
corresponding vector parts with the help of square root function, which happens to give type
double as a result. Due to time limit, I couldnt find a way to give a type int result out of
square root function.
Also the incomplete parts (such as the boundary conditions that are determined by a
ternary condition assignment through communication between processors) cause compiling
errors as well.
Reader can find the implemented code in the attachment of the mail.