Vous êtes sur la page 1sur 5

Introduction to High Performance Computing

Problem Set #5

The assignment is built upon a PETSc example that is intended to illustrate how multicomponent systems of nonlinear elliptic PDEs can be solved implicitly, in parallel. Such systems arise
in engineering, physics, physical chemistry, biomechanics, ecology, finance, image processing, and
other endeavors.
The two-dimensional driven cavity studied herein is a celebrated test problem in computational
fluid dynamics. Though no exact solutions can be written down, highly accurate solutions have been
tabulated and may be consulted in the engineering literature to test the convergence of the many
different discretizations that have been proposed for many different formulations. The system below
arises in a tall aspect ratio in the application of thermally insulated storm windows (for buildings).
It arises in a squat aspect ratio in the application of ocean circulation.
Popular formulations of the PDEs include primitive variables (velocities only), streamfunctionvorticity (no velocities), streamfunction only (a fourth-order system from which both velocities
and the vorticity have been eliminated), and velocity-vorticity. This example employs the last
formulation. It leads to a set of four equations, each of which is led by a Laplacian in one of
the four variables. It is instructive, especially if you are a mechanical engineer or an applied
mathematician, to start with the primitive variable formulation and derive all of the others, and to
consult the literature on the pros and cons of each formulation for numerical computation. However,
such an exercise is beyond the scope of a course that emphasizes the high performance computing
aspect of the problem. It is good for an independent project, but we treat equations (1)(4) below
as a celebrated black box gift from applied mechanics.
Note that two of the four equations are linear and two are quadratically nonlinear, and that the
equations are coupled together in interesting ways. Vorticity gradients appear as source terms for
the velocities. A temperature gradient appears as a source term for the vorticity equation. Finally,
the velocities appear as convective coefficients for gradients of both vorticity and temperature,
introducing nonlinearity. In Unit 6, Parts 2 and 3, we looked at nonlinear preconditionings of a
system closely related to this one, but this exercise uses only linear preconditioning of the Jacobian
of a full Newton approach in the original variables.
The source code, cavity flow.c, for this problem set can be found in the ps5 directory, which
is installed inside the course project directory of Shaheen or Blue Waters. Follow the instructions
that were provided in Problem Set #4 to copy the source code directory to your working directory.
You are free to use any number of MPI processes to perform the following experiments, unless
otherwise mentioned. [Note: Do not oversubscribe a compute node by requesting more MPI ranks
than what it can afford. Hence, a Shaheen or Blue Waters node has 32 compute cores in which
you can allocate at most 32 MPI processes per node; this calculation assumes that hyperthreading
is disabled in the system on which you are working.]
If you have PETSc built on your workstation, you may employ PETScs interactive graphics to
make contour plots of the fields in this example. This interactive capability of the package does not
make sense, however, in the traditional asynchronous batched environment of the supercomputer.
1) The Lid- and Thermally-driven Cavity
cavity flow.c approximates the solution of the following system of four elliptic partial differential
equations in two dimensions (x, y) on a uniform grid, assuming that there is a steady state to this
nonlinear system at the parameters chosen:

2 v +
x

T
2
+ u
+v
Gr
x
y
x
T
T
2 T + Pr(u
+v
)
x
y
2 u

= 0,

(1)

= 0,

(2)

= 0,

(3)

= 0,

(4)

where (u(x, y), v(x, y)) are the components of the velocity fields in the (x, y) directions, (x, y)
v
u
y + x is the normal component of the vorticity, and T (x, y) is the temperature.
These equations are subject to boundary conditions:
along the bottom (0 < x < 1, y = 0): u = v = 0,
along the top (0 < x < 1, y = 1): u = Vlid , v = 0,

T
y
T
y

=0
=0

along the left (x = 0, 0 < y < 1): u = v = 0, T = 0


along the right (x = 1, 0 < y < 1): u = v = 0, T = 1 if Gr > 0 or 0 otherwise
v
On each boundary, is given by its definition, (x, y) = u
y + x , based on the normal
gradients of either u or v (the tangential gradients being zero, according to the underlying velocity
boundary conditions, which are constant along each boundary in this example).
Play with the three dimensionless parameters switchable with the command-line flags -grashof,
-prandtl, and -lidvelocity to develop an intuition for how the problem difficulty varies with
their relative magnitudes. (Pay attention to the initial residual norms in addition to the rate at
which they are reduced.) The first two parameters are the dimensionless Grashof and Prandtl
numbers, and the lid velocity is a proxy for the Reynolds number. When the Grashof number is
high, the free convection is more important. When the Reynolds number is high, the the forced
convection is more important, and when the Prandtl number is high, viscosity is more important
than thermal conductivity, independent of the magnitudes of the forcing caused by increasing the
other two numbers. (Prandtl is near unity for air. Prandtl is very large for oil and very small
for liquid metals, like mercury. Prandtl determines the relative thickness of the boundary layers
caused by the viscous and thermal forcing.)

(a) Nonlinear algebraic convergence


Using the default PETSc options for the nonlinear solver SNES and the linear solver components
KSP and PC, make a table on a fixed mesh of 256 256 uniformly spaced points of at least 27
triples of -grashof -prandtl -lidvelocity (three values for each, varied independently), and
tabulate:
the initial residual norm on the fine grid,
how many nonlinear iterations on the fine grid,
the total execution time counted by PETSc profiler, and
whether the nonlinear iterations converges or diverges.

For every test case, you should investigate the nonlinear iteration convergence, and if a test
case diverges, then do the following:
mark that case in your table
try to solve the convergence issues of each case using PETScs powerful grid sequencing
continuation method for nonlinear problems. Mesh sequencing is like a one-way form of
multigrid. The problem is first solved on a coarse grid and the results are interpolated
(automatically by PETSc) to the next finer level. This continues in factors of 2, say, in mesh
spacing, until the finest mesh is reached. The finest mesh, on which iterations are expensive,
should require only a few iterations, since it is initialized from a set of fields that were already
solutions at the previous refinement. Make a new table for these cases, which initially
diverge, and tabulate:
the initial residual norm of the first CONVERGED refinement sequence,
the sum of the all nonlinear iterations on the fine grid,
maximum number of refinements steps you employed,
the total execution time counted by PETSc profiler, and
whether the nonlinear iterations converges or diverges after grid sequencing.
Please document the set of command lines that you used to employ mesh sequencing.
You may use any parameter triples that you wish, or consider some defaults. Suggested values
for Grashof are 1, 100, and 1000. Suggested values for Prandtl are 0.1, 1, and 10. Suggested values
for the lid velocity (Reynolds) are 104 , 0.1, and 10.
The initial condition of the velocity is zero everywhere (at rest). When the problem starts
up, there is an impulse transmitted from the top boundary for the lid velocity and from the right
boundary for the Grashof. Note that when the Reynolds number is low, the Grashof number has a
strong influence on the initial residual. When the Grashof number is low, the Reynolds number has
a strong influence on the initial residual. Does Prandtl have an influence on the initial residual?
(b) Linear algebraic convergence using domain-decomposed preconditioners
Use middle element of your table of Part(a), which is the case where the three parameters are all
at their intermediate value on a fixed mesh of 256 256 uniformly spaced points.
Use additive Schwarz domain preconditioner; and try to vary the subdomain overlap and ILU
fill to find the best pair in terms of the execution time.
Experiment at least two different distributed-memory settings for every case (e.g., 64 MPI ranks
in 2 nodes, and 256 MPI ranks in 8 nodes).
Note at least one trend you find interesting, relative to the default with one level of overlap,
and no extra fill. Document the set of command lines that led you to it.
Tabulate the following five performance characteristics for each parametrically defined method:
(1) nonlinear solution time, (2) linear solution time, (3) nonlinear iterations, (4) the sum of the
linear subiterations over all nonlinear iterations, and (5) the computational rate (flops/sec).
2) Scalability Study
Use the best pair in terms of the execution time of Schwarz overlap and ILU fill to perform the
following scalability studies.
(a) Strong scalability

Perform a strong scalability study up to 4096 cores solving 4096 4096 mesh points.
(b) Weak scalability
Conduct a weak scalability study by solving a 32 32 problem on a single core of Blue Waters or
Shaheen, then scale the problem up to the full number of cores so that the number of grid points
per process remains the same.
(c) Questions for this section
What is the parallel efficiency at the full number of cores for the strong and weak scaling studies?
How many GFlop/s is the code performing? What fraction of the peak performance of your
allocation are you achieving? How do you explain the relatively low fraction of peak performance
that you obtain?
3) Coding exercise: the student-driven cavity
Use the cavity flow.c code to augment the test suite as follows. Invent a new property; call it
S(x, y) for Student. [Hint: S could be a form of salinity, which is also capable of setting up
a buoyant convection through a density gradient.] Study the structure of the example code and
consult the PETSc manual in order to modify this C source code as follows. Add a fifth component
to the problem, with an equation for S exactly like the fourth component, which is the energy
equation. You will need to define a new parameter like the -Prandtl number -call it A; and a new
parameter like the -Grashof number call it B. Hook these parameters into the command line by
modifying the code. You should be solving the following system.

2 v +
x

T
S
2
+ u
+v
Gr
+B
x
y
x
x
T
T
2 T + Pr(u
+v
)
x
y
S
S
+v
)
2 S + A(u
x
y
2 u

= 0,

(5)

= 0,

(6)

= 0,

(7)

= 0,

(8)

= 0

(9)

Set up boundary conditions for S as follows:


along the bottom (0 < x < 1, y = 0): S = 1 (S is created/injected)
along the top (0 < x < 1, y = 1): S = 1 (S is destroyed/removed)
along the sides (x = 0, 1, 0 < y < 1):

S
x

= 0 (side walls are impermeable with respect to S)

Set up the initial iterate for S(x, y) to linearly interpolate these boundary conditions.
(a) Addition of decoupled (passive) scalar field (four-plus-one)
Choose A = 1 and B = 0, and run the simulation with: -lidvelocity 10, -prandtl 1, and
-grashof 1000.

Capture the contour plot of S using contours runtime option, and save it. Since the S-equation
does not couple back into any of the other equations, the other four fields should not change as you
vary A. Verify that this is true.
(b) A coupled five-component system
Now, put a source term like the -Grashof term into the vorticity equation, using parameter B and
a gradient of S. Play with B, keeping the other parameters fixed at the center values you chose in
part 1. Interpret your results.
(c) Code documentation
Submit a listing of your FormFunction in your homework solution report, the edited C code, and
the makefile.

Vous aimerez peut-être aussi