Cs240a Project Report

Parallelization of a Navier-Stokes Solver for Two-phase Fluid Flows
CS240A Final Project Spring 2010

Peter Burns, Maged Ismail, Nagy Mostafa
1. Introduction
Two-phase fluid flows are encountered in a wide variety of natural processes and industrial applications, e.g. ocean waves, oil/gas processing and transport, combustion devices, and nuclear reactors. Numerical simulation of these problems is a complex and challenging task, and several approaches to simulate such flows have been developed in the last two decades. This project employs an existing level-set NavierStokes solver, and aims to increase its speed and efficiency by parallelization of the bottleneck of the code, namely the pressure Poisson equation.
2. Problem Description
The physical problem that we investigated is the so-called "Bretherton problem", which is a classical problem in fluid dynamics involving the flow of a finger of a less viscous fluid displacing a more viscous fluid between two infinite parallel plates; a schematic is shown on Figure.1. It represents a model problem relevant to two-phase flows in porous media, with important application in the field of oil recovery. The effect of the Capillary number, which represents the ratio between the viscous force and the force due to surface tension, on the evolution of the interface, is shown on Figure 2.
Figure 1. Schematic of the Bretherton problem.
3. Governing Equations
The governing equations consist of the incompressible NavierStokes equations, which express the conservation of mass and momentum, and can be written, in dimensionless form, as:
T u ! 0
Ca = 1
Ca = 0.1
Ca = 0.01
Figure 2. Evolution of the interface at t = 0, 0.5, 1, 1.5, and 2 sec. for different Capillary numbers.
T du T T 1 2T T (u )u ! p u f dt Re
where Re is the Reynolds number (Re = UL/ ) and f represent external forces (e.g. surface tension, gravity, etc.). The velocity field resulting from solving the Navier-Stokes equations is then used to advance the level-set function, according to the advection equation:
T xJ J u ! 0 xt
4. Numerical Methods
The numerical solution of the governing equations is carried out using the following steps: Discretization using finite differences Defining a scalar and vector fields on the grid. A staggered grid (Figure 3) is used to avoid oscillations in the solution due to odd-even decoupling of the velocity and pressure. Using operator splitting to handle the velocity-pressure coupling
Applying a projection method to enforce the divergence free constraint on the velocity field. Using the calculated velocity field to advance the level-set function, whose zero isocontour represents the interface between the two fluids.
Figure 3. Staggered grid.
5. Implementation
The existing serial code is based on finite-difference discretization of the continuity, momentum and level-set advection equations in two-dimensional Cartesian coordinates. It can be used for uniform or non-uniform spatial grid, and has adaptive time stepping which ensures the stability of the algorithm. It uses third-order TVD (total-variation diminishing) time stepping algorithm and second order ENO (essentially non-oscillatory) scheme for convective terms. The code is written in C language and relies on PETSc package for solving the resulting linear systems.
5.1 Serial Code Analysis

We were able to successfully install and configure PETSc on Triton. After that we deployed the existing serial code on Triton and performed time-breakdown analysis of the code. Our experiments confirmed, as revealed by inspection of Table 1 and Figure 4, that the solution of the linear system resulting from the pressure Poisson equation is the main bottleneck of the code. We also noticed that the percentage of the computational time taken by the pressure Poisson equation solver increases as we refine the Cartesian mesh, i.e. increase the number of grid points.
problem size pressure u_vel v_vel other total 120x20 18.65 1.98 1.91 24.921 47.461 240x40 196.96 22.77 22.86 86.63 329.22 480x80 2908.09 280.61 314.99 1072.77 4576.46 Table 1. Time breakdown of the sequential code
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 120x20 240x40 480x80 other v_vel u_vel pressure
Figure 4. Time breakdown of the sequential code.
5.2 PETSc Overview

PETSc (Portable Extensible Toolkit for Scientific Computation) is a suite of data structures and routines for the parallel solution of scientific applications modeled by partial differential equations. It is based on MPI standards for communication. PETSc has the advantage of keeping the MPI communication calls transparent to the developer. It combines powerful parallel data structures and linear system solvers with the ease of use through a simple, well-defined API.
5.3 Parallelization Plan

The serial code proceeds by solving two linear systems for velocities and from the results, the right-hand side matrix for the pressure linear system is formed. As shown in Figure 4, solving the pressure constitutes the majority of the work done. We intend, therefore, to focus on parallelizing the solution of pressure only as an initial step to parallelize the whole program. Figure 5 shows a schematic of how the parallel program should proceed. Initially, all processors initialize the PETSc structures needed for
solving for pressure. All other computations will be carried out on the root processor sequentially. Once the right-hand side of the pressure system is ready on the root processor, it is broadcasted to all processors to be plugged into their solvers. The pressure solver then proceeds in parallel. After a solution is found, it is gathered back on the root processor to be used for the next iterative step. We chose to parallelize in such manner to keep the implementation simple given the time-frame for this course project.
5.4 Methodology
We rely on PETSc version 2.3.3 using Hypre's preconditions for the solvers. We compiled PETSc against mpich using the Intel C Compiler (ICC) 11.1. We run our experiments on Triton cluster which features Appro gB222X Blade Server nodes with dual quad-core Intel Xeon E5530 processors with Intel Microarchitecture Nehalem, running at 2.40 GHz. Each node has 24 gigabytes of memory, an eightmegabyte cache and 10-gigabit Myrinet connection.
5.5 Issues
In this section we list two issues that we face during this project and how we overcame them. The first issue was installing PETSc on Triton. We could, without difficulties, download, configure and build PETSc and MPI-1.4 locally on Triton (no administrator rights). We failed, however, to make it run on more than 9 processors. We suspect this is related to a bug in openmpi compiler. After several trials, we managed to overcome this using Intel C Compiler instead and mpich. Another issue is that the division of the grid among processors is not uniform. This presented a problem when doing an MPI Gather over the pressure solution. Therefore, we had to write our own custom gather that deals with this issue via MPI sends and receives that collects the data in a hierarchical fashion.
Create a parallel matrix Mat A; MatCreate(PETSC_COMM_WORLD, &A); MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, size - 1, size - 1); MatSetType(A, MATAIJ); MatMPIAIJSetPreallocation(A, 5, PETSC_NULL, 5, PETSC_NULL); MatGetOwnershipRange(A,&start,&end); Set values if (myrank == 0){ // local MatSetValue(A, 0, 0, value, ADD_VALUES); // remote (cached) MatSetValue(A, size-1, size-1, value, ADD_VALUES); } // perform necessary communication MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
Table 2. PETSc example.
Figure 5. Code parallelization diagram.
6. Results
6.1 Strong Scaling
In this section we investigate the scalability of our implementation when we double the number of processors from 1 to 32. We tested with a problem size of 2400x400 grid (960,000 variables) using four PETSc solvers. Namely, BiConjugate Gradient Stabilized (BCGS), Generalized Minimal Residual (GMRES), Conjugate Gradient (CG) and Conjugate Gradient Squared (CGS). Figure 6 shows the time breakdown for different steps of the solution for different solvers and processors. One can see that, as speedup is achieved in the pressure solver, the bottleneck shifts to be the computation steps performed per simulation step in preparation for solving the linear systems. The trend is nearly identical for different solvers. Figure 7 explains this more by showing the speedup for the pressure solver for each type of solver. Again, the behavior is nearly identical for low number of processors. There is a significant increase in speedup with more than 8 processors and CGS starts to show better scalability. Figure 8 completes the picture by showing the absolute running time in seconds for every experiment. It can be seen that GMRES is the fastest at small scale (1-8 processors). All solvers execution time converges to less than 1000 seconds at 32 processors. Finally, Figure 9 shows the overall program speedup. Naturally, the overall speedup is less than the pressure solver speedup. CGS is always the best scaling solver. In conclusion, we find that GMRES is the best performing solver, especially at small scale, and CGS to be the best scaling solver.
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 CG 8 16 32 1 2 4 8 16 32
BCGS
GMRES
CGS
other
Figure 6. Time breakdown of the parallel code with different pressure solvers.
13 11 9
Speedup
7 5 3 1 1 BCGS 2 4 GMRES 8 CG 16 CGS 32
Figure 7. Effect of the linear system solver on the speedup of the pressure linear system.
10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 1 BCGS 2 4 GMRES 8 CG 16 CGS 32
Time (s)
Figure 8. Total run time of the pressure solvers.

3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 1 BCGS 2 4 GMRES 8 CG 16 CGS 32
Speedup
Figure 9. Effect of the linear system solver for pressure on the overall speedup.
6.2 Weak Scaling

The idea of weak scaling analysis is to test the performance of the code while keeping the problem size proportional to the number of processors. We define our weak scaling efficiency, , as
Lp !
t1 N , with = constant tp p
In our experiments, we kept the ratio of problem size to the number of processors, N/P, constant at 50,000 and examined the performance of the aforementioned linear system solvers. In Figure 11, we see that GMRES and CGS are the most efficient methods for solving the pressure linear systems with efficiencies of 0.86 and 0.92 respectively for two processors but declining to an efficiency of 0.25 for 16 processors. In general, we found that the efficiency for all four methods shows a steady decline as the number of processors goes up. Figure 12, shows the total time needed to solve the pressure linear system. This Figure is important because it shows that although the CGS method is the most efficient in parallel, it is also the slowest. However, the GMRES method is both the fastest of the four methods tested and has near top efficiency. The parallel performance of each method is a balance of how much communication the method uses versus the number of iterations required for the method to converge. On average, the GMRES method needed 10 more iterations to obtain convergence than BCGS but still performed better in both overall time and parallel efficiency. We would need to have access to the algorithms used by PETSc to compare communication and computation costs of the two methods in more depth.
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 2 4 BCGS 8 16 1 2 4 GMRES 8 16 1 2 4 CG 8 16 1 2 4 CGS 8 16
other
Figure 10. Time breakdown of the parallel code with different pressure solvers for different problem sizes. The problem size scales with number of processors.
0.75
Efficiency
0.5
0.25
0 1 BCGS 2 GMRES 4 CG 8 CGS 16
Figure 11. Efficiency of the pressure solvers with different problem sizes.
1400 1200 1000 Time (s) 800 600 400 200 1 BCGS 2 GMRES 4 CG 8 CGS 16
Figure 12. Total run time of the pressure solvers with different problem sizes.
References
PETSc Users Manual, Satish Balay, Kris Buschelman, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Barry F. Smith, and Hong Zhang, ANL-95/11 Revision 2.3.2, Argonne National Laboratory, September, 2006. The motion of long bubbles in tubes, Bretherton, F. P., J. Fluid. Mech., Vol. 10, 1961, pp. 166.

Cs240a Project Report

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Cs240a Project Report

Transféré par

Droits d'auteur :

Formats disponibles

Parallelization of a Navier-Stokes Solver for Two-phase Fluid Flows

CS240A Final Project Spring 2010

Figure 1. Schematic of the Bretherton problem.

Figure 3. Staggered grid.

5.1 Serial Code Analysis

Figure 4. Time breakdown of the sequential code.

5.2 PETSc Overview

5.3 Parallelization Plan

Table 2. PETSc example.

Figure 5. Code parallelization diagram.

7 5 3 1 1 BCGS 2 4 GMRES 8 CG 16 CGS 32

Figure 8. Total run time of the pressure solvers.

6.2 Weak Scaling

0 1 BCGS 2 GMRES 4 CG 8 CGS 16

Vous aimerez peut-être aussi