Vous êtes sur la page 1sur 12

FOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

THOMAS HAUSER AND JEFFREY ALLEN

A BSTRACT. The prevalence of applications involving the simulation of rareed gas ows continues to increase. As such, new and innovative means for their solution have become necessary. Presented herein, is a new, parallel, steady/unsteady direct simulation Monte Carlo solver, foamDSMC, based on object oriented programming practice. Its development, and validation are presented, along with various single and multiple processor performance characteristics. The validation results, of both the hypersonic corner ow and sphere ow, showed the accuracy of the solver to be comparable to commercial solvers. The foamDSMC solver was additionally applied to a sounding rocket ight, and thus showed its applicability to practical simulations. The single and multiple processor performance results demonstrated good scalability with increased problem sizes and clear avenues for future improvements.

1. I NTRODUCTION A rareed gas may be divided into several different ow regimes in accordance with its level of rarefaction as quantied by the Knudsen number (Kn). A signicantly large number of ows may be classied within the transition regime (0.1 < Kn < 10), and constitute numerical simulation limits well outside that of conventional, continuum based solvers. Relevant applications within this regime include upper atmospheric simulations, including: the Space Shuttle Orbiter [1], the Magellan Spacecraft [2], the Stardust Sample Return Capsule [3], and the Mars Pathnder [4]. Additional relevance outside that of upper-atmospheric studies, includes: chemical vapor deposition [5], the micro lter [6], and micro-electro mechanical systems (MEMS) [7]. Traditionally, the Boltzmann equation, based on kinetic theory, remained the only appropriate option for the solution of these high Kn number ows. The inherent difculties associated with solutions of the Boltzmann equation, however, including: the large number of independent variables required (up to seven), the modeling of the collision term (including inverse collisions), and the modeling of chemical and thermal non-equilibrium effects, have resulted in other, more direct and simplied methods of solving these ows. The direct simulation Monte Carlo (DSMC) method of G.A. Bird is one such method, and may be regarded as a numerical solution to the Boltzmann equation in the limit of very large numbers of simulated molecules [8, 9]. The method, unlike the Boltzmann or Navier-Stokes equations, does not rely upon the discretization/solution of a set of partial differential equations combined with appropriate initial and boundary, closure conditions. Rather, the method, as its name implies directly models the interactions of a small subset of molecules, each representing a statistically large number of actual molecules. The method, thus drastically reduces many of the problems associated with the Boltzmann equation. DSMC, for example, facilitates the modeling of chemical reactions by treating the species on a particle by particle basis, and completely eliminates the need for inverse collisions. The latter being particularly problematic (for the Boltzmann equation) with respect to modeling recombination reactions involving ternary interactions. Up until 1975, due to the large computational costs, the method was limited only to big budget aerospace industries [10]. Since 1975, and running parallel to increased computational efciencies, the DSMC method has been utilized by both large scale agencies, and individuals running their own personal computers. The DS2V and DS3V open source programs [11] of Bird are such examples applicable to the latter category. The rst parallel implementations of the DSMC method were created in the late 1980s and early 1990s. These included structured grids and static domain decomposition. Among others, these included the works of Ota [12], Nance [13], and Matsumoto [14]. In the mid 1990s, the parallel, unstructured grid DSMC solver, MONACO was developed by Dietrich, and Boyd [15]. Additional parallel implementations include works conducted by Wu [16], and LeBeau [17]. Although the evolution of the DSMC method from its early development, has progressed substantially to include such features as: unstructured grids, dynamic load balancing, and adaptive grid renement, few of these parallel implementations; however, accommodate unsteady simulations with capabilities needed for rapidly changing ow and species properties, or allow the exibility and management that the object oriented programming style facilitates. The authors main objective for conducting this research is therefore to outline the development, validation and performance of a new parallel, steady/unsteady DSMC solver, foamDSMC. The solver incorporates an object oriented approach along with the capabilities for solution of unsteady ow applications, involving rapidly changing ow and species properties.

R
1

AF

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

1.1. The Direct Simulation Monte Carlo Method. The DSMC method, as its name implies may be categorized as a Monte Carlo method in that it makes extensive use of random number generation. The primary objective is to model a rareed gas ow by using a large number of simulated molecules to represent real gas behavior. The idea is to track the motion and interactions of these simulated molecules such that their positions, velocities, internal energies and chemical compositions may be correctly modied over small time steps. Since the tracking and molecular interactions are conducted on a particle by particle basis, the conservation of mass, momentum, and energy may be enforced to machine accuracy [18]. The primary assumption upon which the DSMC method relies, is that the deterministic molecular motions and the probabilistic intermolecular collisions are independent. This independence assumption is satised only for a sufciently small time step, the determination of which, is based on the mean collision time. The relatively large computational costs associated with the method, particularly with reference to three-dimensional applications, have given rise to applications for which symmetry simplications become appropriate. Although these symmetrical simplications in physical space enable the reduction of grid dimensions, the collision modeling is always threedimensional and not susceptible to such simplications. Recently parallel implementations of the DSMC method have rendered many of these larger, three-dimensional applications tenable. Because the DSMC method is statistically based, and depends on the simulation of several thousand or millions of simulated molecules, it may be subject to signicant statistical errors or uctuations. Although it is customary to utilize several million simulated molecules for most, standard three-dimensional applications, this is still a relatively small fraction of the number of real molecules that physically would occupy the domain. Since the continuum, macroscopic quantities are computed from either time or ensemble averages associated with the particles, a certain amount of statistical error is produced. Several numerical studies have shown this error to be inversely proportional to the square root of the number of particles. An accurate DSMC simulation must therefore contain a sufcient number of particles to faithfully represent a statistically signicant sample size in order to reduce this potentially signicant error. The primary steps used in the DSMC method include: 1) particle initialization; 2) movement and boundary interaction computations; 3) intermolecular collisions; 4) sampling; and 5) macroscopic variable output. The main algorithm loop occurring at each time step includes steps two and three, while steps four and ve are conducted on user dened intervals. Steady-state results are obtained as the macroscopic quantities are time averaged over sufciently long time periods. Unsteady results are usually obtained from ensemble averages over specic user dened intervals. A more detailed treatment of the DSMC method may be found in [19].

A generalized ow chart of the parallel, steady/unsteady method is shown in Figure 1. As indicated, the primary DSMC routines, including: particle initialization, movement, collision and sampling, are maintained as central elements to the algorithm, and are implemented in accordance with [19]. In an effort to focus this study on the aspects of object orientation, parallel development and unsteady implementation, all of the steps of the method will not be covered. Furthermore, typical assumptions concerning boundary interactions and molecular collision models are maintained throughout this study. These include diffuse reections with complete thermal accommodation, and the use of the Variable Hard Sphere (VHS) collision model used in conjunction with Birds No-time counter (NTC) technique [19]. Additionally, both monatomic and polyatomic molecules may be modeled, the latter using the phenomenological inelastic collision model of Larsen and Borgnakke [20]. 2.1. Object Oriented Baseline Development. The foamDSMC algorithm utilizes the Open Source Field Operation and Manipulation (OpenFOAM) [21] for its baseline set of input-output (I/O) and particle tracking routines. The OpenFOAM package consists of a vast collection of open source, object oriented (C++), routines applicable for the set-up and solution of a large number of both serial and parallelizable CFD related applications. Of particular interest to the present authors included the excellent pre and post-processing functionality, and the established, although limited particle tracking capabilities. The object oriented programming (OOP) approach to DSMC, allows for certain advantages, in terms of code maintenance, expandability, and management over traditional, procedural level, DSMC developments. The use of class objects, such as those dened in foamDSMC as particle or PCloud objects allow for simplied manipulations of underlying quantities such as a particles position, velocity and internal energy. The OOP approach additionally allows for inheritance relationships among classes. The foamDSMC PCloud class, for example, is a derived class from the OpenFOAM, Cloud base class (class PCloud : public Cloud), and thus inherits all of the public functionality of the Cloud class. Future extensions to foamDSMC will also greatly benet from this inheritance feature. The expansions of the collision routine, for example, to accommodate more than just the current Variable Hard Spheres (VHS) model [19], but also the Generalized Hard Sphere (GHS) and Generalized Soft Sphere (GSS) models [19] is one such example.

2. foamDSMC A LGORITHM D EVELOPMENT

AF

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

D
F IGURE 1.

R
A generalized foamDSMC steady/unsteady owchart.

AF T

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

TABLE 1.

Input conditions required for FOAMDSMC

Input Property Description MNSP Maximum number of species, limit=3 MNC Maximum number of cells MNM Maximum number of Molecules IIS If IIS=0, then no steam, if =1, then uniform stream FTMP Free stream temperature (K) FND Free stream number density (/m3 ) FNUM Ratio of number Real molecules:number of simulated molecules NIS Number of time steps before taking a sample NSP Number of samples before output le created vel1 Free stream velocity vector, 3 Cartesian components (m/s) vIndex Viscosity index for each species diameter Species diameter at a given reference temperature (m) mass Species mass (kg) Fsp Species fraction of free stream number density ISPR Number of internal degrees of freedom for each species

foamDSMC in conjunction with OpenFoam uses a dynamically linked list to keep track of all of the particles within the domain. The list is composed of class PCloud objects, and is made up of the position, velocity, species type, internal energy, and cell number of each simulated particle. The list is particularly advantageous with respect to running parallel applications (using domain decomposition) since the migration of particles from one processor domain to another is conducted on a particle by particle rather than a cell by cell basis. The migration is often random and the frequency dependent on the application and number of processors. However, since the DSMC method relies on collisions of particles taking place within a given cell, the list becomes extremely inefcient for the selection of cell collision partners. The primary goal was thus to maintain the linked list for parallel applications, but also enable the sorting of particles according to cell. The solution implemented in foamDSMC was to create pointers to the particles within the list and store these pointers, according to cell, within a Standard Template Library (STL) vector (a one-dimensional, dynamically allocatable array), the size of which was based on the number of cells. The result, as expected, was a dramatic increase in efciency in terms of the selection of collision partners within a given cell. The vector container thus aided all of the functions within foamDSMC which relied on particles within a given cell, including the sampling routine. 2.2. Pre-processing. As stated previously, one particular advantage of using OpenFOAM as the baseline code for foamDSMC was based on its well-established pre-processing capabilities. OpenFOAM is strictly an unstructured mesh solver, and this, combined with the option of using hexagonal or tetrahedral cell elements (or their combination) greatly facilitates the modeling and solution of applications involving relatively complex geometries. The program is compatible with several formats of grid-creation software, including: GAMBIT [22], CFX [23] and STAR-CD [24]. The present study utilizes entirely Fluent .msh les created with GAMBIT [22] using either hexagonal (for the simplest of test cases) or tetrahedral cells for more complex cases. The size of the computational domain is generally determined from the physics of the problem, and should be large enough to ensure unperturbed mean ow at inlet boundaries [19]. This is important because injected particles at the boundaries provide an input ux appropriate to equilibrium ow conditions. The cell size, as is customary, is governed by the local mean free path, and is sized (particularly important for non-adaptive grid solvers) from a conservative estimate of the local ow gradients. The DSMC method, unlike continuum solvers, does not rely upon boundary conditions for the explicit solution of the macroscopic variables. The method does however, require that the user specify distinctions between surface, symmetry, or stream boundary conditions. These specications are conducted at the time of mesh creation, and are read into an appropriately formatted OpenFOAM le. Several of the user-input conditions are shown in Table 1. Among these include, the number of species, various free-stream values, the ratio of the number of real to simulated molecules, and various species dependent properties. 2.3. Parallel Implementation. At the pre-processing stage, the mesh and associated elds are decomposed. The primary goal is to partition the domain with minimal effort that still provides an economic solution [21]. OpenFOAM is equipped with several geometric, decomposition options, including directional SIMPLE, directional Hierarchical, METIS, and manual [21]. All subsequent validation and user-level applications within this study

AF

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

were conducted using the directional SIMPLE method, wherein the domain was partitioned in accordance with a user-specied direction. Once the domain is decomposed, each processor executes the core DSMC routines in serial for all particles and cells within its domain. Parallel communication occurs only when particles cross interprocessor boundaries. As stated previously, each processor maintains a dynamically linked list of local molecules within its domain. Upon entry or exit of molecules from a certain processor domain, this list is adjusted accordingly. Each particle is moved according to its velocity and time step to the new position. Parallel communication of particles (along with their respective properties) is conducted via standard send and receive functions of the Message Passing Interface (MPI). In addition to parallelism through domain decomposition, unsteady applications, using ensemble averaging, include an additional level of parallelism. Since the inherent feature of ensemble averaging includes the solutions from independent realizations, these may therefore be performed on different parallel machines, and later combined to render appropriately averaged unsteady results. This combination of parallelism thus renders unsteady applications particularly attractive, since it makes use of both embarrassingly parallel and decomposition methods. Upon completion of a steady or unsteady, parallel run, the results are recombined onto a single processor and made available for use with several post-processing utilities for which OpenFOAM is compatible. These include, VTK [25] , Paraview [26], Fluent [22], Fieldview [27], and EnSight [28]. 2.4. Unsteady Implementation. As stated previously, unsteady DSMC applications are typically carried out using ensemble rather than time averaging techniques. Although both techniques require increased computer time and resources over their steady-state counterparts, the primary disadvantage of time averaging is often the excessive number of simulated molecules per cell that is required. The DS3V algorithm [11], for example, was implemented with time averaging functionality, and requires upwards of 500 molecules per cell. This necessarily limits the domain size of most unsteady simulations, particularly three-dimensional applications. Ensemble averaging, in contrast, may be carried out with cell samples comparable to those of steady-state simulations (10-30 molecules/cell). The drawback; however, is still the increased time associated with carrying out numerous ensembles for each time interval. Implementing dual levels of parallelism, as suggested earlier, however, can signicantly reduce the runtimes associated with unsteady applications. Ensemble averaging is conducted over a user specied number of independent realizations. These ensembles may be repeated (in the case of highly transient ows) for each time step, or for more relaxed, unsteady conditions, after several time steps have elapsed. Depending on the duration and complexity of the ow simulation, the computational processing and storage requirements needed for unsteady ows is clearly much greater than that for steady ows. Unsteady ows often require the need to input changing species and ow properties. The foamDSMC solver is unique from other unsteady parallel DSMC solvers in this regard. Changing ow and species properties, such as ow concentrations, particle velocities, species fractions, and temperatures inuence the overall ow eld through incoming molecules. The foamDSMC solver, as appropriate, incorporates these changing properties within the particle injection function, and allows the user to input these conditions, prior to running a simulation, within an input le.

3.1. Steady Hypersonic Corner Flow. The hypersonic corner ow application case has become a standard from which several parallel implementations of DSMC have been validated [17] [19]. The case involves two at plates oriented perpendicular to each other and running parallel to the free stream. The computational domain as shown in Figure 2 consists of a parallelepiped that extends 0.1 m along the x direction and 0.6 m in both the y and z directions. The domain is composed of 10x6x6 uniform, hexagonal cells, each with side length 0.01 m. The free steam conditions consist of a number density and velocity of 1.0E20 m3 and (1936, 0, 0 )m/s respectively. The plate wall temperatures were set to 1000 K and were modeled as diffusely reecting with complete thermal accommodation. A xed time step of 1.3E-6 seconds was used, and steady state conditions were obtained after approximately 0.1 seconds. The ratio of real to simulated molecules was 1.2E13, and resulted in an average of approximately 3600 molecules, or 10 molecules per cell. A multi-species, polyatomic gas composed of N2 , O2 , and O was used, and the phenomenological inelastic collision model of Larsen and Borgnakke [20] was implemented. The specic gas properties, taken at standard conditions (101.3KPa, 0oC ) may be seen in Table 2, and were obtained from Appendix A of [19].The approximate value of the mean free path was calculated at 0.0068 m and resulted in a Knudsen number of 0.113. The foamDSMC algorithm was validated both in serial and parallel (using up to eight processors) with Birds Three-dimensional, structured algorithm [19]. Parallel decompositions were performed in accordance with the simple method of OpenFOAM [21], wherein the domain was uniformly partitioned according to the number of processors specied by the user.

AF
3. VALIDATION C ASES

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

F IGURE 2.

Hypersonic corner ow geometry composed of 360 hexagonal elements and an 8-processor parallel decomposition.

TABLE 2.

foamDSMC input properties of N2 , O2 ,and O at standard conditions.

Gas DOF Mol. Mass Viscosity index Dia. Species Fraction mx1027 kg dx1010 m dx1010 m N2 2 46.5 0.74 4.17 0.777 O2 2 53.12 0.77 4.07 0.184 O 0 26.58 0.75 3.0 0.0391

D R AF
F IGURE 3.
Hypersonic sphere ow tetrahedral mesh along the z = 0.04 plane, and a 32-processor, parallel domain decomposition.

The results of Figure ?? show contours of number density, temperature and velocity magnitude at the x = 0.05 m, and the z = 0.03 m planes. Also shown are respective line plots comparing the commercial DS3V solver [11] with the foamDSMC solver. As indicated, the agreement between the DS3V and foamDSMC solvers in serial and parallel (using up to 8 processors) is very good. Note, at present the foamDSMC solver does not implement surface sampling, and thus applicable results corresponding to surface pressure and shear stress were not available. 3.2. Steady Hypersonic Sphere Flow. A second, steady-state validation test of the foamDSMC solver was conducted on a hypersonic sphere ow application, using Birds, three-dimensional, commercial, DSMC solver, DS3V [11]. The case consisted of a 0.02 m diameter sphere centered in the y, z plane and located just aft of center in the x direction ( see Figure 3). The computational domain consisted of a parallelepiped extending 0.1 m in the x direction and 0.8 m in both the y and z directions. A multi-species, polyatomic gas composed of N2 , O2 , and O was used, with free-stream ow conditions and species properties identical to those of of the corner ow application. A mean free-path ( ) of 0.0068m resulted in a Knudsen number (based on the sphere diameter) of 0.34. The domain was composed of 9,200 unstructured, tetrahedral cells (see Figure 3), with average cell widths ranging from approximately 0.33 (adjacent to the sphere surface) to . The sphere wall temperatures were set to 1000 K and were modeled as diffuse with complete thermal accommodation. A time step of 6.5E-7 seconds was used, and steady state conditions were obtained after approximately 0.01 seconds. The ratio of real to simulated molecules (FNUM) used was 1.0E12, and resulted in approximately 64,000 molecules. The results are shown in Figure 4. Contours comparisons of DS3V and foamDSMC, are shown with number density, temperature and velocity magnitude at the z=0.04 m plane. Also shown, are the respective line plots along the stagnation streamline stream line, located at, y = 0.04 m; z = 0.04 m; 0 m x 0.05 m. As indicated, the agreement, using up to 32 processors is very good.

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

F IGURE 4. Hypersonic sphere ow simulation of a multi-species, polyatomic gas composed of N2 , O2 , and O at z = 0.04 m. Shown are contours of number density, temperature, and velocity magnitude comparing the commercial DS3V and the foamDSMC algorithms. Also shown are comparison plots (using up to 32 processors) of these respective quantities along the stagnation streamline corresponding to, y = 0.04 m; z = 0.04 m; 0 m x 0.05 m. 3.3. Unsteady Sphere Flow. The foamDSMC solver was also validated with respect to unsteady applications. The following unsteady, validation case was performed on the hypersonic sphere ow application (initially at rest, but with remaining ow conditions as prescribed above), and was compared with the unsteady DS3V solver. DS3V was initially, uniformly distributed with approximately 3.0E6 argon molecules which resulted in approximately 500 molecules per cell. An initial time step( ) of 1.0 E-7 seconds was used, with sampling conducted every 2 . The foamDSMC algorithm was initialized with approximately 3.18E5 argon molecules, or an average of 34 molecules cell. For each 2 sampling interval, 35 independent ensembles were conducted. Figure 5 shows number density contours using DS3V and foamDSMC at t=5.0E-7, t=1.1E-6, and t=2.1E-6 seconds. As indicated, the wake quickly becomes more elongated behind the trailing edge, until it completely lls the entire aft shadow of the sphere. This unsteady wake phenomenon is a result of entrainment by the surrounding high speed ow. The leading edge number density reveals the evolving shock layer. The thickness of this layer, continues to increase, until the near steady-state thickness is achieved at t=2.1E-6 seconds. Figure 5 also shows comparison plots of number density using foamDSMC and DS3V along the stagnation line at y = 0.04 m; z = 0.04 m; 0 m x 0.05 m, and at the three noted time intervals. The results of foamDSMC were computed using four processors, and show good agreement with the DS3V algorithm. 4. A PPLICATION OF foamDSMC TO THE CODA II S OUNDING ROCKET F LIGHT Subsequent to its successful validation, the foamDSMC solver was applied to the CODA II sounding rocket ight. The mission details of the ight are excluded here due to excessive length, but may be found in [29, 30]. In brief summary, the launch was conducted in order to investigate the atomic oxygen (AO) concentration within the Mesosphere and lower Thermosphere (MALT). Past investigations [31, 32] have revealed that substantial external inuences, primarily aerodynamic, serve to inhibit the accurate measurement of AO with in-situ measurement sensors. Numerical simulation via DSMC of the compressible ow surrounding the rocket along various up-leg

D R AF T

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

F IGURE 5. Unsteady contour plots of number density comparing foamDSMC and Birds DS3V algorithm using argon gas, at t=5.0E-7 secs.; t=1.1E-6 secs.; and t=2.1E-6 secs. Also shown are comparison plots of the commercial DS3V and foamDSMC algorithms along the stagnation streamline corresponding to, y = 0.04 m; z = 0.04 m; 0 m x 0.05 m. and down-leg points in its trajectory have served to signicantly reduce these effects. The following is a brief summary of the setup and results obtained using the foamDSMC solver. The foamDSMC steady-state solver was applied to 25 different altitudes separated by 2 km intervals. Each altitude was solved for both the up-leg and down-leg trajectory resulting in a total of 49 simulations. Several different grids were required each with varying cell concentrations appropriate to the mean free path. As a general rule, a different grid was required for each 10 kilometer interval. The number of cells ranged from over 1.1E6 for the lower 90 km altitudes to approximately 3.0E4 for apogee and near apogee cases. The number of simulated molecules also varied in accordance with the number of cells. In general, in order to maintain a statistically signicant sample size, and reduce potential statistical uctuation, the average number of molecules per cell was no less than ten. The number of simulated molecules from 90 km to apogee thus ranged from approximately 11.0E6 to 3.0E5. The macroscopic variables which consisted of number density, temperature, and velocity were sampled in all cases every two time steps, while output les were written every 300 time steps. The simulations were all conducted in parallel utilizing Utah State Universitys Uinta cluster supercomputer. The number of processors varied from 4 to 16 and resulted in average, steady-state run times of between 6 and 12 hours respectively. Additional input conditions may be found in [jeffs diss]. Figure 6 shows contours of number density and velocity magniture for various up-leg and down-leg locations of the CODA II trajectory. Also shown are the representative grids used for collision domains, and macroscopic variable sampling. As indicated, the grid renement decreases dramatically with increased altitude, due to the increase of mean free path (ranging from nearly 2 cm to more than 10 m). 5. B ENCHMARKING R ESULTS The following benchmarking results of foamDSMC were performed on the Uinta supercomputer. This Linux Networx cluster was installed in September of 2005 at the Center for High Performance Computing at Utah State

D R AF T

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

F IGURE 6. foamDSMC results of number density and velocity contours at specied locations along the up-leg and down-leg CODA II trajectory. Also shown is the changing angle of attack, and the coarsening of grid/cell concentration with increasing altitude. University (HPC@USU) and consists of one server node, two interactive login nodes and 62 compute nodes. Each compute node has two dual core AMD Opteron 265, and 4 GBytes of main memory. The cluster has 3 networks including, a GBit switched Ethernet, a Flat Neighborhood Network build fro GBit and a Myrinet interconnect. Myrinets one-way and two-way (summed bidirectional) data rates are: 1.98 Gb/s (248 MBytes/s), and 3.92 Gb/s (490 MBytes/s), respectively. The Scientic Linux distribution was used for the operating system. 5.1. Single Processor Performance. To evaluate the single processor performance of foamDSMC, the PerfSuite command line utility, psrun [33] was utilized. Specic results presented herein are applicable to the single 110 km, up-leg, CODA II case, composed of 5.03E4 cells and initialized with 2.71E5 molecules. All proles were computed over 50 time steps, each of duration 1.0E-5 seconds, resulting in an approximate wall time of 415.9 seconds. Table 3 shows the functions within foamDSMC which consume the largest percentage of runtime. Clearly the majority of time, 55.01% and 30.97%, is spent nding the nearest cell locations and identifying cell faces, respectively. A signicantly smaller proportion of the total time, 3.90% and 1.58%, is spent in the respective movement and tracking of particles. The remainder of the functions (runtime% < 1.5%) are not shown. Table 4 shows the foamDSMC functions with the largest percentage of L2 data cache misses. As indicated, ndFaces suffers from the majority of misses with 17.4% of the total. This is followed by the 10.65% and 10.23% miss percentages of the functions, sample and inject, respectively. The move function is also included, and shows a 7.8% L2 miss percentage. Combining Tables 3 and 4, we see that the ndFaces is both taking up a considerable amount of runtime, and suffering from a large number of L2 data cache misses. This function, thus clearly represents an ideal candidate for possible performance improvement. The move function also appears in both tables and is therefore, also worth considering for possible improvement. Table 5 provides an estimate of total L1 and L2 data cache misses as functions of increased problem size, as quantied by the increase in simulated molecules. As shown, the L1 miss rate is fairly constant, representing a mere 3.7% difference with a doubling of the number of molecules used. In contrast, the L2 miss rate shows more than a 51% increase. 5.2. Parallel Performance. Figure 7 shows the parallel speedup and efciency results pertaining to the two validation cases and three CODA II applications. Specically, the CODA II results correspond to the steady-state 110 km, up-leg and apogee cases, as well as the unsteady apogee case. As indicated, the best results are attributable to the 110 km up-leg case, with its relatively large problem size of 5.4E4 cells and 4.5E5 molecules. The 110 km results further show super linear scaling, occurring with numbers of processors fewer than 16. The steady corner results exhibit the worst parallel performance, attributable to the use of only 360 cells and 3.6E3 molecules. The

AF

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

10

TABLE 3.

Prole of foamDSMC functions sorted by time

Time (sec) Time % Function Description 220.93 55.01 FindNearestCell Find nearest cell location 124.38 30.97 FindFaces Identify cell faces 15.66 3.90 move move particles 6.34 1.58 trackToFace Track particle to nearest face location TABLE 4.
L2 data cache misses

L2 misses % Total Function Description 1,131,874 17.40 ndFaces Identify cell faces 692,439 10.65 sample Sample the cell for macroscopic quantities 665,049 10.23 inject Inject particles into domain 511,749 7.87 move Move particles TABLE 5.
Total number of L1 and L2 data cache misses with increased problem size

L1 misses L2 misses Number of molecules 5.3E9 0.29E9 2.72E5 5.5E9 0.49E9 5.44E5
1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.3 0.4 0.2
Steady Corner (No. Cells=360; No. Mols=3.6E3) Steady Sphere (No. Cells=9.2E3; No. Mols=6.4E4) Stdy. CODA 110km Up (No. Cells=5.03E4; No. Mols=4.54E5) Stdy. CODA Apogee (No. Cells=1.75E4; No. Mols=2.44E5) Unstdy. CODA Apogee (No. Cells=1.75E4; No. Mols=3.0E5) Ideal

D
Steady Corner (No. Cells=360; No. Mols=3.6E3) Steady Sphere (No. Cells=9.2E3; No. Mols=6.4E4) Stdy. CODA 110km Up (No. Cells=5.346E4; No. Mols=4.54E5) Stdy. CODAI Apogee (No. Cells = 1.75E4; No. Mols = 2.44E5) Unstdy. CODA Apogee (No. Cells= 1.75E4; No. Mols.=3.0E5) Ideal

32 30 28 26 24 22 20 Speedup 18 16 14 12 10 8 6 4 2 2

R
14 16 18 20 No. of Processors 22 24 26 28 30 32

Efficiency

AF
0.1 0 2 4 6 8 10 12

10

12

14 16 18 20 No. of Processors

22

24

26

28

30

32

F IGURE 7. Parallel speedup and efciencies of foamDSMC applied to the corner and sphere ow validation cases, as well as selected CODA II applications. parallel efciency plot, dened as the ratio of speedup to the number of processors, further illustrates these ndings, illustrating super linear scaling with efciencies greater than unity. Clearly from these results, the scalability of the application depends on the problem size. Finally, Figure shows the scaled speedup for the 110 km up-leg, CODA II case. The scaling was based on a linear, 1:1 ratio of problem size (as quantied by the number of molecules) to number of processors. As indicated, the results showed a maximum scaled speedup of approximately 1.9 using 4 processors, and downward slopes thereafter, reaching approximately 1.1 using 16 processors. 6. C ONCLUSIONS The development, validation, and initial performance of foamDSMC, as an object oriented, parallel, steady/unsteady, DSMC solver was shown to be successful. The validation results applied to the hypersonic corner and sphere ows showed the solver to be comparable in accuracy to existing commercial codes. Furthermore, the solver also demonstrated credible results when applied to particle applications, including the CODA II ight prole. Serial benchmarking revealed that possible performance related improvements may be conned to certain specic functions of the solver. Parallel benchmarking revealed that the solvers scalability was dependent on problem size and showed super linear scaling effects for certain sizable applications with lower numbers of processors. Although utilitarian, the solver still requires several future modications for increased capability. Among these, include: 1) The development of surface sampling routines; 2) An adaptive grid capability; 3) Accommodation to incompressible ows, with applications to the micro and nano scales; 4) Hybridization of the solver to accommodate rareed as well as continuum based ows.

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM


2

11

1.8

Scaled Speedup

1.6

1.4

1.2

8 10 No. Processors

12

14

16

F IGURE 8.

Scaled speedup of foamDSMC applied to the 110 km up-leg, CODA II case

ACTS CKNOWLEDGMENTS The authors would like to acknowledge the Space Dynamics Laboratories enabling technologies program, and the developers of OpenFOAM. Computer time from the Center for High Performance Computing at Utah State University is gratefully acknowledged. The computational resource, the Uinta cluster supercomputer, was provided through the National Science Foundation under Grant No. CTS-0321170 with matching funds provided by Utah State University.

[1] Rault, D., Aerodynamics of the Shuttle Orbiter at High Altitudes, Journal of Spacecraft and Rockets, Vol. 31, No. 6, 1994, pp. 944952. [2] Hass, B. L. and Schmitt, D. A., Simulated Rareed Aerodynamics of the Magellan Spacecraft during Aerobraking, Journal of Spacecraft and Rockets, Vol. 31, No. 6, 1994, pp. 980985. [3] Wilmoth, R., Mitcheltree, R., and Moss, J., Low-Density Aerodynamics of the Stardust Sample Return Capsule, AIAA-97-2510, 1997. [4] Moss, J., Blanchard, R., Wilmoth, R., and Braun, R., Mars Pathnder Rareed Aerodynamics: Computations and Measurements, Journal of Spacecraft and Rockets, Vol. 36, No. 3, 1999, pp. 330339. [5] Plimpton, S. and Bartel, T., Parallel particle simulation of low-density uid ows, US Department of Energy Report, , No. DE94-007858, 1993. [6] Yang, X. and Yang, J., Micromachined membrane particle lters, Sensors and Actuators, 1999, pp. 184191. [7] Piekos, E. and Breuer, K., Numerical modeling of micromechanical devices using the direct simulation Monte Carlo method, Journal of Fluids Engineering, Vol. 118, 1996, pp. 464469. [8] Nanbu, K., Theoretical basis of the direct simulation Monte Carlo method, Journal of the Physical Society of Japan, 1982. [9] Wagner, W., A convergence proof for Birds direct simulation Monte Carlo method for the Boltzmann equation, Journal of Statistical Physics, Vol. 66, No. 3/4, 1992, pp. 10111044. [10] Bird, G., Recent Advances and Current Challenges for DSMC, Computer and Mathematical Applications, Vol. 35, No. 1, 1998, pp. 1 14. [11] Bird, G., "The DS3G Program Users Guide, Version 1.1", Killara, New South Wales, Australia, 2003. [12] Ota, M., Taniguchi, H., and Aritomi, M., A parallel processings for direct simulation Monte Carlo method, Japan Society of Mechanical Engineering, Vol. 61, pp. 496502. [13] Nance, R., Wilmoth, R., Moon, B., Hassan, H., and Saltz, J., Parallel solution of three-dimensional ow over a nite at plate, AIAA Paper no. 94-0219, 1994. [14] Matsumoto, Y. and Tokumasu, T., Parallel computing of diatomic molecular rareed gas ows, Parallel Computing, Vol. 23, pp. 1249 1260. [15] Dietrich, S. and Boyd, I., Scalar and parallel optimized implementation of the direct simulation Monte Carlo method, Journal of Computational Physics, Vol. 126, 1996, pp. 328342. [16] Wu, J. and Lian, Y., Parallel three-dimensional direct simulation Monte Carlo method and its applications, Computers and Fluids, Vol. 32, 2003, pp. 11331160. [17] LeBeau, G., A parallel implementation of the direct simulation Monte Carlo method, Computer Methods in Applied Mechanics and Engineering, Vol. 174, 1999, pp. 319337. [18] Oran, E., Oh, C., and Cybyk, B., Direct Simulation Monte Carlo: Recent Advances and Applications, Annual Review of Fluid Mechanics, Vol. 30, 1998, pp. 403441. [19] Bird, G., Molecular Gas Dynamics and the Direct Simulation of Gas Flows, Oxford University Press, Sydney, 1994. [20] Borgnakke, C. and Larsen, P., Statistical collision model for Monte Carlo simulaton of polyatomic gas mixture, Journal of Computational Physics, Vol. 18, No. 4, 1975, pp. 405420. [21] OpenFoam, The Mews, Picketts Lodge, Surrey RH1 5RG, UK, 2006. [22] Fluent 6.1 Users Guide, Lebanon, NH, 1998. [23] Ansys CFX V5.7 Users Manual, Canonsburg, PA, 2004. [24] STAR-CD V3.20, Tustin, CA, 2006. [25] VTK 4.4 Users Guide, Clifton park, NY, 2004. [26] ParaView Guide, Clifton park, NY, 2004. [27] FIELDVIEW 8.0 Users Guide, Lyndhurst, NJ, 2001. [28] EnSight 7.6 User Manual, Apex, NC, 2003.

R EFERENCES

AF

englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM

12

[29] Allen, J. and T.Hauser, Aerodynamic Inuences on Atomic Oxygen Sensors from Sounding Rockets, 35th AIAA Fluid Dynamics Conference and Exhibit, AIAA, Toronto, Canada, June 2005. [30] Allen, J. and Hauser, T., Unsteady DSMC Simulations of the Aerodynamics of Sounding Rockets, 44th AIAA Aerospace Sciences Meeting and Exhibit, AIAA, Reno, Nevada, Jan. 2006. [31] Patterson, P., In Situ Measurements of Upper Atmospheric Atomic Oxygen: The ATOX Resonant Fluorescence/Absorption Sensor, Ph.D. thesis, Utah State University, 2005. [32] Patterson, P., Swenson, C., Clemmons, J., Christensen, A., and Gregory, J., Atomic oxygen erosion observations in a diffuse aurora, EOS Trans. AGU, Fall Meet. Suppl., No. Abstract SA21A-02, AGU, San Francisco, CA, Dec. 2003. [33] Kufrin, R., PerfSuite: An Accessible OpenSource Performance Analysis Environment for Linux, 6th annual Internation conference on Linux clusters: The HPC Revolution, Chapel Hill, NC, April 2005.

D R AF T

Vous aimerez peut-être aussi