Vous êtes sur la page 1sur 62

Performance analysis tools

applied to a finite adaptive mesh


free boundary seepage parallel
algorithm
S. Boeriu1 and J.C. Bruch, Jr.2
1
Center for Computational Science and Engineering
2
Department of Mechanical and Environmental Engineering
and Department of Mathematics
University of California, Santa Barbara
http://www.engineering.ucsb.edu/~hpscicom 1
Acknowledgements

This material is based upon work supported by the


National Science Foundation under Grant
#0086262. This research was supported in part by
NSF cooperative agreement ACI-9619020 through
computing resources provided by the National
Partnership for Advanced Computational
Infrastructure at the San Diego Supercomputer
Center.
http://www.npaci.edu/Horizon/guide_linked/bh_tools_txt.html
2
Outline of Presentation
1. Introduction (Physical problem)
2. Problem formulation
3. Fixed domain formulation
4. Numerical algorithm
5. Test case
6. Performance tools and considerations
a. VAMPIR
b. PARAVER
7. Diagnostic example
8. Conclusions
3
Physical problem

Figure 1. Seepage through a rectangular dam.

4
Simplifying assumptions
i. The soil in the flowfield is homogeneous and
isotropic

ii. Capillary and evaporation effects are neglected

iii. The flow obeys Darcy’s Law

iv. Two-dimensional

v. Steady state
5
Mathematical formulation
 Darcy’s Law: q   K grad h   K grad ( p /  g ) y 
 Potential Function:   K  ( p /  g ) y 
Velocity Components: u  x , v   y
Continuity Equation: ux  vy  0
Irrotationality Condition: u y  vx  0
Cauchy-Riemann Equations:  x   y ,   y   x
Laplace’s Equations: 2  0 , 2  0

6
Problem formulation

Figure 2. Mathematical formulation of physical problem. 7


Extension of solution domain
The solution domain  is extended to the known
region D  ( x, y) : 0  x  xF ,0  y   ( x), xF  x  xC ,0  y  yF 
' ' '

Then extend  continuously to be defined on D


1 1

by setting
 ( x, y ) in 
 ( x, y )  { y in D  

8
This yields

 y   x  (  D ) y in D

in the sense of distributions where

 D  1 in D   and  D  0 in 

9
Fixed domain formulation

10
Figure 3. Fixed domain mathematical formulation.
Numerical Algorithm
A minimization problem can be formulated
in terms of the functional
J ( )  a( ,  )  2( f ,  ) , K
where a is a bilinear form, continuous,
symmetric, positive definite on R and f  R
i.e.,
a( ,  )     dxdy
D

( f ,)   f  dxdy
D
11
The functional J has one and only one
minimum on a closed convex set. The minimum
is found using the following algorithm:

1  i 1 N
(n) 
i( n1/ 2)     aij j   aij j 
( n 1)

aii  j 1 j i 1 

i( n1)  Pi (i( n )   (i( n1/ 2)  i( n ) ))  max(0, i( n )   (i( n1/ 2)  i( n ) ))

where aij  a( Ni , N j ), fi  ( f , Ni ), N i is the canonical basis of R N ,


Pi is the projection on the convex set , i  1,..., N , and N is the number
of nodes. 12
Finite Element Error Analysis
Adaptive Mesh Finite Element Analysis (FEA)

General Equation for FEA:

Ku  f

13
Error Analysis
Error Definition:

where q is the approximation of the exact solution q ;


q̂ is the calculated q of an element (constant);
N is the shape function;
T
and    
S 
 x y 
 14
Averaging Technique:

Error Estimate in an Element:

eq  q  qˆ  eˆq where q  N q

15
Error Norm of the Whole Computation Domain:
1

ˆq
e    (e ˆq ) dR 
ˆq )T (e 2
L2  R 

Ne


2 2
ˆq
e 2
 ˆq
e
L i
i 1

Percentage Error:

eˆq Ne
  100%   q 2 dR   q
0 2
where q
q R
i 1
i

16
Local Mesh Refinement
Desired Criteria:
0
  max where max is the desired error

Desired Local Error Criteria:


1
 q 22
eˆq  max    emax , emax is the max allowableelement error
i
 Ne 
Error Ratio:
eˆq
i  i
, i  1 refine the element
emax
New Element Size:
Ai
 Ai new 
i 17
Mesh Refinement

18
Test case
x1  40
y1  10
y2  3
  1.85
Stopping error criterion  0.0001

19
Results

20
21
Figure 4. Domain decomposition for Pass 4 of Case 1. 22
Figure 5. Speedup for Case 1.
23
Performance tools and
considerations
The parallel program is monitored while
it is executed. Monitoring produces
performance data that is interpreted in
order to reveal areas of poor performance.
The program is then altered and the
process is repeated until an acceptable
level of performance is reached.

24
VAMPIR (Visualization and Analysis of MPI Resources – 2.0)
 VAMPIR 2.0 is a post-mortem trace
visualization tool from Pallas GmbH
http://www.pallas.com

It uses the profile extensions to MPI and


permits analysis of the message events where
data is transmitted between processors during
execution of a parallel program. It has a
convenient user-interface and an excellent
zooming and filtering. Global displays show all
selected processes. 25
• Global Timeline: detailed application execution over
time axis
• Activity Chart: presents per-process profiling
information
• Summaric Chart: aggregated profiling information

• Communication Statistics: message statistics for each


process pair
• Global Communication Statistics: collective operations
statistics
• I/O Statistics: MPI I/O operation statistics

• Calling Tree: global dynamic calling tree


26
27
28
29
30
31
32
33
34
35
PARAVER(Parallel Program Visualization and Analysis Tool)
 PARAVER is a flexible parallel program
visualization and analysis tool based on
an easy-to-use Motif GUI (graphical user
interface)

PARAVER was developed to respond to the


basic need to have a qualitative perception of the
application behavior by visual inspection and then
to be able to focus on the detailed quantitative
analysis of the problems.

36
Paraver (Parallel Program Visualization and Analysis Tool)
 Powerful flexible parallel program visualization
tool based on an easy-to-use Motif GUI (graphical
user interface)
 Developed by :
European Center for Parallelism of
Barcelona (CEPBA)
Universitat Politecnica de Catalunya
http://www.cepba.upc.es/

37
 Paraver is designed to visualize and
analyze
- Communication and load balance
- Combining OpenMP and MPI
- Hardware performance and counters

 Usage
- Compile programs with special
libraries
- Run programs to produce trace
files
- View and analyze traces
- Designed to help in program
understanding and optimization 38
39
40
41
42
43
44
45
46
Inefficient programming example
 Load imbalance (inefficient memory
use)
 TLB (translation lookaside buffer)
misses

47
Figure 6. Stage 1 – Processor 0 – Mesh Map
48
Figure 7. Stage 1 – Processor 3 – Mesh Map
49
Figure 8. Stage 1 – VAMPIR – Activity Chart
50
Figure 9. Stage 1 - PARAVER – Global Display
51
Figure 10. Stage 4 - VAMPIR – Activity Chart
52
Figure 11. Stage 4 - VAMPIR – Display Chart
53
Table 8. TLB misses.
STAGES Proc. 0 Proc. 3

1 TLB misses 9,464 7,870

4 TLB misses 12,210 208,341

54
Figure 12. Stage 4 - Processor 0 – Mesh Map
55
Figure 13. Stage 4 – Processor 3 – Mesh Map
56
Table 9. Stage 4 timing of the SOR module.

Processor Time spent in SOR

0 0.3671

1 0.4068

2 0.6940

3 0.8393

57
Figure 14. Stage 4 – VAMPIR – Activity Chart
58
Figure 15. Stage 4 – VAMPIR – Display Chart
59
Figure 16. Stage 4 – PARAVER – Global Display
60
Conclusions
A significant factor that affects the performance of a
parallel application is the balance between communication
and workload. The challenge of the message passing
model is in reducing message traffic over the
interconnection network. To fully understand the
performance behavior of such applications, analysis and
visualization tools are needed. Two such tools, VAMPIR
and PARAVER, were used to analyze the performance of
the seepage application. It was seen that optimization of
the parallel code can be carried out in an iterative process
involving these tools to investigate performance issues.

61
Web Sites
 Project site
http://www.engineering.ucsb.edu/~hpscicom
 San Diego Supercomputer Center
http://www.npaci.edu/Horizon/guide_linked/bh_tools_txt.
html
 VAMPIR
http://www.pallas.com
 PARAVER
http://www.cepba.upc.es/ 62

Vous aimerez peut-être aussi