Vous êtes sur la page 1sur 19

F

o
r

P
e
e
r

R
e
v
i
e
w
Draft Manuscript for Review
Draft Manuscript for Review
PARALLEL IMPLEMENTATION OF THE EFG METHOD FOR HEAT TRANSFER
AND FLUID FLOW PROBLEMS
Journal: Computational Mechanics
Manuscript ID: CM-03-0003
Manuscript Type: Original Paper
Date Submitted by the Author: 29-Nov-2003
Keywords: finite element, galerkin, variational method
Page 1 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
1
PARALLEL IMPLEMENTATION OF THE EFG METHOD FOR HEAT TRANSFER AND FLUID
FLOW PROBLEMS
I. V. Singh
Mechanical Engineering Group
Birla Institute of Technology and Science
Pilani, 333 031, Rajasthan, India
E-mail: iv_singh@hotmail.com, ivsingh@bits-pilani.ac.in
ABSTRACT
The parallel implementation of the element free Galerkin (EFG) method for heat transfer and fluid flow
problems on MIMD type parallel computer is treated. A new parallel algorithm has been proposed in which
parallelization is performed by row-wise data distribution among the processors. The codes have been
developed in FORTRAN language using MPI message passing library. Two model (heat transfer and fluid flow)
problems have been solved to validate the proposed algorithm. The total time, communication time, user time,
speedup and efficiency have been estimated for heat transfer and fluid flow problems. For 8 processors, the
speedup & efficiency have been obtained as 6.86 & 85.81% respectively in heat transfer problems for the data
size of 1229 = N and 7.20 & 90.00% respectively in fluid flow problems for the data size of 1462 = N .
Keywords: meshless method; EFG method; parallel computing; heat transfer; fluid flow
1 INTRODUCTION
In the last two decades, the meshless methods have been developed as an effective tool to solve boundary
value problems. The essential feature of these meshless methods is that they only require a set of nodes to
construct the interpolation functions. In contrast to the conventional finite element method, these techniques
save the tedious job of mesh generation as no element is required in the entire model. Furthermore, re-meshing
appears easier because nodes can be easily added or removed in the analysis domain. A large variety of
meshless methods have been developed so far which include: smooth particle hydrodynamics (SPH) [1], diffuse
element method (DEM) [2], element free Galerkin (EFG) method [3], reproducing kernel particle method
(RKPM) [4], partition of unity method (PUM) [5], H-p cloud method [6], free mesh method (FMM) [7], natural
element method (NEM) [8], local boundary integral equation (LBIE) method [9], meshless local Petrov-
Galerkin (MLPG) method [10], the method of finite spheres [11], local radial point interpolation method
(LRPIM) [12] and regular hybrid boundary node method (RHBNM) [13], etc. It has been observed that the
results obtained by most of these meshless methods are competitive to FEM in different areas of engineering.
Page 2 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
2
Now, the only barrier with the wide implementation of these meshless methods is their huge computational cost.
To reduce the computational cost, few researchers have parallelized free mesh method (FMM) [14], smooth
particle hydrodynamics (SPH) [15] and partition of unity method (PUM) [16]. In the continuity of the
parallelized meshless methods, a parallel algorithm has also been proposed to reduce the computational cost of
the EFG method. The parallel code has been written in FORTRAN language using MPI message passing library
and executed on a supercomputing machine PARAM 10000. Code validation has been done by solving two
model problems. Total time, communication time, user time, speedup and efficiency have been calculated for
heat transfer and fluid flow problems.
2. REVIEW OF THE EFG METHOD
The discretization of the governing equations by EFG method requires moving least square (MLS)
approximants, which are made up of three components: a weight function associated with each node, a basis
function and a set of non-constant coefficients. Using MLS approximation, the unknown function ) (x T or ) (x u
is approximated by ) (x
h
T or ) (x
h
u over the solution domain [3, 16] as.
T x x x ) ( ) ( ) (
1
= u =
=
n
I
I I
h
T T (1a)
or
u x x x ) ( ) ( ) (
1
= u =
=
n
I
I I
h
u u (1b)
where, ) (x
I
u is the shape function and
I
T (or
I
u ) is the nodal parameter
In the present analysis, cubicspline weight function [16] is used

>
s < +
s +
= =
1 0
1
2
1
3
4
4 4
3
4
2
1
4 4
3
2
) ( ) (
3 2
3 2
r
r r r r
r r r
r w w
I
x x (2)
where, =
I
r) (
I
I
m
d
|| || x - x
,
I mI
c d d
max
=
max
d scaling parameter.
2.1 The EFG for Heat Transfer Problems (Example-I)
The governing equation for two-dimensional steady state heat transfer in isotropic material is given as:
0
2
2
2
2
= +
c
c
+
c
c
|
|
.
|

\
|
Q
y
T
x
T
k (3a)
Page 3 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
3
The boundary conditions are given as:
at edge ,
1
I
e
T T = (3b)
at edge ,
2
I 0 =
c
c
y
T
k (3c)
at edge ,
3
I ) (

=
c
c
T T h
x
T
k (3d)
at edge ,
4
I ) (

=
c
c
T T h
y
T
k (3e)
Enforcing essential boundary conditions using Lagrange multiplier method, a set of linear equations is obtained
using Eq. (1a)
(

=
(

q
R

T
G
G K
0
T
(4a)
}
I u u +
}
I u u +
}
O
(
(

u
u
(

(
(

u
u
=
I I O
4 3
,
,
,
,
0
0
d h d h d
k
k
K
J
T
I J
T
I
y J
x I
T
y J
x I
J I
(4b)
}
I u +
}
I u +
}
O u =
I

I

O
4 3
d T h d T h d Q f
I I I I
&
(4c)
}
I u =
I
1
d N G
k I K I
(4d)
}
I =
I
1
d N T q
K e K
(4e)
Eq. (4a) can be further written as:
[ ]{ } { } F U A = (5a)
where,
[ ]
N by N
T (

=
0 G
G K
A (5b)
{ }
1 by N
(

T
U (5c)
{ }
1 by N
(

=
q
R
F (5d)
2.2 The EFG for Fluid Flow Problems (Example-II)
The momentum equation for viscous incompressible fluid flowing through the long and uniform duct is given as
Page 4 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
4
0
2
2
2
2
=
|
.
|

\
|
+
c
c

c
c
c
c
z
p
y
u
x
u
(6a)
The essential boundary conditions are
at the surface 0
1 1
= = I u u (6b)
at the surface 0
2 2
= = I u u (6c)
at the surface 0
3 3
= = I u u (6d)
at the surface 0
4 4
= = I u u (6e)
Enforcing essential boundary conditions using Lagrange multiplier method, the following set of linear equations
is obtained using Eq. (1b)
)
`

=
)
`

(
(

q
f

u
G
G K
T
0
(7a)
where
O
(
(

(
(

=
}
O
d

K
y J
x J
T
y I
x I
J I
,
,
,
,
0
0
(7b)
}
O
O = d M f
I I
(7c)
} } } }
I I I I
I + I + I + I =
4 3 2 1
d N d N d N d N G
K I K I K I K I K I
(7d)
} } } }
I I I I
I + I + I + I =
4
4
3
3
2
2
1
1
d N u d N u d N u d N u q
K S K S K S K S
K
(7e)
The Eq. (7a) can be written as:
[ ]{ } { } F U A = (8a)
where
[ ]
N N
T
by
0 (
(

=
G
G K
A (8b)
{ }
1 by N
)
`

u
U (8c)
{ }
1 by N
)
`

=
q
f
F (8d)
Page 5 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
5
3 PARALLEL IMPLEMENTATION
Parallel implementation on distributed memory systems differs from the parallel implementation on shared
memory systems. On distributed memory parallel computers, each processor has its own local memory only.
Data exchange between different processors is done by message passing and the time needed for this
interprocessor communication is to be considered. For parallelization, data are distributed among the processors
in such a way that communication costs should be as low as possible. One approach is the domain
decomposition method in which whole domain is divided in to small subdomains and each processor performs
the work on one subdomain. This gives good results if it is possible to divide the domain into subdomains in
such a way that each processor gets nearly equal amount of work. In this paper, authors have utilized the data
decomposition approach in the parallel algorithm. The EFG code for solving heat transfer and fluid flow
problems consist two parts:
nodes generation and assembling of the system matrix
solution of the linear system of equations
Basically, there are two methods for the parallelization of EFG sequential code. The first method
emphasizes on the parallel implementation of the whole sequential code while second method direct towards the
careful analysis of the whole sequential code and then select the portions intelligently where implementing
parallel programming will results in reduction of computational cost both in terms of time and complexity.
Therefore, first a careful analysis of the EFG code has been performed and it has been found that the time
required in solving the linear system of equations (i.e. inversion time) increases with the increase in data size
(number of equations) as shown in Table 1 & Fig. 1 for heat transfer problems and in Table 2 & Fig. 2 for fluid
flow problems respectively. In other words, the major part of the total computational time is required in solving
the linear system of equations. Therefore, parallel code has been developed only for the solution of the system
of linear equations, not for the whole EFG sequential code.
Table 1: Variation of total time & solution (inversion) time with data size (no. of equations) for heat transfer
problems
Computational time (sec)
Data size
(no. of equations)
Total time,
t
t Solution time,
s
t
(Inverse time)
|
|
.
|

\
|
x100
t
s
t
t
%
89
131
281
461
701
991
1229
0.6431
1.6158
11.3075
55.9784
194.5410
598.8565
1179.5400
0.3641
1.1312
9.8999
52.740
188.2192
587.2674
1161.9026
56.62
70.00
87.55
94.21
96.75
98.06
98.50
Page 6 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
6
200 400 600 800 1000 1200
0
10
20
30
40
50
60
70
80
90
100
Data size (Number of equations)
S
o
l
u
t
i
o
n

(
i
n
v
e
r
s
e
)

t
i
m
e

(
%
)
Fig. 1: Percentage variation of solution (inverse) time with data size (no. of equations) for heat transfer problem
Table 2: Variation of total time & solution (inversion) time with data size (no. of equations) for fluid flow
problem
Computational time (sec)
Data size
(no. of equations)
Total time,
t
t Solution time,
s
t
(Inverse time)
|
|
.
|

\
|
x100
t
s
t
t
%
113
161
316
521
776
1126
1462
0.8595
2.3036
16.6898
78.2076
266.1130
838.3200
1973.200
0.7168
2.0446
15.8126
75.9189
260.6112
826.9423
1953.8189
83.39
88.76
94.75
97.07
97.95
98.64
99.02
Page 7 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
7
200 400 600 800 1000 1200 1400
50
55
60
65
70
75
80
85
90
95
100
Data size (Number of equations)
S
o
l
u
t
i
o
n

(
i
n
v
e
r
s
e
)

t
i
m
e

(
%
)
Fig. 2: Percentage variation of solution (inverse) time with data size (no. of equations) for fluid flow problem
3.1 Parallel Algorithm for Solution of Linear System Equations
Matrix inversion method is one of the common methods adopted for obtaining the solution of the system of
linear equations [ ]{ } { } F U A = . In this method, first the inverse of matrix [ ] A is calculated then the solution is
computed using{ } [ ] { } F A U
1
= . In the present work, a parallel algorithm is proposed based on the matrix
inversion technique to reduce the computational cost of the EFG method. During implementation of this
algorithm on a supercomputer (PARAM 10000), first row wise data distribution is carried out. After proper data
distribution among the processors, an identity matrix is generated by each processor of size [ ] A . In the process
of matrix inversion, row wise operations are carried out. Every non-diagonal element of matrix [ ] A is converted
to zero and every diagonal element of matrix [ ] A is converted to unity. Whatever operations carried out on
matrix [ ] A , the same operations are also carried out on matrix [ ] I . Each processor operates on its own row to
achieve less computational time. After finding the inverse of matrix [ ] A , the unknown [ ] U is calculated by
using{ } [ ] { } F A U
1
= .
Page 8 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
8
Parallel Algorithm
Global Numprocs Number of processors
N Number of equations
MyRank Rank of each processor.
Rank Rank of processor holding current row
[ ] A Input matrix.
{ } F Input column vector
[ ] I Inverse matrix of [ ] A
i Variable indicating current row
start Starting row number for each processor
end Ending row number for each processor
do 0 = i to Numprocs - 1
Set start
Set end
end do
do i = 1 to N
Set diagonal elements of [ ]
i
A = 1.0
Change non-diagonal element of [ ]
i
A
Change elements of matrix [ ]
i
I
do i = 0 to Numprocs 1
Find the Rank of the current row
If (MyRank = Rank) then
Broadcast current row
endif
end do
do j = start to end
Change non-diagonal element of [ ] 0 . 0 =
i
A
Change elements of matrix[ ]
i
I
end do
end do
do i = start to end
Compute { }
i
U
end do
do j = 1 to Numprocs 1
Send { }
i
U to Master Processor
end do
3.2 Hardware and Software Used
The hardware used for numerical solution is a PARAM 10000 supercomputer which has been developed by
C-DAC, Pune, India. The PARAM 10000 is 6.4 GF, RISC based distributed memory multiprocessor system and
categorized under multiple instruction multiple data (MIMD) type computer. It has total four nodes (three
compute nodes and one server node). Each compute node has two UltraSparc II 64-bit RISC CPUs of 400 MHz,
512 MB main memory, two Ultra SCSI HDD of 9.1 GB each and one 10/100 Fast Ethernet Card while server
node has two UltraSparc II 64-bit RISC CPUs of 400 MHz, 1GB of main memory, four Ultra SCSI HDD of 9.1
GB each and one 10/100 Fast Ethernet Card. PARAM 10000 parallel machine has total 8 processors (each node
with two processors), Sun Sparc Compilers (F90 Compiler Version 2.0, F77 Compiler Version 5.0, C Compiler
Version 5.0, C++ Compiler Version 5.0) and supports both MPI & PVM message passing environments.
Page 9 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
9
4 NUMERICAL RESULTS AND DISCUSSION
The parallel code has been developed in FORTRAN language for the proposed algorithm. The EFG results
have been obtained for the model heat transfer and fluid flow problems. The computational time components i.e.
(total time, user time and communication time), speedup and efficiency (Appendix) have been calculated for the
whole sequential code using a PARAM 10000 supercomputer.
4.1 Example-I: Heat Transfer Problem
The parallel EFG results have been obtained for a model heat transfer problem. The different parameters
used for the analysis of model shown in Fig. 3 are tabulated in Table 3. Table 4 shows a comparison of
temperature values obtained by EFG method with those obtained by FEM for 121 nodes. From Table 4, it is
clear that the temperature values obtained by EFG method are in good agreement with those obtained by FEM.
Table 5 shows the variation of total time, communication time, speedup and efficiency with the number of
processors for 701 = N . The variation of total time, communication time, user time, speedup and efficiency with
number of processors is also presented in Table 6 for 991 = N and in Table 7 for 1229 = N . Fig. 4 & Fig. 5
show the variation of speedup & efficiency with number of processors & data size (number of equations). Using
8 processors, the maximum speedup & efficiency have been obtained as 6.86 & 85.81% respectively for the data
size of 1229 = N .
From the above analysis, it is observed that as the data size (number of equations) increases, the results
starts improving both in terms of efficiency and speedup. The contribution of communication time to the total
time is almost negligible. Moreover it is also clear that with the increase in data size (number of equations), the
speedup & efficiency are improving with the increase in number of processors.
Fig. 3: Model for heat transfer problem
y
2
I
L
W
1
I
4
I
3
I
x
Page 10 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
10
Table 3: Data for the model shown in Fig. 3
Parameters Value of parameters
Length ) (L
Width (W )
Thermal conductivity ( k )
Rate of internal heat generation ( Q )
Heat transfer coefficient ( h )
Surrounding fluid temperature (

T )
Temperature at edge ( )
e
T , 0 = x or
1
I
1 m
1 m
400 W/m C
0 W/m
3
100 W/m
2
C
20 C
200 C
Table 4: Comparison of EFG results with FEM at few typical locations for 121 nodes
Location (m)
Temperature ( C
0
)
x
y
EFG FEM
0.5
0.5
0.5
1.0
1.0
1.0
1.0
0.5
0.0
1.0
0.5
0.0
160.8224
172.2274
175.0713
140.9820
151.8434
155.0760
160.8950
172.2670
175.1240
141.0610
151.8510
155.1100
Table 5: Variation of total time, communication time, user time, speedup and efficiency with number of
processors for 701 = N
Number of
processors
Total Time
(sec)
Communication
time (sec)
User time
(sec)
Speedup
Efficiency
(%)
1
2
3
4
5
6
7
8
212.2315
111.4260
78.0770
60.6514
55.0408
54.5207
46.3948
50.2148
0.0000
0.0671
0.1053
0.5706
0.8323
6.0414
1.9330
5.5403
211.6500
110.4400
76.3050
58.2800
48.6950
42.8500
36.8950
33.4350
1.00
1.92
2.77
3.63
4.34
4.94
5.74
6.33
100.00
95.82
92.45
90.75
86.92
82.32
81.95
79.13
Table 6: Variation of total time, communication time, user time, speedup and efficiency with number of
processors for 991 = N
Number of
processors
Total Time
(sec)
Communication
time (sec)
User time
(sec)
Speedup
Efficiency
(%)
1
2
3
4
5
6
7
8
598.8565
310.1355
214.2525
164.6720
141.6955
134.5425
114.9740
122.0950
0.0000
0.1924
0.2256
0.9013
1.7398
6.8044
3.0855
6.2373
597.2500
307.7700
211.3900
161.3050
133.4550
114.3500
98.7850
89.4100
1.00
1.94
2.82
3.70
4.47
5.22
6.05
6.68
100.00
97.03
94.18
92.56
89.50
87.05
86.37
83.50
Page 11 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
11
Table 7: Variation of total time, communication time, user time, speedup and efficiency with number of
processors for 1229 = N
Number of
processors
Total Time
(sec)
Communication
time (sec)
User time
(sec)
Speedup
Efficiency
(%)
1
2
3
4
5
6
7
8
1179.5400
606.6460
417.6550
320.2650
282.8400
254.3290
222.0310
216.9900
0.0000
0.2567
0.7822
0.5918
2.6019
4.7861
4.9631
3.8176
1176.5700
600.4800
411.4100
315.1500
261.6100
221.0900
191.8700
171.3900
1.00
1.95
2.86
3.73
4.50
5.32
6.13
6.86
100.00
97.97
95.32
93.33
89.95
88.69
87.60
85.81
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Number of processors
S
p
e
e
d
u
p
N=701
N=991
N=1229
Ideal
Fig. 4: Variation of speedup with number of processors and data size (no. of equations)
Page 12 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
12
1 2 3 4 5 6 7 8
0
10
20
30
40
50
60
70
80
90
100
Number of processors
E
f
f
i
c
i
e
n
c
y

(
%
)
N=701
N=991
N=1229
Fig. 5: Variation of efficiency with number of processors and data size (no. of equations)
4.2 Example-II: Fluid Flow Problem
The parallel EFG results have been obtained for a model fluid flow problem. The different parameters used
for the analysis of model shown in Fig. 6 are tabulated in Table 8. Table 9 shows a comparison of velocities
values obtained by EFG method with those obtained by FEM for 121 nodes. From Table 9, it is clear that the
velocity values obtained by EFG method are in good agreement with those obtained by FEM.
Table 10 shows the variation of total time, communication time, user time, speedup and efficiency with the
number of processors for 776 = N . Table 11 & Table 12 also show the variation of total time, communication
time, user time, speedup and efficiency with number of processors for 1126 = N & 1462 = N respectively. Fig.
7 & Fig. 8 show the variation of speedup & efficiency with number of processors & data size (number of
equations). Using 8 processors, the maximum speedup & efficiency have been obtained to be 7.20 & 90.00%
respectively for the data size of 1462 = N .
From the above analysis, it is observed that with increase in data size (number of equations), the results are
improving both in terms of efficiency and speedup. The contribution of communication time to the total time is
almost negligible. Moreover it is also clear that with the increase in data size (number of equations), the speedup
& efficiency are improving with the increase in number of processors.
Page 13 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
13
Fig. 6: Model cross-section of the fluid flowing through a duct
Table 8: Data for model shown in Fig. 6
Parameters Value of parameters
Depth ( D )
Length ( L )
Pressure gradient (
z
P
c
c
)
Dynamic viscosity ( )
All surface velocities ( )
S
u
0.25 m
0.25 m
5000 N/m
2
/m
5 Ns/m
2
0 m/sec
Table 9: Comparison of EFG results with FEM at few typical locations for 121 nodes
Location (m) Velocity (m/sec)
x
y
EFG FEM
0
0
0
0
0
0
0.125
0.100
0.075
0.050
0.025
0.000
0.0000
1.8670
3.1882
3.9319
4.4598
4.6417
0.0000
1.8280
3.1304
3.9929
4.4827
4.6412
3
I
4
I
D
L
y
z
x
1
I
2
I
Page 14 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
14
Table 10: Variation of total time, communication time, user time, speedup and efficiency with number of
processors for 776 = N
Number of
processors
Total Time
(sec)
Communication
time (sec)
User time
(sec)
Speedup
Efficiency
(%)
1
2
3
4
5
6
7
8
266.1130
141.0705
99.6623
76.6645
66.8084
67.3349
59.5002
65.2026
0.0000
0.0445
0.1466
0.3459
0.8534
6.1760
1.7815
5.3795
265.4600
138.8950
95.5150
72.6950
59.8550
52.2250
44.4450
40.9100
1.00
1.91
2.77
3.65
4.43
5.08
5.97
6.48
100.00
95.56
92.64
91.29
88.70
84.72
85.32
81.11
Table 11: Variation of total time, communication time, user time, speedup and efficiency with number of
processors for 1126 = N
Number of
processors
Total Time
(sec)
Communication
time (sec)
User time
(sec)
Speedup
Efficiency
(%)
1
2
3
4
5
6
7
8
838.3200
433.8130
299.6620
230.6550
201.4360
181.6840
163.2360
163.3560
0.0000
0.1940
0.2509
1.1592
1.6366
4.9364
3.4810
5.1744
836.4050
427.2300
291.9100
221.5500
182.4800
155.1100
134.4500
121.9800
1.00
1.95
2.86
3.77
4.58
5.39
6.22
6.85
100.00
97.88
95.50
94.38
91.67
89.87
88.87
85.71
Table 12: Variation of total time, communication time, user time speedup and efficiency with number of
processors for 1462 = N
Number of
processors
Total Time
(sec)
Communication
time (sec)
User time
(sec)
Speedup
Efficiency
(%)
1
2
3
4
5
6
7
8
1973.2000
994.7770
697.1760
534.6850
449.4020
400.0850
357.6345
335.0405
0.0000
0.2786
0.9614
0.3451
2.6545
5.8641
4.3058
6.8723
1968.5500
983.7400
664.7600
502.8700
422.2300
354.2450
308.5900
273.5350
1.00
2.00
2.96
3.91
4.66
5.55
6.39
7.20
100.00
100.00
98.71
97.86
93.24
92.61
91.35
90.00
Page 15 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
15
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Number of processors
S
p
e
e
d
u
p
N=776
N=1126
N=1462
Ideal
Fig. 7: Variation of speedup with number of processors and data size (no. of equations)
1 2 3 4 5 6 7 8
0
10
20
30
40
50
60
70
80
90
100
Number of processors
E
f
f
i
c
i
e
n
c
y

(
%
)
N=776
N=1126
N=1462
Fig. 8: Variation of efficiency with number of processors and data size (no. of equations)
Page 16 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
16
5 CONCLUSIONS
In this paper, a new parallel algorithm has been proposed for the EFG method. The parallel EFG code has
been written in FORTRAN language using MPI message passing library and validated by solving two model
problems. The analysis shows that with the increase in data size (number of equations), speedup and efficiency
both improve. Moreover it is also observed that with the increase in data size, the results (total time,
communication time, user time, efficiency and speedup) are improving with the increase in number of
processors. From parallel EFG results presented in this paper, it can be noted that the proposed algorithm is
working well for the EFG method.
NOTATIONS
max
d Scaling parameter
Q Rate of internal heat generation /volume
h Convective heat transfer coefficient
k Coefficient of thermal conductivity
M

|
|
.
|

\
|
T
A
P
h
c
r
e
T Edge temperature

T
Surrounding fluid temperature
) (
I
w x x Weight function
I Boundary of the domain
M Pressure gradient (
z
P
c
c
)
n Number of nodes in the domain of influence
N Number of equations
K
N Lagrange interpolant
) (x
h
T or ) (x
h
u Moving least square approximant
Lagrange multiplier
Dynamic viscosity
O Domain of the problem
) (x u Shape function
Page 17 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
17
REFERENCES
1. J. J. Monaghan, An introduction to SPH, Computer Physics Communications, Vol. 48, pp. 89-96, 1988.
2. B. Nayroles, G. Touzot and P. Villon, Generalizing the finite element method: diffuse approximation and
diffuse elements, Computational Mechanics, Vol. 10, pp. 307-318, 1992.
3. T. Belytschko, Y. Y. Lu and L. Gu, Element free Galerkin methods, International Journal for Numerical
Methods in Engineering, Vol. 37, pp. 229-256, 1994.
4. W. K. Liu, S. Jun and Y. F. Zhang, Reproducing kernel particle methods, International Journal for
Numerical Methods in Engineering, Vol. 20, pp. 1081-1106, 1995.
5. I. Babuska and J. M. Melenk, The partition of unity method, International Journal for Numerical Methods
in Engineering, Vol. 40, 727-758, 1997.
6. C. A. Durate and J. T. Oden, An H-p adaptive method using clouds, Computer Methods in Applied
Mechanics and Engineering, Vol. 139, pp. 237-262, 1996.
7. G. Yagawa and T. Yamada, Free mesh method, a new meshless finite element method, Computational
Mechanics, Vol. 18, pp. 383-386, 1996.
8. N. Sukumar, B. Moran and T. Belytschko, The natural element method in solid mechanics, Inernational
Journal for Numerical Methods in Engineering, Vol. 43, pp. 839-887, 1998.
9. T. Zhu, J. D. Zhang and S. N. Atluri, A meshless local boundary integral equation (LBIE) method for
solving nonlinear problems, Computational Mechanics, Vol. 22, pp. 174-186, 1998.
10. S. N. Atluri and T. Zhu, A new Meshless Local Petrov-Galerkin (MLPG) approach in computational
mechanics, Computational Mechanics, Vol. 22, pp. 117-127, 1998.
11. S. De and K. J. Bathe, The method of finite spheres, Computational Mechanics, Vol. 25, pp. 329-345, 2000.
12. G. R. Liu and Y. T. Gu, A local radial point interpolation method (LRPIM) for free vibration analysis of 2-
D solids, Journal of Sound and Vibration, Vol. 246(1), pp. 29-46, 2001.
13. J. Zhang, Z. Yao and M. Tanaka, The meshless regular hybrid boundary node method for 2-D linear
elasticity, Engineering Analysis with Boundary Elements, Vol. 27, pp. 259-268, 2003.
14. M. Shirazaki and G. Yagawa, Large-scale parallel flow analysis based on free mesh method: a virtually
meshless method, Computer Methods in Applied Mechanics and Engineering, Vol. 174, pp. 419-431, 1999.
15. D. F. Medina and J. K. Chen, Three-dimensional simulations of impact induced damage in composite
structures using the parallelized SPH method, Composites: Part-A, Vol. 31, pp. 853-860, 2000.
Page 18 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics
F
o
r

P
e
e
r

R
e
v
i
e
w
18
16. I. V. Singh, K. Sandeep and R. Prakash, Heat transfer analysis of two-dimensional fins using meshless
element-free Galerkin method, Numerical Heat Transfer-Part A, Vol. 44, pp. 73-84, 2003.
APPENDIX
1 COMPUTATIONAL TIME COMPONENTS
The different components of computational time include real time, system time, user time, CPU time, total time
and communication time. Among all these components, emphasis has been given on total time, communication
time and user time.
1.1 Total Time
The total time (run time) is the time at which parallel computation starts to the moment at which last processor
finishes its execution. The total time is the time measured by the MPI watches built in the program itself.
1.2 Communication Time
The communication time is the time required to transfer the data form one processor to the other processor or
processors.
1.3 User time
The time spent by the program in its execution.
2 PERFORMANCE MATRICES
2.1 Speedup
A measure of relative performance between a multiprocessor system and a single processor system is the
speedup factor, it is defined as:
system) essor (multiproc processors of number using time) (execution User time
system) processor (single processors one using time) (execution User time
Speedup =
2.2 Efficiency
processors of number X processors of number using User time
system) processor (single processors one using User time
Efficiency =
Page 19 of 19
Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany
Computational Mechanics

Vous aimerez peut-être aussi