Fem Solver Paper

An Efficient Parallel Solver using Matrix Inversion Method for Linear
and Non-linear Finite Element Problems
P K Gupta, Associate Member

R N Khapre, Non-member
This paper presents a modified parallel matrix inversion algorithm to solve a set of linear equations. It also discusses its
suitability for finite element analysis. This algorithm is implemented on supercomputer PARAM 10000. Computational
time results were obtained by solving a problem of analysis of anchorage zone in prestressed post-tensioned concrete beam.
These results are compared and discussed with the existing results of a parallel matrix inversion algorithm which one
available in literature. This algorithm is then implemented in computer codes for linear and non-linear finite element
analysis. One typical problem from each category is solved and the computational time variation results are obtained and
discusses.
Keywords: Parallel solver; Finite element method; Matrix inversion; PARAM 10000
NOTATION 350 MHz clusters. They also presented and discussed the results
BW : band width of components of computational time. Ramesh and Shah4 made
a similar attempt of implementation of parallel preconditioned
e : eccentricity conjugate gradient method. They used Master-Slave approach
[I ] : identity matrix on transputer-based machine PARAM with 256 nodes having
4 MB of RAM. They employed the ring topology for the sub
[K ] : global stiffness matrix
workers (Slaves).
Pk : prestressing force
Gupta and Khapre5,6 developed three parallel solvers using
INTRODUCTION matrix inversion method, Gauss-Seidel method and Gauss
Structural analysis using finite element method is one of the elimination method, to solve a set of linear equations on the
area in which huge data is require to be handled during platform of a supercomputer PARAM 10000. These solvers
computation. Conventional computer takes significant were implemented to get solution of a set of linear equations
computational time to complete such analysis. Parallel generated during the analysis of a linear elastic structural
computing technique can be a better option for such kind of problem by finite element method. After comparing the
analysis that takes much computational time. In finite element computational time variation obtained by different solvers, it
analysis major portion of the computational time is spent in was found that the solver developed using matrix inversion
getting the solution of the linear equations. Therefore, parallel method is suitable for supercomputers while Gauss
solvers could be employed to reduce the computational time. Elimination solver is suitable for conventional computer. They
There are different mathematical methods available for getting also carried out the comparison of blocking and non-blocking
solution of a set of linear equations. These are already in use in communication mechanism and found that both mechanisms
development of parallel solvers1-7. are equally effective in communication when incorporated in
parallel solvers6. They also carried out a study to find the effect
Shah and Kant1 used parallel Cholesky solver to determine the of user activities on computational time and found that as the
solution of a set of linear equations, which are obtained during user activities increase the Real time also increases6. They also
the finite element analysis of fibre-reinforced polymer shells. introduced a term User time that remains unaffected by user
Khan and Topping2 presented a modified parallel Jacobi- activities and recommended that it should be used to evaluate
conditioned conjugate gradient method. They discussed and the performance of parallel programs6. A study was also carried
implemented element-by-element and diagonally conditioned out by Gupta, et al 7 to explore the suitability of C and
approaches on distributed memory MIMD architectures. FORTRAN 77 language on supercomputers. A parallel solver
Thiagarajan and Aravamuthan3 presented a preconditioned using matrix inversion method was developed on PARAM
conjugate gradient finite element solver on 32-node Pentium II 10000 machine using C and FORTRAN 77 as programming
P K Gupta and R N Khapre are with Civil Engineering Group, Birla languages. After comparing the computational time results, it
Institute of Technology and Science, Pilani, Rajasthan, 333 031. was found that the program written in C language takes less
This paper was received on September 1, 2004. Written discussion on the time as compared to the program written in FORTRAN 77
paper will be entertained till August 31, 2005. language. It was also found that the percentage difference in
44 IE (I) JournalCV
User time obtained by both the programs was nearly 25% for global stiffness matrix is a diagonal matrix in which the
every number of processors. elements inside the bandwidth are non-zero and rest other
elements inside the upper and lower triangles of matrix are
An attempt has been made through this paper to improve the
zero. The appearance of global stiffness matrix and identity
previous work5,6. Parallel solver using matrix inversion method,
matrix is quite similar. Both matrices contain zero elements in
implemented on PARAM 10000 using C language, presented by
their upper and lower triangles (Figure 1), hence if one used this
Gupta and Khapre5,6 is improved. The suitability of matrix
method, computational affords can be saved. Further, zero
inversion method is also discussed exclusively for finite
elements also exist inside the bandwidth of stiffness matrix;
element analysis. A set of linear equations taken from
therefore, lesser computations are required to inverse the global
literature5 was solved using this developed solver. The
stiffness matrix.
computational time results obtained by developed solver are
discussed and compared with the previous literature5,6. This
solver is then implemented in two finite element codes to solve Global P {Number of Processors}
the obtained set of linear equations. A typical problem in each n {Number of Equations}
category is solved and the computational time variation is
MyRank {Rank of the Processor}
obtained. Speed-up achieved by both the finite element codes
Rank {Rank of processor holding current row}
are calculated and discussed.
start {Flag indicating starting row number for each
processor}
PARAM 10000 ARCHITECTURE OVERVIEW
end {Flag indicating ending row number for each
PARAM 10000 has a MIMD distributed memory machine processor}
architecture, developed by Centre for Development of i {Variable indicating current row}
Advanced Computing (C-DAC). The machine has four nodes, [I ] {Matrix indicating inverse of matrix [A]}
each having two UltraSPARC-II 64-bit RISC CPUs @ 400 MHz for all P i where 0 < i < P do
each, with 2 MB external cache. Each processor has 512 MB Set start
main memory extendable to 2 GB. The PARAM 10000 has the
Set end
two interconnection networks, namely, PARAMNnet and Fast
Ethernet8. for i = 0 to n 1 step 1
if diagonal of [A] i = 1.0
MATRIX INVERSION METHOD continue
else
Matrix inversion method is one of the basic methods of solving
system of linear equation [A] [x] = [B ]. In this method, the Set diagonal element of [A] i = 1.0
inverse of matrix [A] is computed and then multiplied with the Change elements of matrix [I ]i
vector [B ] to get the unknown vector [x]. Mathematical relation endif
[A] [A]1 = [I ] is used to generate the matrix [A]1. In the process for all P i where 0 < i < P do
of matrix inversion, an identity matrix [I ] is generated and row Find the Rank of current row
wise operations are carried out on matrix [A] and matrix [I ]
If MyRank = Rank
such that matrix [A] takes form of matrix [I ] and matrix [I ] gets
Broadcast current row
converted to matrix [A]1.
endif
In finite element method, matrix [A] represents the global endfor
stiffness matrix and vectors [B ] and [x] represents global force
for j = start to end step 1
matrix and global displacement matrix, respectively. The
if [A] ij ≠ 0.0
BW Change non-diagonal element of [A] ij = 0.0
Change elements of matrix [I ]ij
endif
endfor
endfor
for i = start to end step 1
Compute [x] i
endfor
for all Pi where 0 < i < P do
Broadcast [x] i to All Processor
endfor
Figure 1 Stiffness matrix and identity matrix Figure 2 Parallel algorithm for matrix inversion method
Vol 86, May 2005 45

Table 1 Comparison of computational time components
Processor Developed solver Original solver Saving, %
Real (s) User (s) Comm (s) Real (s) User (s) Comm (s) Real User
1 189.86 188.51 0.00 578.16 575.42 0.00 67.16 67.23
2 173.76 109.30 1.59 299.72 294.14 1.50 42.02 62.84
3 136.06 90.94 3.57 208.52 200.37 3.61 34.75 54.61
4 128.05 76.19 4.01 163.36 153.00 4.18 21.61 50.20
5 110.73 65.25 4.97 155.09 127.17 12.96 28.60 48.69
6 105.03 56.98 5.34 141.84 108.41 15.61 25.95 47.44
7 92.35 50.65 6.61 125.84 92.24 14.62 26.61 45.08
8 94.88 47.11 3.01 134.43 82.95 16.15 29.42 43.20
Note : Communication (s): Communication time (s)
ALGORITHM User time reduces up to 29% and 43%, respectively, when eight
Initially the range of data to be handled by each processor was processors were employed. Sudden reduction in Real time can
decided. If data distribution was not even, then the remaining be observed from one processor to four processors. It can also
data was distributed to the processors with lower ranks. After be observed that after four processors, reduction in Real time is
proper data distribution among the processors, an identity gradual but insignificant. It can also be observed that variation
matrix [I ] of size [A] was created by all processors. In the of reduction in percentage saving in User time with increase in
process of matrix inversion, row wise operations were carried number of processor is continuous, whereas in case of Real time
out. Every non-diagonal element of matrix [A] was converted to the variation is abrupt (sudden fall of percentage saving at four
zero and every diagonal element of matrix [A] was made unity. processors). The user activities are mainly responsible for such
While doing this, the operations were skipped at locations variations.
where non-diagonal elements have zero value and diagonal
element have unity value. This helped in reducing the number
of computations.
Whatever operations were carried out on matrix [A], same Centre of end block
operations were also carried out on matrix [I ] simultaneously.
Each processor operated only those rows, which were Pk
e
Centre of anchorage plate Pk
designated to it to achieve less computational time. After
finding the inverse of matrix [A], the unknown vector [x] was
calculated by multiplying [A]1 with [B ]. At this juncture, each
(a)
processor was having elements of vector [x] those belong to its
share. Then each processor broadcasted these elements of
vector [x] to the all other processors so that every processor
should have complete vector [x]. Figure 2 shows the algorithm
of matrix inversion on parallel computers.
COMPUTATIONAL TIME RESULTS

Based on the algorithm discussed, a parallel solver is developed.
Data of size 1226 × 1226 generated from finite element analysis6
is solved by this developed solver. Computational time results
were generated and compared with the results of the original
solver developed by Gupta and Khapre6 in the Table 1. One can
observe that all the components of computational time, namely,
Real, User and Communication9 reduce dramatically when
this solver was used. One can also observe that for single
processor nearly 67% of User time as well as Real time can be (b)
saved by using the presented solver. As the number of
processors increase the percentage saving in both time Figure 3(a) Eccentrically loaded prestressed concrete beam; and
components reduces. The percentage saving in Real time and (b) Discretised beam for present study
46 IE (I) JournalCV
CASE STUDY achieved was 2.8 for five numbers of processors. It can be
The above-developed solver was implemented in two different observed that the Real time speed-up curve is just below the
finite element codes. One typical problem from each category User time speed-up curve.
was solved using these two codes and the results of different
components of computational time were obtained and Case II: Non-linear Finite Element Analysis
discussed. A problem of simple compression of solid cylinder10 having
dimensions, 25 mm radius and 25 mm height was analyzed. The
Case I: Linear Finite Element Analysis
A problem of anchorage zone in prestressed post-tensioned Axis of rotation
concrete beam is analyzed5. The problem was considered as Undeformed mesh
two-dimensional plane stress problem (Figure 3(a)) and the Undeformed
profile
beam was discretized using 4800 three-noded triangular
elements with 2501 nodes (Figure 3(b)) resulting in global
stiffness matrix of size 5002 × 5002. The problem was analyzed Deformed
by increasing the number of processors from one to five. Each profile
processor required 480 MB of memory for every execution.
Figure 4(a) shows the variation in the different components of
computational time with increase in number of processors. It is
observed that all components of computational time reduce
considerably with the increase in the number of processors.
Figure 4(b) shows the variation in speed-up achieved by the
Figure 5 Discretized cylinder and deformed undeformed shape of solid
FEM code. It shows almost linear variation in speed-up
cylinder
achieved by Real time as well as in User time. Maximum speed-up
Real
10000
30000 Real User
User Communication
Communication 8000
Time, s
20000 6000
Time, s
4000
10000
2000
0
0 1 2 3 4 5 6 7 8
1 2 3 4 5
Number of processors
Number of processors
(a) (a)
5 Real Real
User 9 User
Ideal Ideal
4
7
Speed-up
Speed-up
3 5
2 3
1
1
1 2 3 4 5 1 2 3 4 5 6 7 8
Number of processors Number of processors
(b)
(b)
Figure 4 Variation in (a) computational time; and (b) speed-up with Figure 6 Variation in (a) computational time; and (b) speed-up with
number of processors number of processors
Vol 86, May 2005 47

cylinder was compressed with a velocity of 25 mm/s till 30% inversion method takes slightly more time than the Gauss
reduction in height was achieved. The reduction was occurred Elimination method and hence it can be concluded that the
in 15 steps. The bottom surface was considered as frictionless developed solver can be effectively used on single and multiple
and for top surface, friction factor of magnitude 0.5 was processor machines.
considered. The error norm was considered as 0.001 and
ACKNOWLEDGEMENT
limiting strain rate value was considered as 0.01 to define the
rigid portion of cylinder. The material behavior was expressed The authors would like to thanks the C-DAC, Pune for the
by the equation σ = k ε& m ; where the values of k and m were support given to this research work through research project
Computer Simulation of Large Deformations Process. The
taken as 10 and 0.1, respectively. The cylinder was discretized
authors also acknowledge the support of Image and Parallel
using 400 four-noded rectangular elements with 441 nodes
Processing Laboratory, BITS, Pilani, India, for providing
(Figure 5) resulting in global stiffness matrix of size 882 × 882. parallel computing facilities for this work.
The problem was analyzed by increasing the number of
processors from one to eight. Each processor required 14 MB of REFERENCE
memory for every execution. The solution procedure was 1. M S Shah and T Kant. Finite Element Analysis of Fibre-reinforced
iterative and 84 iterations were carried out to analyze the Polymer Shells using Higher Order Shear Deformation Theories on Parallel
problem in 15 steps. Figure 5 also shows the deformed- Distributed Memory Machines. International Journal of Computer
undeformed shape of the cylinder. Applications in Technology, vol 31, 1998, p 1.
Figure 6(a) shows the computational time variation with 2. A I Khan and B H V Topping. Parallel Finite Element Analysis using
increasing number of processors. It can be observed that both Jacobi-conditioned Conjugate Gradient Algorithm. Advances in
Real time and User time reduce with the increase in number of Engineering Software, vol 25, 1996, p 309.
processors, whereas Communication time increases with the
3. G Thiagarajan and V Aravamuthan. Parallelization Strategies for
increase in number of processors. Rapid reduction in Real time
Element-by-element Preconditioned Conjugate Gradient Solver using
and User time can be observed from one processor to four
High-performance FORTRAN for Unstructured Finite-element
processors, after which the reduction is insignificant. Figure
Applications on Linux Clusters. ASCE Journal of Computing in Civil
6(b) shows the variation in speed-up with the increasing number
Engineering, vol 16, no 1, January 2002, p 1.
of processors. Maximum speed-up achieved by Real time is three
at eight numbers of processors, while the maximum speed-up 4. K S Ramesh and M Shah. Implementation of Parallel Preconditioned
achieved by User time is 8.6 at eight numbers of processors. Conjugate Gradient Solver for FEA on PARAM. Proceedings of the
International Symposium on Scientific Computing and Mathematical Modelling,
CONCLUSION Banglore, December 1992, p 49.
The paper shows the proper implementation of matrix 5. P K Gupta and R N Khapre. Finite Element Analysis of Anchorage Zone
inversion method in development of parallel solver for finite using Supercomputer PARAM 10000. Proceedings of the International
element method. It also discusses how this solver is suitable for Conference Structural Engineering Convention, an International Meet, IIT,
finite element analysis. An efficient parallel solver is presented Kharagpur, December 2003, p 465.
to reduce the computational time involve in the finite element
analysis. When the results of the computational time were 6. P K Gupta and R N Khapre. Comparative Study of Solution Methods of
compared with the available literature5, it was found that the System of Linear Equations on Supercomputers. Proceedings of the
developed solver is more efficient than the solver developed International Conference Structural Engineering Convention, an International
earlier5. The paper also shows the efficient implementation of Meet, IIT, Kharagpur, December 2003, p 522.
parallel solver in linear and non-linear finite element codes. It 7. P K Gupta, J P Mishra, R N Khapre and P K Jain. Comparison of C and
presents two different problems and shows how the FORTRAN 77 Languages based on their Performance on PARAM 10000.
computational time reduces by adopting parallel solver. Proceedings of the National Conference on Distributed Computing, NITTE,
According to the literature6, to analyze the data of size 1226 × Karkala, March 2004, p 33.
1226 using Gauss Elimination method, single processor of 8. http://param.bits-pilani.ac.in/
PARAM 10000 machine takes 173.46 s of Real time and 170.36 s
9. S Das. UNIX : Concepts and Applications. Tata McGraw-Hill, New
of User time. When the same data was analyzed using single
Delhi, 1999, p 48.
processor with the present solver it was found that Real time
and User time taken by present solver were 189.96 s and 188.51 s, 10. S Kobayashi, Soo-Ik Oh and T Altan. Metal Forming and the Finite-
respectively. One can observe that the present solver for matrix element Method. Oxford University Press, New York, 1989, p 364.
48 IE (I) JournalCV

Fem Solver Paper

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Fem Solver Paper

Transféré par

Droits d'auteur :

Formats disponibles

An Efficient Parallel Solver using Matrix Inversion Method for Linear

and Non-linear Finite Element Problems

P K Gupta, Associate Member

Vol 86, May 2005 45

Processor Developed solver Original solver Saving, %

1 189.86 188.51 0.00 578.16 575.42 0.00 67.16 67.23

2 173.76 109.30 1.59 299.72 294.14 1.50 42.02 62.84

3 136.06 90.94 3.57 208.52 200.37 3.61 34.75 54.61

4 128.05 76.19 4.01 163.36 153.00 4.18 21.61 50.20

5 110.73 65.25 4.97 155.09 127.17 12.96 28.60 48.69

6 105.03 56.98 5.34 141.84 108.41 15.61 25.95 47.44

7 92.35 50.65 6.61 125.84 92.24 14.62 26.61 45.08

8 94.88 47.11 3.01 134.43 82.95 16.15 29.42 43.20

Note : Communication (s): Communication time (s)

COMPUTATIONAL TIME RESULTS

Number of processors Number of processors

Vol 86, May 2005 47

Vous aimerez peut-être aussi