Académique Documents
Professionnel Documents
Culture Documents
This paper presents a modified parallel matrix inversion algorithm to solve a set of linear equations. It also discusses its
suitability for finite element analysis. This algorithm is implemented on supercomputer PARAM 10000. Computational
time results were obtained by solving a problem of analysis of anchorage zone in prestressed post-tensioned concrete beam.
These results are compared and discussed with the existing results of a parallel matrix inversion algorithm which one
available in literature. This algorithm is then implemented in computer codes for linear and non-linear finite element
analysis. One typical problem from each category is solved and the computational time variation results are obtained and
discusses.
Keywords: Parallel solver; Finite element method; Matrix inversion; PARAM 10000
NOTATION 350 MHz clusters. They also presented and discussed the results
BW : band width of components of computational time. Ramesh and Shah4 made
a similar attempt of implementation of parallel preconditioned
e : eccentricity conjugate gradient method. They used Master-Slave approach
[I ] : identity matrix on transputer-based machine PARAM with 256 nodes having
4 MB of RAM. They employed the ring topology for the sub
[K ] : global stiffness matrix
workers (Slaves).
Pk : prestressing force
Gupta and Khapre5,6 developed three parallel solvers using
INTRODUCTION matrix inversion method, Gauss-Seidel method and Gauss
Structural analysis using finite element method is one of the elimination method, to solve a set of linear equations on the
area in which huge data is require to be handled during platform of a supercomputer PARAM 10000. These solvers
computation. Conventional computer takes significant were implemented to get solution of a set of linear equations
computational time to complete such analysis. Parallel generated during the analysis of a linear elastic structural
computing technique can be a better option for such kind of problem by finite element method. After comparing the
analysis that takes much computational time. In finite element computational time variation obtained by different solvers, it
analysis major portion of the computational time is spent in was found that the solver developed using matrix inversion
getting the solution of the linear equations. Therefore, parallel method is suitable for supercomputers while Gauss
solvers could be employed to reduce the computational time. Elimination solver is suitable for conventional computer. They
There are different mathematical methods available for getting also carried out the comparison of blocking and non-blocking
solution of a set of linear equations. These are already in use in communication mechanism and found that both mechanisms
development of parallel solvers1-7. are equally effective in communication when incorporated in
parallel solvers6. They also carried out a study to find the effect
Shah and Kant1 used parallel Cholesky solver to determine the of user activities on computational time and found that as the
solution of a set of linear equations, which are obtained during user activities increase the Real time also increases6. They also
the finite element analysis of fibre-reinforced polymer shells. introduced a term User time that remains unaffected by user
Khan and Topping2 presented a modified parallel Jacobi- activities and recommended that it should be used to evaluate
conditioned conjugate gradient method. They discussed and the performance of parallel programs6. A study was also carried
implemented element-by-element and diagonally conditioned out by Gupta, et al 7 to explore the suitability of C and
approaches on distributed memory MIMD architectures. FORTRAN 77 language on supercomputers. A parallel solver
Thiagarajan and Aravamuthan3 presented a preconditioned using matrix inversion method was developed on PARAM
conjugate gradient finite element solver on 32-node Pentium II 10000 machine using C and FORTRAN 77 as programming
P K Gupta and R N Khapre are with Civil Engineering Group, Birla languages. After comparing the computational time results, it
Institute of Technology and Science, Pilani, Rajasthan, 333 031. was found that the program written in C language takes less
This paper was received on September 1, 2004. Written discussion on the time as compared to the program written in FORTRAN 77
paper will be entertained till August 31, 2005. language. It was also found that the percentage difference in
44 IE (I) JournalCV
User time obtained by both the programs was nearly 25% for global stiffness matrix is a diagonal matrix in which the
every number of processors. elements inside the bandwidth are non-zero and rest other
elements inside the upper and lower triangles of matrix are
An attempt has been made through this paper to improve the
zero. The appearance of global stiffness matrix and identity
previous work5,6. Parallel solver using matrix inversion method,
matrix is quite similar. Both matrices contain zero elements in
implemented on PARAM 10000 using C language, presented by
their upper and lower triangles (Figure 1), hence if one used this
Gupta and Khapre5,6 is improved. The suitability of matrix
method, computational affords can be saved. Further, zero
inversion method is also discussed exclusively for finite
elements also exist inside the bandwidth of stiffness matrix;
element analysis. A set of linear equations taken from
therefore, lesser computations are required to inverse the global
literature5 was solved using this developed solver. The
stiffness matrix.
computational time results obtained by developed solver are
discussed and compared with the previous literature5,6. This
solver is then implemented in two finite element codes to solve Global P {Number of Processors}
the obtained set of linear equations. A typical problem in each n {Number of Equations}
category is solved and the computational time variation is
MyRank {Rank of the Processor}
obtained. Speed-up achieved by both the finite element codes
Rank {Rank of processor holding current row}
are calculated and discussed.
start {Flag indicating starting row number for each
processor}
PARAM 10000 ARCHITECTURE OVERVIEW
end {Flag indicating ending row number for each
PARAM 10000 has a MIMD distributed memory machine processor}
architecture, developed by Centre for Development of i {Variable indicating current row}
Advanced Computing (C-DAC). The machine has four nodes, [I ] {Matrix indicating inverse of matrix [A]}
each having two UltraSPARC-II 64-bit RISC CPUs @ 400 MHz for all P i where 0 < i < P do
each, with 2 MB external cache. Each processor has 512 MB Set start
main memory extendable to 2 GB. The PARAM 10000 has the
Set end
two interconnection networks, namely, PARAMNnet and Fast
Ethernet8. for i = 0 to n 1 step 1
if diagonal of [A] i = 1.0
MATRIX INVERSION METHOD continue
else
Matrix inversion method is one of the basic methods of solving
system of linear equation [A] [x] = [B ]. In this method, the Set diagonal element of [A] i = 1.0
inverse of matrix [A] is computed and then multiplied with the Change elements of matrix [I ]i
vector [B ] to get the unknown vector [x]. Mathematical relation endif
[A] [A]1 = [I ] is used to generate the matrix [A]1. In the process for all P i where 0 < i < P do
of matrix inversion, an identity matrix [I ] is generated and row Find the Rank of current row
wise operations are carried out on matrix [A] and matrix [I ]
If MyRank = Rank
such that matrix [A] takes form of matrix [I ] and matrix [I ] gets
Broadcast current row
converted to matrix [A]1.
endif
In finite element method, matrix [A] represents the global endfor
stiffness matrix and vectors [B ] and [x] represents global force
for j = start to end step 1
matrix and global displacement matrix, respectively. The
if [A] ij ≠ 0.0
BW Change non-diagonal element of [A] ij = 0.0
Change elements of matrix [I ]ij
endif
endfor
endfor
for i = start to end step 1
Compute [x] i
endfor
for all Pi where 0 < i < P do
Broadcast [x] i to All Processor
endfor
Figure 1 Stiffness matrix and identity matrix Figure 2 Parallel algorithm for matrix inversion method
Real (s) User (s) Comm (s) Real (s) User (s) Comm (s) Real User
ALGORITHM User time reduces up to 29% and 43%, respectively, when eight
Initially the range of data to be handled by each processor was processors were employed. Sudden reduction in Real time can
decided. If data distribution was not even, then the remaining be observed from one processor to four processors. It can also
data was distributed to the processors with lower ranks. After be observed that after four processors, reduction in Real time is
proper data distribution among the processors, an identity gradual but insignificant. It can also be observed that variation
matrix [I ] of size [A] was created by all processors. In the of reduction in percentage saving in User time with increase in
process of matrix inversion, row wise operations were carried number of processor is continuous, whereas in case of Real time
out. Every non-diagonal element of matrix [A] was converted to the variation is abrupt (sudden fall of percentage saving at four
zero and every diagonal element of matrix [A] was made unity. processors). The user activities are mainly responsible for such
While doing this, the operations were skipped at locations variations.
where non-diagonal elements have zero value and diagonal
element have unity value. This helped in reducing the number
of computations.
Whatever operations were carried out on matrix [A], same Centre of end block
operations were also carried out on matrix [I ] simultaneously.
Each processor operated only those rows, which were Pk
e
Centre of anchorage plate Pk
designated to it to achieve less computational time. After
finding the inverse of matrix [A], the unknown vector [x] was
calculated by multiplying [A]1 with [B ]. At this juncture, each
(a)
processor was having elements of vector [x] those belong to its
share. Then each processor broadcasted these elements of
vector [x] to the all other processors so that every processor
should have complete vector [x]. Figure 2 shows the algorithm
of matrix inversion on parallel computers.
46 IE (I) JournalCV
CASE STUDY achieved was 2.8 for five numbers of processors. It can be
The above-developed solver was implemented in two different observed that the Real time speed-up curve is just below the
finite element codes. One typical problem from each category User time speed-up curve.
was solved using these two codes and the results of different
components of computational time were obtained and Case II: Non-linear Finite Element Analysis
discussed. A problem of simple compression of solid cylinder10 having
dimensions, 25 mm radius and 25 mm height was analyzed. The
Case I: Linear Finite Element Analysis
A problem of anchorage zone in prestressed post-tensioned Axis of rotation
concrete beam is analyzed5. The problem was considered as Undeformed mesh
two-dimensional plane stress problem (Figure 3(a)) and the Undeformed
profile
beam was discretized using 4800 three-noded triangular
elements with 2501 nodes (Figure 3(b)) resulting in global
stiffness matrix of size 5002 × 5002. The problem was analyzed Deformed
by increasing the number of processors from one to five. Each profile
processor required 480 MB of memory for every execution.
Figure 4(a) shows the variation in the different components of
computational time with increase in number of processors. It is
observed that all components of computational time reduce
considerably with the increase in the number of processors.
Figure 4(b) shows the variation in speed-up achieved by the
Figure 5 Discretized cylinder and deformed undeformed shape of solid
FEM code. It shows almost linear variation in speed-up
cylinder
achieved by Real time as well as in User time. Maximum speed-up
Real
10000
30000 Real User
User Communication
Communication 8000
Time, s
20000 6000
Time, s
4000
10000
2000
0
0 1 2 3 4 5 6 7 8
1 2 3 4 5
Number of processors
Number of processors
(a) (a)
5 Real Real
User 9 User
Ideal Ideal
4
7
Speed-up
Speed-up
3 5
2 3
1
1
1 2 3 4 5 1 2 3 4 5 6 7 8
(b)
(b)
Figure 4 Variation in (a) computational time; and (b) speed-up with Figure 6 Variation in (a) computational time; and (b) speed-up with
number of processors number of processors
48 IE (I) JournalCV