Académique Documents
Professionnel Documents
Culture Documents
~
Linear Equations So/tv:ar'e in a Fortre.n [i]nvironm ant
Y~c~c J. fTo~gc~rrc~
M a t h e m . a t [ c s and C o m p u t e r S c i e n c e Division
Argonne National Laboratory
A r g o n n e , ]liinois 604Z9
O c t o b e r g4, ! 9 8 3
A b s t r a c t - I n this n o t e we c o m p a r e a n u m b e r of d i f f e r e n t c o m p u t e r s y s t e m s
f o r solving d e n s e s y s t e m s of l i n e a r e q u a t i o n s u s i n g t h e LIN?ACK s o f t w a r e i n
a Fortran environment. There are about 50 computers compared, ranging
from a Cray X-MP to the 68000 based systems such as the Apollo and SUN
Workstations.
The t i m i n g i n f o r m a t i o n p r e s e n t e d h e r e s h o u l d in no w a y b e u s e d to j u d g e
t h e o v e r a l l p e r f o r m a n c e of a c o m p u t e r s y s t e m . The r e s u l t s r e f l e c t o n l y o n e
p r o b l e m a r e a : solving d e n s e s y s t e m s of e q u a t i o n s u s i n g t h e LINPACK [ i ] pro-
g r a m s in a F o r t r a n e n v i r o n m e n t .
The LINPACK p r o g r a m s c a n b e c h a r a c t e r i z e d as h a v i n g a h i g h p e r c e n t a g e of
floating point arithmetic operations. The routines involved in this timing study,
S G E F A and SGESL, use algorithms which are column oriented. By c o l u m n orlen-
tation we m e a n the programs usually references array "elements sequentia!!y
d o w n a column, no~ across a row. C o l u m n orientation is important in increasing
e~iciency in a Fortran environment because of the way in which arrays are
stored. ~ost
. . of r.~ floating
. . point
. .operations
. . in LINInACK _~.~_~j~m~,~a>~.._,~ . < ~ ~n~_a
set of subprograms called the Basic Linear Algebra <ub~'~'c...... (B!...AS) Fo]
These routines are called by the L ! N P A C K routines repeat.ed~y throughout the
calculation. The B!.AS reference one-dimension arrays, rather than two-
dimensional a r r a y s .
22
Solving a S y s t e m of l , i n c a r [:quaLions
w i t h L[NPACK a in FY:It P r e c i s i o n ~'
23
VAX ! I / 7 2 5 FPA VMS (Coded BI_AS) 266 .0,13 ~0 z 6
Apollo /- i. t t,~3 ( C o d e d BIAS) 323 ,0,%8 :.3. t 52.7
IBM 433] .]1 o p t = 3 32,:6 ,0 ..... P .6 :72.9
\LAX 1 1 / 7 3 0 F'PA VMS (Coded Bh.\S) :348 .076 !9.5 f~'J9
VAX 1 ! / 7 2 5 FPA V-FiS (Coded BLAS) 348 .036 !9.5 ([!.9
IB*," P C / 8 0 8 7 VS o p t = 3 391 . 0 "o"i P1.9
~ 03.7
Apollo 4.1 FEB 5559 .022 $!,3 '9!.2
f 'r r
M a s s c o m p ~,~Co00 UNIX, [77 2661 .
00,;6 ~,;9. ' "~"
~ O .
Co T~77zents."
The C r a y X-),~P t i m i n g s r e f l e c t o n l y one p r o c e s s o r .
24
VAX 11/730 FPA %~[S (Coded BIAS) 205 060 1 !.5 33 4
VAX 1 1 / 7 8 5 FPA ~ S (Coded BD\S) 205 .050 i !.5 33.4
B u r r o u g h s 6700 H 234 .05E 13. t 35.2
VA.X 11/730 FPA \'HIS ~59 .047 14.5 4~.2
V A X 11/725 FPA VMS 259 .047 i4.5 42..?
IBM P C / 8 0 8 7 VS o p t = 3 303 .040 i7.0 48.5
DEC KA-10 F40 805 .040 i7.! 49.8
Apollo 4.1 P E B 334 .037 18.7 54 5
IBM PC/8087 ~[icrosoft3.1 1071 .0!.I 60.0 !75.
M a s s c o m p MCS00 UNIX, f77 1245 .0099 69.7 203.
SUN UNIX, f77 1295 .005 72.5 2! 1.
IBM PC Microsoft 3.1 21875 .00058 1225. 3568.
c g b m p / / e r r e f e r s to t h e c o m p i l e r u s e d a n d (Coded BLAS) r e f e r s to t h e u s e of
a s s e m b l y l a n g u a g e coding of t h e BIAS.
d R ~ n is t h e n u m b e r of t i m e s f a s t e r or slower a p a r t i c u l a r m a c h i n e
c o n f i g u r a t i o n is w h e n c o m p a r e d to t h e C r a y - l S using a F o r t r a n c o d i n g for t h e
BIAS.
.1003
Un~ = 106T/me/[--~---+ lOOZ).
Jack J. Dongarra
Mathematics and C o m p u t e r Science Division Telephone: 312-972-7246
Argonne National Laboratory
Argonne, Illinois 60439 ARP.~Lnet: D O N O A R R A @ A N L - K C S
25
APPENDIX
The LINPACK routines used to generate the timings in the previous table do
not reflect the true performance of "advanced scientific computers". A different
implementation of the solution of linear equations, presented in a report by
Dongarra and Eisenstat [3], better describes the performance on such
machines. That algorithm is based on matrix-vector operations rather than just
vector operations. This produces a program that has a high level of modularity
or larger granularity, having the potential for better performance across a wide
rar~e of machines, especially on hi~,h performance computers. As before, a For-
tran p r o g r a m was run and the time to complete the solution of equations for a
matrix of order 300 is reported.
Note that these numbers are for a problem of order 300 and all runs are for
full precision.
Commcnls."
I The Cray X-:~.J timings reflect the use of two processor in solving the prob-
lem.
26
Th e m a j o r d i f T e r e n c e betel.con t h e Cray I-~< a n d , C r a y I-S is in t h e m e m o r y
s p e e d , t h e Crav i-\7 has slot'.or m e m o r y . The timings, show l:he Crav !<< to be
f a s t e r t h a n t h e Cray !-S. A ft er m u c h d i s c u s s i o n a n d e x a m i n a t i o n o!' t h e gen-
e r a t e d assembly lan.guage code it was determined theft, in fact, the Cray i-.q
was faster" for t hi s p r o g r a m . The code ~_.~...-=t_d .... r . . . . ~ by th,, ..~ c o m p i l e r c a u s e s t h e
C r a y I-S to m i s s a c h a i n - s l o t . On t h e Cray i - Z , b e c a u s e or s l o w e r m e m o r y ,
t h e c h a i n - s l o t is n o t m i s s e d , t h u s t h e r a s t e r e x e c u t i o n t i m e .
27