Vous êtes sur la page 1sur 469

Iterative Methods

for Sparse Linear Systems

Yousef Saad
6

15

12

11

14

10

13

Copyright c 2000 by Yousef Saad.


S ECOND EDITION WITH CORRECTIONS . JANUARY 3 RD , 2000.

PREFACE
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suggestions for Teaching . . . . . . . . . . . . . . . . . . . . . . . . .

BACKGROUND IN LINEAR ALGEBRA


1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Matrices . . . . . . . . . . . . . . . . . . . Square Matrices and Eigenvalues . . . . . . Types of Matrices . . . . . . . . . . . . . . Vector Inner Products and Norms . . . . . . Matrix Norms . . . . . . . . . . . . . . . . Subspaces, Range, and Kernel . . . . . . . . Orthogonal Vectors and Subspaces . . . . . Canonical Forms of Matrices . . . . . . . . 1.8.1 Reduction to the Diagonal Form . . 1.8.2 The Jordan Canonical Form . . . . . 1.8.3 The Schur Canonical Form . . . . . 1.8.4 Application to Powers of Matrices . 1.9 Normal and Hermitian Matrices . . . . . . . 1.9.1 Normal Matrices . . . . . . . . . . 1.9.2 Hermitian Matrices . . . . . . . . . 1.10 Nonnegative Matrices, M-Matrices . . . . . 1.11 Positive-Denite Matrices . . . . . . . . . . 1.12 Projection Operators . . . . . . . . . . . . . 1.12.1 Range and Null Space of a Projector 1.12.2 Matrix Representations . . . . . . . 1.12.3 Orthogonal and Oblique Projectors . 1.12.4 Properties of Orthogonal Projectors . 1.13 Basic Concepts in Linear Systems . . . . . . 1.13.1 Existence of a Solution . . . . . . . 1.13.2 Perturbation Analysis . . . . . . . . Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

DISCRETIZATION OF PDES
2.1 Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Elliptic Operators . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 The Convection Diffusion Equation . . . . . . . . . . . . . . .

 
xiii
xiv xv

1
1 3 4 6 8 9 10 15 15 16 17 19 21 21 24 26 30 33 33 35 35 37 38 38 39 41

44
44 45 47

2.2

Finite Difference Methods . . . . . . . . . . . . . . . . 2.2.1 Basic Approximations . . . . . . . . . . . . . . 2.2.2 Difference Schemes for the Laplacean Operator 2.2.3 Finite Differences for 1-D Problems . . . . . . 2.2.4 Upwind Schemes . . . . . . . . . . . . . . . . 2.2.5 Finite Differences for 2-D Problems . . . . . . 2.3 The Finite Element Method . . . . . . . . . . . . . . . 2.4 Mesh Generation and Renement . . . . . . . . . . . . 2.5 Finite Volume Method . . . . . . . . . . . . . . . . . . Exercises and Notes . . . . . . . . . . . . . . . . . . . . . .

SPARSE MATRICES
3.1 3.2 Introduction . . . . . . . . . . . . . . . . . Graph Representations . . . . . . . . . . . . 3.2.1 Graphs and Adjacency Graphs . . . 3.2.2 Graphs of PDE Matrices . . . . . . 3.3 Permutations and Reorderings . . . . . . . . 3.3.1 Basic Concepts . . . . . . . . . . . 3.3.2 Relations with the Adjacency Graph 3.3.3 Common Reorderings . . . . . . . . 3.3.4 Irreducibility . . . . . . . . . . . . 3.4 Storage Schemes . . . . . . . . . . . . . . . 3.5 Basic Sparse Matrix Operations . . . . . . . 3.6 Sparse Direct Solution Methods . . . . . . . 3.7 Test Problems . . . . . . . . . . . . . . . . Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BASIC ITERATIVE METHODS


4.1 Jacobi, Gauss-Seidel, and SOR . . . . . . . . 4.1.1 Block Relaxation Schemes . . . . . . 4.1.2 Iteration Matrices and Preconditioning 4.2 Convergence . . . . . . . . . . . . . . . . . . 4.2.1 General Convergence Result . . . . . 4.2.2 Regular Splittings . . . . . . . . . . . 4.2.3 Diagonally Dominant Matrices . . . . 4.2.4 Symmetric Positive Denite Matrices 4.2.5 Property A and Consistent Orderings . 4.3 Alternating Direction Methods . . . . . . . . Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

PROJECTION METHODS
5.1 Basic Denitions and Algorithms . . 5.1.1 General Projection Methods 5.1.2 Matrix Representation . . . . General Theory . . . . . . . . . . . 5.2.1 Two Optimality Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.2

 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 48 49 51 51 54 55 61 63 66

68
68 70 70 72 72 72 75 75 83 84 87 88 88 91

95
95 98 102 104 104 107 108 112 112 116 119

122
122 123 124 126 126

5.2.2 Interpretation in Terms of Projectors 5.2.3 General Error Bound . . . . . . . . 5.3 One-Dimensional Projection Processes . . . 5.3.1 Steepest Descent . . . . . . . . . . 5.3.2 Minimal Residual (MR) Iteration . . 5.3.3 Residual Norm Steepest Descent . . 5.4 Additive and Multiplicative Processes . . . . Exercises and Notes . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

127 129 131 132 134 136 136 139

KRYLOV SUBSPACE METHODS PART I


6.1 6.2 6.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . Krylov Subspaces . . . . . . . . . . . . . . . . . . . . Arnoldis Method . . . . . . . . . . . . . . . . . . . . 6.3.1 The Basic Algorithm . . . . . . . . . . . . . . 6.3.2 Practical Implementations . . . . . . . . . . . . 6.4 Arnoldis Method for Linear Systems (FOM) . . . . . . 6.4.1 Variation 1: Restarted FOM . . . . . . . . . . . 6.4.2 Variation 2: IOM and DIOM . . . . . . . . . . 6.5 GMRES . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 The Basic GMRES Algorithm . . . . . . . . . 6.5.2 The Householder Version . . . . . . . . . . . . 6.5.3 Practical Implementation Issues . . . . . . . . 6.5.4 Breakdown of GMRES . . . . . . . . . . . . . 6.5.5 Relations between FOM and GMRES . . . . . 6.5.6 Variation 1: Restarting . . . . . . . . . . . . . 6.5.7 Variation 2: Truncated GMRES Versions . . . . 6.6 The Symmetric Lanczos Algorithm . . . . . . . . . . . 6.6.1 The Algorithm . . . . . . . . . . . . . . . . . . 6.6.2 Relation with Orthogonal Polynomials . . . . . 6.7 The Conjugate Gradient Algorithm . . . . . . . . . . . 6.7.1 Derivation and Theory . . . . . . . . . . . . . 6.7.2 Alternative Formulations . . . . . . . . . . . . 6.7.3 Eigenvalue Estimates from the CG Coefcients 6.8 The Conjugate Residual Method . . . . . . . . . . . . 6.9 GCR, ORTHOMIN, and ORTHODIR . . . . . . . . . . 6.10 The Faber-Manteuffel Theorem . . . . . . . . . . . . . 6.11 Convergence Analysis . . . . . . . . . . . . . . . . . . 6.11.1 Real Chebyshev Polynomials . . . . . . . . . . 6.11.2 Complex Chebyshev Polynomials . . . . . . . 6.11.3 Convergence of the CG Algorithm . . . . . . . 6.11.4 Convergence of GMRES . . . . . . . . . . . . 6.12 Block Krylov Methods . . . . . . . . . . . . . . . . . Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

144
144 145 147 147 149 152 154 155 158 158 159 161 165 165 168 169 174 174 175 176 176 180 181 183 183 186 188 188 189 193 194 197 202

KRYLOV SUBSPACE METHODS PART II


7.1

205

Lanczos Biorthogonalization . . . . . . . . . . . . . . . . . . . . . . . 205

 

7.1.1 The Algorithm . . . . . . . . . . . . 7.1.2 Practical Implementations . . . . . . 7.2 The Lanczos Algorithm for Linear Systems . 7.3 The BCG and QMR Algorithms . . . . . . . 7.3.1 The Biconjugate Gradient Algorithm 7.3.2 Quasi-Minimal Residual Algorithm 7.4 Transpose-Free Variants . . . . . . . . . . . 7.4.1 Conjugate Gradient Squared . . . . 7.4.2 BICGSTAB . . . . . . . . . . . . . 7.4.3 Transpose-Free QMR (TFQMR) . . Exercises and Notes . . . . . . . . . . . . . . . .

METHODS RELATED TO THE NORMAL EQUATIONS


8.1 8.2 The Normal Equations . . . . . . . . . . . . . Row Projection Methods . . . . . . . . . . . 8.2.1 Gauss-Seidel on the Normal Equations 8.2.2 Cimminos Method . . . . . . . . . . 8.3 Conjugate Gradient and Normal Equations . . 8.3.1 CGNR . . . . . . . . . . . . . . . . . 8.3.2 CGNE . . . . . . . . . . . . . . . . . 8.4 Saddle-Point Problems . . . . . . . . . . . . . Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

PRECONDITIONED ITERATIONS
9.1 9.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . Preconditioned Conjugate Gradient . . . . . . . . . . . 9.2.1 Preserving Symmetry . . . . . . . . . . . . . . 9.2.2 Efcient Implementations . . . . . . . . . . . . 9.3 Preconditioned GMRES . . . . . . . . . . . . . . . . . 9.3.1 Left-Preconditioned GMRES . . . . . . . . . . 9.3.2 Right-Preconditioned GMRES . . . . . . . . . 9.3.3 Split Preconditioning . . . . . . . . . . . . . . 9.3.4 Comparison of Right and Left Preconditioning . 9.4 Flexible Variants . . . . . . . . . . . . . . . . . . . . . 9.4.1 Flexible GMRES . . . . . . . . . . . . . . . . 9.4.2 DQGMRES . . . . . . . . . . . . . . . . . . . 9.5 Preconditioned CG for the Normal Equations . . . . . . 9.6 The CGW Algorithm . . . . . . . . . . . . . . . . . . Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10 PRECONDITIONING TECHNIQUES
10.1 Introduction . . . . . . . . . . . . . . . 10.2 Jacobi, SOR, and SSOR Preconditioners 10.3 ILU Factorization Preconditioners . . . 10.3.1 Incomplete LU Factorizations . . 10.3.2 Zero Fill-in ILU (ILU(0)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 208 210 210 211 212 214 215 217 221 227

230
230 232 232 234 237 237 238 240 243

245
245 246 246 249 251 251 253 254 255 256 256 259 260 261 263

265
265 266 269 270 275

10.3.3 Level of Fill and ILU( ) . . . . . . . . . . . . 10.3.4 Matrices with Regular Structure . . . . . . . 10.3.5 Modied ILU (MILU) . . . . . . . . . . . . 10.4 Threshold Strategies and ILUT . . . . . . . . . . . . 10.4.1 The ILUT Approach . . . . . . . . . . . . . 10.4.2 Analysis . . . . . . . . . . . . . . . . . . . . 10.4.3 Implementation Details . . . . . . . . . . . . 10.4.4 The ILUTP Approach . . . . . . . . . . . . . 10.4.5 The ILUS Approach . . . . . . . . . . . . . . 10.5 Approximate Inverse Preconditioners . . . . . . . . . 10.5.1 Approximating the Inverse of a Sparse Matrix 10.5.2 Global Iteration . . . . . . . . . . . . . . . . 10.5.3 Column-Oriented Algorithms . . . . . . . . . 10.5.4 Theoretical Considerations . . . . . . . . . . 10.5.5 Convergence of Self Preconditioned MR . . . 10.5.6 Factored Approximate Inverses . . . . . . . . 10.5.7 Improving a Preconditioner . . . . . . . . . . 10.6 Block Preconditioners . . . . . . . . . . . . . . . . . 10.6.1 Block-Tridiagonal Matrices . . . . . . . . . . 10.6.2 General Matrices . . . . . . . . . . . . . . . 10.7 Preconditioners for the Normal Equations . . . . . . 10.7.1 Jacobi, SOR, and Variants . . . . . . . . . . . 10.7.2 IC(0) for the Normal Equations . . . . . . . . 10.7.3 Incomplete Gram-Schmidt and ILQ . . . . . Exercises and Notes . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

278 281 286 287 288 289 292 294 296 298 299 299 301 303 305 307 310 310 311 312 313 313 314 316 319

11 PARALLEL IMPLEMENTATIONS
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . 11.2 Forms of Parallelism . . . . . . . . . . . . . . . . . 11.2.1 Multiple Functional Units . . . . . . . . . . 11.2.2 Pipelining . . . . . . . . . . . . . . . . . . 11.2.3 Vector Processors . . . . . . . . . . . . . . 11.2.4 Multiprocessing and Distributed Computing 11.3 Types of Parallel Architectures . . . . . . . . . . . 11.3.1 Shared Memory Computers . . . . . . . . . 11.3.2 Distributed Memory Architectures . . . . . 11.4 Types of Operations . . . . . . . . . . . . . . . . . 11.4.1 Preconditioned CG . . . . . . . . . . . . . 11.4.2 GMRES . . . . . . . . . . . . . . . . . . . 11.4.3 Vector Operations . . . . . . . . . . . . . . 11.4.4 Reverse Communication . . . . . . . . . . 11.5 Matrix-by-Vector Products . . . . . . . . . . . . . 11.5.1 The Case of Dense Matrices . . . . . . . . 11.5.2 The CSR and CSC Formats . . . . . . . . . 11.5.3 Matvecs in the Diagonal Format . . . . . . 11.5.4 The Ellpack-Itpack Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

324
324 325 325 326 326 326 327 327 329 331 332 332 333 334 335 335 336 339 340

 

11.5.5 The Jagged Diagonal Format . . . . . . . . . . 11.5.6 The Case of Distributed Sparse Matrices . . . . 11.6 Standard Preconditioning Operations . . . . . . . . . . 11.6.1 Parallelism in Forward Sweeps . . . . . . . . . 11.6.2 Level Scheduling: the Case of 5-Point Matrices 11.6.3 Level Scheduling for Irregular Graphs . . . . . Exercises and Notes . . . . . . . . . . . . . . . . . . . . . .

12 PARALLEL PRECONDITIONERS
12.1 Introduction . . . . . . . . . . . . . . . . . . . . 12.2 Block-Jacobi Preconditioners . . . . . . . . . . . 12.3 Polynomial Preconditioners . . . . . . . . . . . . 12.3.1 Neumann Polynomials . . . . . . . . . . 12.3.2 Chebyshev Polynomials . . . . . . . . . . 12.3.3 Least-Squares Polynomials . . . . . . . . 12.3.4 The Nonsymmetric Case . . . . . . . . . 12.4 Multicoloring . . . . . . . . . . . . . . . . . . . 12.4.1 Red-Black Ordering . . . . . . . . . . . . 12.4.2 Solution of Red-Black Systems . . . . . . 12.4.3 Multicoloring for General Sparse Matrices 12.5 Multi-Elimination ILU . . . . . . . . . . . . . . . 12.5.1 Multi-Elimination . . . . . . . . . . . . . 12.5.2 ILUM . . . . . . . . . . . . . . . . . . . 12.6 Distributed ILU and SSOR . . . . . . . . . . . . 12.6.1 Distributed Sparse Matrices . . . . . . . . 12.7 Other Techniques . . . . . . . . . . . . . . . . . 12.7.1 Approximate Inverses . . . . . . . . . . . 12.7.2 Element-by-Element Techniques . . . . . 12.7.3 Parallel Row Projection Preconditioners . Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 DOMAIN DECOMPOSITION METHODS


13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . 13.1.2 Types of Partitionings . . . . . . . . . . . . . . . . 13.1.3 Types of Techniques . . . . . . . . . . . . . . . . . 13.2 Direct Solution and the Schur Complement . . . . . . . . . 13.2.1 Block Gaussian Elimination . . . . . . . . . . . . 13.2.2 Properties of the Schur Complement . . . . . . . . 13.2.3 Schur Complement for Vertex-Based Partitionings . 13.2.4 Schur Complement for Finite-Element Partitionings 13.3 Schwarz Alternating Procedures . . . . . . . . . . . . . . . 13.3.1 Multiplicative Schwarz Procedure . . . . . . . . . 13.3.2 Multiplicative Schwarz Preconditioning . . . . . . 13.3.3 Additive Schwarz Procedure . . . . . . . . . . . . 13.3.4 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

 

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

341 342 345 346 346 347 350

353
353 354 356 356 357 360 363 365 366 367 368 369 370 371 374 374 376 377 377 379 380

383
383 384 385 386 388 388 389 390 393 395 395 400 402 404

13.4 Schur Complement Approaches . . . . . . . . . . . . . . . 13.4.1 Induced Preconditioners . . . . . . . . . . . . . . . 13.4.2 Probing . . . . . . . . . . . . . . . . . . . . . . . 13.4.3 Preconditioning Vertex-Based Schur Complements 13.5 Full Matrix Methods . . . . . . . . . . . . . . . . . . . . . 13.6 Graph Partitioning . . . . . . . . . . . . . . . . . . . . . . 13.6.1 Basic Denitions . . . . . . . . . . . . . . . . . . 13.6.2 Geometric Approach . . . . . . . . . . . . . . . . 13.6.3 Spectral Techniques . . . . . . . . . . . . . . . . . 13.6.4 Graph Theory Techniques . . . . . . . . . . . . . . Exercises and Notes . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

408 408 410 411 412 414 414 415 417 418 422

REFERENCES INDEX

425 439

 

Iterative methods for solving general, large sparse linear systems have been gaining popularity in many areas of scientic computing. Until recently, direct solution methods were often preferred to iterative methods in real applications because of their robustness and predictable behavior. However, a number of efcient iterative solvers were discovered and the increased need for solving very large linear systems triggered a noticeable and rapid shift toward iterative techniques in many applications. This trend can be traced back to the 1960s and 1970s when two important developments revolutionized solution methods for large linear systems. First was the realization that one can take advantage of sparsity to design special direct methods that can be quite economical. Initiated by electrical engineers, these direct sparse solution methods led to the development of reliable and efcient general-purpose direct solution software codes over the next three decades. Second was the emergence of preconditioned conjugate gradient-like methods for solving linear systems. It was found that the combination of preconditioning and Krylov subspace iterations could provide efcient and simple generalpurpose procedures that could compete with direct solvers. Preconditioning involves exploiting ideas from sparse direct solvers. Gradually, iterative methods started to approach the quality of direct solvers. In earlier times, iterative methods were often special-purpose in nature. They were developed with certain applications in mind, and their efciency relied on many problem-dependent parameters. Now, three-dimensional models are commonplace and iterative methods are almost mandatory. The memory and the computational requirements for solving threedimensional Partial Differential Equations, or two-dimensional ones involving many degrees of freedom per point, may seriously challenge the most efcient direct solvers available today. Also, iterative methods are gaining ground because they are easier to implement efciently on high-performance computers than direct methods. My intention in writing this volume is to provide up-to-date coverage of iterative methods for solving large sparse linear systems. I focused the book on practical methods that work for general sparse matrices rather than for any specic class of problems. It is indeed becoming important to embrace applications not necessarily governed by Partial Differential Equations, as these applications are on the rise. Apart from two recent volumes by Axelsson [15] and Hackbusch [116], few books on iterative methods have appeared since the excellent ones by Varga [213]. and later Young [232]. Since then, researchers and practitioners have achieved remarkable progress in the development and use of effective iterative methods. Unfortunately, fewer elegant results have been discovered since the 1950s and 1960s. The eld has moved in other directions. Methods have gained not only in efciency but also in robustness and in generality. The traditional techniques which required

rather complicated procedures to determine optimal acceleration parameters have yielded to the parameter-free conjugate gradient class of methods. The primary aim of this book is to describe some of the best techniques available today, from both preconditioners and accelerators. One of the aims of the book is to provide a good mix of theory and practice. It also addresses some of the current research issues such as parallel implementations and robust preconditioners. The emphasis is on Krylov subspace methods, currently the most practical and common group of techniques used in applications. Although there is a tutorial chapter that covers the discretization of Partial Differential Equations, the book is not biased toward any specic application area. Instead, the matrices are assumed to be general sparse, possibly irregularly structured. The book has been structured in four distinct parts. The rst part, Chapters 1 to 4, presents the basic tools. The second part, Chapters 5 to 8, presents projection methods and Krylov subspace techniques. The third part, Chapters 9 and 10, discusses preconditioning. The fourth part, Chapters 11 to 13, discusses parallel implementations and parallel algorithms.

I am grateful to a number of colleagues who proofread or reviewed different versions of the manuscript. Among them are Randy Bramley (University of Indiana at Bloomingtin), Xiao-Chuan Cai (University of Colorado at Boulder), Tony Chan (University of California at Los Angeles), Jane Cullum (IBM, Yorktown Heights), Alan Edelman (Massachussett Institute of Technology), Paul Fischer (Brown University), David Keyes (Old Dominion University), Beresford Parlett (University of California at Berkeley) and Shang-Hua Teng (University of Minnesota). Their numerous comments, corrections, and encouragements were a highly appreciated contribution. In particular, they helped improve the presentation considerably and prompted the addition of a number of topics missing from earlier versions. This book evolved from several successive improvements of a set of lecture notes for the course Iterative Methods for Linear Systems which I taught at the University of Minnesota in the last few years. I apologize to those students who used the earlier error-laden and incomplete manuscripts. Their input and criticism contributed signicantly to improving the manuscript. I also wish to thank those students at MIT (with Alan Edelman) and UCLA (with Tony Chan) who used this book in manuscript form and provided helpful feedback. My colleagues at the university of Minnesota, staff and faculty members, have helped in different ways. I wish to thank in particular Ahmed Sameh for his encouragements and for fostering a productive environment in the department. Finally, I am grateful to the National Science Foundation for their continued nancial support of my research, part of which is represented in this work. Yousef Saad

) (   % # !      '&$" 



This book can be used as a text to teach a graduate-level course on iterative methods for linear systems. Selecting topics to teach depends on whether the course is taught in a mathematics department or a computer science (or engineering) department, and whether the course is over a semester or a quarter. Here are a few comments on the relevance of the topics in each chapter. For a graduate course in a mathematics department, much of the material in Chapter 1 should be known already. For non-mathematics majors most of the chapter must be covered or reviewed to acquire a good background for later chapters. The important topics for the rest of the book are in Sections: 1.8.1, 1.8.3, 1.8.4, 1.9, 1.11. Section 1.12 is best treated at the beginning of Chapter 5. Chapter 2 is essentially independent from the rest and could be skipped altogether in a quarter course. One lecture on nite differences and the resulting matrices would be enough for a non-math course. Chapter 3 should make the student familiar with some implementation issues associated with iterative solution procedures for general sparse matrices. In a computer science or engineering department, this can be very relevant. For mathematicians, a mention of the graph theory aspects of sparse matrices and a few storage schemes may be sufcient. Most students at this level should be familiar with a few of the elementary relaxation techniques covered in Chapter 4. The convergence theory can be skipped for non-math majors. These methods are now often used as preconditioners and this may be the only motive for covering them. Chapter 5 introduces key concepts and presents projection techniques in general terms. Non-mathematicians may wish to skip Section 5.2.3. Otherwise, it is recommended to start the theory section by going back to Section 1.12 on general denitions on projectors. Chapters 6 and 7 represent the heart of the matter. It is recommended to describe the rst algorithms carefully and put emphasis on the fact that they generalize the one-dimensional methods covered in Chapter 5. It is also important to stress the optimality properties of those methods in Chapter 6 and the fact that these follow immediately from the properties of projectors seen in Section 1.12. When covering the algorithms in Chapter 7, it is crucial to point out the main differences between them and those seen in Chapter 6. The variants such as CGS, BICGSTAB, and TFQMR can be covered in a short time, omitting details of the algebraic derivations or covering only one of the three. The class of methods based on the normal equation approach, i.e., Chapter 8, can be skipped in a math-oriented course, especially in the case of a quarter system. For a semester course, selected topics may be Sections 8.1, 8.2, and 8.4. Currently, preconditioning is known to be the critical ingredient in the success of iterative methods in solving real-life problems. Therefore, at least some parts of Chapter 9 and Chapter 10 should be covered. Section 9.2 and (very briey) 9.3 are recommended. From Chapter 10, discuss the basic ideas in Sections 10.1 through 10.3. The rest could be skipped in a quarter course. Chapter 11 may be useful to present to computer science majors, but may be skimmed or skipped in a mathematics or an engineering course. Parts of Chapter 12 could be taught primarily to make the students aware of the importance of alternative preconditioners. Suggested selections are: 12.2, 12.4, and 12.7.2 (for engineers). Chapter 13 presents an im-

   (    ( )  # ) # ) #



portant research area and is primilarily geared to mathematics majors. Computer scientists or engineers may prefer to cover this material in less detail. To make these suggestions more specic, the following two tables are offered as sample course outlines. Numbers refer to sections in the text. A semester course represents approximately 30 lectures of 75 minutes each whereas a quarter course is approximately 20 lectures of 75 minutes each. Different topics are selected for a mathematics course and a non-mathematics course.

Weeks 13

46

79

10 12

13 15

Weeks 12 34 56 78 9 10


Semester course Mathematics Computer Science / Eng. 1.9 1.13 1.1 1.6 (Read) 2.1 2.5 1.7 1.13, 2.1 2.2 3.1 3.3, 3.7 3.1 3.7 4.1 4.3 4.1 4.2 5. 1 5.4 5.1 5.2.1 6.1 6.3 6.1 6.3 6.4 6.7 (Except 6.5.2) 6.4 6.5 (Except 6.5.5) 6.9 6.11 6.7.1, 6.86.9, 6.11.3. 7.1 7.3 7.1 7.3 7.4.1; 7.4.2 7.4.3 (Read) 7.4.1; 7.4.2 7.4.3 (Read) 8.1, 8.2, 8.4; 9.1 9.3 8.1 8.3; 9.1 9.3 10.1 10.3 10.1 10.4 10.5.1 10.5.6 10.5.1 10.5.4 10.6 ; 12.2 12.4 11.1 11.4 (Read); 11.5 11.6 13.1 13.6 12.1 12.2; 12.4 12.7 Quarter course Mathematics Computer Science / Eng. 1.9 1.13, 3.1 3.2 1.1 1.6 (Read); 3.1 3.7 4.1 4.3 4.1 5.1 5.4 5.1 5.2.1 6.1 6.4 6.1 6.3 6.4 6.7 (Except 6.5.2) 6.4 6.5 (Except 6.5.5) 6.11, 7.1 7.3 6.7.1, 6.11.3, 7.1 7.3 7.4.1; 7.4.2 7.4.3 (Read) 7.4.1; 7.4.2 7.4.3 (Read) 9.1 9.3; 10.1 10.3 9.1 9.3; 10.1 10.3 10.6 ; 12.2 12.4 11.1 11.4 (Read); 11.5 11.6 13.1 13.4 12.1 12.2; 12.4 12.7

)1)uCYyG ~ 1)1u`d H  H H  h~  z  ~ i1)1}d ~ 111)dhYp  !~  h!~


, where , and are matrices of size

For the sake of generality, all vector spaces considered in this chapter are complex, unless otherwise stated. A complex matrix is an array of complex numbers

f6 54YiF"i9YsF38C"b`76i&bYF P$5470)4iFx 4 E 0 & 0 4 %tY3F)FD16C"4 g g 4 gE E $ 16T"Di9HE%$zTf361AEvv3C"4 $38F%9 AW16C"yA8 qFDR H38CF16C"4SYrF& 5e&i9 AHEAW%$iF10T"C"3E& T4$ gS3f3W0 6 54Sa4 )E70 V6C&)8SoHE$e$vkg3W V65RH8%$2 )W16g `TvtsV" 6 F e68 %8xAEAf2 1W VpW%8xgC&00Y5II 6 )8$rFs476t544 18e6321210)EVF)"V6)8C8gX&2' g$%166`C"C"1'v)R1E4ue6HEb)8$ dr6C16i&q65B18AgEY160`TF)BC'@CF' e61'k4gi&f 3W453g0 AE$Y)8f r0sF76H4XS380A88 54)FV6 vHF2 16@#r&"$16C& 5B6 `t(&W10DFW" 9i3HE)"t'$ rFhYSCF"%$4 & $P'&I 66 p 4 4 F 9s 'w6`'P4P$FE5w4r32%6 W3P0e6P$sf1Ef54 q65I18iFpxoE6 @V6$ %83nv0AW0 r&P$54H4)2%8`61'0 E1RCFY76E FC&$16i4#m" C0 P$T4`tV6e&)8Ar'&iF$T"#SC"166T4 (4 le632rVFk84r5jER Yi9F P$HEP$54& )0i$8%50h1W0T&q68D'6` $' C8P$541&Sw1V3E%8g5Ii&rFYF4E $`4iV610%8`0g e69 FX 010f b 0 0 e5W))E16W `63WTB%6610cF A16P$43F5B$10%F1dT&8"Dv0 76C&i&tV6 V0H83W%8%$101fAEg e6`b3W0 9V6768 C&0 e&%816AC0T"iF16#C"4 HE$T438S$R6W F HEDW 163W1B `0$3Ei4YvV0' 0 f )8i$& P$r6'`i&4 6%$# 3WC2)RUF&10AE`0& P$673Wi&540`)E3W' r6$10C&i43EV6S00 8%Db62 )`16r&1'T"54)Ei4)2%6S`60 s9)81'E 165BHECFDCFbe616XC"' e9`1664 #)R"vGFF5Iiet5%EAEvS%FF30 )RfF EU $ Y'E W W Y 9 V0 516V0%8S1f0 C"xe64 549 3W' gP$di&0 iF10Y%"8 16f C0i9g)RHEVF$y6H4T"38F cETIb%$$ `a&TB7654V6i&q`68)21'cYFE0 E C"P$TF454X70' #5414xAEI iFi&THEE10B A$6 H)WT9%8V6e6r0U)Ew164SSt4v)`6&s1R)W)8Q541632PIT"610$%4DH)"B )8S16' C&6`)8)B1'r6G0i4FAqDpE0AEW sFC854@C&$ pEB16$ 3WagA$1TR@AE9 76i&)8h7h54g0 6 HA8V632%8TF10cC"160 )"(c')"' 4 & %$%$ SC8"$#I4 W g $ 8 f54 6 uT" F 6 67 g "

The set of all matrices is a complex vector space denoted by operations with matrices are the following:

!         

Addition:

and . The main

)  ( % 

{i}{ |

Multiplication by a scalar:

Multiplication by another matrix:

where

Sometimes, a notation with column vectors and row vectors is used. The column vector is the vector consisting of the -th column of ,

Similarly, the notation

For example, the following could be written

or

The transpose of a matrix in is a matrix in whose elements are dened by . It is denoted by . It is often more relevant to use the transpose conjugate matrix denoted by and dened by

in which the bar denotes the (element-wise) complex conjugation. Matrices are strongly related to linear mappings between vector spaces of nite dimension. This is because they represent these mappings with respect to two given bases: one for the initial vector space and the other for the image vector space, or range of .

)11zCYG ~ 11)u`@ H q p    u @  u n 
will denote the -th row of the matrix

G G R I Q IP GH ubt ~ i)11Y zd i1)1Y (P T p 2 A 5

   
, where , and

C D C1)116 C 0 w%2 B

C 2 T1)1) E62 T 0 2 w% B

0 ' ' 1)( &' H #$j% #$bj" zj! (

6 9 2F7 4 99 8 2 0 344

. . .

@8 0 99

6 7

344

 2


2 

A matrix is square if it has the same number of columns and rows, i.e., if important square matrix is the identity matrix


. An

where is the Kronecker symbol. The identity matrix satises the equality for every matrix of size . The inverse of a matrix, when it exists, is a matrix

such that

The inverse of is denoted by . The determinant of a matrix may be dened in several ways. For simplicity, the folmatrix is dened lowing recursive denition is used here. The determinant of a as the scalar . Then the determinant of an matrix is given by

where is an matrix obtained by deleting the rst row and the -th column of . A matrix is said to be singular when and nonsingular otherwise. We have the following simple properties: . .
7 8 1 ) 20( 1 ) 90( 1 ) 90( 1 ) 90( 1 ) 90(

From the above denition of determinants it can be shown by induction that the function that maps a given complex value to the value is a polynomial of degree ; see Exercise 8. This is known as the characteristic polynomial of the matrix .

A complex scalar is called an eigenvalue of the square matrix if a nonzero vector of exists such that . The vector is called an eigenvector of associated with . The set of all the eigenvalues of is called the spectrum of and is denoted by .
A scalar is an eigenvalue of if and only if . That is true if and only if (iff thereafter) is a root of the characteristic polynomial. In particular, there are at most distinct eigenvalues. It is clear that a matrix is singular if and only if it admits zero as an eigenvalue. A well known result in linear algebra is stated in the following proposition.

A matrix

is nonsingular if and only if it admits an inverse.

wB

7 a

@ 3

A B B

~ C B C B t q ~ B C B C C C B C B C B  CC Q1 BB G C } B B C ( BB C B C ~B  C ~B 0 C 0 B 0 0 C B 1 ( & C B 0 ~ h~

Y `

1 ) 26(

@ 3

1 ) 26(

)   " #    

A B

1 ) 20(

( x

" %%%" $$$

V U@

0 " !  1( #   

 $ 
5 4

% H

V W

3 

u n u z d $  d 
0

! $ ) ( %  )  
& '

20( 1 )

1 ) 90(

T U

1 ) 20(

T U

H QF F Q SRI cbP` ` Q

SRFP I#GDE C H Q F HF

V G S

 90( 1 )

 
 1 ) 90( 1 ) 90(

| }{ p

Thus, the determinant of a matrix determines whether or not the matrix admits an inverse. The maximum modulus of the eigenvalues is called spectral radius and is denoted by

The trace of a matrix is equal to the sum of all its diagonal elements

It can be easily shown that the trace of is also equal to the sum of the eigenvalues of counted with their multiplicities as roots of the characteristic polynomial.

When a distinction is necessary, an eigenvector of is often called a right eigenvector. Therefore, the eigenvalue as well as the right and left eigenvectors, and , satisfy the relations

or, equivalently,
V

The choice of a method for solving linear systems will often depend on the structure of the matrix . One of the most important properties of matrices is symmetry, because of its impact on the eigenstructure of . A number of other classes of matrices also have particular eigenstructures. The most important ones are listed below: Symmetric matrices:

Hermitian matrices:

Skew-symmetric matrices:

Skew-Hermitian matrices:

Normal matrices:

Nonnegative matrices: positive, and negative matrices). Unitary matrices:




(similar denition for nonpositive,

!Q  I I Q I I I I

0 1I 0 ~i)11YGa`gIp X)H (H ( I H I HG H I H G

3 

VU@

VU@

)  ( %  )  

I H

eigenvector of

If is an eigenvalue of , then is an eigenvalue of . An associated with the eigenvalue is called a left eigenvector of .

I H

   

Q@

 B    C
0 q 1 ( & C B 
Q@
@

u @  u n 
1 @

U SRFP cbP` ` T H Q F Q Q

q G S

&'{ |

$ %"#(

It is worth noting that a unitary matrix , since

is a matrix whose inverse is its transpose conjugate

A matrix such that is diagonal is often called orthogonal. Some matrices have particular structures that are often convenient for computational purposes. The following list, though incomplete, gives an idea of these special matrices which play an important role in numerical analysis and scientic computing applications. Diagonal matrices: for
(

. Notation:

Upper triangular matrices:

for for

Lower triangular matrices:

Upper bidiagonal matrices:

for for

or or

. .

Lower bidiagonal matrices:

Tridiagonal matrices:

for any pair


" # (

such that
4 # "

. Notation:

Banded matrices: only if nonnegative integers. The number

, where and are two is called the bandwidth of . such that . Lower

Upper Hessenberg matrices: for any pair Hessenberg matrices can be dened similarly.
V V

Outer product matrices:

, where both

and

are vectors.

Permutation matrices: the columns of identity matrix.

are a permutation of the columns of the

Block diagonal matrices: generalizes the diagonal matrix by replacing each diagonal entry by a matrix. Notation:

Block tridiagonal matrices: generalizes the tridiagonal matrix by replacing each nonzero entry by a square matrix. Notation:

The above properties emphasize structure, i.e., positions of the nonzero elements with respect to the zeros. Also, they assume that there are many zero elements or that the matrix is of low rank. This is in contrast with the classications listed earlier, such as symmetry or normality.

 I  &  a` H % ! D"# ! p % " ! " %  # $"  H C 0 TTVT 0 kB      Q  !  a` (Hz s s XHz   s s H    l p   p C  C1)11E67C 0 0 kB   6 

3 P

C 0 V 0 B ( ' ) 

C A i)11 6 6 0 0 !( ' B 

3 G

4 "

I0 0
0
& & &

"

 

3 B

a 7

a 7

R 1 (

7 8

0I0
7 8 8 7

   

0 1I 0

u d

IP

as can be readily shown. A useful relation satised by any inner product is the so-called Cauchy-Schwartz inequality:

7 8

"

7 a

7 8

"

gB

Note that (2) implies that is real and therefore, (3) adds the constraint that must also be positive for any nonzero . For any and ,

z C g B ( C xB C g B C z B C pB z C x B p C z g B C g B xB
7 a

z 
7



p
7

 

gB

6 @ 0 @



 "

 

x C g B C B C 0 B 0 C Y6
 

C B 6
   

6 @

) %  

!  $$ )( ! 

   

An inner product on a (complex) vector space

is any mapping from

C gB

u @  u n 

Similarly, for any . Hence, the condition (3) can be rewritten as

which yields the result (1.2). In the particular case of the vector space , a canonical inner product is the Euclidean inner product. The Euclidean inner product of two vectors and
 

"$$$ %%%%"

 

 

"

 "

@ 3


@ 3


"

# 7 C

@ 3

@ 3

 "

$@

 

'@

 

y B C g B
7
(  

@ 3

@ 3

@ P 3

@ P 3

&

01( g % B C C xB B C x B C C 6 C  B  l C x B C B C B 6  g B  6 g B  ( C y B  C B 6   C B C g B Q C g B C C B C B C g B # 6 C x B 
 " % 

a 7

 "

# 7

C C C

x B

$#

x B

 !

6 @

0 @B

x B

 

| {

$  "    (


If

The proof of this inequality begins by expanding ,

which satises the following conditions:

then the inequality is trivially satised. Assume that . Then shows the above equality is positive denite, i.e., is Hermitian, i.e., is linear with respect to , i.e., and iff

for any and . In particular

using the properties of

and take into

of

is dened by

which is often rewritten in matrix notation as


 "

It is easy to verify that this mapping does indeed satisfy the three conditions required for inner products, listed above. A fundamental property of the Euclidean inner product in matrix computations is the simple relation

The proof of this is straightforward. The adjoint of with respect to an arbitrary inner product is a matrix such that for all pairs of vectors and . A matrix is self-adjoint, or Hermitian with respect to this inner product, if it is equal to its adjoint. The following proposition is a consequence of the equality (1.5).

Unitary matrices preserve the Euclidean inner product, i.e.,

for any unitary matrix


Indeed,

and any vectors


"

and .
"

A vector norm on a vector space is a real-valued function satises the following three conditions: and iff .
7 a

on

, which

For the particular case when , we can associate with the inner product (1.3) the Euclidean norm of a complex vector dened by

It follows from Proposition 1.3 that a unitary matrix preserves the Euclidean norm metric, i.e.,

The linear transformation associated with a unitary matrix is therefore an isometry. The most commonly used vector norms in numerical linear algebra are special cases of the H lder norms o

t t t t

  

C C xB g B S x C I g % C x B B

0 # #   1 ( &   0

g 36   X6  0 



PYQ y 1 ( & C x B 0
C

6 0 C



 

x I C gB

B g % C 0 x 0 B




0 g B C 1I 0 g B C 0 g 0 B

a 7

x B X6   t #   x                z ( 

n d
  



 

   
 

j 01( C B G u
U T


H QF F Q SRI cbP` ` Q

" %%%" $$$

V G S

 '#

 

"



Note that the limit of when tends to innity exists and is equal to the maximum modulus of the s. This denes a norm denoted by . The cases , , and lead to the most important norms in practice,

The Cauchy-Schwartz inequality of (1.2) becomes




For a general matrix

in

, we dene the following special set of norms

and

iff

The most important cases are again those associated with . The case is of particular interest and the associated norm is simply denoted by and called a -norm. A fundamental property of a -norm is that

an immediate consequence of the denition (1.7). Matrix norms that satisfy the above property are sometimes called consistent. A result of consistency is that for any square matrix ,

In particular the matrix converges to zero if any of its -norms is less than 1. The Frobenius norm of a matrix is dened by

This can be viewed as the 2-norm of the column (or row) vector in consisting of all the columns (respectively rows) of listed from to (respectively to .) It can be shown
3 4

 

~ X

) 0

0 A  H  10 ( & 1( & 35  

7 8

 6 0 8

 

The norm is induced by the two norms properties of norms, i.e.,


 

and

. These norms satisfy the usual

#   #   # #  (  #   uCY l u  #  zyj             zS   p (   # #       #   ( #

36   6   #  C x B  10 (  %   0 X6  6 6 0 6   e 6   6      6   0  0 
 

   
 

'#  





& $ '% " #!  "

u @  u n 
#


 ' 

" %%%" $$$



 

'

 

)    %

 

 

( %

| {

that this norm is also consistent, in spite of the fact that it is not induced by a pair of vector norms, i.e., it is not derived from a formula of the form (1.7); see Exercise 5. However, it does not satisfy some of the other properties of the -norms. For example, the Frobenius norm of the identity matrix is not equal to one. To avoid these difculties, we will only use the term matrix norm for a norm that is induced by two norms as in the denition (1.7). Thus, we will not consider the Frobenius norm to be a proper matrix norm, according to our conventions, even though it is consistent. The following equalities satised by the matrix norms dened above lead to alternative denitions that are often easier to work with:

As will be shown later, the eigenvalues of are nonnegative. Their square roots are called singular values of and are denoted by . Thus, the relation (1.11) states that is equal to , the largest singular value of .

From the relation (1.11), it is clear that the spectral radius is equal to the 2-norm of a matrix when the matrix is Hermitian. However, it is not a matrix norm in general. For example, the rst property of norms is not satised, since for

we have and

while . Also, the triangle inequality is not satised for the pair , where is dened above. Indeed, while

A subspace of is a subset of that is also a complex vector space. The set of all linear combinations of a set of vectors of is a vector subspace called the linear span of ,

&

0 i) 11Y P DP I C I ( B  C I B   6 60 0 C I X B 6 6 0 0 C I 0B 6   H  0 ( 1( &  0  H  1 ( & 10 ( 0 

p C B C B

Ti)11 6 T 0 e T111) 6 T 0 e

 7

   

7 

 


%%%%" "$$$

%%%%" "$$$

C  B G C B

!

1 1

n e e d z p    
 ( & 0 '%
!

#  '



( & ! $  ) '%

6

!)

7 a

 )

T  U 

"#

| }{

 

If the s are linearly independent, then each vector of admits a unique expression as a linear combination of the s. The set is then called a basis of the subspace . Given two vector subspaces and , their sum is a subspace dened as the set of all vectors that are equal to the sum of a vector of and a vector of . The intersection of two subspaces is also a subspace. If the intersection of and is reduced to , then the sum of and is called their direct sum and is denoted by . When is equal to , then every vector of can be written in a unique way as the sum of an element of and an element of . The transformation that maps into is a linear transformation that is idempotent, i.e., such that . It is called a projector along . onto Two important subspaces that are associated with a matrix of are its range, dened by

The range of is clearly equal to the linear span of its columns. The rank of a matrix is equal to the dimension of the range of , i.e., to the number of linearly independent columns. This column rank is equal to the row rank, the number of linearly independent rows of . A matrix in is of full rank when its rank is equal to the smallest of and . A subspace is said to be invariant under a (square) matrix whenever . In particular for any eigenvalue of the subspace is invariant under . The subspace is called the eigenspace associated with and consists of all the eigenvectors of associated with , in addition to the zero-vector.

It is orthonormal if, in addition, every vector of has a 2-norm equal to unity. A vector that is orthogonal to all the vectors of a subspace is said to be orthogonal to this subspace. The set of all the vectors that are orthogonal to is a vector subspace called the orthogonal complement of and denoted by . The space is the direct sum of and its orthogonal complement. Thus, any vector can be written in a unique fashion as the sum of a vector in and a vector in . The operator which maps into its component in the subspace is the orthogonal projector onto .

Y

( ) '%a & $ 7

( )

 (

A set of vectors

is said to be orthogonal if

   

@ C

@ 3

7 ! 8

) ) 

B 

$$ '$(  ) !  ) 

C TTwB C1)117T 0 % 6

! "#

and its kernel or null space

! 7 

6 

  0 


6 

6 

" %%%" $$$

 ( & ! $ B0 '%

   

0 1( S 01( %   & uY u d   n 

 S C B    C B 

0 

0 

6 

6 

0 

( )


  #  '   (


6 

0 

6 

B 

0 

| {

( & ! $  ) #%

0 

Every subspace admits an orthonormal basis which is obtained by taking any basis and orthonormalizing it. The orthonormalization can be achieved by an algorithm known as the Gram-Schmidt process which we now describe. Given a set of linearly independent vectors , rst normalize the vector , which means divide it by its 2norm, to obtain the scaled vector of norm unity. Then is orthogonalized against the vector by subtracting from a multiple of to make the resulting vector orthogonal to , i.e.,

The resulting vector is again normalized to yield the second vector . The -th step of the Gram-Schmidt process consists of orthogonalizing the vector against all previous vectors .

, for

4.

5. 6. If 7. EndDo

, then Stop, else

It is easy to prove that the above algorithm will not break down, i.e., all steps will be completed if and only if the set of vectors is linearly independent. From lines 4 and 5, it is clear that at every step of the algorithm the following relation holds:

This is called the QR decomposition of the matrix . From what was said above, the QR decomposition of a matrix exists whenever the column vectors of form a linearly independent set of vectors. The above algorithm is the standard Gram-Schmidt process. There are alternative formulations of the algorithm which have better numerical properties. The best known of these is the Modied Gram-Schmidt (MGS) algorithm.

1. Dene 2. For

. If Do:

Stop, else

' 6 '

' 0 0 ' ( 0 ) 0 ) 0 0 6  0  E0 ' 7 8 E0 ' % "     $B " $ 8 T &I$H!G) F!E# DCA @9% & U

If , matrix whose nonzero elements are the can be written as

, and if denotes the upper triangular dened in the algorithm, then the above relation

'

 w

"

)1)1 6 0

3 P

1. Compute 2. For 3. Compute

. If

Stop, else compute

Do:

z
00

'(0

7 7 7 ) H z ri1)1u`@ H i)11zG q G

0 C0 6

    
) 0 )

'  7U~ 5 0 0 H ' 1 3 3 4" )11) 6 0 2 0 4" )11) 6 0 12 0 H ' 1( &

B 3 6

' ( ' 7 a ) ( ) 6  ()  ' 0 1( ) ' & ) 3 ) () C 0 ' ) B % ' 00' 7 6 0  00' &$ #!) #  U F % "      T

0 )

@    n  

0 )

! "

i)1 1zG q G

i1)1Y6 0

Yet another alternative for orthogonalizing a sequence of vectors is the Householder algorithm. This technique uses Householder reectors, i.e., matrices of the form

in which is a vector of 2-norm unity. Geometrically, the vector represents a mirror image of with respect to the hyperplane . To describe the Householder orthogonalization process, the problem can be formulated as that of nding a QR factorization of a given matrix . For any vector , the vector for the Householder transformation (1.15) is selected in such a way that

where

is a scalar. Writing

yields

This shows that the desired

is a multiple of the vector

For (1.16) to be satised, we must impose the condition

In order to avoid that the resulting vector

be small, it is customary to take

which yields

Given an matrix, its rst column can be transformed to a multiple of the column , by premultiplying it by a Householder matrix ,

Assume, inductively, that the matrix

has been transformed in

successive steps into

3 

0G

which gives of the vector . Therefore, it is necessary that




, where

is the rst component

0  3

0  0 

6 00 66

6 0 6   3

A6   C 0 B    36    0  l 66   C 0 6  h GC0  B

0  Q G C 0 h G  l 0 h  

G l

 0  3 0  3

( ! B) #%  ( &

0 

 C 0 B(   %  0 B(   %

~ 

 0 % 3 (

3 

00

 Q

3. Dene 4. For , Do: 5. 6. 7. EndDo 8. Compute , 9. If then Stop, else 10. EndDo
)

   

uY u d   n 

7 ' ( ' 5 6  (  7 ' D7 C H ' ( B ( ( '  % ( H ai)1) @


Y


3 

3 I

  3

) 0 )


)

3 )

0 B 6

 D}~

7 a

the partially upper triangular form

This matrix is upper triangular up to column number . To advance by one step, it must be transformed into one which is upper triangular up the -th column, leaving the previous columns in the same form. To leave the rst columns unchanged, select a vector which has zeros in positions through . So the next Householder reector matrix is dened as

where the components of the vector are given by

with

where

Therefore, the Householder matrices need not, and should not, be explicitly formed. In addition, the vectors need not be explicitly scaled. Assume now that Householder transforms have been applied to a certain matrix

b G 0 

We note in passing that the premultiplication of a matrix form requires only a rank-one update since,

by a Householder trans-

&

&

 6 0

' C  ' ( &   ''

if if if

6 

in which the vector

is dened as

. . .

. . .

. . .
"

99

0 '

"

 

'" 0 4 '

'"

99

..

. . . . . .

99@8 9 99 99

  

  

''

  

3 

G' '

    

3 

' q

3 

E6 6 60

3 

'

 0 % (

B  

@    n  
'  E0 0

344

44

44

44

44

44

0C G

0 0 0  1) 0

3 

'

&

' 

'

of dimension

, to reduce it into the upper triangular form,

Recall that our initial goal was to obtain a QR factorization of . We now wish to recover the and matrices from the s and the above matrix. If we denote by the product of the on the left-side of (1.22), then (1.22) becomes

in which is an upper triangular matrix, and is an Since is unitary, its inverse is equal to its transpose and, as a result,


zero block.

If is the matrix of size which consists of the rst matrix, then the above equality translates into

columns of the identity

G'

'

1. Dene 2. For Do: 3. If compute 4. Compute using (1.19), (1.20), (1.21) 5. Compute with 6. Compute 7. EndDo

in which is the triangular matrix obtained from the Householder reduction of (1.22) and (1.23)) and

 "8 %  " 8 8 % 6G!$ 8 GB 8   B

&

 1)#6  0  0

11 6 ' '  ' ' '  ' ' ' 0  )1 6 '  0 '  '' 111)  3 i)1) 0 1 0 V G

3 

&

' ' 

&

 0 

U T

'

and

are the matrices sought. In summary,

The matrix

represents the

rst columns of


. Since

&

(see

 C ~B

. . . . . .

& &

99

..

9
99
. . .
" 7

99 @8 99 99

  

   

5 0 0 G   G 0 G 0  5G 0

&

116  0 

60

66

uY u d   n 


00

344 44 44 44 44 4 5

00

' 

! ~

G 

)1#6  0 

&

 

GP 0

h~ 
&

 0

QW

0
0

Note that line 6 can be omitted since the are not needed in the execution of the next steps. It must be executed only when the matrix is needed at the completion of the algorithm. Also, the operation in line 5 consists only of zeroing the components and updating the -th component of . In practice, a work vector can be used for and its nonzero components after this step can be saved into an upper triangular matrix. Since the components 1 through of the vector are zero, the upper triangular matrix can be saved in those zero locations which would otherwise be unused.

This section discusses the reduction of square matrices into matrices that have simpler forms, such as diagonal, bidiagonal, or triangular. Reduction means a transformation that preserves the eigenvalues of a matrix.

Two matrices

and

are said to be similar if there is a nonsingular


&

The mapping

It is clear that similarity is an equivalence relation. Similarity transformations preserve the eigenvalues of matrices. An eigenvector of is transformed into the eigenvector of . In effect, a similarity transformation amounts to representing the matrix in a different basis. We now introduce some terminology.

An eigenvalue of has algebraic multiplicity , if it is a root of multiplicity of the characteristic polynomial.

If an eigenvalue is of algebraic multiplicity one, it is said to be simple. A nonsimple eigenvalue is multiple. The geometric multiplicity of an eigenvalue of is the maximum number of independent eigenvectors associated with it. In other words, the geometric multiplicity is the dimension of the eigenspace . A matrix is derogatory if the geometric multiplicity of at least one of its eigenvalues is larger than one. An eigenvalue is semisimple if its algebraic multiplicity is equal to its geometric multiplicity. An eigenvalue that is not semisimple is called defective.

Often, ( ) are used to denote the distinct eigenvalues of . It is easy to show that the characteristic polynomials of two similar matrices are identical; see Exercise 9. Therefore, the eigenvalues of two similar matrices are equal and so are their algebraic multiplicities. Moreover, if is an eigenvector of , then is an eigenvector

@ 3

B 

)11)A6 0

matrix

such that

is called a similarity transformation.

G


0 0 0

'

)  ( %  )   "   %  

'

'

    $
 

u d

U SRFP I#GDE C T H Q F HF

u @

| }{

~ i1)1' '
0
  '#   '  

AV

of and, conversely, if is an eigenvector of then is an eigenvector of . As a result the number of independent eigenvectors associated with a given eigenvalue is the same for two similar matrices, i.e., their geometric multiplicity is also the same.

The simplest form in which a matrix can be reduced is undoubtedly the diagonal form. Unfortunately, this reduction is not always possible. A matrix that can be reduced to the diagonal form is called diagonalizable. The following theorem characterizes such matrices.

A matrix of dimension arly independent eigenvectors.

is diagonalizable if and only if it has

A matrix is diagonalizable if and only if there exists a nonsingular matrix and a diagonal matrix such that , or equivalently , where is a diagonal matrix. This is equivalent to saying that linearly independent vectors exist the column-vectors of such that . Each of these column-vectors is an eigenvector of . A matrix that is diagonalizable has only semisimple eigenvalues. Conversely, if all the eigenvalues of a matrix are semisimple, then has eigenvectors. It can be easily shown that these eigenvectors are linearly independent; see Exercise 2. As a result, we have the following proposition.

A matrix is diagonalizable if and only if all its eigenvalues are

semisimple.

Since every simple eigenvalue is semisimple, an immediate corollary of the above result is: When has distinct eigenvalues, then it is diagonalizable.

From the theoretical viewpoint, one of the most important canonical forms of matrices is the well known Jordan form. A full development of the steps leading to the Jordan form is beyond the scope of this book. Only the main theorem is stated. Details, including the proof, can be found in standard books of linear algebra such as [117]. In the following, refers to the algebraic multiplicity of the individual eigenvalue and is the index of the eigenvalue, i.e., the smallest integer for which .

Any matrix can be reduced to a block diagonal matrix consisting of diagonal blocks, each associated with a distinct eigenvalue . Each of these diagonal blocks has itself a block diagonal structure consisting of sub-blocks, where is the geometric multiplicity of the eigenvalue . Each of the sub-blocks, referred to as a Jordan

C R T!  @ 3

7 &0

 0 SR! C B 

B ) Q

6  3 1 '  (  %    !5420)' &$#"!    

6  3 1 '   ' !542P#!IGHGF!ED '  C %

@4

   

&

@ 3

9 ~ @

uY u d   n 
&

0 80 7

A B

T U

~  CjG b

SRFP cbP` ` H Q F Q Q

U T

T U

CjG b

q G S

line-

The -th diagonal block, , is known as the -th Jordan submatrix (sometimes Jordan Box). The Jordan submatrix number starts in column . Thus,

Each of the blocks corresponds to a different eigenvector associated with the eigenvalue . Its size is the index of .

Here, it will be shown that any matrix is unitarily similar to an upper triangular matrix. The only result needed to prove the following theorem is that any vector of 2-norm one can be completed by additional vectors to form an orthonormal basis of .

For any square matrix , there exists a unitary matrix

is upper triangular.
T

The proof is by induction over the dimension . The result is trivial for . Assume that it is true for and consider any matrix of size . The matrix admits at least one eigenvector that is associated with an eigenvalue . Also assume without . First, complete the vector into an orthonormal set, i.e., loss of generality that nd an matrix such that the matrix is unitary. Then and hence,

Now use the induction hypothesis for the exists an unitary matrix
3 3

matrix

such that

: There is upper triangular.

&

(I 0

V1

6  3 1 '   '   % !542P#!IG8##I

C 0 0 C 0I 0 ~ B  C 0 ~ B 0 ! ~B  C ~B DI 3  1 I I   I   I C 3  1  ~ s~ ~ s~  (6   y B ~ 

..

with

..

..

such that

99 @8

344

'

0I0

VU@

9@8 9

' Q R Y 5 A 6 344

%  B

where each is associated with itself the following structure,

and is of size

the algebraic multiplicity of

..

..

. It has

block, is an upper bidiagonal matrix of size not exceeding on the diagonal and the constant one on the super diagonal.

, with the constant

6 0
Y

9@8 9

99

99

# Q

6$

    $

u d
344 44 44 4 5

111) X
3 @

0 0

&

SG b ~

u @

V 6@

&

Dene the

matrix


and multiply both members of (1.24) by from the left and from the right. The resulting matrix is clearly upper triangular and this shows that the result is true for , with which is a unitary matrix. A simpler proof that uses the Jordan canonical form and the QR decomposition is the subject of Exercise 7. Since the matrix is triangular and similar to , its diagonal elements are equal to the eigenvalues of ordered in a certain manner. In fact, it is easy to extend the proof of the theorem to show that this factorization can be obtained with any order for the eigenvalues. Despite its simplicity, the above theorem has far-reaching consequences, some of which will be examined in the next section. It is important to note that for any , the subspace spanned by the rst columns of is invariant under . Indeed, the relation implies that for , we have
 #

which is known as the partial Schur decomposition of . The simplest case of this decomposition is when , in which case is an eigenvector. The vectors are usually called Schur vectors. Schur vectors are not unique and depend, in particular, on the order chosen for the eigenvalues. A slight variation on the Schur canonical form is the quasi-Schur form, also called the real Schur form. Here, diagonal blocks of size are allowed in the upper triangular matrix . The reason for this is to avoid complex arithmetic when the original matrix is block is associated with each complex conjugate pair of eigenvalues of the real. A matrix.

Consider the

matrix

The matrix

has the pair of complex conjugate eigenvalues

If we let and if is the principal leading submatrix of dimension of , the above relation can be rewritten as

   

1) Yup  11 z

A 5 3

8 7

0 0 ~

0 ) p ' 1( (` ( &

' 5' 0 ' 0

0 0 (0
7 7 7

uY u d   n 
I ( 0P


'

 w

~ h!~

3'

11)1 6 0 1 ' 0

) 0

!~ ~
U T

 5

0 (0



and the real eigenvalue of matrices


7 7

. The standard (complex) Schur form is given by the pair


7 7 W3 7 7

and

It is possible to avoid complex arithmetic by using the quasi-Schur form which consists of the pair of matrices

We conclude this section by pointing out that the Schur and the quasi-Schur forms of a given matrix are in no way unique. In addition to the dependence on the ordering of the eigenvalues, any column of can be multiplied by a complex sign and a new corresponding can be found. For the quasi-Schur form, there are innitely many ways to select the blocks, corresponding to applying arbitrary rotations to the columns of associated with these blocks.


The analysis of many numerical techniques is based on understanding the behavior of the successive powers of a given matrix . In this regard, the following theorem plays a fundamental role in numerical linear algebra, more particularly in the analysis of iterative methods.


The sequence

converges to zero if and only if

To prove the necessary condition, assume that and consider eigenvector associated with an eigenvalue of maximum modulus. We have

which implies, by taking the 2-norms of both sides,


V 6 0 @

0 V

IP  '

6 !4  3 

'

511))z

0 V 0' @

    #

0 '   '  0 '

0 @

! 'P 0'   1  

'

and

a unit

5 A qz uX u p p pY uX z p z u 3 8 p z A z p p 3 qz Y V p 5 z p Y z 1A V p Y p 3 Y z z z p 3 p z 8 qz 1) qz

7
7

qz7 7 7 7 3 Y 7 7 z 7 77 3 % Y uY zW3 z7 z%p7 pW3 z77 z 7 z7 z p777 qz7 %qz7 pW3

7 3

7 3

7 3

3 

7W3 7 W3

    $
7

u d

7 7 7 3

 

'

5

C S G bB

u @



This shows that . The Jordan canonical form must be used to show the sufcient condition. Assume that . Start with the equality

To prove that converges to zero, it is sufcient to show that converges to zero. An important observation is that preserves its block form. Therefore, it is sufcient to prove that each of the Jordan blocks converges to zero. Each block is of the form

An equally important result is stated in the following theorem.

The series

The rst part of the theorem is an immediate consequence of Theorem 1.4. Indeed, if the series converges, then . By the previous theorem, this implies that . To show that the converse is also true, use the equality

and exploit the fact that since


& 3 

Another important consequence of the Jordan canonical form is a result that relates the spectral radius of a matrix to its matrix norm.

&

This shows that the series converges since the left-hand side will converge to In addition, it also shows the second part of the theorem.

3 

' )1 6  C 0 ' B 0 C B Q C C )1  B C B 0 '

, then

is nonsingular, and therefore,

'

3 

 ' 

3 

3 

&

3 

converges if and only if of the series is equal to


T

. Under this condition, .

is nonsingular and the limit

Since the matrix


@

, each of the terms in this nite sum converges to zero as converges to zero.

3 

&

Using the triangle inequality for any norm and taking


@

yields

where

is a nilpotent matrix of index , i.e.,


3 


. Therefore, for

Q (

'

C   '   B ( & ! #  '  0 Q ( C ' q B ( & ! ' 0 ! Q @Y

   
& 7 8

&

'

Q  0  C B uY u d    n 
'

3 

&'()' &

& '

'

&

& '

&

C  C B B

'

3 

U T

'

CjG b

'   

B Q C B Q C

Q

. Therefore,

For any matrix norm

, we have

The proof is a direct application of the Jordan canonical form and is the subject of Exercise 10.

This section examines specic properties of normal matrices and Hermitian matrices, including some optimality properties related to their spectra. The most common normal matrices that arise in practice are Hermitian or skew-Hermitian.

By denition, a matrix is said to be normal if it commutes with its transpose conjugate, i.e., if it satises the relation

An immediate property of normal matrices is stated in the following lemma.

If a normal matrix is triangular, then it is a diagonal matrix.

Assume, for example, that is upper triangular and normal. Compare the rst diagonal element of the left-hand side matrix of (1.25) with the corresponding element of the matrix on the right-hand side. We obtain that

which shows that the elements of the rst row are zeros except for the diagonal one. The same argument can now be used for the second row, the third row, and so on to the last row, to show that for . A consequence of this lemma is the following important result.

A matrix is normal if and only if it is unitarily similar to a diagonal

matrix.
T

It is straightforward to verify that a matrix which is unitarily similar to a diagonal matrix is normal. We now prove that any normal matrix is unitarily similar to a diagonal

&

y t

  ' 6 2' 6 ) 1

C B

6  0  0 ( & 6  E00 

) ( % 

I ( I

'F0  '    

   

u d

 '

 ( $ % 

uu

!      %

SG b n  d 
7 a

 SG b

T U

| }{

Q Q





or,

Thus, any normal matrix is diagonalizable and admits an orthonormal basis of eigenvectors, namely, the column vectors of . The following result will be used in a later chapter. The question that is asked is: Assuming that any eigenvector of a matrix is also an eigenvector of , is normal? If had a full set of eigenvectors, then the result is true and easy to prove. Indeed, if is the matrix of common eigenvectors, then and , with and diagonal. Then, and and, therefore, . It turns out that the result is true in general, i.e., independently of the number of eigenvectors that admits.

eigenvector of
T

If is normal, then its left and right eigenvectors are identical, so the sufcient condition is trivial. Assume now that a matrix is such that each of its eigenvectors , , with is an eigenvector of . For each eigenvector of , , and since is also an eigenvector of , then . Observe that and because , it follows that . Next, it is proved by contradiction that there are no elementary divisors. Assume that the contrary is true for . Then, the rst principal vector associated with is dened by
@ @

Taking the inner product of the above relation with


V @ V W

On the other hand, it is also true that


V V W

A result of (1.26) and (1.27) is that which is a contradiction. Therefore, has a full set of eigenvectors. This leads to the situation discussed just before the lemma, from which it is concluded that must be normal. Clearly, Hermitian matrices are a particular case of normal matrices. Since a normal matrix satises the relation , with diagonal and unitary, the eigenvalues of are the diagonal entries of . Therefore, if these entries are real it is clear that . This is restated in the following corollary.

7 I 0 7 7 I 0 )0 C   B t C  B C  Q % C  I B C  B B t C  T  B C  P EB C  T B  P  x C B C C C C Q   B C    B Q   B   I B     I B    I IH IH ~#  111) d   IH  DI 7 I X 7 6 7 0 7 IHX ~ 6 ~ 0 7

A matrix .

is normal if and only if each of its eigenvectors is also an

, we obtain

( &

&

5 I 5

6 7

0I 6

I P

7)  DI 

0 7

@ 3

Upon multiplication by which means that possible if is diagonal.

on the left and on the right, this leads to the equality is normal, and according to the previous lemma this is only

5 0 5 I 0 I 5 0 I #5 I 0 0 I 50 I 0#0 0 5 5 5 I #0 I 0 I 0

matrix. Let be the Schur canonical form of upper triangular. By the normality of ,

where

   

uY u d   n 

0 #50


T U

H5 I

is unitary and

is

A normal matrix whose eigenvalues are real is Hermitian.

As will be seen shortly, the converse is also true, i.e., a Hermitian matrix has real eigenvalues. An eigenvalue of any matrix satises the relation

dened for any nonzero vector in . These ratios are known as Rayleigh quotients and are important both for theoretical and practical purposes. The set of all possible Rayleigh quotients as runs over is called the eld of values of . This set is clearly bounded since each is bounded by the the 2-norm of , i.e., for all . If a matrix is normal, then any vector in can be expressed as

where

From a well known characterization of convex hulls established by Hausdorff (Hausdorffs convex hull theorem), this means that the set of all possible Rayleigh quotients as runs over all of is equal to the convex hull of the s. This leads to the following theorem which is stated without proof.

The eld of values of a normal matrix is equal to the convex hull of its

spectrum.

The next question is whether or not this is also true for nonnormal matrices and the answer is no: The convex hull of the eigenvalues and the eld of values of a nonnormal matrix are different in general. As a generic example, one can take any nonsymmetric real matrix which has real eigenvalues only. In this case, the convex hull of the spectrum is a real interval but its eld of values will contain imaginary values. See Exercise 12 for another example. It has been shown (Hausdorff) that the eld of values of a matrix is a convex set. Since the eigenvalues are members of the eld of values, their convex hull is contained in the eld of values. This is summarized in the following proposition.

&

'

0 0 1 ( & Y # 6  ' 6  1( ' ) C 0 )' ( gB     ) ) C x B

where the vectors becomes

form an orthogonal basis of eigenvectors, and the expression for

2 &

"

6

#  C B 

@ '

&

Y 0 ( 1)' 6 ' @ 0 ( 1)' 6 ' '

( ( )

10 ( &

) 

 

# 7

C B 

SG b

where

is an associated eigenvector. Generally, one might consider the complex scalars

V V

C C xx B B C C C B B

V V W

   
@

u d

uu

b G n  d 
@ T U ` Q Q Q

The eld of values of an arbitrary matrix is a convex set which contains the convex hull of its spectrum. It is equal to the convex hull of the spectrum when the matrix is normal.

A rst result on Hermitian matrices is the following.


X

The eigenvalues of a Hermitian matrix are real, i.e.,


V

Then

which is the stated result.

It is not difcult to see that if, in addition, the matrix is real, then the eigenvectors can be chosen to be real; see Exercise 21. Since a Hermitian matrix is normal, the following is a consequence of Theorem 1.7.

Any Hermitian matrix is unitarily similar to a real diagonal matrix.

In particular a Hermitian matrix admits a set of orthonormal eigenvectors that form a basis of . In the proof of Theorem 1.8 we used the fact that the inner products are real. Generally, it is clear that any Hermitian matrix is such that is real for any vector . It turns out that the converse is also true, i.e., it can be shown that if is real for all vectors in , then the matrix is Hermitian; see Exercise 15. Eigenvalues of Hermitian matrices can be characterized by optimality properties of the Rayleigh quotients (1.28). The best known of these is the min-max principle. We now label all the eigenvalues of in descending order:

Here, the eigenvalues are not necessarily distinct and they are repeated, each according to its multiplicity. In the following theorem, known as the Min-Max Theorem, represents a generic subspace of .

The eigenvalues of a Hermitian matrix

are characterized by the

relation

V W

C C gx B B (

g B

C B C B C % B

$ & % "

V W

V W

0 4'

(6 (

&

  ' '

0 @

V W

( "

z CjG b

T U

CjG b

T U

Let

be an eigenvalue of

and

an associated eigenvector or 2-norm unity.

 C B

   

 P0  ' 6 G' G  %  6

uY u d   n 
A B

 U SRFP cbP` ` T H Q F Q Q
U T

 CjG b

q G S j
T Q Q Q

Let be an orthonormal basis of consisting of eigenvectors of associated with respectively. Let be the subspace spanned by the rst of these vectors and denote by the maximum of over all nonzero vectors of a subspace . Since the dimension of is , a well known theorem of linear algebra shows that its intersection with any subspace of dimension is not reduced to , i.e., there is vector in . For this , we have
 

The above result is often called the Courant-Fisher min-max principle or theorem. As a particular case, the largest eigenvalue of satises

Actually, there are four different ways of rewriting the above characterization. The second formulation is

and the two other ones can be obtained from (1.30) and (1.32) by simply relabeling the eigenvalues increasingly instead of decreasingly. Thus, with our labeling of the eigenvalues in descending order, (1.32) tells us that the smallest eigenvalue satises

replaced by if the eigenvalues are relabeled increasingly. In order for all the eigenvalues of a Hermitian matrix to be positive, it is necessary and sufcient that

Such a matrix is called positive denite. A matrix which satises for any is said to be positive semidenite. In particular, the matrix is semipositive denite for any rectangular matrix, since
 

Similarly, is also a Hermitian semipositive denite matrix. The square roots of the eigenvalues of for a general rectangular matrix are called the singular values of

g p ( C x B C g I B (I ( C g B p  S z  C x B

7 8



DI I (

with

&

&

B

&

so that . In other words, as runs over all the subspaces, is always and there is at least one subspace . This shows the desired result.

-dimensional for which

 3

so that . Consider, on the other hand, the particular subspace of dimension is spanned by . For each vector in this subspace, we have

which

 ~B

C ' ( 6   0 1(0 ' ( ) C xx B B 6  ' )  0 1( '  ~ ' )  '  C C B gB C x B '


3 3
" (

C gB g B

"

' @

) 

& 

C C x B ( g B 0

C C gg B B  (

$( & % "

' ( ' #C C ' '( gB  6    ( ) ) C g B 1)11 '

"

   

' (

& '$ 

& '$ 

u d

  ' '

( "

 

uu
@

1111)0 ( % n  d 

" %%%" $$$


)

0 @

' @

0 @

! ) 

B C

& 

B



! 7 

' @

and are denoted by . In Section 1.5, we have stated without proof that the 2-norm of any matrix is equal to the largest singular value of . This is now an obvious fact, because

which results from (1.31). Another characterization of eigenvalues, known as the Courant characterization, is stated in the next theorem. In contrast with the min-max theorem, this property is recursive in nature.
@

In other words, the maximum of the Rayleigh quotient over a subspace that is orthogonal to the rst eigenvectors is equal to and is achieved for the eigenvector associated with . The proof follows easily from the expansion (1.29) of the Rayleigh quotient.

Nonnegative matrices play a crucial role in the theory of matrices. They are important in the study of convergence of iterative methods and arise in many applications including economics, queuing theory, and chemical engineering. A nonnegative matrix is simply a matrix whose entries are nonnegative. More generally, a partial order relation can be dened on the set of matrices.

Let

and

be two

matrices. Then

The binary relation imposes only a partial order on since two arbitrary matrices in are not necessarily comparable by this relation. For the remainder of this section,

if by denition, for , then is nonnegative if , and positive if positive is replaced by negative.

. If

denotes the zero matrix, . Similar denitions hold in which

'

h ~

& ' 

)  ( % 

 # # # # ~ #  h~

%%$ $$

' @

  " '$  &

!)

 ( %  %( # $     

) ) 0

( p # pz

' @

' @ 3 

U T

H RFP I#F D Q F H

1{{ |

and for

C C C gx B B ( ( ( ( C ' ' ' ' B B C gB ( C 0 0 B g B C 0 0 B 0

The eigenvalue tian matrix are such that


) ) )

and the corresponding eigenvector




of a Hermi-

0 X

C C ( C g gDBI B C x B B ( x

 

   
 "

& '$ 

&'$  "

0 X

uY u d   n 


$ & %

6 6  66  

( 66  
& '$%

 U T

CjG b

Q

uj

we now assume that only square matrices are involved. The next proposition lists a number of rather trivial properties regarding the partial order relation just dened.

The following properties hold.

The relation for matrices is reexive ( ), antisymmetric (if , then ), and transitive (if and , then ). If and are nonnegative, then so is their product and their sum If is nonnegative, then so is . If , then . If , then and similarly .


and

The proof of these properties is left as Exercise 23. A matrix is said to be reducible if there is a permutation matrix such that is block upper triangular. Otherwise, it is irreducible. An important result concerning nonnegative matrices is the following theorem known as the Perron-Frobenius theorem.

Let be a real nonnegative irreducible matrix. Then , the spectral radius of , is a simple eigenvalue of . Moreover, there exists an eigenvector with positive elements associated with this eigenvalue.
A relaxed version of this theorem allows the matrix to be reducible but the conclusion is somewhat weakened in the sense that the elements of the eigenvectors are only guaranteed to be nonnegative. Next, a useful property is established.

Let

be nonnegative matrices, with and

. Then

Consider the rst inequality only, since the proof for the second is identical. The result that is claimed translates into

which is clearly true by the assumptions.

A consequence of the proposition is the following corollary.

Let

and

be two nonnegative matrices, with

. Then

The proof is by induction. The inequality is clearly true for . Assume that (1.35) is true for . According to the previous proposition, multiplying (1.35) from the left by results in


G C

   s# s# #   0  # 0   G # G  # '  ( # # # s# # # # V G S

s#

0 ~ # a` # ' ' 0 )(&' # ' ' 1)(&'

xs# x

z (

' X

   5



u d   d   7 e u u
#

' # '

0 4'

~ ~ 

}s# D

SG b

U T

T U

U T

b G

H QF F Q SRI cbP` ` Q

H QF F Q SRI cbP` ` Q

V G S

 '#

 '

 

` Q Q





Now, it is clear that if sides of the inequality

, then also , by Proposition 1.6. We now multiply both by to the right, and obtain

The inequalities (1.36) and (1.37) show that proof.

, which completes the induction

A theorem which has important consequences on the analysis of iterative methods will now be stated.

Let

and

be two square matrices that satisfy the inequalities

Then

The proof is based on the following equality stated in Theorem 1.6


for any matrix norm. Choosing the in Proposition 1.6


norm, for example, we have from the last property


which completes the proof.

Dene . If it is assumed that is nonsingular and

In addition, since , all the powers of as well as their sum in (1.40) are also nonnegative. To prove the sufcient condition, assume that is nonsingular and that its inverse is nonnegative. By the Perron-Frobenius theorem, there is a nonnegative eigenvector associated with , which is an eigenvalue, i.e.,
V

or, equivalently,

Since and are nonnegative, and which is the desired result.


S & V

is nonsingular, this shows that

3 c

C B 0 B

&

&

3 

3 

I V

&

&

&

is nonsingular and
3  T

Let

be a nonnegative matrix. Then is nonnegative.

if and only if

, then by Theorem 1.5,

t t t

3 

( & 0 C  C B  C B

B 'F00  '   ' # ' 00  '   ' C B ' C 0 0 B ' F0  '   C # C B B s# s#

   

0 4'

s# 0 ' 0 ' # ' X # (' ' ( uY u d    n 


4 4 7 3

3 

3 

 CjG b

CjG b

U T

T U

A matrix is said to be an .

-matrix if it satises the following four

properties:
" "
  

for for is nonsingular. .


7 7

In reality, the four conditions in the above denition are somewhat redundant and equivalent conditions that are more rigorous will be given later. Let be any matrix which satises properties (1) and (2) in the above denition and let be the diagonal of . Since ,

Now dene

Using the previous theorem, is nonsingular and if and only if . It is now easy to see that conditions (3) and (4) of Denition 1.4 can be replaced by the condition .

Let a matrix .

be given such that .

for for

Then

is an

-matrix if and only if , where


3 

From the above argument, an immediate application of Theorem 1.15 shows that properties (3) and (4) of the above denition are equivalent to , where and . In addition, is nonsingular iff is and is nonnegative iff is.

The next theorem shows that the condition (1) in Denition 1.4 is implied by the other three.

Let a matrix

be given such that .

for is nonsingular. .
7

Then

&

3 

for where

, i.e.,

is an .

-matrix.

3 

7 I &

&

C 0 &

3 

C 0

&

3 

3 

   5
&

( 0 ~ i)11Y # ~}ia)`1x1 Y #  Yy q u d   d   7 e u u


7 C
0 Y 7 3 

C 7 0 ~ i)1 1Yy#   B ( 0 ~ i)11Y}a`xY # SG b

0 7  7  C B 0 ~ i)11Y # ~}ia)`1x1 Y #  Yy SG b  C B  C B

&

&

7B 3 7

3 

T U

SRFP I#GDE C H Q F HF & " " " " &P &P T


 '#    '#   '#   '    

 7

which gives

Since for all , the right-hand side is and since , then . The second part of the result now follows immediately from an application of the previous theorem. Finally, this useful result follows.


Let

be two matrices which satisfy

for all

Then if
T

is an

-matrix, so is the matrix

This establishes the result by using Theorem 1.16 once again.

A real matrix is said to be positive denite or positive real if

Q C 0

p  j

&A

7 3 # B

C 0

) ( %  (   ! @ %( ) "    

&A

&

7 3 B

&

Since the matrices imply that

and

are nonnegative, Theorems 1.14 and 1.16

&

3 

7 B & 0

7 3 

&A

which, upon multiplying through by


3

, yields

0 C 7 ( 7 (0

3 A

&

Consider now the matrix

. Since

, then

Assume that is an -matrix and let The matrix is positive because

denote the diagonal of a matrix

( V

( R ' ' & q q

#  A 7

7   s# # H CjG b

0 7 B &A 7 0

7 3 

& I

Dene

. Writing that

   
yields .

0 ' ' 1)(&' y V C ( B 0 uY u d    n  7 3 

( 0

U T

&A

7 3 

' '

{ | b1{{

 

 !


Q

It must be emphasized that this denition is only useful when formulated entirely for real variables. Indeed, if were not restricted to be real, then assuming that is real for all complex would imply that is Hermitian; see Exercise 15. If, in addition to Denition 1.41, is symmetric (real), then is said to be Symmetric Positive Denite (SPD). Similarly, if is Hermitian, then is said to be Hermitian Positive Denite (HPD). Some properties of HPD matrices were seen in Section 1.9, in particular with regards to their eigenvalues. Now the more general case where is non-Hermitian and positive denite is considered. We begin with the observation that any square matrix (real or complex) can be decomposed as

in which

Note that both and are Hermitian while the matrix in the decomposition (1.42) is skew-Hermitian. The matrix in the decomposition is called the Hermitian part of , while the matrix is the skew-Hermitian part of . The above decomposition is the analogue of the decomposition of a complex number into ,

This results in the following theorem.

Let be a real positive denite matrix. Then addition, there exists a scalar such that

is nonsingular. In

for any real vector .


V T

The rst statement is an immediate consequence of the denition of positive deniteness. Indeed, if were singular, then there would be a nonzero vector such that and as a result for this vector, which would contradict (1.41). We now prove the second part of the theorem. From (1.45) and the fact that is positive denite, we conclude that is HPD. Hence, from (1.33) based on the min-max theorem, we get
V 7 a W

A simple yet important result which locates the eigenvalues of

Taking

yields the desired inequality (1.46).

in terms of the spectra

C C ( C B B  ( % C B B  ( %

66 

V W

When is real and is a real vector then position (1.42) immediately gives the equality
V V V

is real and, as a result, the decom-

&

B
V W

CQ

C % C B B C B C 3 C C B B G Q B B7

C C I B A  I  B 

 

( C B

& '$

   
V

u d

 

uu uu d PnA p C B

& '$
(

SG b



The result follows using properties established in Section 1.9.

Thus, the eigenvalues of a matrix are contained in a rectangle dened by the eigenvalues of its Hermitian part and its non-Hermitian part. In the particular case where is real, then is skew-Hermitian and its eigenvalues form a set that is symmetric with respect to the real axis in the complex plane. Indeed, in this case, is real and its eigenvalues come in conjugate pairs. Note that all the arguments herein are based on the eld of values and, therefore, they provide ways to localize the eigenvalues of from knowledge of the eld of values. However, this approximation can be inaccurate in some cases.

Consider the matrix

When a matrix

is Symmetric Positive Denite, the mapping

from to is a proper inner product on , in the sense dened in Section 1.4. The associated norm is often referred to as the energy norm. Sometimes, it is possible to nd an appropriate HPD matrix which makes a given matrix Hermitian, i.e., such that

although is a non-Hermitian matrix with respect to the Euclidean inner product. The simplest examples are and , where is Hermitian and is Hermitian Positive Denite.

The eigenvalues of are .


(

are

and 101. Those of

are

and those of

C B

x 0 g C x % C g B B

g B

 

V c

xB

&

jlj 

assuming that

. This leads to

C  C C B C B B B 7 6  C  )B C B C B

V c

V W

U    T

B 7 )

vector

When the decomposition (1.42) is applied to the Rayleigh quotient of the eigenassociated with , we obtain

and

C C  B # # C C B B 7 B

Let

be any square (possibly complex) matrix and let . Then any eigenvalue of is such that
@

t t s B 06 

C # B C

C C I I B 0 6 C jG b

T U

of

and
Q

can now be proved.

   

uY u d   n 

Projection operators or projectors play an important role in numerical linear algebra, particularly in iterative methods for solving various matrix problems. This section introduces these operators from a purely algebraic point of view and gives a few of their important properties.

Conversely, every pair of subspaces and which forms a direct sum of denes a unique projector such that and . This associated projector maps an element of into the component , where is the -component in the unique decomposition associated with the direct sum. In fact, this association is unique, that is, an arbitrary projector can be entirely of , and (2) its null space determined by the given of two subspaces: (1) The range which is also the range of . For any , the vector satises the conditions,

The linear mapping is said to project onto and along or parallel to the subspace . If is of rank , then the range of is of dimension . Therefore, it is natural to dene through its orthogonal complement which has dimension . The above conditions that dene for any become

These equations dene a projector statement, (1.51), establishes the

onto and orthogonal to the subspace . The rst degrees of freedom, while the second, (1.52), gives

&

3  7 

( )

( )

In addition, the two subspaces and Indeed, if a vector belongs to , then is also in , then . Hence, every element of can be written as be decomposed as the direct sum

intersect only at the element zero. , by the idempotence property. If it which proves the result. Moreover, . Therefore, the space can

A few simple properties follow from this denition. First, if , and the following relation holds,

6 0 0 0 C  B   C  B   C  B C  B  j   C  j C B  q   B  q  C C   BB

 3 0  B (

 ) B (

C B

V 3

 3

 3 

) (

) 

7 

 3 

( )

A projector

is any linear mapping from

to itself which is idempotent, i.e., such that

is a projector, then so is

  C  !  ' ! ' 3 

 6

1 1 

) "  ( 

GD#IG' ' (

  

   "  " ( 


)  

{i{ |
C


 3 B 

$


the constraints that dene from these degrees of freedom. The general denition of projectors is illustrated in Figure 1.1.

and both of dimension , is it The question now is: Given two arbitrary subspaces, always possible to dene a projector onto orthogonal to through the conditions (1.51) and (1.52)? The following lemma answers this question.

This in turn is equivalent to the statement that for any , there exists a unique pair of vectors such that

In summary, given two subspaces and , satisfying the condition , there is a projector onto orthogonal to , which denes the projected vector of any vector
V

 ! 7  (

where ii.

belongs to

, and

belongs to

, a statement which is identical with

( S

bG

V 3

Since is of dimension to the condition that

is of dimension

and the above condition is equivalent

The rst condition states that any vector which is in must be the zero vector. It is equivalent to the condition

 ! 7  (

For any in and (1.52).


T

there is a unique vector

No nonzero vector of

is orthogonal to ; which satises the conditions (1.51)

Given two subspaces and two conditions are mathematically equivalent.

of the same dimension

, the following

and also orthogonal to

Projection of

onto

and orthogonal to .

   

uY u d   n 

T U  D

U T

 3

Two bases are required to obtain a matrix representation of a general projector: a basis for the subspace and a second one for the subspace . These two bases are biorthogonal when

In matrix form, this can be rewritten as

If the two bases are biorthogonal, then it follows that , which yields the matrix representation of

. Therefore, in this case,

Then, the projector is said to be the orthogonal projector onto . A projector that is not orthogonal is oblique. Thus, an orthogonal projector is dened through the following requirements satised for any vector ,

An important class of projectors is obtained in the case when the subspace , i.e., when

!    C 

1  #  HG' 20)GI ! 1 ' (  %

(C  C B  B

 3 

If we assume that no vector of matrix is nonsingular.

is orthogonal to , then it can be shown that the

is equal to

 2

 t

&

( 0

( ( )

In case the bases (1.56) that

and

are not biorthogonal, then it is easily seen from the condition

 (

I   I z C BI i)1)} CvC

7 8

In matrix form this means in the basis. The constraint

. Since belongs to , let be its representation is equivalent to the condition, for

t )11) 0 1

!PG '    

iff

In particular, the condition translates into which means that The converse is also true. Hence, the following useful property,

( C  B 
C
B )

from equations (1.51) and (1.52). This projector is such that

'

H1 C v C  B

 a

 B) (

 C  B

 ' 6
a 7

  

7

) (

A A

"

BB

 )1)1 0  1

or equivalently,

Orthogonal projection of

onto a subspace

The above relations lead to the following proposition.


U T

A projector is orthogonal if and only if it is Hermitian.

By denition, an orthogonal projector is one for which . Therefore, by (1.61), if is Hermitian, then it is orthogonal. Conversely, if is orthogonal, then (1.61) implies while (1.62) implies . Since is a projector and since projectors are uniquely determined by their range and null spaces, this implies that . Given any unitary matrix whose columns form an orthonormal basis of , we can represent by the matrix . This is a particular case of the matrix representation of projectors (1.57). In addition to being idempotent, the linear mapping associated with this matrix satises the characterization given above, i.e.,

It is important to note that this representation of the orthogonal projector is not unique. In fact, any orthonormal basis will give a different representation of in the above form. As

and

&

C (  ) B (

C C I B B C B

 0 (

 ) (

CC I C C B  B B I B 

 0 (  )  (

3 

A consequence of the relation (1.60) is

C g I  % C  x B C 6  x % C  g I  % C x 6 C B B B

I B )

) 

 C B

I 

Q~ 

 )

I BB

I 

First note that

is also a projector because for all

and ,

y x C gB C x I



I 

It is interesting to consider the mapping


"

dened as the adjoint of

   


7 8

C xC

uY u d   n 

 3 BB 

( ( )

B

U T

SRFP cbP` ` H Q F Q Q

 3

 ) B (

q G S

I P

a consequence for any two orthogonal bases of , we must have an equality which can also be veried independently; see Exercise 26.

When is an orthogonal projector, then the two vectors and in the decomposition are orthogonal. The following relation results:

A consequence of this is that for any ,

for any orthogonal projector . An orthogonal projector has only two eigenvalues: zero or one. Any vector of the range of is an eigenvector associated with the eigenvalue one. Any vector of the null-space is obviously an eigenvector associated with the eigenvalue zero. Next, an important optimality property of orthogonal projectors is established.

given vector

in

Let be the orthogonal projector onto a subspace , the following is true:

Let be any vector of and consider the square of its distance from . Since is orthogonal to to which belongs, then

By expressing the conditions that dene for an orthogonal projector onto a subspace , it is possible to reformulate the above result in the form of necessary and sufcient conditions which enable us to determine the best approximation to a given vector in the least-squares sense.

Let a subspace

, and a vector

in

be given. Then

if and only if the following two conditions are satised,

62

6

 (

Therefore, that the minimum is reached for

for all .

in

. This establishes the result by noticing

66  C

B

66 

A6 

 3

 3

6 

66  C

X6 

 B) (

Thus, the maximum of , for all value one is reached for any element in

in does not exceed one. In addition the . Therefore,

. Then for any

I6 6

I0 0

!    C 

 3 B 

66 

1 2' !GI%  ) 3   ( 

 3 B 

j A6 

66 

# 6 

BG  

B

6  (6

 (

6 

  

66 

 3  3

 C

 3 B 

( 6

66 

j y SG b

U T

b G

` Q Q



 3

Linear systems are among the most important and common problems encountered in scientic computing. From the theoretical point of view, the problem is rather easy and explicit solutions using determinants exist. In addition, it is well understood when a solution exists, when it does not, and when there are innitely many solutions. However, the numerical viewpoint is far more complex. Approximations may be available but it may be difcult to estimate how accurate they are. This clearly will depend on the data at hand, i.e., primarily on the coefcient matrix. This section gives a very brief overview of the existence theory as well as the sensitivity of the solutions.

Consider the linear system

Here, is termed the unknown and the right-hand side. When solving the linear system (1.65), we distinguish three situations. Case 1 The matrix is nonsingular. There is a unique solution given by . Case 2 The matrix is singular and . Since , there is an such that . Then is also a solution for any in . Since is at least one-dimensional, there are innitely many solutions. Case 3 The matrix is singular and . There are no solutions.
&

The simplest illustration of the above three cases is with small diagonal

Then

is nonsingular and there is a unique

given by


Then is singular and, as is easily seen, . For example, a particular element such that is . The null space of consists of all vectors whose rst component is zero, i.e., all vectors of the form . Therefore, there are innitely many
&

C 1 P B 

Now let

 7

) (

7 

 7

matrices. Let

C B 

0 

&

C B ) C B ( 1)

 

!0 1   

   

) " (  )   % )

C B C 

' 3 4II G   

B ( 1)

uY u d   n 
( 0

"

$ &

&

 $( )
&

&

$   

&

&

 )

T U   

& 1{{ |

&

solution which are given by

Finally, let

be the same as in the previous case, but dene the right-hand side as

In this case there are no solutions because the second equation cannot be satised.

Consider the linear system (1.65) where is an nonsingular matrix. Given any matrix , the matrix is nonsingular for small enough, i.e., for where is some small number; see Exercise 32. Assume that we perturb the data in the above system, i.e., that we perturb the matrix by and the right-hand side by . The solution of the perturbed system satises the equation,

Let

. Then,

The size of the derivative of is an indication of the size of the variation that the solution undergoes when the data, i.e., the pair is perturbed in the direction . In absolute terms, a small variation will cause the solution to vary by roughly . The relative variation is such that

Using the fact that

in the above equation yields




which relates the relative variation in the solution to the relative sizes of the perturbations. The quantity

( 

2 

C B

 

 

 

3 W

0

B 0 &

   

b

&

 

   C B

0

3 1

0

&

As an immediate result, the function given by

is differentiable at

and its derivative is

B 1

 #

C c C

1 2' G' !PG     ' 

3 W

C C B 0 C  B B  B C #B % C B C  B B d C B C  B

S

&

~ j(~

 U   


3 W

7 

C p  B

B 

&

#  C    B  3 1  #     

& 

CB

uY h   u  u


B 1

7 #

 C  3 B  C

CB

B 0 &

CB

B1

For large matrices, the determinant of a matrix is almost never a good indication of near singularity or degree of sensitivity of the linear system. The reason is that is the product of the eigenvalues which depends very much on a scaling of a matrix, whereas the condition number of a matrix is scaling-invariant. For example, for the determinant is , which can be very small if , whereas for any of the standard norms. In addition, small eigenvalues do not always give a good indication of poor conditioning. Indeed, a matrix can have all its eigenvalues equal to one yet be poorly conditioned.

The simplest example is provided by matrices of the form

for large . The inverse of

is

and for the

-norm we have


so that

For a large , this can give a very large condition number, whereas all the eigenvalues of are equal to unity. When an iterative procedure is used for solving a linear system, we typically face the problem of choosing a good stopping procedure for the algorithm. Often a residual norm,

is available for some current approximation and an estimate of the absolute error or the relative error is desired. The following simple relation is helpful in this regard,

It is necessary to have an estimate of the condition number above relation.

in order to exploit the

is called the condition number of the linear system (1.65) with respect to the norm condition number is relative to a norm. When using the standard norms , it is customary to label with the same label as the associated norm. Thus,

i1) 1Y #  
3 W

  

1 ) 20(

B  C 1

   
. The ,


   y

Q   

# 0  #  C B #  '   C B
G 0  3 

6 C    % C B B   

G 0 

&

uY u d   n 
3


 '

&

&

  (

3 

 C B

U T

120( )



1 Verify that the Euclidean inner product dened by (1.4) does indeed satisfy the general denition of inner products on vector spaces. 2 Show that two eigenvectors associated with two distinct eigenvalues are linearly independent. In a more general sense, show that a family of eigenvectors associated with distinct eigenvalues forms a linearly independent family. 3 Show that if is any nonzero eigenvalue of the matrix , then it is also an eigenvalue of the matrix . Start with the particular case where and are square and is nonsingular, then consider the more general case where may be singular or even rectangular (but such that and are square). 4 Let be an orthogonal matrix, i.e., such that , where is a diagonal matrix. Assuming that is nonsingular, what is the inverse of ? Assuming that , how can be transformed into a unitary matrix (by operations on its rows or columns)? 5 Show that the Frobenius norm is consistent. Can this norm be associated to two vector norms via (1.7)? What is the Frobenius norm of a diagonal matrix? What is the -norm of a diagonal matrix (for any )? 6 Find the Jordan canonical form of the matrix:

7 Give an alternative proof of Theorem 1.3 on the Schur form by starting from the Jordan canonical form. [Hint: Write and use the QR decomposition of .] 8 Show from the denition of determinants used in Section 1.2 that the characteristic polynomial is a polynomial of degree for an matrix. 9 Show that the characteristic polynomials of two similar matrices are equal.

for any matrix norm. [Hint: Use the Jordan canonical form.] 11 Let be a nonsingular matrix and, for any matrix norm , dene . Show that this is indeed a matrix norm. Is this matrix norm consistent? Show the same for and where is also a nonsingular matrix. These norms are not, in general, associated with any vector norms, i.e., they cant be dened by a formula of the form (1.7). Why? What about the particular case ? 12 Find the eld of values of the matrix

and verify that it is not equal to the convex hull of its eigenvalues.

Y qB   Y Y B Y i pY Y

5 Y Y

X f c b ` hgedA aH Y Y VWU U U STSRQ

10 Show that

88 @97

Same question for the matrix obtained by replacing the element

by 1.

" #! $      " 4" 56 ( ( & " % 2 0 ( & 31)'


&

" v"

"

%

P 

H F B Y IuB Y  hY Y t

H F B C B  IGED%A

 

   

) ")   

n u d   g
Y B rsY B


13 Show that for a skew-Hermitian matrix ,

15 Using the results of the previous two problems, show that if is real for all in , then must be Hermitian. Would this result be true if the assumption were to be replaced by: is real for all real ? Explain. 16 The denition of a positive denite matrix is that be real and positive for all real vectors . Show that this is equivalent to requiring that the Hermitian part of , namely, , be (Hermitian) positive denite.

a Hermitian Positive Denite matrix dening a -inner 20 Let be a Hermitian matrix and product. Show that is Hermitian (self-adjoint) with respect to the -inner product if and only if and commute. What condition must satisfy for the same condition to hold in the more general case where is not Hermitian? 21 Let be a real symmetric matrix and an eigenvalue of . Show that if is an eigenvector associated with , then so is . As a result, prove that for any eigenvalue of a real symmetric matrix, there is an associated eigenvector which is real.

23 Prove all the properties listed in Proposition 1.6.

27 What are the eigenvalues of a projector? What about its eigenvectors? 28 Show that if two projectors and What are the range and kernel of ? commute, then their product

is a projector.

 u  H c H c

H f  5 f a!f

 

26 Show that for two orthogonal bases .

bA8IT( aY T ` `

25 Show that if we have

then

c H 5 d c

5 f

f H f

S  9

5 Y ` 5 AY `8A`aY Y Y T SX9 0 R R

24 Let

be an -matrix and is an -matrix.

two nonnegative vectors such that

. Conclude that under the same assumption, of we have

of the same subspace

V T & W9 H F US

22 Show that a Hessenberg matrix tory.

such that

, cannot be deroga-

. Show that

55 WW5  P H  GH E C &  G0 ( & Q "%IC FDB

@9

! 5 #eI c 5 c

19 Show that commute.

is normal iff its Hermitian and skew-Hermitian parts, as dened in Section 1.11,

f gec $  

18 Let a matrix be such that Use Lemma 1.2.]

where is a polynomial. Show that

17 Let and where is a Hermitian matrix and and Hermitian in general? Show that Denite matrix. Are (self-adjoint) with respect to the -inner product.

is a Hermitian Positive and are Hermitian is normal. [Hint:

f   g ec H5

 3

f 2ec

f dI4ec

H 5 8 5 6

f 10hf )('c c      

[Hint: Expand

.]

 

5  &%$ #" 9  f  c    ! " f   c  

"% dIc f

14 Given an arbitrary matrix , show that if

   
for all

5   

uY u d   n 
" f  d c

6 7H F  H

f dI4ec

for any

in

, then it is true that

Compute

, and .


Show that

Give a lower bound for

32 Consider a nonsingular matrix . Given any matrix , show that there exists such that the is nonsingular for all . What is the largest possible value for matrix satisfying the condition? [Hint: Consider the eigenvalues of the generalized eigenvalue problem .]
    "V 

N OTES AND R EFERENCES . For additional reading on the material presented in this chapter, see Golub and Van Loan [108], Datta [64], Stewart [202], and Varga [213]. Details on matrix eigenvalue problems can be found in Gantmachers book [100] and Wilkinson [227]. An excellent treatise of nonnegative matrices is in the book by Varga [213] which remains a good reference on iterative methods more three decades after its rst publication. Another book with state-of-the-art coverage on iterative methods up to the very beginning of the 1970s is the book by Young [232] which covers -matrices and related topics in great detail. For a good overview of the linear algebra aspects of matrix theory and a complete proof of Jordans canonical form, Halmos [117] is recommended.

31 Find a small rank-one perturbation which makes the matrix lower bound for the singular values of .

in Exercise 29 singular. Derive a

f ec H

30 What is the inverse of the matrix based on this.

of the previous exercise? Give an expression of

H F  ( &

&

. . .

..

. . .

. . . . . .

. . . . . .

. . . . . .

..

. . . . . .

99 8

99

2 & ( &

&

&

344

44

WW " " 99 &0 55W55W55 0 & & 99 0 5W5W5 040 & " & & & & 0404&v& & 0   
99 8 0 &

"

5W5W5

5Y Y " "

f ec 5

   5 Y Y 5 Y 2 Y 2

"

  !

4 34

44

44

29 Consider the matrix

of size

and the vector

and

 

 %

 fs c

n u d   g
9  92

f gec X

 

C 6

Physical phenomena are often modeled by equations that relate several partial derivatives of physical quantities, such as forces, momentums, velocities, energy, temperature, etc. These equations rarely have a closed-form (explicit) solution. In this chapter, a few types of Partial Differential Equations are introduced, which will serve as models throughout the book. Only one- or two-dimensional problems are considered, and the space variables are denoted by in the case of one-dimensional problems or and for two-dimensional problems. In two dimensions, denotes the vector of components .

)  (   ( "  & '  ! (

{iU |


E 4 & %$#`tC&' P$i&53"`te&(A2 )RiFF T"T4SS9R6 x) P YF $8 i47C0mQPP$IP4q618 '7& $8@ 54i&)EgYV6)8iF6p3gHEo $$54 EY6`CF6VC"38' 76v0544 763WC&C"16g )xC&rF`654g1'4 A2sFC8)5476HE4k4$ i0wr6)854)"A2' %$10"v1"'I " 6 % 5e HC"@T4S6698 wGh52r643F rI C&F116!E)C"D8XC`6)EB4 7F !g6 rIT4%6A`pE6 zWu 3Wt%0`6 3f'3W C&0 S$6HW%)1016iB)"0 r'16x16g& $%B P$3E54)"10`'HB)876i&476 EYFE TFP$W`0'54 m TI$6T4b)8R V6 x&07' $V6%8xA0 V6F3)8rF164kT"V&p64183Ce1e&E3AgF f G WW 3W)WgD1&I C 3E1&GAW)"V01')8Cg16p1E& F e6`e&W9 )Wa4`6)&6 DF23W 16T"HP$&4 & 3WEF Afu10F AWf0 Fs$ 8%2tiF& G3E$ 6 WYA)FiFwT42C"& 67S&6 #$ tTI& `6%$`'#6" "%$ xC&I2 Ct & wi9)"3E' %$70v`'"P4 Rtd548I V6 83g3dC&AeE 6 HEWe$9iSAE0 0 $eFT84 06 0 S 0Y 'S $6 Y F d ff 3W3W 1010YA$fHE F W9 i$@jC"16F389 164T")8@164 1Bga&C&163EC"143FST u)R3 d6`V6 8C936) #' SYrFi&% 10" )"' %$vdEF P$T41'3Ekg0vTtYF $PT41'E3gkR 03W01'`6C&)2V68 @987654 2 1 0(!  &16g" `ts&38rF546 q018 e62%F3W2 0 $P541E6q18p"6 0 o $eW30 $4 is%81010t2g 16T" 3W04 $3854F)EV6R i&)8YpF6E P$$ 54njpe$5sF)80 32A20 0  $# " ! 4 C&67&i&I AiFC"6 T4S#a4C&6 3WXp$P& &v16# AtY"E F $4 Eo @gAW0 $H4%8 b"6`m P$T4& V6)8'7& $aF(i&Cg00 54)E"V6)8po $E3Wq01816TB76b6q%8V6)816# Wd `ti6rC&4C0%$6 m$P$zC8E P454j1Ef q618v7(&60 ' 16$sFB)8C6YiF76im5BC&YHE E16@T"$iF10lPIC"Yi6 ds4 Fja8R i&dY)8R HE 76F16P$& $@%8B 541016h5B0 g 54`10106tC"df4TTsC"16v)14)6He163Wt%T"f6 ts4#FsA$8%e%(r62 S0 V&8%$C81054T2jt3E1 16#00#e6 s9`t%8i&0 v5I W iFAWe %EV0)8316E3)Eh%6g )R#Y6F9 q)8%8%6f0 &iYE3Fg3$E4 W 454 4i4r6 He$`Fs%83232rFe4He%t6 st3$X)16C"4   `6rFt384V' &YxrF$g tE&i& a4FC&P$e654)9109%$g `}G166f )"T"1'@Cg4 %8bi0& B16R TY}5476&fW aF rFTgGP$4 4 Ck4i&gE0 YCFI AW' 0`' $%A&32 w 16 "#i& E`tF v54&P$ )106 3Wg f%8sF 3W$02 54$C8)E54V6)8jp60 w@76$o sn & %8W3100 $PC2H4&v%8Y10R F n

      

One of the most common Partial Differential Equations encountered in various areas of engineering is Poissons equation: for in


where

is a bounded, open domain in

. Here,

are the two space variables.

Domain

for Poissons equation.

The above equation is to be satised only for points that are located at the interior of the domain . Equally important are the conditions that must be satised on the boundary of . These are termed boundary conditions, and they come in three common types:

The vector usually refers to a unit vector that is normal to and directed outwards. Note that the Neumann boundary conditions are a particular case of the Cauchy conditions with . For a given unit vector, with components and , the directional derivative is dened by

t t

& &

&

C

"

B V 3

V  C   

   0  C B 0

C B %

B V

"

Dirichlet condition Neumann condition Cauchy condition

&

t Y

"

B 1

!# '   
C C C "B V B  C B % V "B 0 6 6


 PG  1 1  

      

"

&

u
V V

  C B 

6V

A


0 6V

un w uv

7

and the dot in (2.3) indicates a dot product of two vectors in . In reality, Poissons equation is often a limit case of a time-dependent problem. It can, for example, represent the steady-state temperature distribution in a region when there is a heat source that is constant with respect to time. The boundary conditions should then model heat loss across the boundary . The particular case where , i.e., the equation

to which boundary conditions must be added, is called the Laplace equation and its solutions are called harmonic functions. Many problems in physics have boundary conditions of mixed type, e.g., of Dirichlet type in one part of the boundary and of Cauchy type in another. Another observation is that the Neumann conditions do not dene the solution uniquely. Indeed, if is a solution, then so is for any constant . The operator
V

is called the Laplacean operator and appears in many models of physical and mechanical phenomena. These models often lead to more general elliptic operators of the form


where the scalar function depends on the coordinate and may represent some specic parameter of the medium, such as density, porosity, etc. At this point it may be useful to recall some notation which is widely used in physics and mechanics. The operator can be considered as a vector consisting of the components and . When applied to a scalar function , this operator is nothing but the gradient operator, since it yields a vector with the components and as is shown in (2.4). The dot notation allows dot products of vectors in to be dened. These vectors can include partial differential operators. For of with yields the scalar quantity, example, the dot product

which is called the divergence of the vector function . Applying this divergence operator to , where is a scalar function, yields the operator in (2.5). The . Thus, divergence of the vector function is often denoted by div or

div

&

&

t Y t Y

 c 3 

% % V

66  00    

 %% 
66 6 6

6 6 00

% % V  V 

a V 7

7 8

 kB 

"

%

%

where

is the gradient of ,

 

u u u n

is a further generalization of the Laplacean operator in the case where the medium is anisotropic and inhomogeneous. The coefcients depend on the space variable and reect the position as well as the directional dependence of the material properties, such as porosity in the case of uid ow or dielectric constants in electrostatics. In fact, the above operator can be viewed as a particular case of , where is a matrix which acts on the two components of .

Many physical problems involve a combination of diffusion and convection phenomena. Such phenomena are modeled by the convection-diffusion equation

the steady-state version of which can be written as


V 3

Problems of this type are often used as model problems because they represent the simplest form of conservation of mass in uid mechanics. Note that the vector is sometimes quite large, which may cause some difculties either to the discretization schemes or to the iterative solution techniques.

The nite difference method is based on local approximations of the partial derivatives in a Partial Differential Equation, which are derived by low order Taylor series expansions. The method is quite simple to dene and rather easy to implement. Also, it is particularly appealing for simple regions, such as rectangles, and when uniform meshes are used. The matrices that result from these discretizations are often well structured, which means that they typically consist of a few nonzero diagonals. Another advantage is that there are a number of fast solvers for constant coefcient problems, which can deliver the solution in logarithmic time per grid point. This means the total number of operations is of the order of where is the total number of discretization points. This section gives

( &

y  C  wB c

! ( &%  '  !  (  )   

or

t
&

!PG F  '

C  G  kB    w C  kB  6 6 0

!  3 T$! ! 4%   3  

 B  T

6 7 0

C 0

The closely related operator

   

 y B  

A B A

C ~ B  ~

un u  d u

 

| c

an overview of nite difference discretization techniques.

The simplest way to approximate the rst derivative of a function at the point formula
V

When is differentiable at , then the limit of the above ratio when tends to zero is the derivative of at . For a function that is in the neighborhood of , we have by Taylors formula

for some

in the interval
V

. Therefore, the above approximation (2.8) satises

The formula (2.9) can be rewritten with


V

replaced by

to obtain

in which belongs to the interval . Adding (2.9) and (2.11), dividing through by , and using the mean value theorem for the fourth order derivatives results in the following approximation of the second derivative

where . The above formula is called a centered difference approximation of the second derivative since the point at which the derivative is being approximated is the center of the points used for the approximation. The dependence of this derivative on the values of at the points involved in the approximation is often represented by a stencil or molecule, shown in Figure 2.2.

The three-point stencil for the centered difference approximation to the second order derivative. The approximation (2.8) for the rst derivative is forward rather than centered. Also, in (2.8). The two a backward formula can be used which consists of replacing with formulas can also be averaged to obtain the centered difference formula:

&

&

& &

&

2 &

&

t Y

t Y

t Y

t Y

t Y

t Y

C B 9 9 6e C 

 4 B V

&

 B V

  V

C 6 B

is via the

! ' BG   6

C

V 9

C 6 9 V 9 "B 6

B V 3

"




B V 3

B V

n
 

9 3 6V 6 9 6

 C 

6V

C 6 B l C 

 
9 6 9 6

B V

 '

B V 3

"

u u u n
B V



 C   B 99 C   x B  99  C B C 

99 

V


'

"

9 B V 9

A A

"

B V

B V

C

C 6 9 B V6 9


4 


3 B V

B V

&

# &

4 

It is easy to show that the above centered difference formula is of the second order, while (2.8) is only rst order accurate. Denoted by and , the forward and backward difference operators are dened by

All previous approximations can be rewritten using these operators. In addition to standard rst order and second order derivatives, it is sometimes necessary to approximate the second order operator

A centered difference formula for this, which has second order accuracy, is given by

If the approximation (2.12) is used for both the and terms in the Laplacean operator, using a mesh size of for the variable and for the variable, the following second order accurate approximation results:

In the particular case where the mesh sizes size , the approximation becomes

and

are the same and equal to a mesh

which is called the ve-point centered approximation to the Laplacean. The stencil of this nite difference approximation is illustrated in (a) of Figure 2.3.

& &

&

( &

tY t Y t Y t Y C6 6 0 C6 0
C  

C0

 !# '    G' I' 0' &#%  !5& %  I  3   1  1  3 6  3

&

6 0 B V

3 W

V 3

 B 6 0

"

B V

6 B

& 

3C 0 6 C6 

CC 

3

B V

&

C 6 6 0 B C 6  0 B  6  C 6 006 B l C 6 0  0 

6 C

"

B V 3

B V 3

V & 

3

C 

"

V 3

B V

0 B V

9 C

 6 0

0 4 V

"

"

"

B V

B V

&

B 6 0 6 

C  6 0 B C 6  0 B 1 6  

C C
0

   

"

"

B V 4

B V & 

"

"

B V

9 C

A A BBA

B V

B V

un u  d u
9 9

(a)
1

-4

Five-point stencils for the centered difference approximation to the Laplacean operator: (a) the standard stencil, (b) the skewed stencil. Another approximation may be obtained by exploiting the four points located on the two diagonal lines from . These points can be used in the same manner as in the previous approximation except that the mesh size has changed. The corresponding stencil is illustrated in (b) of Figure 2.3. (c)
1 1 1 1

(d)
4 1

-8

-20

Two nine-point centered difference stencils for the Laplacean operator. The approximation (2.17) is second order accurate and the error takes the form

There are other schemes that utilize nine-point formulas as opposed to ve-point formulas. Two such schemes obtained by combining the standard and skewed stencils described above are shown in Figure 2.4. Both approximations (c) and (d) are second order accurate. However, (d) is sixth order for harmonic functions, i.e., functions whose Laplacean is zero.

0 B V

C B


(b)
1 -4 1

n
C


 

B V

u u u n
0
V

e


 

Consider the one-dimensional equation, for

The interval [0,1] can be discretized uniformly by taking the

points

where . Because of the Dirichlet boundary conditions, the values and are known. At every other point, an approximation is sought for the exact solution . If the centered difference approximation (2.12) is used, then by the equation (2.18) satisfy the relation expressed at the point , the unknowns

where

Consider now the one-dimensional version of the convection-diffusion equation (2.7) in which the coefcients and are constant, and , using Dirichlet boundary conditions,

In this particular case, it is easy to verify that the exact solution to the above equation is given by


where is the so-called P clet number dened by e . Now consider the approximate solution provided by using the centered difference schemes seen above, for both the

& &

t Y

& V

in which . Notice that for and , the equation will involve which are known quantities, both equal to zero in this case. Thus, for linear system obtained is of the form

& 2 &

and , the

& "B V

tY t

6 1   

A 5  6 9 4444 99 99 8 34 h

U~

)z B
(

4I 6 %

~ D D  0

~ i1)1z @@  

 )53

0 4 V 3 V 6 & 0 4 V 0 V V

 7

7 8

a 7

B

 I  3 &  #B 3 3 
3 3

z P

C C V a 7 7#B V C B V 3 "B

 

   

B V

V 8 # V 7 7 V ` V 3

&

A BA

p B

V 3

C C 0 B ~ B   B

BAA
C

un u  d u
(

"

0 4 V

rst- and second order derivatives. The equation for unknown number becomes

or, dening

This is a second order homogeneous linear difference equation and the usual way to solve it is to seek a general solution in the form . Substituting in (2.21), must satisfy
7 V 3 3 W

Therefore, is a root and the second root is . The general solution of the above difference equation is now sought as a linear combination of the two solutions corresponding to these two roots,


Thus, the solution is

When the factor becomes negative and the above approximations will oscillate around zero. In contrast, the exact solution is positive and monotone in the range . In this situation the solution is very inaccurate regardless of the arithmetic. In other words, the scheme itself creates the oscillations. To avoid this, a small enough mesh can be taken to ensure that . The resulting approximation is in much better agreement with the exact solution. Unfortunately, this condition can limit the mesh size too drastically for large values of . , the oscillations disappear since . In fact, a linear algebra Note that when interpretation of the oscillations comes from comparing the tridiagonal matrices obtained from the discretization. Again, for the case , the tridiagonal matrix resulting from discretizing the equation (2.7) takes the form

The above matrix is no longer a diagonally dominant M-matrix. Observe that if the backward difference formula for the rst order derivative is used, we obtain

1p 1

! A 5  6 9 ! 44 99 3444 99 8 ~

z 0 6  0 ! 0 

7 a

4 V

X 3 4 X 3

&

& V

Because of the boundary condition boundary condition yields


X 3 4 V

, it is necessary that

. Likewise, the

with

& &

t Y

5   0 0 !  0   6 ' 0 '   C B C PB 6 ' 0' z C kB ' l 6 ' C ! PB ' ' p 0 C PB 0 C B  5 p 0 6  l 0 0  0


,
3 I 3
(

7 8

&

&




 
3 4 V 7 V

u u u n
3 3 V 4 V & 3 & V 3 V 3 3 3 3 4 V X V V 7 3 3

3
(


(

Then (weak) diagonal dominance is preserved if obtained for the above backward scheme is
3 3 3 3

. This is because the new matrix

where is now dened by . Each diagonal term gets reinforced by the positive term while each subdiagonal term increases by the same amount in absolute value. In the case where , the forward difference formula
7 a 4 V V 3 & V & " # 3 V 3 4 V 7

can be used to achieve the same effect. Generally speaking, if depends on the space variable , the effect of weak-diagonal dominance can be achieved by simply adopting the following discretization known as an upwind scheme:
7 a 4 V V

where

The above difference scheme can be rewritten by introducing the sign function . The approximation to at is then dened by

Making use of the notation


4

a slightly more elegant formula can be obtained by expressing the approximation of the product ,

where stands for . The diagonal term in the resulting tridiagonal matrix is nonnegative, the offdiagonal terms are nonpositive, and the diagonal term is the negative sum of the offdiagonal terms. This property characterizes upwind schemes. A notable disadvantage of upwind schemes is the low order of approximation which they yield. An advantage is that upwind schemes yield linear systems that are easier to solve by iterative methods.

C B

& & &

& &

t Y

t Y

( 0  %

C   C C  B

C B 0    0  C C  C  C   B   B B B

V E& 

V E& 

4 V &

( 0 %

3 

C C B B   B B  C C B  

if if

0 6 0 2  0  6 0 0 0  5 (  V ! A 5  6 99 ! 4444 34
99@8 9
3 3 3 3

z 

7 7 3


V

&

&

 &4 

&

   
3

( 0 %

3 W

V 

B C B

"

un u  d u
B V C

 

11

12

13

Similar to the previous case, consider this simple problem, in on

where

Since the values at the boundaries are known, we number only the interior points, i.e., the points with and . The points are labeled from the bottom up, one horizontal line at a time. This labeling is called natural ordering and is shown in Figure 2.5 for the very simple case when and . The pattern of the matrix corresponding to the above equations appears in Figure 2.6.

6~ 0~ 0 ~   6 ~   6 Q 6 ~ X6  0 Q 0 ~ 0 

7 a

where is now the rectangle discretized uniformly by taking directions:


7 8 "

and its boundary. Both intervals can be points in the direction and points in the

tY t Y

& &

& &

6 ~ 111)z }r 6   ( 6 0 ~ i1)1p #P 0   s@ 0 0 6~ C 0 ~ C 6 Q p B  0 Q p B

6 1   

Natural ordering of the unknowns for a dimensional grid.

two-

 v

3 !5&  3 & # 3 3 


14 15 9 10 4 5

a V 7

n
"

 
7


66 V

7 #

u u u n
0 6V 6

7 #

AA


"


"

0 "B

To be more accurate, the matrix has the following block structure: with

The nite element method is best illustrated with the solution of a simple elliptic Partial Differential Equation in a two-dimensional space. Consider again Poissons equation (2.24) with the Dirichlet boundary condition (2.25), where is a bounded open domain in and its boundary. The Laplacean operator

appears in many models of physical and mechanical phenomena. Equations involving the more general elliptic operators (2.5) and (2.6) can be treated in the same way as Poissons equation (2.24) and (2.25), at least from the viewpoint of the numerical solutions techniques. An essential ingredient for understanding the nite element method is Greens formula. The setting for this formula is an open set whose boundary consists of a closed and smooth curve as illustrated in Figure 2.1. A vector-valued function , which is continuously differentiable in , is given. The divergence theorem in two-dimensional spaces states that

div

& &

t Y

A 5 34

9 @8

Pattern of matrix associated with the difference mesh of Figure 2.5.

nite

9 ~  h 9 

66

!  ( &%

n 
8 A
 3

 35 6

( "&"    (  (   % 

 3

 3

U   

 3

u d u
D

&'c |

The dot in the right-hand side represents a dot product of two vectors in . In this case it is between the vector and the unit vector which is normal to at the point of consideration and oriented outward. To derive Greens formula, consider a scalar function and a vector . By standard differentiation, function

which expresses

as

Integrating the above equality over

and using the divergence theorem, we obtain

The above equality can be viewed as a generalization of the standard integration by part formula in calculus. Greens formula results from (2.28) by simply taking a vector which is itself a gradient of a scalar function , namely, ,

We now return to the initial problem (2.24-2.25). To solve this problem approximately, it is necessary to (1) take approximations to the unknown function , and (2) translate the equations into a system which can be solved numerically. The options for approximating are numerous. However, the primary requirement is that these approximations should be in a (small) nite dimensional space. There are also some additional desirable numerical properties. For example, it is difcult to approximate high degree polynomials numerically. To extract systems of equations which yield the solution, it is common to use the weak formulation of the problem. Let us dene

An immediate property of the functional with respect to and , namely,


V

is that it is bilinear. That means that it is linear

& &

t Y

x 9 6 6 0 0

6 @

9 ~ 

With this, we obtain Greens formula

0 C 6  B 6 C 0  B 0 C 6  6 0  0 B 0 C 6 B 6 C 0 B 0 C 6 6 0 0 B

 

 

9 V

g9  9   

h 9   

`C Y

C  y B  B

Observe that denoted by

. Also the function

is called the normal derivative and is

2 & &

( & &

t Y

t Y

9 ~  

9 ~  C  B c

C  B    Q   }   C   % C  B  B

 Q

9   9  

9 V

V~ ~ V  ~ V     3

 

u u u n

3

3

3 

  h 9   

h9  

 

then, for functions satisfying the Dirichlet boundary conditions, which are at least twice differentiable, Greens formula (2.29) shows that

The weak formulation of the initial problem (2.24-2.25) consists of selecting a subspace of reference of and then dening the following problem:
V V

Find

such that

In order to understand the usual choices for the space , note that the denition of the weak problem only requires the dot products of the gradients of and and the functions and to be integrable. The most general under these conditions is the space of all functions whose derivatives up to the rst order are in . This is known as . However, this space does not take into account the boundary conditions. The functions in must be restricted to have zero values on . The resulting space is called . The nite element method consists of approximating the weak problem by a nitedimensional problem obtained by replacing with a subspace of functions that are dened as low-degree polynomials on small pieces (elements) of the original domain.

Finite element triangulation of a domain.

Consider a region in the plane which is triangulated as shown in Figure 2.7. In this example, the domain is simply an ellipse but the external enclosing curve is not shown. The original domain is thus approximated by the union of triangles ,

For the triangulation to be valid, these triangles must have no vertex that lies on the edge

&

t Y

B0

B 0&

  C  y B C  B

x 9C B C

"

C  B C  B

V V

0 1( 

B V

C 
V

T 

Notice that

denotes the

-inner product of

and in , i.e.,

n 

U   

C  B u d u

of any other triangle. The mesh size

is dened by diam

where diam , the diameter of a triangle , is the length of its longest side. Then the nite dimensional space is dened as the space of all functions which are piecewise linear and continuous on the polygonal region , and which vanish on the boundary . More specically,

Here, represents the restriction of the function to the subset . If are the nodes of the triangulation, then a function in can be associated with each node , so that the family of functions s satises the following conditions:
&

These conditions dene uniquely. In addition, the space . Each function of can be expressed as

The nite element approximation consists of writing the Galerkin condition (2.30) for functions in . This denes the approximate problem:

Since is in , there are degrees of freedom. By the linearity of is only necessary to impose the condition for in constraints. Writing the desired solution in the basis as


with respect to , it . This results

and substituting in (2.32) gives the linear problem




where

The above equations form a linear system of equations

in which the coefcients of are the s; those of are the Symmetric Positive Denite matrix. Indeed, it is clear that

s. In addition,

&

t Y

x 9 y  z  9z  y  H# C y y B g C z Ty B XHx 0 p  1 ( &

Find

~ 111) dTPd d Y  H C B t ~ i1)1Y ry 0  y  z 

continuous

linear

if if

s form a basis of the

such that

& &

t Y

0 1( C B & d i~1)1Y # C y B C B   C  y B C  B

C B y 1 10 ( & C B

7 8

 


" %%%" $$$

0 (  

u u u n
"

a 

is a

which means that . To see that is positive denite, rst note that for any function . If for a function in , then it must be true that almost everywhere in . Since is linear in each triangle and continuous, then it is clear that it must be constant on all . Since, in addition, it vanishes on the boundary, then it must be equal to zero on all of . The result follows by exploiting the relation with
7 V V

which is valid for any vector . Another important observation is that the matrix is also sparse. Indeed, is nonzero only when the two basis functions and have common support triangles, or equivalently when the nodes and are the vertices of a common triangle. Specically, for a given node , the coefcient will be nonzero only when the node is one of the nodes of a triangle that is adjacent to node . In practice, the matrix is built by summing up the contributions of all triangles by applying the formula

in which the sum is over all the triangles

and

Note that is zero unless the nodes and are both vertices of . Thus, a triangle contributes nonzero values to its three vertices from the above formula. The matrix

associated with the triangle with vertices is called an element stiffness matrix. In order to form the matrix , it is necessary to sum up all the contributions to the position of the matrix. This process is called an assembly process. In the assembly, the matrix is computed as

in which

is the number of elements. Each of the matrices

is of the form

where is the element matrix for the element as dened above. Also Boolean connectivity matrix which maps the coordinates of the matrix coordinates of the full matrix .

is an into the

&

 ~

t Y

Hv

(C  B

w 

w  G   Q ~ 0 1( & ! C H ' B C Ya` r`rB C C C ' ' C ' ' B A C zu z B C y z B 5 C' z B C z C BB C y C BB 3 8 ' Cy B C z Ty B 9 x 2u  y  C z Cy B

H  z h 1( % 0 0 C x B u B C 1 ( &

C z Cy B

&

C z Cy B

n 
" %%%$ $$$


 C T x B H 
! 

 

U   

u d u

Finite element mesh


6

Assembled matrix
4 4 3 2 2 1 3 5

A simple nite element mesh and the pattern of the corresponding assembled matrix.

The assembly process can be illustrated with a very simple example. Consider the nite element mesh shown in Figure 2.8. The four elements are numbered from bottom to top as indicated by the labels located at their centers. There are six nodes in this assomesh and their labeling is indicated in the circled numbers. The four matrices ciated with these elements are shown in Figure 2.9. Thus, the rst element will contribute to the nodes , the second to nodes , the third to nodes , and the fourth to nodes .

The element matrices , nite element mesh shown in Figure 2.8.

for the

In fact there are two different ways to represent and use the matrix . We can form all the element matrices one by one and then we can store them, e.g., in an rectangular array. This representation is often called the unassembled form of . Then the matrix may be assembled if it is needed. However, element stiffness matrices can also be used in different ways without having to assemble the matrix. For example, frontal techniques are direct solution methods that take the linear system in unassembled form and compute the solution by a form of Gaussian elimination. There are also iterative solution techniques which work directly with unassembled matrices. One of the main operations

  h jQ ~

1)1)

 

u u u n
6


 

uCY



required in many iterative methods is to compute , the product of the matrix an arbitrary vector . In unassembled form, this can be achieved as follows:

Thus, the product gathers the data associated with the -element into a 3-vector consistent with the ordering of the matrix . After this is done, this vector must be multiplied by . Finally, the result is added to the current vector in appropriate locations determined by the array. This sequence of operations must be done for each of the elements. A more common and somewhat more appealing technique is to perform the assembly of the matrix. All the elements are scanned one by one and the nine associated contributions , added to the corresponding positions in the global stiffness matrix. The assembled matrix must now be stored but the element matrices may be discarded. The structure of the assembled matrix depends on the ordering of the nodes. To facilitate the computations, a widely used strategy transforms all triangles into a reference triangle with vertices . The area of the triangle is then simply the determinant of the Jacobian of the transformation that allows passage from one set of axes to the other. Simple boundary conditions such as Neumann or Dirichlet do not cause any difculty. The simplest way to handle Dirichlet conditions is to include boundary values as unknowns and modify the assembled system to incorporate the boundary values. Thus, each equation associated with the boundary point in the assembled system is replaced by the equation . This yields a small identity block hidden within the linear system. For Neumann conditions, Greens formula will give rise to the equations

which will involve the Neumann data over the boundary. Since the Neumann data is typically given at some points only (the boundary nodes), linear interpolation (trapezoidal rule) or the mid-line value (midpoint rule) can be used to approximate the integral. Note that (2.36) can be viewed as the -th equation of the linear system. Another important point is that if the boundary conditions are only of Neumann type, then the resulting system is singular. An equation must be removed, or the linear system must be solved by taking this singularity into account.

Generating a nite element triangulation can be done quite easily by exploiting some initial grid and then rening the mesh a few times either uniformly or in specic areas. The simplest renement technique consists of taking the three midpoints of a triangle, thus creating four smaller triangles from a larger triangle and losing one triangle, namely, the

&

&

t Y t Y

 % 9 ~ u 9 z 9 u  

G B

(   '&%  

0 (  ! &


7 5

YB C )z B C p B

 U

u n u d   w

01( ! &

! 

!   ( " ) &%   # 

5 # 7 7

r`)

C ' B

G 

| c

by

original one. A systematic use of one level of this strategy is illustrated for the mesh in Figure 2.8, and is shown in Figure 2.10. Finite element mesh
6 16 15 4 14 4 13 3 13 11 8 2 7 1 9 5 1 7 8 6 11 2 9 3 12 12 10 10 15 5 14

Assembled matrix

The simple nite element mesh of Figure 2.8 after one level of renement and the corresponding matrix. One advantage of this approach is that it preserves the angles of the original triangulation. This is an important property since the angles on a good quality triangulation must satisfy certain bounds. On the other hand, the indiscriminate use of the uniform renement strategy may lead to some inefciencies. Indeed, it is desirable to introduce more triangles in areas where the solution is likely to have large variations. In terms of vertices, midpoints should be introduced only where needed. To obtain standard nite element triangles, the points that have been created on the edges of a triangle must be linked to existing vertices in the triangle. This is because no vertex of a triangle is allowed to lie on the edge of another triangle. Figure 2.11 shows three possible cases that can arise. The original triangle is (a). In (b), only one new vertex (numbered ) has appeared on one edge of the triangle and it is joined to the vertex opposite to it. In (c), two new vertices appear inside the original triangle. There is no alternative but to join vertices (4) and (5). However, after this is done, either vertices (4) and (3) or vertices (1) and (5) must be joined. If angles are desired that will not become too small with further renements, the second choice is clearly better in this case. In fact, various strategies for improving the quality of the triangles have been devised. The nal case (d) corresponds to the uniform renement case where all edges have been split in two. There are three new vertices and four new elements, and the larger initial element is removed.

 

u u u n

# T  D

(a)

(b)

(c)

(d)

Original triangle (a) and three possible renement scenarios.

The nite volume method is geared toward the solution of conservation laws of the form:

In the above equation, is a certain vector function of and time, possibly nonlinear. This is called the ux vector. The source term is a function of space and time. We now apply the principle used in the weak formulation, described before. Multiply both sides by a test function , and take the integral

Then integrate by part using formula (2.28) for the second term on the left-hand side to obtain

Consider now a control volume consisting, for example, of an elementary triangle in the two-dimensional case, such as those used in the nite element method. Take for a function whose value is one on the triangle and zero elsewhere. The second term in the

 & (

t Y

x9 0

x9 0

9 ~

h 9 

0 

b 

! ( &% &    (     % 

n 
9

V

c

9 V

 S

u d u
  

| c

above equation vanishes and the following relation results:

The above relation is at the basis of the nite volume approximation. To go a little further, the assumptions will be simplied slightly by taking a vector function that is linear with respect to . Specically, assume

Note that, in this case, the term in (2.37) becomes . In addition, the right-hand side and the rst term in the left-hand side of (2.38) can be approximated as follows:
V

Here, represents the volume of , and is some average value of in the cell These are crude approximations but they serve the purpose of illustrating the scheme. The nite volume equation (2.38) yields

The contour integral

is the sum of the integrals over all edges of the control volume. Let the value of on each edge be approximated by some average . In addition, denotes the length of each edge and a common notation is
V

Then the contour integral is approximated by


V

The situation in the case where the control volume is a simple triangle is depicted in Figure 2.12. The unknowns are the approximations of the function associated with each cell. These can be viewed as approximations of at the centers of gravity of each cell . This type of model is called cell-centered nite volume approximations. Other techniques based on using approximations on the vertices of the cells are known as cell-vertex nite volume techniques.

& 2

&

&

t Y t Y t Y

 @

  1

g 9 0 R 9 ~

  ) 9 ~

VB

D A 8 A A    8  !  4 ! 2  10  ( & $ !  !     796BCB%36@"976#53#)"'%#"

&

) 


V @

~ Q

0R  

 


V 6 @ 0 @

u u u n


&

 V

9 V


V

 

Finite volume cell associated with node three neighboring cells.

and

The value required in (2.40) can be taken simply as the average between the approximation of in cell and the approximation in the cell on the other side of the edge

This gives

One further simplication takes place by observing that


7

and therefore

In the above equation, the summation is over all the neighboring cells . One problem with such simple approximations is that they do not account for large gradients of in the components. In nite volume approximations, it is typical to exploit upwind schemes which are more suitable in such cases. By comparing with one-dimensional upV

  1 X &  

This yields

&

t Y

  C B &   C B X Q

p & &

( &

@ V

@ V

n 

'

c

 S

u d u

wind schemes, it can be easily seen that the suitable modication to (2.41) is as follows:
V

This gives
V

Now write

The equation for cell takes the form

where

Thus, the diagonal elements of the matrix are nonnegative, while its offdiagonal elements are nonpositive. In addition, the row-sum of the elements, i.e., the sum of all elements in the same row, is equal to zero. This is because

The matrices obtained have the same desirable property of weak diagonal dominance seen in the one-dimensional case. A disadvantage of upwind schemes, whether in the context of irregular grids or in one-dimensional equations, is the loss of accuracy due to the low order of the schemes.

 'c hf B  'c hd'c 9 9 9f

1 Derive Forward Difference formulas similar to (2.8), i.e., involving , which are of second and third order. Write down the discretization errors explicitly.

tY t Y

&

&

z & & C B & C B & p  &

z # C B z ( C B &   H  &

 

&

&

where

) 

V 3

sign

V W V 3

&

sign

& &

t Y

  C B C B &     ) C B   C B &     ) C B C B C B &   C B Q


C B
V 3

 
3 @ 3

u u u n
V 3 V 4

p 

 

) ")  

55 WW5 hf B (

2 Derive a Centered Difference formula for the rst derivative, similar to (2.13), which is at least of third order. 3 Show that the Upwind Difference scheme described in 2.2.4, when and are constant, is stable for the model problem (2.7). 4 Develop the two nine-point formulas illustrated in Figure 2.4. Find the corresponding discretization errors. [Hint: Combine of the ve-point formula (2.17) plus of the same formula based on the diagonal stencil to get one formula. Use the reverse combination , to get the other formula.] 5 Consider a (two-dimensional) rectangular mesh which is discretized as in the nite difference approximation. Show that the nite volume approximation to yields the same matrix as an upwind scheme applied to the same problem. What would be the mesh of the equivalent upwind nite difference approximation? 6 Show that the right-hand side of equation (2.16) can also be written as

10 Write a short FORTRAN or C program to perform a matrix-by-vector product when the matrix is stored in unassembled form. 11 Consider the nite element mesh of Example 2.1. Compare the number of operations required to perform a matrix-by-vector product when the matrix is in assembled and in unassembled form. Compare also the storage required in each case. For a general nite element matrix, what can the ratio be between the two in the worst case (consider only linear approximations on triangular elements) for arithmetic? Express the number of operations in terms of the number of nodes and edges of the mesh. You may make the assumption that the maximum number of elements that are adjacent to a given node is (e.g., ). with edges, and let , for , where is the 12 Let be a polygon in length of the -th edge and is the unit outward normal at the -th edge. Use the divergence . theorem to prove that

N OTES AND R EFERENCES . The material in this chapter is based on several sources. For a basic description of the nite element method, the book by C. Johnson is a good reference [128]. Axelsson and Barker [16] gives a treatment which includes various solution techniques emphasizing iterative techniques. For nite difference and nite volume methods, we recommend C. Hirsch [121], which also gives a good description of the equations and solution methods for uid ow problems.

C

P 5  5Wp5 &  P

9 Develop the equivalent of Greens formula for the elliptic operator

 

C C  C

 $

"% C 

C 

H C

"# !

8 Show that the functions

s dened by (2.31) form a basis of

. dened in (2.6).

 

7 Show that the formula (2.16) is indeed second order accurate for functions that are in

H8 8 f B  c     5c    0 EB 0 'Whf B 3EB 80 'c hf B 0 IB 2'GDf B 3EB 2'cWhf9H8  'c 5

9 5

5 9 E

E 7 F 5 B &

 

n u d   g


The natural idea to take advantage of the zeros of a matrix and their location was initiated by engineers in various disciplines. In the simplest case involving banded matrices, special techniques are straightforward to develop. Electrical engineers dealing with electrical networks in the 1960s were the rst to exploit sparsity to solve general sparse linear systems for matrices with irregular structure. The main issue, and the rst addressed by sparse matrix technology, was to devise direct solution methods for linear systems. These had to be economical, both in terms of storage and computational effort. Sparse direct solvers can handle very large problems that cannot be tackled by the usual dense solvers.

  ( !   (

{iU& |

t 4 (116C"t & & E 4 V6A8aFa4xaF4D0 `67i&gtC&V638Cg& P41'FwgT8a4705430tF 16C"D04 & 13E0`ei&9 YF $4ir054)E76' C&V6)8)2" V68 %tH8i$1616B P$T"54V04 )8`eC&7646 #s4$$ 1838e6S2F T4VF`')8V6t$2 H8 H8x16i$ T"y`eC&C60 ' $I C854jU67&s6 8%10C2S)RQTI6 zHB4$% )816)BGAEDC&16F B 3$#)8765423106 )"(& %$2 # IT"16)8E T416St" xze`e&AiFT"T4Sw0 iFEYE X$ F 54g WYaFE &GV%8301AEr0a4d1RDEF $P547054)ES)6 W3Xp$ &5416DC)E$ " 6#i$4C&16xCg' $a&& yU38$6 I PqFC6Xw`R f r6164 T"%$vg4 Cg)&1RW xg166)WF vV6%8 4 `54V610A8C"0aFxa4 4 x6T&& `6%' $54C8f )4Sq67`C&6`16" 16T"xr&A854qF1ExjC&6R V61386 )WTgT4c1'6 W rgsF)8C8a4`6& 704m 1654C"03S4 v40 i1610pT"E fP64 6 3`0D6 3aF $4 F4T" ##HEA$C9%6xE T&16 %$3E)"`'r6S i$T8i4Sj67&s8%10C2&#67T&16 t`i&EF P$54`0A'F zH8i$f 16T"j1AEr&54)ES)6 1W6 W3sFsF0 )8)8i$`6`6`'im@Em %6C2)RiFjE& g)8)816%6PITBX16fi6#3R )EH163g)8vwi&IB"e6s91076%8s&j%8" 0 10W 1"C216' %$qd`4&T"S@S(6I )RF $ wC8e61854i967S40 }5407 6%3Ey1f0 0 B%1&AE)i00(h16)W'0`e` C63 gY0$7T8i9104 i4SF0 rBpb10i44 q`d`vm"0 6 `e$TIHWP4S)'6P$54Y0 }rFg 5&%6HE#C$e Ce163E54p`0E dj'g fP63T&yCt16r&&$g v54)E%$ $SC86 E3)"54)`'S1W0 6 r664 R 16 " 3W760 s&P$%8H4%810C2Y!1RbC&F`t i&`6Y'E i$FT8$i4i4SmC00 P$T4V6)8 Hr'@P& $!1V%8AE300 16%3E9s078%a40dwweW& )8rFx544 r6 1A20`106 1"G'W W 3Wi&`0g' F %$%$32TBrV6%8dqEi&24 16F C"P$10454 l`$gHE W3%60Hf$P$54)81E7'T&6q1863p6 wo @hn&$ 0 &$

  

 

13

43

15

41

40

42

45

18

10

23

11

27

12

31

14

44

16

19

17

21

22

25

26

29

30

33

20

24

28

32

39

34

35

37

38

36

A nite element grid model.

Essentially, there are two broad types of sparse matrices: structured and unstructured. A structured matrix is one whose nonzero entries form a regular pattern, often along a small number of diagonals. Alternatively, the nonzero elements may lie in blocks (dense submatrices) of the same size, which form a regular pattern, typically along a small number of (block) diagonals. A matrix with irregularly located entries is said to be irregularly structured. The best example of a regularly structured matrix is a matrix that consists of only a few diagonals. Finite difference matrices on rectangular grids, such as the ones seen in the previous chapter, are typical examples of matrices with regular structure. Most nite element or nite volume techniques applied to complex geometries lead to irregularly structured matrices. Figure 3.2 shows a small irregularly structured sparse matrix associated with the nite element grid problem shown in Figure 3.1. The distinction between the two types of matrices may not noticeably affect direct solution techniques, and it has not received much attention in the past. However, this distinction can be important for iterative solution methods. In these methods, one of the essential operations is matrix-by-vector products. The performance of these operations can differ signicantly on high performance computers, depending on whether they are regularly structured or not. For example, on vector computers, storing the matrix by diagonals is ideal, but the more general schemes may suffer because they require indirect addressing. The next section discusses graph representations of sparse matrices. This is followed by an overview of some of the storage schemes used for sparse matrices and an explanation of how some of the simplest operations with sparse matrices can be performed. Then sparse linear system solution methods will be covered. Finally, Section 3.7 discusses test matrices.

T  D

 

u n

Sparse matrix associated with the nite element grid of Figure 3.1.

Graph theory is an ideal tool for representing the structure of sparse matrices and for this reason it plays a major role in sparse matrix techniques. For example, graph theory is the key ingredient used in unraveling parallelism in sparse Gaussian elimination or in preconditioning techniques. In the following section, graphs are discussed in general terms and then their applications to nite element or nite difference matrices are discussed.

Remember that a graph is dened by two sets, a set of vertices

This graph is often represented by a set of points in the plane linked by a directed line between the points that are connected by an edge. A graph is a way of representing a binary relation between objects of a set . For example, can represent the major cities of the world. A line is drawn between any two cities that are linked by a nonstop airline connection. Such a graph will represent the relation there is a nonstop ight from city (A) to city (B). In this particular example, the binary relation is likely to

and a set of edges

which consists of pairs

, where

are elements of

 '

    C   B  )11) 6  0 

(  #$ I ' CG' #G'

    


u d

)  ( ""')  (  

%  ' #(

| U& $ $#



, i.e.,

be symmetric, i.e., when there is a nonstop ight from (A) to (B) there is also a nonstop ight from (B) to (A). In such situations, the graph is said to be undirected, as opposed to a general graph which is directed.

4 Graphs of two

sparse matrices.

Going back to sparse matrices, the adjacency graph of a sparse matrix is a graph , whose vertices in represent the unknowns. Its edges represent the binary relations established by the equations in the following manner: There is an edge from node to node when . This edge will therefore represent the binary relation equation involves unknown . Note that the graph is directed, except when the matrix has a symmetric pattern ( iff for all ). When a matrix has a symmetric nonzero pattern, i.e., when and are always nonzero at the same time, then the graph is undirected. Thus, for undirected graphs, every edge points in both directions. As a result, undirected graphs can be represented with nonoriented edges. As an example of the use of graph models, parallelism in Gaussian elimination can be extracted by nding unknowns that are independent at a given stage of the elimination. These are unknowns which do not depend on each other according to the above binary relation. The rows corresponding to such unknowns can then be used as pivots simultaneously. Thus, in one extreme, when the matrix is diagonal, then all unknowns are independent. Conversely, when a matrix is dense, each unknown will depend on all other unknowns. Sparse matrices lie somewhere between these two extremes. There are a few interesting simple properties of adjacency graphs. The graph of can be interpreted as an -vertex graph whose edges are the pairs for which there exists at least one path of length exactly two from node to node in the original graph of . Similarly, the graph of consists of edges which represent the binary relation there is at least one path of length from node to node . For details, see Exercise 4.

T
C

r`rB

Hp

~ # a` #

7 8

 T  H p

a 7

  


'

7 8

T  D

For Partial Differential Equations involving only one physical unknown per mesh point, the adjacency graph of the matrix arising from the discretization is often the graph represented by the mesh itself. However, it is common to have several unknowns per mesh point. For example, the equations modeling uid ow may involve the two velocity components of the uid (in two dimensions) as well as energy and momentum at each mesh point. In such situations, there are two choices when labeling the unknowns. They can be labeled contiguously at each mesh point. Thus, for the example just mentioned, we can label all four variables (two velocities followed by momentum and then pressure) at a given mesh point as , , . Alternatively, all unknowns associated with one type of variable can be labeled rst (e.g., rst velocity components), followed by those associated with the second type of variables (e.g., second velocity components), etc. In either case, it is clear that there is redundant information in the graph of the adjacency matrix. The quotient graph corresponding to the physical mesh can be used instead. This results in substantial savings in storage and computation. In the uid ow example mentioned above, the storage can be reduced by a factor of almost 16 for the integer arrays needed to represent the graph. This is because the number of edges has been reduced by this much, while the number of vertices, which is usually much smaller, remains the same.

Permuting the rows or the columns, or both the rows and columns, of a sparse matrix is a common operation. In fact, reordering rows and columns is one of the most important ingredients used in parallel implementations of both direct and iterative solution techniques. This section introduces the ideas related to these reordering techniques and their relations to the adjacency graphs of the matrices. Recall the notation introduced in Chapter 1 that the -th column of a matrix is denoted by and the -th row by .

We begin with a denition and new notation.

Let be a matrix and . Then the matrices


! " ! " #   " "

a permutation of the set

are called row -permutation and column -permutation of , respectively.

IP  '
" %%%" $$$ " %%%" $$$

1( 0 ( e   % 2 0 0 1( 0 ( e   % X2 P)1)136 P 0 )

  I! "P   

6 82
" %%%" $$$ " %%%" $$$ 


) #   !  

3 

    

u d

' 

%  ' #(

! $   ( ( %  )

A A B C

p B 1)




! H RFP I#F D Q F H

5~)1)uCYY q

B V

&'U& |


C 

It is well known that any permutation of the set results from at most interchanges, i.e., elementary permutations in which only two entries have been interchanged. An interchange matrix is the identity matrix with two of its rows interchanged. Denote by such matrices, with and being the numbers of the interchanged rows. Note that in order to interchange rows and of a matrix , we only need to premultiply it by the matrix . Let be an arbitrary permutation. This permutation is the product of a sequence of consecutive interchanges . Then the , then rows of the rows of a matrix can be permuted by interchanging rows resulting matrix, etc., and nally by interchanging of the resulting matrix. Each of these operations can be achieved by a premultiplication by . The same observation can be made regarding the columns of a matrix: In order to interchange columns and of a matrix, postmultiply it by . The following proposition follows from these observations.

Let

where

Products of interchange matrices are called permutation matrices. Clearly, a permutation matrix is nothing but the identity matrix with its rows (or columns) permuted. Observe that , i.e., the square of an interchange matrix is the identity, or equivalently, the inverse of an interchange matrix is equal to itself, a property which is intuitively clear. It is easy to see that the matrices (3.1) and (3.2) satisfy

which shows that the two matrices and are nonsingular and that they are the inverse of one another. In other words, permuting the rows and the columns of a matrix, using the same permutation, actually performs a similarity transformation. Another important consequence arises because the products involved in the denitions (3.1) and (3.2) of and occur in reverse order. Since each of the elementary matrices is symmetric, the matrix is the transpose of . Therefore,

Since the inverse of the matrix is its own transpose, permutation matrices are unitary. Another way of deriving the above relationships is to express the permutation matrices and in terms of the identity matrix, whose columns or rows are permuted. It can easily be seen (See Exercise 3) that

It is then possible to verify directly that


"  "

t t

&

 0 0 1) 40 0 x 0 )1 0 0 ! 

"

"

 G 

"

"

" 2

" 

"

2  2 2

"

0 

&

" "

"

"

G 

"

G 

32

" 

"

"

"

"

"

" #

0 0

G 

changes

be a permutation resulting from the product of the inter. Then,

0 a 6 r 6 ~ i1)1Y 0 C r' 0a ' wB

"

0 )1 40 0 0 0 )1 0 0  0 2  2 1~ 11)

~ 51)1u`
X

u n n u d d d $ 
!

~ 0 P)11) 6 P ) p
0
 T

H QF F Q SRI cbP` ` Q

'

a ' rB V G S

It is important to interpret permutation operations for the linear systems to be solved. When the rows of a matrix are permuted, the order in which the equations are written is changed. On the other hand, when the columns are permuted, the unknowns are in effect relabeled, or reordered.

Consider, for example, the linear system

and

, then the (column-) permuted linear system is

Note that only the unknowns have been permuted, not the equations, and in particular, the right-hand side has not changed. In the above example, only the columns of have been permuted. Such one-sided permutations are not as common as two-sided permutations in sparse matrix techniques. In reality, this is often related to the fact that the diagonal elements in linear systems play a distinct and important role. For instance, diagonal elements are typically large in PDE applications and it may be desirable to preserve this important property in the permuted matrix. In order to do so, it is typical to apply the same permutation to both the columns and the rows of . Such operations are called symmetric permutations, and if denoted by , then the result of such symmetric permutations satises the relation

The interpretation of the symmetric permutation is quite simple. The resulting matrix corresponds to renaming, or relabeling, or reordering the unknowns and then reordering the equations in the same manner.

For the previous example, if the rows are permuted with the same permutation as the columns, the linear system obtained is

Observe that the diagonal elements are now diagonal elements from the original matrix, placed in a different order on the main diagonal.

A 6 5

9 @8 0

9 @8 0

34

34

A 7 6 9 @8 7

6 A 6 5 A 6 )0 6 66 6 0 00 9 @8 0 34 @8 9 6 0 ) 07 E66 7 0 0 5 6 6 34

9 @8 0

A 6

 G 

34 @8 7 9

6 6 6 A 7 6 7 6 07 0)00

    
where

u d

"

34

34


!

u` YY 

T   



"

From the point of view of graph theory, another important interpretation of a symmetric permutation is that it is equivalent to relabeling the vertices of the graph without altering the edges. Indeed, let be an edge in the adjacency graph of the original matrix and let be the permuted matrix. Then and a result is an edge in the adjacency graph of the permuted matrix , if and only if is an edge in the graph of the original matrix . Thus, the graph of the permuted matrix has not changed; rather, the labeling of the vertices has. In contrast, nonsymmetric permutations do not preserve the graph. In fact, they can transform an indirected graph into a directed one. Symmetric permutations may have a tremendous impact on the structure of the matrix even though the general graph of the adjacency matrix is identical.

Consider the matrix illustrated in Figure 3.4 together with its adjacency graph. Such matrices are sometimes called arrow matrices because of their shape, but it would probably be more accurate to term them star matrices because of the structure of their graphs. If the equations are reordered using the permutation , the matrix and graph shown in Figure 3.5 are obtained. Although the difference between the two graphs may seem slight, the matrices have a completely different structure, which may have a significant impact on the algorithms. As an example, if Gaussian elimination is used on the reordered matrix, no ll-in will occur, i.e., the L and U parts of the LU factorization will have the same structure as the lower and upper parts of , respectively. On the other hand, Gaussian elimination on the original matrix results in disastrous ll-ins. Specically, the L and U parts of the LU factorization are now dense matrices after the rst step of Gaussian elimination. With direct sparse matrix techniques, it is important to nd permutations of the matrix that will have the effect of reducing ll-ins during the Gaussian elimination process.

To conclude this section, it should be mentioned that two-sided nonsymmetric permutations may also arise in practice. However, they are more common in the context of direct methods.

The type of reordering, or permutations, used in applications depends on whether a direct or an iterative method is being considered. The following is a sample of such reorderings which are more useful for iterative methods.

'

C EC C C qB wB B a`rB

 '

(  #$ I ' G' #%   H  C %

15)1)z p

( B 2! ! 6 !  6

    p

"

u n n u d d d $ 

!PG ' 1 

a`rB

A B

T  

7 6 8

4 3

graph.

3 4 2

6 7

Adjacency graph and matrix obtained from above gure after permuting the nodes in reverse order.

Level-set orderings. This class of orderings contains a number of techniques that are based on traversing the graph by level sets. A level set is dened recursively as the set of all unmarked neighbors of all the nodes of a previous level set. Initially, a level set consists of one node, although strategies with several starting nodes are also important and will be considered later. As soon as a level set is traversed, its nodes are marked and numbered. They can, for example, be numbered in the order in which they are traversed. In addition, the order in which each level itself is traversed gives rise to different orderings. For instance, the nodes of a certain level can be visited in the natural order in which they are listed. The neighbors of each of these nodes are then inspected. Each time, a neighbor of a visited vertex that is not numbered is encountered, it is added to the list and labeled as

    
9 2

Pattern of a 9

u d 

T  D




9 arrow matrix and its adjacency

the next element of the next level set. This simple strategy is called Breadth First Search (BFS) traversal in graph theory. The ordering will depend on the way in which the nodes are traversed in each level set. In BFS the elements of a level set are always traversed in the natural order in which they are listed. In the Cuthill-McKee ordering the elements of a level set are traversed from the nodes of lowest degree to those of highest degree.

EndDo EndDo EndWhile

The array obtained from the procedure lists the nodes in the order in which they are visited and can, in a practical implementation, be used to store the level sets in succession. A pointer is needed to indicate where each set starts. The array thus constructed does in fact represent the permutation array dened earlier. In 1971, George [103] observed that reversing the Cuthill-McKee ordering yields a better scheme for sparse Gaussian elimination. The simplest way to understand this is to look at the two graphs produced by these orderings. The results of the standard and reversed Cuthill-McKee orderings on the sample nite element mesh problem seen earlier are shown in Figures 3.6 and 3.7, when the initial node is (relative to the labeling of the original ordering of Figure 2.10). The case of the gure, corresponds to a variant of CMK in which the traversals in Line 6, is done in a random order instead of according to the degree. A large part of the structure of the two matrices consists of little arrow submatrices, similar to the ones seen in Example 3.3. In the case of the regular CMK ordering, these arrows point upward, as in Figure 3.4, a consequence of the level set labeling. These blocks are similar the star matrices of Figure 3.4. As a result, Gaussian elimination will essentially ll in the square blocks which they span. As was indicated in Example 3.3, a remedy is to reorder the nodes backward, as is done globally in the reverse Cuthill-McKee strategy. For the reverse CMK ordering, the arrows are pointing downward, as in Figure 3.5, and Gaussian elimination yields much less ll-in.


Traverse in order of increasing degree and for each visited node Do: For each neighbor of such that Add to the set ;


 

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

Input: initial node ; Output: permutation array Start: Set ; ; Set ; While ( ) Do:

Do:

 Q  Q ~ ~ s C ~ B  Q C rB ' ' C wB ' ' Q ~  Q ~ C 0 s ) C 0 '  ' QB ~ 0 y0 r B  Q

 B $  " !6 B  9%  &  T B  " %

u n n u d d d $ 

q G

15

13

14

11

11

    
12 10 9 6 4 2 1 3

Cuthill-McKee ordering.

10

12

14

15

13

Reverse Cuthill-McKee ordering.

u d

T  D


 D

The choice of the initial node in the CMK and RCMK orderings may be important. Referring to the original ordering of Figure 2.10, the previous illustration used . However, it is clearly a poor choice if matrices with small bandwidth or prole are desired. If is selected instead, then the reverse Cuthill-McKee algorithm produces the matrix in Figure 3.8, which is more suitable for banded or skyline solvers.

10

12

11

13

14

15

Reverse Cuthill-McKee ordering starting with

Independent set orderings. The matrices that arise in the model nite element problems seen in Figures 2.7, 2.10, and 3.2 are all characterized by an upper-left block that is diagonal, i.e., they have the structure

in which is diagonal and , and are sparse matrices. The upper-diagonal block corresponds to unknowns from the previous levels of renement and its presence is due to the ordering of the equations in use. As new vertices are created in the rened grid, they are given new numbers and the initial numbering of the vertices is unchanged. Since the old connected vertices are cut by new ones, they are no longer related by equations. Sets such as these are called independent sets. Independent sets are especially useful in parallel computing, for implementing both direct and iterative methods. Referring to the adjacency graph of the matrix, and denoting by the edge from vertex to vertex , an independent set is a subset of the vertex set such that

if

then

or

xB

 "

yB

u n n u d d d $ 
C

B x7

T 

0
D

 T 

To explain this in words: Elements of are not allowed to be connected to other elements of either by incoming or outgoing edges. An independent set is maximal if it cannot be augmented by elements in its complement to form a larger independent set. Note that a maximal independent set is by no means the largest possible independent set that can be found. In fact, nding the independent set of maximum cardinal is -hard [132]. In the following, the term independent set always refers to maximal independent set. There are a number of simple and inexpensive heuristics for nding large maximal independent sets. A greedy heuristic traverses the nodes in a given order, and if a node is not already marked, it selects the node as a new member of . Then this node is marked along with its nearest neighbors. Here, a nearest neighbor of a node means any node linked to by an incoming or an outgoing edge.

1. Set . Do: 2. For 3. If node is not marked then 4. 5. Mark and all its nearest neighbors 6. EndIf 7. EndDo

In the above algorithm, the nodes are traversed in the natural order , but they can also be traversed in any permutation of . Since the size of the reduced system is , it is reasonable to try to maximize the size of in order to obtain a small reduced system. It is possible to give a rough idea of the size of . Assume that the maximum degree of each node does not exceed . Whenever the above algorithm accepts a node as a new member of , it potentially puts all its nearest neighbors, i.e., at most nodes, in the complement of . Therefore, if is the size of , the size of its complement, , is such that , and as a result,

This lower bound can be improved slightly by replacing with the maximum degree all the vertices that constitute . This results in the inequality

of

which suggests that it may be a good idea to rst visit the nodes with smaller degrees. In fact, this observation leads to a general heuristic regarding a good order of traversal. The algorithm can be viewed as follows: Each time a node is visited, remove it and its nearest neighbors from the graph, and then visit a node from the remaining graph. Continue in the same manner until all nodes are exhausted. Every node that is visited is a member of and its nearest neighbors are members of . As result, if is the degree of the node visited at step , adjusted for all the edge deletions resulting from the previous visitation steps, then the number of nodes that are left at step satises the relation

 ~ i)1)uCY ~ 51)1u` P1)1) 0 )

 

Y

 )

3 B

~ (

~ (

   8 !8 ! "% !&

    

&

~ g ~

u d


Q

DB $

B # & T

 ~

5   i~)11zC Yy } V G


3 3

QW

The process adds a new element to the set at each step and stops when . In order to maximize , the number of steps in the procedure must be maximized. The difculty in the analysis arises from the fact that the degrees are updated at each step because of the removal of the edges associated with the removed nodes. If the process is to be lengthened, a rule of thumb would be to visit the nodes that have the smallest degrees rst.

1. Set . Find an ordering of the nodes by increasing degree. 2. For , Do: 3. If node is not marked then 4. 5. Mark and all its nearest neighbors 6. EndIf 7. EndDo
! 

A renement to the above algorithm would be to update the degrees of all nodes involved in a removal, and dynamically select the one with the smallest degree as the next node to be visited. This can be implemented efciently using a min-heap data structure. A different heuristic is to attempt to maximize the number of elements in by a form of local optimization which determines the order of traversal dynamically. In the following, removing a vertex from a graph means deleting the vertex and all edges incident to/from this vertex. The algorithms described in this section were tested on the same example used before, namely, the nite element mesh problem of Figure 2.10. Here, all strategies used yield the initial independent set in the matrix itself, which corresponds to the nodes of all the previous levels of renement. This may well be optimal in this case, i.e., a larger independent set may not exist. Multicolor orderings. Graph coloring is a familiar problem in computer science which refers to the process of labeling (coloring) the nodes of a graph in such a way that no two adjacent nodes have the same label (color). The goal of graph coloring is to obtain a colored graph which uses the smallest possible number of colors. However, optimality in the context of numerical linear algebra is a secondary issue and simple heuristics do provide adequate colorings. Basic methods for obtaining a multicoloring of an arbitrary grid are quite simple. They rely on greedy techniques, a simple version of which is as follows.

Color

wCEC B b C VB

 

  ( 

1. For 2. For 3. Set Color 4. EndDo

Do: set Do:

~


 )

   !8  B 

  ! "% 8

 "


7 8

8 D " % 8

B  !  "  B G  B B 

P)11) 0

   C rB ' Q

u n n u d d d $ 
%&DDB # $ B 

5~ )~ 1i))rBu1 1CYd Yd q G

%   ~)1)uC Yy G q G




F F


Q Q

 

Here, Adj represents the set of nodes that are adjacent to node . The color assigned to node in line 3 is the smallest allowable color number which can be assigned to node . Here, allowable means different from the colors of the nearest neighbors and positive. This procedure is illustrated in Figure 3.9. The node being colored in the gure is indicated by an arrow. It will be assigned color number 3, the smallest positive integer different from 1, 2, 4, 5.

1 0

2 4 5

The greedy multicoloring algorithm.

has been arbitrarily selected for traversing In the above algorithm, the order the nodes and coloring them. Instead, the nodes can be traversed in any order , , . If a graph is bipartite, i.e., if it can be colored with two colors, then the algorithm will nd the optimal two-color (Red-Black) ordering for Breadth-First traversals. In addition, if a graph is bipartite, it is easy to show that the algorithm will nd two colors for any traversal which, at a given step, visits an unmarked node that is adjacent to at least one visited node. In general, the number of colors needed does not exceed the maximum degree of each node +1. These properties are the subject of Exercises 9 and 8.

Figure 3.10 illustrates the algorithm for the same example used earlier, i.e., the nite element mesh problem of Figure 2.10. The dashed lines separate the different color sets found. Four colors are found in this example.

Once the colors have been found, the matrix can be permuted to have a block structure in which the diagonal blocks are diagonal. Alternatively, the color sets , , and the permutation array in the algorithms can be used.

111 6 0 %

  1)  0 1 X

~ 5)11zCY

    

u d




T   

wB

15

12

11

14

10

13

Graph and matrix corresponding to mesh of Figure 2.10 after multicolor ordering.

Remember that a path in a graph is a sequence of vertices , which are such that is an edge for . Also, a graph is said to be connected if there is a path between any pair of two vertices in . A connected component in a graph is a maximal subset of vertices which all can be connected to one another by paths in the graph. Now consider matrices whose graphs may be directed. A matrix is reducible if its graph is not connected, and irreducible otherwise. When a matrix is reducible, then it can be permuted by means of symmetric permutations into a block upper triangular matrix of the form

where each partition corresponds to a connected component. It is clear that linear systems with the above matrix can be solved through a sequence of subsystems with the matrices , .

..

. . .

'

 )1)1A6  0 

99 @8

1111

# F#

1      B

y i)1) }


3 

60

66

u n n u d d d $ 
 

00

344

T  D

)i1)1 # q

0 4

  B

In order to take advantage of the large number of zero elements, special schemes are required to store sparse matrices. The main goal is to represent only the nonzero elements, and to be able to perform the common matrix operations. In the following, denotes the total number of nonzero elements. Only the most popular schemes are covered here, but additional details can be found in books such as Duff, Erisman, and Reid [77]. The simplest storage scheme for sparse matrices is the so-called coordinate format. The data structure consists of three arrays: (1) a real array containing all the real (or complex) values of the nonzero elements of in any order; (2) an integer array containing their row indices; and (3) a second integer array containing their column indices. All three arrays are of length , the number of nonzero elements. The matrix

will be represented (for example) by AA JR JC 12. 9. 5 5 3 5 7. 3 3 5. 2 4 1. 1 1

2. 11. 3. 1 4 4 4 2 1

6. 3 1

4. 2 2

8. 10. 3 4 4 3

In the above example, the elements are listed in an arbitrary order. In fact, they are usually listed by row or columns. If the elements were listed by row, the array which contains redundant information might be replaced by an array which points to the beginning of each row instead. This would involve nonnegligible savings in storage. The new data structure has three arrays with the following functions:


An integer array contains the column indices of the elements the array . The length of is Nz.

An integer array contains the pointers to the beginning of each row in the arrays and . Thus, the content of is the position in arrays and where the -th row starts. The length of is with containing the number , i.e., the address in and of the beginning of a ctitious row number .

A real array The length of

contains the real values is .

stored row by row, from row 1 to . as stored in

C ( D~ B

99 8 7

u% A z z z z

Yz pp z p u p

Hz

DC ~ rB

    

p p p p

u d
 7 7 7 7

z z z 5 3444

) &% $ )  '  ( )   #

~ C PB ( (

X(


 

  

| U&


Thus, the above matrix may be stored as follows: AA JA IA 1. 1 1 2. 4 3 3. 1 6 4. 2 5. 4 6. 1 7. 3 8. 4 9. 10. 11. 12. 5 3 4 5

10 12 13

This format is probably the most popular for storing general sparse matrices. It is called the Compressed Sparse Row (CSR) format. This scheme is preferred over the coordinate scheme because it is often more useful for performing typical computations. On the other hand, the coordinate scheme is advantageous for its simplicity and its exibility. It is often used as an entry format in sparse matrix software packages. There are a number of variations for the Compressed Sparse Row format. The most obvious variation is storing the columns instead of the rows. The corresponding scheme is known as the Compressed Sparse Column (CSC) scheme. Another common variation exploits the fact that the diagonal elements of many matrices are all usually nonzero and/or that they are accessed more often than the rest of the elements. As a result, they can be stored separately. The Modied Sparse Row (MSR) format has only two arrays: a real array and an integer array . The rst positions in contain the diagonal elements of the matrix in order. The position of the array is not used, but may sometimes be used to carry other information concerning the matrix. Starting at position , the nonzero elements of , excluding its diagonal elements, are stored by row. For each element , the integer represents its column index on the matrix. The rst positions of contain the pointer to the beginning of each row in and . Thus, for the above example, the two arrays will be as follows: AA JA 1. 4. 8 7. 11. 12. * 2. 4 3. 1 5. 4 6. 1 8. 4 9. 10. 5 3

10 13 14 14

The star denotes an unused location. Notice that , indicating that the last row is a zero row, once the diagonal element has been removed. Diagonally structured matrices are matrices whose nonzero elements are located along a small number of diagonals. These diagonals can be stored in a rectangular array , where is the number of diagonals. The offsets of each of the diagonals with respect to the main diagonal must be known. These will be stored in an array . Thus, the element of the original matrix is located in position of the array , i.e.,

The order in which the diagonals are stored in the columns of is generally unimportant, though if several more operations are performed with the main diagonal, storing it in the rst column may be slightly advantageous. Note also that all the diagonals except the main diagonal have fewer than elements, so there are positions in that will not be used.

~ ~

C ~B C ~B

'

B

(   310 ) C rCwB

4 "

  210 ) (

C (

B

4 "

 

  

c~ h~

!    

!'&$%##   $ "

a`rB

For example, the following matrix which has three diagonals

will be represented by the two arrays * 3. 6. 9. 11 1. 4. 7. 10. 12. 2. 5. 8. * *

A more general scheme which is popular on vector machines is the so-called EllpackItpack format. The assumption in this scheme is that there are at most nonzero elements per row, where is small. Then two rectangular arrays of dimension each are required (one real and one integer). The rst, , is similar to and contains the nonzero elements of . The nonzero elements of each row of the matrix can be stored in a row of the array , completing the row by zeros as necessary. Together with , an integer array must be stored which contains the column positions of each entry in .

Thus, for the matrix of the previous example, the Ellpack-Itpack storage

scheme is 1. 3. 6. 9. 11 2. 4. 7. 10. 12. 0. 5. 8. 0. 0. 1 1 2 3 4 3 2 3 4 5 1 4 5 . 4 5

A certain column number must be chosen for each of the zero elements that must be added to pad the shorter rows of , i.e., rows 1, 4, and 5. In this example, those integers are selected to be equal to the row numbers, as can be seen in the array. This is somewhat arbitrary, and in fact, any integer between and would be acceptable. However, there may be good reasons for not inserting the same integers too often, e.g. a constant number, for performance considerations.

$ %"

$ %"

 

 !

$ %"

$ ## $ "

= -1 0 2 .

99 @8 7

ue p A z z p z z z p z z z
7

$ " ! ''&%"    $ ! %"       $

    

z z z p z
7 7 7

u d

p p p 5 Y 3444
7 7 7

%

$%"

T    T

$%"



The matrix-by-vector product is an important operation which is required in most of the iterative solution algorithms for solving sparse linear systems. This section shows how these can be implemented for a small subset of the storage schemes considered earlier. The following Fortran 90 segment shows the main loop of the matrix-by-vector operation for matrices stored in the Compressed Sparse Row stored format.

Notice that each iteration of the loop computes a different component of the resulting vector. This is advantageous because each of these components can be computed independently. If the matrix is stored by columns, then the following code could be used instead:
 )) " 

In each iteration of the loop, a multiple of the -th column is added to the result, which is assumed to have been initially set to zero. Notice now that the outer loop is no longer parallelizable. An alternative to improve parallelization is to try to split the vector operation in each inner loop. The inner loop has few operations, in general, so this is unlikely to be a sound approach. This comparison demonstrates that data structures may have to change to improve performance when dealing with high performance computers. Now consider the matrix-by-vector product in diagonal storage.
%  )) " $$%#" $  )) 

Here, each of the diagonals is multiplied by the vector and the result added to the vector . It is again assumed that the vector has been lled with zeros at the start of the loop. From the point of view of parallelization and/or vectorization, the above code is probably the better to use. On the other hand, it is not general enough. Solving a lower or upper triangular system is another important kernel in sparse matrix computations. The following segment of code shows a simple routine for solving a unit lower triangular system for the CSR storage format.

 # ) &

   & &

% %#" &!"  % $ $  # 

)   ( 

 "!%

 


" %%% ) &3&  " " %   ! )& &  )) " 

  #

$


 

( % "' ) )  )
%   &#  ! ) &

u d

&%#" $ $

 

&

   " %%% " %%

u p

" %%%

 

| c&

"

At each step, the inner product of the current solution with the -th row is computed and subtracted from . This gives the value of . The function computes the dot product of two arbitrary vectors and . The vector is the -th row of the matrix in sparse format and is the vector of the components of gathered into a short vector which is consistent with the column indices of the elements in the row .


Most direct methods for sparse linear systems perform an LU factorization of the original matrix and try to reduce cost by minimizing ll-ins, i.e., nonzero elements introduced during the elimination process in positions which were initially zeros. The data structures employed are rather complicated. The early codes relied heavily on linked lists which are convenient for inserting new nonzero elements. Linked-list data structures were dropped in favor of other more dynamic schemes that leave some initial elbow room in each row for the insertions, and then adjust the structure as more ll-ins are introduced. A typical sparse direct solution solver for positive denite matrices consists of four phases. First, preordering is applied to minimizing ll-in. Two popular methods are used: minimal degree ordering and nested-dissection ordering. Second, a symbolic factorization is performed. This means that the factorization is processed only symbolically, i.e., without numerical values. Third, the numerical factorization, in which the actual factors and are formed, is processed. Finally, the forward and backward triangular sweeps are executed for each different right-hand side. In a code where numerical pivoting is necessary, the symbolic phase cannot be separated from the numerical factorization.

For comparison purposes it is important to use a common set of test matrices that represent a wide spectrum of applications. There are two distinct ways of providing such data sets. The rst approach is to collect sparse matrices in a well-specied standard format from various applications. This approach is used in the Harwell-Boeing collection of test matrices. The second approach is to generate these matrices with a few sample programs such

 

%%  & )  3   "  " #%    ! ) 

 

 &  

# &

)  ! ( &%

w1 B

  

    

u d

  (   ) (  ! '" )  )

 

) %  

 C

wB

 "

!%

 

" %% &

"#

| U&

| U&

( )  (

"

as those provided in the SPARSKIT library [179]. The coming chapters will use examples from these two sources. In particular, ve test problems will be emphasized for their varying degrees of difculty. The SPARSKIT package can generate matrices arising from the discretization of the two- or three-dimensional Partial Differential Equations


on rectangular regions with general mixed-type boundary conditions. In the test problems, the regions are the square , or the cube ; the Dirichlet condition is always used on the boundary. Only the discretized matrix is of importance, since the right-hand side will be created articially. Therefore, the right-hand side, , is not relevant.

Physical domain and coefcients for Problem 1.

Problem 1: F2DA. In the rst test problem which will be labeled F2DA, the domain is two-dimensional, with

and


where the constant is equal to 10. The domain and coefcients for this problem are shown is Figure 3.11. If the number of points in each direction is 34, then there are


~ ~ t p p C x B C g B C B C x 7 C B Y C x B C x B

7 8 7

C )z B C C

7 #

V B V B V 9B 3 3 V V

 

C g B C g B

( "

 '(0  

"

"

&

" 

)z B

 "

 U    

T  D

C g B9

 

interior points in each direction and a matrix of size is obtained. In this test example, as well as the other ones described below, the right-hand side is generated as


Problem 2: F2DB. The second test problem is similar to the previous one but involves discontinuous coefcient functions and . Here, and the functions are also dened by (3.4). However, the functions and now both take the value 1,000 inside the subsquare of width centered at ( ), and one elsewhere in the domain, i.e.,

. In this case,

The constant is taken to be equal to 10.0 as before. The Harwell-Boeing collection is a large data set consisting of test matrices which have been contributed by researchers and engineers from many different disciplines. These have often been used for test purposes in the literature [78]. The collection provides a data structure which constitutes an excellent medium for exchanging matrices. The matrices are stored as ASCII les with a very specic format consisting of a four- or ve-line header. Then, the data containing the matrix is stored in CSC format together with any righthand sides, initial guesses, or exact solutions when available. The SPARSKIT library also provides routines for reading and generating matrices in this format. Only one matrix from the collection was selected for testing the algorithms described in the coming chapters. The matrices in the last two test examples are both irregularly structured. Problem 4: ORS The matrix selected from the Harwell-Boeing collection is ORSIRR1. This matrix arises from a reservoir engineering problem. Its size is and it has a total of 6,858 nonzero elements. The original problem is based on a irregular grid. In this case and the next one, the matrices are preprocessed by scaling their rows and columns.

 u  z ~

pp C x B C x B

7 8 7

and

Problem 3: F3D. The third test problem is three-dimensional with internal mesh points in each direction leading to a problem of size we take

~ ~ ~ ~  g   $  0 &  C x B C x B 6 6 6 0 0 0 ~ ~

C ygB C gB 9 C x B C g B C g B

&

)%

) 1

 "

"

 

"

in which random values.

. The initial guess is always taken to be a vector of pseudo-

Y7

~ 6 l ~ ~

    

u d

1i)1)1YB q

y p9

Problem 5: FID This test matrix is extracted from the well known uid ow simulation package FIDAP [84]. It is actually the test example number 36 from this package and features a two-dimensional Chemical Vapor Deposition in a Horizontal Reactor. The matrix has a size of and has nonzero elements. It has a symmetric pattern and few diagonally dominant rows or columns. The rows and columns are prescaled in the same way as in the previous example. Figure 3.12 shows the patterns of the matrices ORS and FID.

Patterns of the matrices ORS (left) and FID

(right).

1 Consider the mesh of a discretized PDE. In which situations is the graph representing this mesh the same as the adjacency graph of the matrix? Give examples from both Finite Difference and Finite Element discretizations.

3 Consider the matrix dened as

Show directly (without using Proposition 3.1 or interchange matrices) that the following three relations hold

G 5 F % T 4 f T f  G G  G

5 G  f

)%66 

2 Let

and

be two sparse (square) matrices of the same dimension. How can the graph of be characterized with respect to the graphs of and ?

 

7


) ")   

n u d   g ~
D


where a

represents an arbitrary nonzero element.

Show the adjacency graphs of the matrices , , , and . (Assume that there are no numerical cancellations in computing the products and ). Since there are zero diagonal elements, represent explicitly the cycles corresponding to the edges when they are present.

Consider the particular case in which . Give an interpretation of an edge in the graph of in terms of paths of length two in the graph of . The paths must take into account the cycles corresponding to nonzero diagonal elements of . Now consider the case where . Give an interpretation of an edge in the graph of in terms of paths of length three in the graph of . Generalize the result to arbitrary powers of .

Show the adjacency graph of

Consider the permutation . Show the adjacency graph and new pattern for the matrix obtained from a symmetric permutation of based on the permutation array .

6 Consider a matrix which has the pattern

Consider the permutation . Show the adjacency graph and new pattern for the matrix obtained from a symmetric permutation of based on the permutation array . Show the adjacency graph and new pattern for the matrix obtained from a reverse CuthillMcKee ordering of starting with the node 1. (Assume the vertices adjacent to a given vertex are always listed in increasing order in the data structure that describes the graph.)

2 ( &

   

Show the adjacency graph of

. (Place the 8 vertices on a circle.)

A 5  5 9 4 A 99 44 99 344444 999@8    ( 2  &  5 A  34444

A 5 99

5 Consider a

matrix which has the pattern

9@8 9

%

g  6

Consider the matrix of edges in the graph of

. Give an interpretation of an edge in the graph of and . Verify this answer using the above matrices.

f c

" " " 99 8 44v"


99 "

" 4" " 4" " " " 5 A 44v" " "" " " " 44v" " " "" 34444

" 99 8 v"
99

" " " " 4v4" 5 A " 4" " " " v"4" " " " 44""" 34444 " " " " 44v4"

4 Consider the two matrices

    
in terms

u d

 %

8 %66





7 Given a ve-point nite difference graph, show that the greedy algorithm will always nd a coloring of the graph with two colors. 8 Prove that the total number of colors found by the greedy multicoloring algorithm does not exceed , where is the maximum degree of all the vertices of a graph (not counting the cycles associated with diagonal elements). 9 Consider a graph that is bipartite, i.e., 2-colorable. Assume that the vertices of the graph are colored by a variant of Algorithm (3.4), in which the nodes are traversed in a certain order . Consider now a permutation satisfying the following property: for each at least one of the is adjacent to . Show that the algorithm will nd a 2-coloring of the nodes graph. Among the following traversals indicate which ones satisfy the property of the previous = any question: (1) Breadth-First Search, (2) random traversal, (3) traversal dened by node adjacent to .

10 Given a matrix that is irreducible and with a symmetric pattern, show that its structural inverse is dense. Structural inverse means the pattern of the inverse, regardless of the values, or otherwise stated, is the union of all patterns of the inverses for all possible values. [Hint: Use Cayley Hamiltons theorem and a well known result on powers of adjacency matrices mentioned at the end of Section 3.2.1.] 11 The most economical storage scheme in terms of memory usage is the following variation on the in a real array and the corresponding coordinate format: Store all nonzero values linear array address in an integer array . The order in which these corresponding entries are stored is unimportant as long as they are both in the same position in their respective arrays. What are the advantages and disadvantages of this data structure? Write a short routine for performing a matrix-by-vector product in this format. 12 Write a FORTRAN code segment to perform the matrix-by-vector product for matrices stored in Ellpack-Itpack format. 13 Write a small subroutine to perform the following operations on a sparse matrix in coordinate format, diagonal format, and CSR format:

Count the number of nonzero elements in the main diagonal; Extract the diagonal whose offset is ;

Add a given diagonal to the matrix. What is the most convenient storage scheme for each of these operations?

14 Linked lists is another popular scheme often used for storing sparse matrices. These allow to link together data items (e.g., elements of a given row) in a large linear array. A starting position is given in the array which contains the rst element of the set. Then, a link to the next element in the array is provided from a LINK array. Show how to implement this scheme. A linked list is to be used for each row. What are the main advantages and disadvantages of linked lists?

f P c

Add a nonzero element in position zero or a nonzero element);

of the matrix (this position may initially contain a

  IC     & &

C  5W5W5 H

Is it true that for any permutation

C 7

P   f #0 c  &

Find a multicolor ordering for 2, etc.).

(give the vertex labels color 1, followed by those for color

the number of colors found will be two?

H F C WW5 H 55 5

H FC

 

f c & 

n u d   g
 5 p5 5 5 H

 

Write an algorithm to perform a matrix-by-vector product in this format.

N OTES AND R EFERENCES . Two good references on sparse matrix computations are the book by George and Liu [104] and the more recent volume by Duff, Erisman, and Reid [77]. These are geared toward direct solution methods and the rst specializes in symmetric positive denite problems. Also of interest are [157] and [163] and the early survey by Duff [76]. Sparse matrix techniques have traditionally been associated with direct solution methods. Clearly, this is now changing rapidly since the sophistication of iterative solution packages is starting to equal that of direct solvers. The SPARSKIT library, a package for sparse matrix computations [179] is currently in its second version and is available through anonymous FTP ). Another available software package ( which emphasizes object-oriented design with the goal of hiding complex data structures from users is PETSc [19]. A manipulation package for sparse matrices, similar to SPARSKIT in spirit, is SMMS developed by Alvarado [6]. The idea of the greedy multicoloring algorithm is known in Finite Element techniques (to color elements); see, e.g., Benantar and Flaherty [23]. Wu [229] presents the greedy algorithm for multicoloring vertices and uses it for SOR type iterations, see also [182]. The effect of multicoloring has been extensively studied by Adams [2, 3] and Poole and Ortega [164]. Interesting results regarding multicoloring in the context of nite elements based on quad-tree structures have been obtained by Benantar and Flaherty [23] who show, in particular, that with this structure a maximum of six colors is required.

8 756 25)3 "))""1 )(("%#"  ' 4 2 0 1 0  0 $  $ '  & $ !     

    

u d




V}k

Equation (4.1) is a linear system, is the coefcient matrix, is the right-hand side vector, and is the vector of unknowns. Most of the methods covered in this chapter involve passing from one iterate to the next by modifying one or a few components of an approximate vector solution at a time. This is natural since there are simple criteria when modifying a component in order to improve an iterate. One example is to annihilate some component(s) of the residual vector . The convergence of these methods is rarely guaranteed for all matrices, but a large body of theory exists for the case where the coefcient matrix arises from the nite difference discretization of Elliptic Partial Differential Equations. This chapter begins by reviewing the basic iterative methods for solving linear systems. Given an real matrix and a real -vector , the problem considered is: Find belonging to such that

Rwi

t

R 7vv7ffARVVc~lC irBP pbqrXn n d8F rHrp tprHGFQ8rBaD`RXW6WFAVd8D98W6w9fRDqX rHpFP HBr9PyQ8%B98yRDPeQ8BryX rHpFePu H rx8ccQ8rBP b D P Y 8 U F X s T 6 P s pnnP f P Q8%B986is%B989TXQ8VvwiGbGfGDDe8eu9ubVd8F rH~8rPezWx986CseDE%BI8pF`9P%6%BI8RFP tseYr8tH%BduD8Y F BX p D b s u F n u p H aD`RX6WFAcpF9s8 RHgWth8ym98W6l6F 7geY98tsHrtUWXug9s98As%B98rTRiwpix %pd8pFQPrB9P%nd8D~eYd8DRb Y 8 U u 8 BX U F H 6 8 X x %pQ8rBhPQ8rBD98 rH96eud8ld8D98W6{cpiIXrFue98``9PY SHDQ8~98W6%fyIDpF%s98qrXtWXuWx8gP B P b s F Fs X B Tp b B F X s n U f Bf98cRutsrHpFP tprHt6rH7pF~eYAtUrH7D {esRD%nI8GFtsX SHRF%PoeP %pQ8B}Y8 p PeqesRDX SHRFP@ q`AgI8D986F X sX ssP P 8 P H D p u s u HYX U f%V%69uqw{eY9896%uePQ8fD me89u9sa8Eu%B98pTWXautpSHpFws%BQYttsHrPdtF%B8emVtsz9Y`ccwUrHlFRyWx8f X P|i B H s sb 8 BX u P H sP 8 F P P B nH PV U sq s pqkbP IXAy98DX d8qpFs rXc`wUsP qaHs X porHEXpF%BAtUnP `aHn mQXeoP %s%B98Tn `98lW6P 6wFF 9f7vVIDXH RupFws9srH98ss ts%X`V8unH kqjU kXiyShu 986 cFr fQx q%Hg `yWXY f`RXYaDd `WFW6 VAtU8 d8WD r986 erF`XHs rXF p u eA eYd8RD9Pgh89By%UI8GFRDGxCrBeP98wsH Vr8QurBP vRutsrHT qIXihXfgYe8dRDcaD`RXW6WFAV98T SHRFQP%BI8F AGFEDCA9875 8D D p p p DB b Y 8U H B@ 6

 2 0  431) ( ' &%$ "#"     !   

We begin with the decomposition

in which is the diagonal of , its strict lower part, and its strict upper part, as illustrated in Figure 4.1. It is always assumed that the diagonal entries of are all nonzero.

-F D -E

Initial partitioning of matrix A.

The Jacobi iteration determines the -th component of the next approximation so as to annihilate the -th component of the residual vector. In the following, denotes the -th component of the iterate and the -th component of the right-hand side . Thus, writing

or

This is a component-wise form of the Jacobi iteration. All components of the next iterate can be grouped into the vector . The above notation can be used to rewrite the Jacobi iteration (4.4) in vector form as

Similarly, the Gauss-Seidel iteration corrects the -th component of the current approximate solution, in the order , again to annihilate the -th component of the residual. However, this time the approximate solution is updated immediately after the new component is determined. The newly computed components , can be changed within a working vector which is redened at each relaxation step. Thus, since

r i

rQ(( y E (

( V(

P H GR VIF

W # w

sth i ph ( R S xG u F vR c rg PH w VI u q gf

vE

sth i h PH pF vR c rg QIG u u q gf

H X

' w % W ` #

rV(( y (

W eP"QUIHG R F dRR c

RS

W U "QH

AE W QUH

R dR y c

in which

represents the -th component of the vector , yields

r0 i ra i

P H GR QIF

' 1

 "!

( Y ` R

()& & ' %

5| |  5 | qz y C5  $ #
X W U "VH E

tA

R S

% 1

CA 9 76 5 3 DB@84 2

   H

rP"QUDHGpR F W

|y5{
E RX b # T

which leads to the iteration,

The dening equation (4.6) can be written as

which leads immediately to the vector form of the Gauss-Seidel iteration

Computing the new approximation in (4.5) requires multiplying by the inverse of the diagonal matrix . In (4.8) a triangular system must be solved with , the lower triangular part of . Thus, the new approximation in a Gauss-Seidel step can be determined either by solving a triangular system with the matrix or from the relation (4.7). A backward Gauss-Seidel iteration can also be dened as

which is equivalent to making the coordinate corrections in the order .A Symmetric Gauss-Seidel Iteration consists of a forward sweep followed by a backward sweep. The Jacobi and the Gauss-Seidel iterations are both of the form

in which

is a splitting of , with for Jacobi, for forward Gauss-Seidel, and for backward Gauss-Seidel. An iterative method of the form (4.10) can be dened for any splitting of the form (4.11) where is nonsingular. Overrelaxation is based on the splitting

and the corresponding Successive Over Relaxation (SOR) method is given by the recursion

The above iteration corresponds to the relaxation sequence

in which is dened by the expression in the right-hand side of (4.7). A backward SOR sweep can be dened analogously to the backward Gauss-Seidel sweep (4.9).

a i a i a i a i a!wi R%wi a0 wi
y ( ( y % $

( (

( W

PH ( Y $ QIG u F u R c

( ( ( Q(

# w

E "(

( w

( Y

# # X

PH w VIG u R S xpF u R c

&'# # X

% &

( w

H '

W "U

H $ %

R  u  u W W f eP"QUIHG R F dRR c ePQUIHG u F vuR c W f R S

E e( PQHDG R F # X

w ' #

  "

W U "VH

W "U

% ' W &

$ w ' #%

f w H  

QH W U

Ru   u W f rP"QUDHpuG F u R c W Rf

& "VUH ~% w # W

'

% #

0 (R w 1)F #

W U "QH

W "UQH

% #

W rP"VUIHG R F

W U "QH

#

( ( T R IR y c

the order is

, the result at the -th step is

 !p

s| | q  q k `tsjt V
#

'

PeW"QUHIGpR F

vE

##

01( R F

A Symmetric SOR (SSOR) step consists of the SOR step (4.12) followed by a backward SOR step,

This gives the recurrence

Block relaxation schemes are generalizations of the point relaxation schemes described above. They update a whole set of components at each time, typically a subvector of the solution vector, instead of only one component. The matrix and the right-hand side and solution vectors are partitioned as follows:

. . .

. . .

. . .

..

. . .

. . .

. . .

in which the partitionings of and into subvectors and are identical and compatible with the partitioning of . Thus, for any vector partitioned as in (4.15),

in which denotes the -th component of the vector according to the above partitioning. The diagonal blocks in are square and assumed nonsingular. Now dene, similarly to the scalar case, the splitting

rQwi

( R S W S S

F 5 I 5 H ) F E 'B A 8 @ 8 & 5 3 1 ) ' QP2GDC726976420(& %

c7

( R F W F F

% #

RF

( u F vR u

R S

'

#W

& ` % #

X ( `R X

u  X f

' #

XbX

X YW

R X t

# X #

TWVUR TT R TWVUR TT T TT R WVUSW

T TT T T WVaVWT

g #

can be rewritten as

% #

( W % # X # &'# # X

g T

g E

% #

! ! $#" 

# & # # X

WX QR W

W W

Observing that

wi rQawi

 W

% #

# & ( T # X

# w  X # # $ w % X # T T y # w % #$ X # y Tw ' # yW X # # XT X # T T

 w W X ' # % # # w % # W ' #

XT

# # T #

where

W U "QH # w H & # & # w '#

h h

 "! X # X #

(  w

5| |  5 | qz y C5 
H

y "QH W U $ ' w % #% W U "QH X # y $ % T w ' # % X # T

QH W U

  

j
#


T #

|y5{
 RX b w % #$ T

with

..

With these denitions, it is easy to generalize the previous three iterative procedures dened earlier, namely, Jacobi, Gauss-Seidel, and SOR. For example, the block Jacobi itare all replaced eration is now dened as a technique in which the new subvectors according to

which leads to the same equation as before,

except that the meanings of , , and have changed to their block analogues. With nite difference approximations of PDEs, it is standard to block the variables and the matrix by partitioning along whole lines of the mesh. For example, for the twodimensional mesh illustrated in Figure 2.5, this partitioning is

This corresponds to the mesh 2.5 of Chapter 2, whose associated matrix pattern is shown in Figure 2.6. A relaxation can also be dened along the vertical instead of the horizontal lines. Techniques of this type are often known as line relaxation techniques. In addition, a block can also correspond to the unknowns associated with a few congrid. secutive lines in the plane. One such blocking is illustrated in Figure 4.2 for a The corresponding matrix with its block structure is shown in Figure 4.3. An important difference between this partitioning and the one corresponding to the single-line partitioning is that now the matrices are block-tridiagonal instead of tridiagonal. As a result, solving linear systems with may be much more expensive. On the other hand, the number of iterations required to achieve convergence often decreases rapidly as the block-size increases.

( ( "V(

R R R

W R

H ( W # w

vE

( R S W I R R

R (

' % # W "UQH ' w % W ` #

w RX

' w %

R SW ( W

rT T

or,

P H GR VIF

R S w RX

' w %

'

. . .

. . .

..

..

. . .

a!wi
X X W

C
TVWT T TT VWT W

X bX

rT T

 !p W W

W rP"QUIHG R F IRR

VWT TT

s| | q  q k `tsjt V
R dR

W Rd R R IR W W

W rP"QUIHG R F

WX W

31

32

33

34

25

26

27

28

19

20

21

22

13

14

15

16

10

domains.

Matrix associated with the mesh of Figure 4.2.

Finally, block techniques can be dened in more general terms. First, by using blocks that allow us to update arbitrary groups of components, and second, by allowing the blocks to overlap. Since this is a form of the domain-decomposition method which will be seen later, we dene the approach carefully. So far, our partition has been based on an actual set-partition of the variable set into subsets , with the condition that two distinct subsets are disjoint. In set theory, this is called a partition of . More generally, a set-decomposition of removes the constraint of disjointness. In other words it is required that the union of the subsets s be equal to :

In the following,

denotes the size of

and the subset

is of the form,

( ( (

y R R $( R #( ! ! ! " X X X R T R T R T X W 

 R

( (  Q (

Partitioning of a

square mesh into three sub-

 "!
35 36 29 30 23 24 17 18 11 12 5 6

5| |  5| qz A Cy5 
 R

( R 

  

A 9 7 6 3 B@854 2

A 9 7 6 3 B@854 2

 |5
R

A general block Jacobi iteration can be dened as follows. Let

be the

matrix

and

and dene similarly the partitioned vectors


R 

The -dimensional vector represents the projection of with respect to the basis spanned by the columns of . The action of performs the reverse operation. That means is an extension operation from a vector in (represented in the basis consisting of the columns of ) into a vector in . The operator is termed a restriction operator and is an prolongation operator. Each component of the Jacobi iteration can be obtained by imposing the condition that the projection of the residual in the span of be zero, i.e.,
R

This leads to the following algorithm:

H V A

H W U QH RX`R w 3 R XR YdR y R ( ( Q ( y E ( ( V( Y W

1. For 2. For 3. Solve 4. Set 5. EndDo 6. EndDo

until convergence Do: Do:

a!wi

H V A

3 CA 6 Q S @8 5 S Q I D E @ PD 7V9UT RPH77 9FD B C@A974( 4DB9 G E B 8 5 6 5

Remember that

, which can be rewritten as


R 

Note that is a projector from to the subspace ..., . In addition, we have the relation

spanned by the columns

( (  Q(

R !

When there is no overlap, i.e., when the then dene . Let be the matrix

s form a partition of the whole set

P G u s 

where each is the -th column of the weight factor chosen so that


identity matrix, and

represents a

C C R

P G P G P G P G PW eDG PWG ( 8& s s s s ( s s ( s s $ (

R R

" !

R 

P G P G PW eDG & s s ( s ( s % R $

u  u

R  R  

u 

R FR

R  u

bR

R S  (

R R (

R 

W I R R

W U "QH

 R

 

 !p R

P H GR F P W U H GR w QIp` r"VIpF

uR

R  R

RF

s| | q  q k `tsjt V

R

u y R



yR

 

P uG s

uF

X T R R

bR

1 2

~h v

u vR

) 0 (&#$ ' %

As was the case with the scalar algorithms, there is only a slight difference between the Jacobi and Gauss-Seidel iterations. Gauss-Seidel immediately updates the component to be corrected at step , and uses the updated approximate solution to compute the residual vector needed to correct the next component. However, the Jacobi iteration uses the same previous approximation for this purpose. Therefore, the block Gauss-Seidel iteration can be dened algorithmically as follows:

From the point of view of storage, Gauss-Seidel is more economical because the new approximation can be overwritten over the same vector. Also, it typically converges faster. On the other hand, the Jacobi iteration has some appeal on parallel computers since the second Do loop, corresponding to the different blocks, can be executed in parallel. Although the point Jacobi algorithm by itself is rarely a successful technique for real-life problems, its block Jacobi variant, when using large enough overlapping blocks, can be quite attractive especially in a parallel computing environment.

The Jacobi and Gauss-Seidel iterations are of the form

for the Jacobi and Gauss-Seidel iterations, respectively. Moreover, given the matrix splitting

where is associated with the linear system (4.1), a linear xed-point iteration can be dened by the recurrence

r 0 i r wi

r0 0 i

pq0 i

( W

W & #  % W T&  X #

 w

( 

W

   X T T

in which

r wi

 E B ' 0A ' ) 63 C8 5 ) B 3 A 8 DC7GP5 A B E B B E 5  E F I E 'B A 8 3

( w 

W U "VH

V A

W U "QH

0 (

1. Until convergence Do: Do: 2. For 3. Solve 4. Set 5. EndDo 6. EndDo


R

6 Q S @ 8 5 S B 5 Q 5 PD 7V9UT 4C7P @

 "!

5| |  5| qz A Cy5  R

3 A G4E D  CA479( RB9 B B @8 5 6 5

  

R XR w 3 R X dR y R ( (  (

j
H

@ E

 |5
 ! #" 
1 2

vE

~Q v

' 7$ % #

For example, for the Jacobi iteration, , while for the Gauss-Seidel iteration, . The iteration can be viewed as a technique for solving the system

The above system which has the same solution as the original system is called a preconditioned system and is the preconditioning matrix or preconditioner. In other words, a relaxation scheme is equivalent to a xed-point iteration on a preconditioned system. For example, for the Jacobi, Gauss-Seidel, SOR, and SSOR iterations, these preconditioning matrices are, respectively,

Thus, the Jacobi preconditioner is simply the diagonal of , while the Gauss-Seidel preconditioner is the lower triangular part of . The constant coefcients in front of the matrices and only have the effect of scaling the equations of the preconditioned system uniformly. Therefore, they are unimportant in the preconditioning context. Note that the preconditioned system may be a full system. Indeed, there is no reason should be a sparse matrix (even though may be sparse), since the inverse why of a sparse matrix is not necessarily sparse. This limits the number of techniques that can be applied to solve the preconditioned system. Most of the iterative techniques used only require matrix-by-vector products. In this case, to compute for a given vector , rst compute and then solve the system :

The matrix may be sparser than and the matrix-by-vector product may be less expensive than the product . A number of similar but somewhat more complex ideas have been exploited in the context of preconditioned iterative methods. A few of these will be examined in Chapter 9.



In some cases, it may be advantageous to exploit the splitting as by the procedure

and compute

a 0 i a 0 i a0 i 0 i

' #

"

# W #

 

  3 ( W  ( 

W  (

% #

0 0 # # X # y  T 0 ( T% # # y#  X 0 ( ( %& T $ #  ( $   # 

V W

W

0 0

Since

has the form

, this system can be rewritten as

aa0 i

# ( #  W  

'

H T  W UQH w   ( % ` #

W  

which has the form (4.18) with

 !p

s| | q  q k `tsjt V


W

All the methods seen in the previous section dene a sequence of iterates of the form

in which is a certain iteration matrix. The questions addressed in this section are: (a) if the iteration converges, then is the limit indeed a solution of the original system? (b) under which conditions does the iteration converge? (c) when the iteration does converge, how fast is it? If the above iteration converges, its limit satises

In the case where the above iteration arises from the splitting , it is easy to see that the solution to the above system is identical to that of the original system . Indeed, in this case the sequence (4.28) has the form

and its limit satises

or

. This answers question (a). Next, we focus on the other two questions.

Standard results seen in Chapter 1 imply that if the spectral radius of the iteration matrix is less than unity, then converges to zero and the iteration (4.28) converges toward the solution dened by (4.29). Conversely, the relation

shows that if the iteration converges for any and then converges to zero for any vector . As a result, must be less than unity and the following theorem is proved:

Let be a square matrix such that . Then is nonsingular and the iteration (4.28) converges for any and . Conversely, if the iteration (4.28) converges for for any and , then .
Since it is expensive to compute the spectral radius of a matrix, sufcient conditions that guarantee convergence can be useful in practice. One such sufcient condition could be obtained by utilizing the inequality, , for any matrix norm.

r a i

 t

 X

If is nonsingular then there is a solution from (4.28) yields

to the equation (4.29). Subtracting (4.29)

r 0 i

r 0 i

A &

F5 63

W U QH 4

 H

5 ) E 22P5  P0' ) & 8 PP5  3 5 E 3 5 E

 "!

( w 

 w

( w

X   T VTTWT X W H H H W"QUH T H VVT TT W X H 4V "QUH

5| |  5| qz A Cy5  
w  H

 X 

W

X

W U "VH

W U "QH

  

 

! $6  

@ 

cztc l  |5 9 C 

CA DB9

'

cv ~

. Then

Let be a square matrix such that for some matrix norm is nonsingular and the iteration (4.28) converges for any initial vector .

Apart from knowing that the sequence (4.28) converges, it is also desirable to know how fast it converges. The error at step satises
W

The matrix can be expressed in the Jordan canonical form as . Assume for simplicity that there is only one eigenvalue of of largest modulus and call it . Then

A careful look at the powers of the matrix shows that all its blocks, except the block associated with the eigenvalue , converge to zero as tends to innity. Let this Jordan block be of size and of the form
W

The above denition depends on the initial vector , so it may be termed a specic convergence factor. A general convergence factor can also be dened by

It follows from the above analysis that . The convergence rate logarithm of the inverse of the convergence factor

H W %

H W

9 7  X P W H V 6 @GH U 2  H  Q$ISTR " 56F4E D  

  X @8H 9 7  T 642 553  H    

C2  5B

where

is some constant. The convergence factor of a sequence is the limit

W U ) 0 "X H 01 '

  

Thus, the norm of

has the asymptotical form

W X %

W U "X H (H  '

If is large enough, then for any i.e.,


W

the dominant term in the above sum is the last term,

is the (natural)

% &R

 $E

 R

 R " #H ! H % W w  H  H % w  X X W fX

H

where

is nilpotent of index , i.e.,

. Then, for

  

 W

( % w    

H  H 4

 

Y$ X }

H (

H H 

 CA DB9  v  | Cv At | |

# # '

' 0

Thus, the global asymptotic convergence factor is equal to the spectral radius of the iteration matrix . The general convergence rate differs from the specic rate only when the initial error does not have any components in the invariant subspace associated with the dominant eigenvalue. Since it is hard to know this information in advance, the general convergence factor is more useful in practice.

Consider the simple example of Richardsons Iteration,

Then, the eigenvalues

of

are such that

In particular, if and , at least one eigenvalue is , and so for any . In this case the method will always diverge for some initial guess. Let us assume that all eigenvalues are positive, i.e., . Then, the following conditions must be satised in order for the method to converge:

This function of is depicted in Figure 4.4. As the curve shows, the best possible is reached at the point where the curve with positive slope crosses the curve with negative slope, i.e., when

The next question is: What is the best value for the parameter , i.e., the value of which minimizes ? The spectral radius of is

 R ! 

The rst condition implies that , while the second requires that other words, the method converges for any scalar which satises

. In

Thus, the iteration matrix is Assume that the eigenvalues

and the convergence factor is , are all real and such that,

X  r0 a i

 0 R  

Y  R  y y R    R   R   R   R R r( ( y veE(   W R    "QUH

 R

 w

0v(0 R   X " $ #   R  

0  R !

y   R R    y

 R ! w

0 PH 6

 

T 

R 

0  R

where

is a nonnegative scalar. This iteration can be rewritten as

pqa i

H W %

This factor satises

 "! H
.

 H W  H  2  5654E  9 7H T  X $VS " @54568E W U 2  H  P H 6 D  

tA

5| |  5| qz A Cy5 
T

 w

@8H 9 7

W U "QH

  

 |5

CA 9  DB7 

1 0

This gives

Replacing this in one of the two curves gives the corresponding optimal spectral radius
w R   R  R  R

This expression shows the difculty with the presence of small and large eigenvalues. The convergence rate can be extremely small for realistic problems. In addition, to achieve good convergence, eigenvalue estimates are required in order to obtain the optimal or a near-optimal , and this may cause difculties. Finally, since can be very large, the curve can be extremely sensitive near the optimal value of . These observations are common to many iterative methods that depend on an acceleration parameter.

With a regular splitting, we associate the iteration

The question asked is: Under which conditions does such an iteration converge? The following result, which generalizes Theorem 1.15, gives the answer.

and only if

W

Let be a regular splitting of a matrix . Then is nonsingular and is nonnegative.

a i

 w

W

W U "VH

(

 (

pair of matrices nonnegative.

and

Let be three given matrices satisfying is a regular splitting of , if is nonsingular and

aaa i

 R 

F  B 6E 0A A B &  F C9& 3 8

  w R R

 63 5

$X" 

$X" 

The curve

as a function of .

y 0  R 1 0

. The are

 C

0  R  1 0  X s W 
X T y X W $ "  
(

6  

9A 9 6 5 vB7 4 32

| Cv At | |
CA DB9

A 9
' )$ 2) 2 )

1 P

Av ~

'

if

it follows that is nonsingular. The assumptions of Theorem 1.15 are satised for the matrix since is nonnegative and . Therefore, is nonnegative as is . To prove the sufcient condition, assume that is nonsingular and that its inverse is nonnegative. Since and are nonsingular, the relation (4.35) shows again that is nonsingular and in addition,

Clearly, is nonnegative by the assumptions, and as a result of the PerronFrobenius theorem, there is a nonnegative eigenvector associated with which is an eigenvalue, such that

We begin with a few standard denitions.

A matrix

is

(weakly) diagonally dominant if

 (

This theorem establishes that the iteration (4.34) always converges, if lar splitting and is an M-matrix.

is a regu-

F 5 ) 3 A 2DB 078 I 6C8 E B ' A E I

( (

hs i h Is (0 uR c0 g q  0 u u c0 f  R

&

& 8 ' 8  E  B

and this can be true only when which implies that .

. Since

is nonsingular, then

Y  X

XT

X 

 

 X

 6

 

A B9

W

Since

and

are nonnegative, this shows that

 X

XT

From this and by virtue of (4.36), it follows that

r a i r a i
W

W X W W W  X    W  W  X


X


 T  T

 W

  W  X W

 W W 1

W

T 

W

Dene

. From the fact that

 "!

 X

5| |  5| qz A Cy5 
T

  

@


 |5

' )$ 2) 2 )

A 6

, and the relation

strictly diagonally dominant if

irreducibly diagonally dominant if

is irreducible, and

with strict inequality for at least one .




Often the term diagonally dominant is used instead of weakly diagonally dominant. Diagonal dominance is related to an important result in Numerical Linear Algebra known as Gershgorins theorem. This theorem allows rough locations for all the eigenvalues of to be determined. In some situations, it is desirable to determine these locations in the complex plane by directly exploiting some knowledge of the entries of the matrix . The simplest such result is the bound

for any matrix norm. Gershgorins theorem provides a more precise localization result.

(Gershgorin) Any eigenvalue of a matrix is located in one of the closed discs of the complex plane centered at and having the radius

In other words, such that

Let be an eigenvector associated with an eigenvalue , and let be the index of the component of largest modulus in . Scale so that , and , for . Since is an eigenvector, then

which gives

This completes the proof.

a a i

a a i

 0 R F0

u v0 vR c 0

A0

W  Ru  u

fu

( V(

( V(

W W

0 u

F0

( uF u c

 0 IRR

c0

i ph

} } qg fg 0
h
 
h

i ph

0 vR c 0 u

qrg gf

   0 R 0

 0 u F 500

hs i h pds (0 u vR c 0 g q  0 u u c 0 f  R

hs i h pds ( 0 vR c 0 g q  0 u u c 0 u f

R IR c

sth

i ph rqgg fu R 

u c0

R

i h

rqg fg

0

| Cv At | |
A 9

1 P

Av ~

'

 

Since the result also holds for the transpose of , a version of the theorem can also be formulated based on column sums instead of row sums. The discs dened in the theorem are called Gershgorin discs. The theorem states that the union of these discs contains the spectrum of . It can also be shown that if there are Gershgorin discs whose union is disjoint from all other discs, then contains exactly eigenvalues (counted with their multiplicities). For example, when one disc is disjoint from the others, then it must contain exactly one eigenvalue. An additional renement which has important consequences concerns the particular case when is irreducible.

Let be an irreducible matrix, and assume that an eigenvalue of lies on the boundary of the union of the Gershgorin discs. Then lies on the boundary of all Gershgorin discs.
As in the proof of Gershgorins theorem, let be an eigenvector associated with , with , and , for . Start from equation (4.38) in the proof of Gershgorins theorem which states that the point belongs to the -th disc. In addition, belongs to the boundary of the union of all the discs. As a result, it cannot be an interior point to the disc . This implies that . Therefore, the inequalities in (4.38) both become equalities:

Let be any integer . Since is irreducible, its graph is connected and, therefore, there exists a path from node to node in the adjacency graph. Let this path be

By denition of an edge in the adjacency graph, . Because of the equality in (4.39), it is necessary that for any nonzero . Therefore, must be equal to one. Now repeating the argument with replaced by shows that the following equality holds:

The argument can be continued showing each time that

and this is valid for . In the end, it will be proved that belongs to the boundary of the -th disc for an arbitrary . An immediate corollary of the Gershgorin theorem and the above theorem follows.

If a matrix is strictly diagonally dominant or irreducibly diagonally dominant, then it is nonsingular.

r a i

r i

p i

0 i F0

q0 u  i

q0 u

c0

c0

t

 "!

i h

i h  rqg f g q0 u F F00 u i

u F i c

0 c

(s

i ph rqg fg q0 u F 500 u

5| |  5| qz A Cy5 
H

$( $( !( !

q0 s

s c

c0

c0

 y h i ph qg fg q0 c 0

i h

i ph rqg f g q0 i

#( ! !

   y


k0 u F 0

W ( (

 0 R F0

j


i c

(  

 |5 0
T
A B9

9A vB9

c0

~{

F0

cv ~

# # '

A 6

'

C C
'0

! !

If a matrix is strictly diagonally dominant, then the union of the Gershgorin disks excludes the origin, so cannot be an eigenvalue. Assume now that it is only irreducibly diagonal dominant. Then if it is singular, the zero eigenvalue lies on the boundary of the union of the Gershgorin disks. In this situation, according to the previous theorem, this eigenvalue should lie on the boundary of all the disks. This would mean that

which contradicts the assumption of irreducible diagonal dominance. The following theorem can now be stated.

If is a strictly diagonally dominant or an irreducibly diagonally dominant matrix, then the associated Jacobi and Gauss-Seidel iterations converge for any .
We rst prove the results for strictly diagonally dominant matrices. Let be the dominant eigenvalue of the iteration matrix for Jacobi and for Gauss-Seidel. As in the proof of Gershgorins theorem, let be an eigenvector associated with , with , and , for . Start from equation (4.38) in the proof of Gershgorins theorem which states that for ,

-th row of the equation

which yields the inequality


u

In the case when the matrix is only irreducibly diagonally dominant, the above proofs only show that , where is the iteration matrix for either Jacobi or . Gauss-Seidel. A proof by contradiction will be used to show that in fact Assume that is an eigenvalue of with . Then the matrix would be singular and, as a result, would also be singular. Since , it is clear that is also an irreducibly diagonally dominant matrix. This would contradict Corollary 4.2.

 y

C 0 0

W W 

The last term in the above equation has the form and . Therefore,

with

X &## ' %

( (  0 0 u u c 0 u c0

( u F u c

0 cT 0

70

w F c  uFu c

0 

0 0 u F 50 u c 0 u 0 u F F0 u c 0 0

This proves the result for Jacobis method. For the Gauss-Seidel iteration, write the in the form

all nonnegative

y  E

'

% W #

( ( V(

 0 R F0

 0

h i 0 h c0 q rg gf u c0

for

C C C

0  0 u F0 0

W

F0

0 uR c0 g q f

W

c0 u c0

i h Is q0 u u c 0

hs

i h

 0 0

qg fg

0  0

0 c0

W

| Cv At | |
'W Y 

 0 0

A 9 Av ~

%`

W z

1 P

A A

'


6 6

A number of properties which are related to the graph of a nite difference matrix are now dened. The rst of these properties is called Property A. A matrix has Property A if its graph is bipartite. This means that the graph is two-colorable in the sense dened in Chapter 3: Its vertices can be partitioned in two sets in such a way that no two vertices in the same set are connected by an edge. Note that, as usual, the self-connecting edges which correspond to the diagonal elements are ignored.

A matrix has Property A if the vertices of its adjacency graph can be partitioned in two sets and , so that any edge in the graph links a vertex of to a vertex of .
In other words, nodes from the rst set are connected only to nodes from the second set and vice versa. This denition is illustrated in Figure 4.5.

Graph illustration of Property A.

An alternative denition is that a matrix has Property A if it can be permuted into a matrix with the following structure:

r0 i

If is symmetric with positive diagonal elements and for SOR converges for any if and only if is positive denite.

F  3 6E B P5 46P0A F 7F ' ) C8 8 3 ' A E 5 B E E

 # ' 1

% W1 #

( Y

It is possible to show that when for any in the open interval true under certain assumptions.

F 5 ) B 078 I 0A B E G5 0 CA 7F '# ) B 3 A 5 Q0 F 3 A 5 B 5 B B I I A 3 5 ' 7P6 #3 

is Symmetric Positive Denite, then SOR will converge and for any initial guess . In fact, the reverse is also

 "!
,

5| |  5| qz A Cy5 

  

j
 
W

A 9 6 3 B@8754 2

 6  

 |5
6  
A B9

A B9

' )$ 2) 2 )

cv ~

'

C C

where and are diagonal matrices. This structure can be obtained by rst labeling all the unknowns in from 1 to , in which and the rest from to . Note that the Jacobi iteration matrix will have the same structure except that the blocks will be replaced by zero blocks. These Jacobi iteration matrices satisfy an important property stated in the following proposition.

The rst property is shown by simply observing that if is an eigenvector associated with , then is an eigenvector of associated with the eigenvalue . Consider the second property. For any , the matrix is similar to , i.e., with dened by

This proves the desired result

A denition which generalizes this important property is consistently ordered matrices. Varga [213] calls a consistently ordered matrix one for which the eigenvalues of are independent of . Another denition given by Young [232] considers a specic class of matrices which generalize this property. We will use this denition here. Unlike Property A, the consistent ordering property depends on the initial ordering of the unknowns.

A matrix is said to be consistently ordered if the vertices of its adjawith the property that any two cency graph can be partitioned in sets , , , adjacent vertices and in the graph belong to two consecutive partitions and , with , if , and , if .
It is easy to show that consistently ordered matrices satisfy property A: the rst color is made up of all the partitions with odd and the second with the partitions with even .

X   T

X 

 H

 

R

X 

 

dened for

are independent of .

y w

 

If is an eigenvalue of , then so is The eigenvalues of the matrix

E  

X 

w W

  R 

Y ` 

9A vB9

E

' )$ 2) 2 )

W @

v w

and let

and

be the lower and upper triangular parts of

, respectively. Then

aa i

"

Let

be a matrix with the following structure:

C C

W # r( y #

0 0

| Cv At | |
CA DB9

) ' 2 )

Q v A

'

'


6
W

Block tridiagonal matrices of the form

whose diagonal blocks are diagonal matrices are called -matrices. Clearly, such matrices are consistently ordered. Note that matrices of the form (4.42) are a particular case with . Consider now a general, consistently ordered matrix. By denition, there is permutaof which is the union of disjoint subsets

with the property that if and belongs to , then belongs to depending on whether or . This permutation can be used to permute symmetrically. If is the permutation matrix associated with the permutation , then clearly

is a -matrix. Not every matrix that can be symmetrically permuted into a -matrix is consistently ordered. The important property here is that the partition preserves the order of the indices of nonzero elements. In terms of the adjacency graph, there is a partition of the graph with the property that an oriented edge from to always points to a set with a larger index if , or a smaller index otherwise. In particular, a very important consequence is that edges corresponding to the lower triangular part will remain so in the permuted matrix. The same is true for the upper triangular part. Indeed, if a nonzero element in the permuted matrix is with , then by , or . Because denition of the permutation of the order preservation, it is necessary that . A similar observation holds for the upper triangular part. Therefore, this results in the following proposition.

With the above property it can be shown that for consistently ordered matrices the eigenvalues of as dened in Proposition 4.1 are also invariant with respect to .

ordered matrix

Let be the Jacobi iteration matrix associated with a consistently , and let and be the lower and upper triangular parts of , respec-

A B9

X 

' )$ )

h v A

in which

represents the (strict) lower part of

and

the (strict) upper part of

r i

    X  

tion matrix

such that

If a matrix is consistently ordered, then there exists a permutais a -matrix and




r i

X eX

T

 W E W E E  eXX P uT G i   T P R G i c X  u T  R c  X T  

 V( E



 W

()    X  T

 Y (

 E

u vR c

 

( (  Q (

tion

X X W # X 

W X X

..

..

..

 "!

5| |  5| qz A Cy5 
R # W # R

   #

j
R #


A B9

 |5

' )$ )

A 9  B7 

h v A

 Y( E

'

'

'

'

tC 9 C

tively. Then the eigenvalues of the matrix

From the previous proposition, the lower part of is precisely . Similarly, the upper part is , the lower and upper parts of the associated -matrix. Therefore, we only need to show that the property is true for a -matrix. is similar to . This means that In this case, for any , the matrix with being equal to

where the partitioning is associated with the subsets

Note that -matrices and matrices with the structure (4.42) are two particular cases of matrices which fulll the assumptions of the above proposition. There are a number of well known properties related to Property A and consistent orderings. For example, it is possible to show that,

Consistently ordered matrices satisfy an important property which relates the eigenvalues of the corresponding SOR iteration matrices to those of the Jacobi iteration matrices. The main theorem regarding the theory for SOR is a consequence of the following result proved by Young [232]. Remember that

is an eigenvalue of the Jacobi iteration matrix . Conversely, if is an eigenvalue of the Jacobi matrix and if a scalar satises (4.46), then is an eigenvalue of .

a i

 ! # X

# w

, and let , any scalar

Let

be a consistently ordered matrix such that for . Then if is a nonzero eigenvalue of the SOR iteration matrix such that

Property A is invariant under symmetric permutations. A matrix has Property A if and only if there is a permutation matrix is consistently ordered.

R dR c

  # X

..

respectively.

such that

X 

 v

w 'W # # W X % W # # y W # w ' # X % # XT X # T T

( ( W X 

y

 

X 

 

X 

 # T

First transform by the previous proposition

into a -matrix using the permutation

X 

dened for

do not depend on .

in (4.44) provided

C C

y w

X 

| Cv At | |
0

 W

Y $ 

A  9

1 P

W(V ( y Av ~

W @

'

This theorem allows us to compute an optimal value for , which can be shown to be equal to

A typical SOR procedure starts with some , for example, , then proceeds with a number of SOR steps with this . The convergence rate for the resulting iterates is estimated providing an estimate for using Theorem 4.7. A better is then obtained from the formula (4.47), and the iteration restarted. Further renements of the optimal are calculated and retrotted in this manner as the algorithm progresses.

The Alternating Direction Implicit (ADI) method was introduced in the mid-1950s by Peaceman and Rachford [162] specically for solving equations arising from nite difference discretizations of elliptic and parabolic Partial Differential Equations. Consider a partial differential equation of elliptic type

on a rectangular domain with Dirichlet boundary conditions. The equations are discretized with centered nite differences using points in the direction and points in

r i

r i

W w

( "  X b

 b (  X b

(  X b

 Qm gmc 

T 

y # y w

which means that ordered, the eigenvalues of same as those of follows immediately.

is an eigenvalue of . Since is consistently which are equal to are the , where is the Jacobi iteration matrix. The proof

( Y $

Since

, this can be rewritten as

Y `

X rX

w #

y

$X" #

w W X w y T #  # w X T # # w 

# w

( " X b

T #

T eT

or

Y$  # X eX X

T T

which is equivalent to

Y$   # X X

and the Jacobi iteration matrix is merely

. Writing that

 X X #

' W T#
#


T

 

( " c X b T 

%W #

Denote

by

and

by

, so that

 "!
is an eigenvalue yields

5| |  5| qz A Cy5 

  

 |5
$ # Y

A 6

C C

the

direction, This results in the system of equations

respectively. In what follows, the same notation is used to represent the discretized version of the unknown function . The ADI algorithm consists of iterating by solving (4.49) in the and directions alternatively as follows.

Here, , is a sequence of positive acceleration parameters. The specic case where is chosen to be a constant deserves particular attention. In this case, we can formulate the above iteration in the usual form of (4.28) with

Note that (4.51) can be rewritten in a simpler form; see Exercise 5. The ADI algorithm is often formulated for solving the time-dependent Partial Differential Equation

on the domain conditions are:

. The initial and boundary

where is the boundary of the unit square . The equations are discretized with respect to the space variables and as before, resulting in a system of Ordinary Differential Equations:

aa i

a i i

a i

 X

( Y

 b

( w  b  ( ( & (   (  b  ( ( ( b  ' Y ( b ( # $ b T  X X ( ( ( X X  b $" %T XX  r( $ b "%$ T ( $ $ T y T T y & (Y $ ( ( ! ( ( X Y X Y "& Y $  X  ( b  b Tw  T 

( " X b

T 

y

(  X

T  ( c X b T 

w X 

y

Y 

or, when

, in the form (4.22) with

a0 i R` i a i

( 

W

X  T w

U QH i H X

 T

 H  H

 W  X W 

w T

1. For 2. Solve: 3. Solve: 4. EndDo

until convergence Do:

 

and

 ( b b" ( X

T 

 CD 7@ 8 E

{

6 @

(  c X b T 

W U "VH  H  UQH X H w   i T X Vw ( y I( Y ( T

5E@ 3 A 4H5 RB9

in which the matrices to the operators

and

represent the three-point central difference approximations

a i

C  C (

 !

5|    5 | yq t 5 | v tC 5r

( (

1 2

~h v

( H 

) 0 (&#$ ' %

in which the matrices and have been dened earlier. The Alternating Direction Implicit algorithm advances the relation (4.56) forward in time alternately in the and directions as follows:

Horizontal ordering
19 20 21 22 23 24 4 8

13

14

15

16

10

The horizontal and vertical orderings for the unknowns in ADI. Assuming that the mesh-points are ordered by lines in the -direction, then the rst step of Algorithm 4.3 constitutes a set of independent tridiagonal linear systems of size each. However, the second step constitutes a large tridiagonal system whose three diagonals are offset by , , and , respectively. This second system can also be rewritten as a set of independent tridiagonal systems of size each by reordering the grid points by lines, this time in the direction. The natural (horizontal) and vertical orderings are illustrated in Figure 4.6. Whenever moving from one half step of ADI to the next, we must implicitly work with the transpose of the matrix representing the solution on the grid points. This data operation may be an expensive task on parallel machines and often it is cited as one of the drawbacks of Alternating Direction Methods in this case. ADI methods were extensively studied in the 1950s and 1960s for the particular case of positive denite systems. For such systems, and have real eigenvalues and the following is a summary of the main results in this situation. First, when and are Symmetric Positive Denite, then the stationary iteration ( , for all ) converges. For the model problem, the asymptotic rate of convergence of the stationary ADI iteration using the optimal is the same as that of SSOR using the optimal . However, each ADI step is more expensive than one SSOR step. One of the more important results in the ADI theory is that the rate of convergence of ADI can be increased appreciably by using a cyclic sequence of parameters, . A theory for selecting the best sequence of s is well understood in the case when and commute [26]. For the model problem, the parameters can be selected so that the time complexity is reduced to , for details see [162].

Y 

The acceleration parameters

of Algorithm 4.3 are replaced by a natural time-step. Vertical ordering


12 16 20 24

17

18

11

15

19

23

11

12

10

14

18

22

13

17

21

i

U QH

 "!

5| |  5| qz A Cy5 
y w  w  T T

i

W U "QH

U QH

  

j

!

A 9 76 3 B@854 2

 |5
Y

C C

where

is a real parameter.

Verify that the eigenvalues of

are given by

and that an eigenvector associated with each

is

Under what condition on

Now take . How does this matrix relate to the matrices seen in Chapter 2 for onedimensional problems?

Will the Jacobi iteration converge for this matrix? If so, what will its convergence factor be? Will the Gauss-Seidel iteration converge for this matrix? If so, what will its convergence factor be?

Deduce the expression (4.27) for the preconditioning matrix associated with the SSOR iteration. 3 Let be a matrix with a positive diagonal Obtain an expression equivalent to that of (4.13) for and . Show that

Now assume that in addition to having a positive diagonal, is symmetric. Prove that the eigenvalues of the SSOR iteration matrix are real and nonnegative.

9 g 9 b 3  

b3 b 3   ! 09 g 9

d fc A QTu9 sb yqxwTu9 sb Ipq09 b iU3 b  v t  p3 pv t r 3  % g h

d c

b v b  3 Tt 9 b  3 g v g ! 3 t 9 !

v WTt p r WTt p ! v

dec

dec

2 Prove that the iteration matrix

For which values of

A YW285 T6Q1 BBBC 280U4SPQ1  92856TSPQ1 XV9 7 3 RP  A A A 9 7 5 %3 R 7 3 R HI # G

does this matrix become positive denite?

will the SOR iteration converge? of SSOR, as dened by (4.13), can be expressed as

. but which involves the matrices

E # F "

 7

where

a i

 

 BCBB   A A A

   

5 9 7 3 ) ' %   @28564120(&$  " #

   

   

   

 



v uWt p uWt p v

d c Wv p v  WTt p

1 Consider an

tridiagonal matrix of the form

C C 

vcQtc |5 | y  w  |y|


    UaWa a  aWa a       `  `

. .

where the

blocks are nonsingular matrices which are not necessarily diagonal.

What are the block Jacobi and block Gauss-Seidel iteration matrices? Show a result similar to that in Proposition 4.3 for the Jacobi iteration matrix. Show also that for (1) the block Gauss-Seidel and block Jacobi iterations either both converge or both diverge, and (2) when they both converge, then the block Gauss-Seidel iteration is (asymptotically) twice as fast as the block Jacobi iteration.

, where 5 According to formula (4.23), the vector in iteration (4.22) should be equal to is the right-hand side and is given in (4.52). Yet, formula (4.51) gives a different expression for . Reconcile the two results, i.e., show that the expression (4.51) can also be rewritten as

7 Consider a matrix which is consistently ordered. Show that the asymptotic convergence rate for Gauss-Seidel is double that of the Jacobi iteration.

is called a three-cyclic matrix.

Assume that a matrix has the form , where is a nonsingular diagonal matrix, and is three-cyclic. How can the eigenvalues of the Jacobi iteration matrix be related to those of the Gauss-Seidel iteration matrix? How does the asymptotic convergence rate of the Gauss-Seidel iteration compare with that of the Jacobi iteration matrix in this case? Answer the same questions as in (2) for the case when SOR replaces the Gauss-Seidel iteration.

Generalize the above results to -cyclic matrices, i.e., matrices of the form

N OTES AND R EFERENCES . Two good references for the material covered in this chapter are Varga [213] and and Young [232]. Although relaxation-type methods were very popular up to the 1960s, they are now mostly used as preconditioners, a topic which will be seen in detail in Chapters 9 and 10. One of the main difculties with these methods is nding an optimal relaxation factor for

..

v t

 # "

E p 

v 

"

What are the eigenvalues of which depends on , , and

? (Express them in terms of eigenvalues of a certain matrix .)

r

8 A matrix of the form

6 Show that a matrix has Property A if and only if there is a permutation matrix is consistently ordered.

such that

v t

p r 

..

..

..

A Tt 9 v g

E 3 Tt 9 g E %  3 v

p r 

 "

p r 

I

 I

4 Let

 "!

5| |  5| qz A Cy5 

  

j
r  b

 |5
 vTt     p 
  `       ` 

general matrices. Theorem 4.4 is due to Ostrowski. For details on the use of Gershgorins theorem in eigenvalue problems, see [180]. The original idea of the ADI method is described in [162] and those results on the optimal parameters for ADI can be found in [26]. A comprehensive text on this class of techniques can be found in [220]. Not covered in this book is the related class of multigrid methods; see the reference [115] for a detailed exposition. Closely related to the multigrid approach is the Aggregation-Disaggregation technique which is popular in Markov chain modeling. A recommended book for these methods and others used in the context of Markov chain modeling is [203].

C C 

|5 | y  w  |y|

tt

where is an real matrix. In this chapter, the same symbol is often used to denote the matrix and the linear mapping in that it represents. The idea of projection techniques is to extract an approximate solution to the above problem from a subspace of . If is this subspace of candidate approximants, or search subspace, and if is its dimension, then, in general, constraints must be imposed to be able to extract such an approximation. A typical way of describing these constraints is to impose (independent) orthogonality conditions. Specically, the residual vector is constrained to be orthogonal to linearly independent vectors. This denes another subspace of dimension which will be called the subspace of constraints or left subspace for reasons that will be explained below. This simple framework is common to many different mathematical methods and is known as the Petrov-Galerkin conditions. There are two broad classes of projection methods: orthogonal and oblique. In an orthogonal projection technique, the subspace is the same as . In an oblique projection

 A

pw i

t t

R  y  mg vv gQ v xff eiED%Bd8pF9P%6w%BI8RFP gts%se8d8fr8}GXF n u p H D Dd8IDD8`QX%BX rHWFeu8EXrB~ro8 VqXcQ8yt98W6i9fX r8 HrWTQ8rB}`%XlDY rH9TEXrB7F 79D`s%BI8F u n s n pn U u BX U F n YX u P 8 n H P pq`9P96986{9ff9Y%sv98W6ipFePwpaHIPWFytsmYeQ8%B98%TWXu D y8dRDPeWu`9PqX rHRD%sAtUqY Wq98q9895pitxBX98W6F n u F X 8 F 8Y H H p s 8 H sX 6 8UD AIXAIDpF%sd8WDh8%Bn 9YgWAQPGB``QP%B989sa8~tx%B98TgPgtsAD98 rH96eud8ld8D98W6FyDr8tH%BduWD8Y sP BX 8 U fp u H b s F %Bd8pF9P%6tD rH7ri8%u9PnRDWb}P qQX fB %UI8GFRDGxBrPe98tsH pA%fqX rHpFb qIX986F GXX HrpFcwUH n u 65 D U D P Xs pD Fs P 8 s i Rq9Bpo98EX6%B%FRXn `z`PBs RuxtsWFPSH }hP9u 98WBqRsF roX vts`}hXHB ~DIDf x8QX%B rneW saX SHrWFaueW8%BXe E tIDpFz9sP e8d8D H Q8tprBrHpF%nz~RDBb Q8 IDDs X 8SH`RuF 9PQX%BV9fX SHX~euWFDy8%UQX%Bd8GF(DxDn R sn P c X P uP ` nP b n e8 rBeP98tsH p a8EurBP p RutsrHT qIXyXtD98 Hr%6ueI898T rHpFhP9B8dF Peu rHrF%uQPrBn RutsSHGFD aHroz986A%fgGFrDXt p DB f b s F Hp 8 F X

 2 0  431) ( ' 0   0 ! 

Consider the linear system

method, is different from and may be totally unrelated to it. This distinction is rather important and gives rise to different types of algorithms.

Let be an real matrix and and be two -dimensional subspaces of . A projection technique onto the subspace and orthogonal to is a process which nds an approximate solution to (5.1) by imposing the conditions that belong to and that the new residual vector be orthogonal to ,

If we wish to exploit the knowledge of an initial guess to the solution, then the approximation must be sought in the afne space instead of the homogeneous vector space . This requires a slight modication to the above formulation. The approximate problem should be redened as

In other words, the approximate solution can be dened as

Interpretation of the orthogonality condition.

This is a basic projection step, in its most general form. Most standard techniques use a succession of such projections. Typically, a new projection step uses a new pair of subspace and and an initial guess equal to the most recent approximation obtained

The orthogonality condition (5.6) imposed on the new residual trated in Figure 5.1.

is illus-

a q i a q i

 g 

 

 ( Y ` ( t  ( w  ( T X

then the above equation becomes

or

q i

X w

 

C DA

7 4 32 6 5

Note that if as

is written in the form

, and the initial residual vector

 t   A  X w ( tA m T (

Find

such that

is dened

ra i

 A

 w 

 w 

Find

such that

a0 q i

F ' H 5 I E 'B 2 20A Q4DCA ) 5 '#3  & 8 P5 P5  3 E

 A 

75 B!B Bl 5  tp     |

   

! ! $#"

  

from the previous projection step. Projection methods form a unifying framework for many of the well known methods in scientic computing. In fact, virtually all of the basic iterative techniques seen in the previous chapter can be considered projection techniques. Whenever an approximation is dened via degrees of freedom (subspace ) and constraints (Subspace ), a projection process results. In the simplest case, an elementary Gauss-Seidel step as dened by (4.6) is nothing but a projection step with . These projection steps are cycled for until convergence. See Exercise 1 for an alternative way of selecting the sequence of s. Orthogonal projection methods correspond to the particular case when the two subspaces and are identical. The distinction is particularly important in the Hermitian case since we are guaranteed that the projected problem will be Hermitian in this situation, as will be seen shortly. In addition, a number of helpful theoretical results are true for the orthogonal case. When , the Petrov-Galerkin conditions are called the Galerkin conditions.

Let , an matrix whose column-vectors form a basis of and, similarly, , an matrix whose column-vectors form a basis of . If the approximate solution is written as

then the orthogonality condition leads immediately to the following system of equations for the vector :


In many algorithms, the matrix does not have to be formed since it is available as a by-product of the algorithm. A prototype projection technique is represented by the following algorithm.

r i

 

 

If the assumption is made that the expression for the approximate solution

matrix results,

is nonsingular, the following

 

E 'B A 8 A E 5 F 5 DC776PP63  63 5

 "!


( b w

QH C

5|  |  qzl 5  


 

b 

B@ 078 I 3 A

!

 

 ! #"

W & ( $ W ( & ( $ (

 |5 W(VR ( y lE

C DA

7  

9 C

As an example, consider the matrix

Let , , and

satisfy either one of the two following conditions, , or and


is positive denite and is nonsingular and

. of and , respec-

Since is positive denite, so is , see Chapter 1, and this shows that is nonsingular. Consider now case (ii). Let be any basis of and be any basis of . Since , can be expressed in this case as , where is a nonsingular matrix. Then

Since is nonsingular, the matrix is of full rank and as a result, nonsingular. This, along with (5.8), shows that is nonsingular.

a q i

 X

Consider rst the case (i). Let be any basis of and fact, since and are the same, can always be expressed as nonsingular matrix. Then

be any basis of . In , where is a



X

4 

"

 

 

! "

 

Then the matrix tively.

is nonsingular for any bases

There are two important particular cases where the nonsingularity of anteed. These are discussed in the following proposition.

   

C DA

 Q v A

) ' 2 )

W & ( ( $ (  !

where

identity matrix and is the . Although is nonsingular, the matrix the upper-left corner of and is therefore singular.

is the

zero matrix, and let is precisely the

block in

is guar-

"

The approximate solution is dened only when the matrix which is not guaranteed to be true even when is nonsingular.

& V( (

% $

and

for

and

is nonsingular,


is

 

b W w X V A b T W $  & ( ( %

1. Until convergence, Do: 2. Select a pair of subspaces 3. Choose bases 4. 5. 6. 7. EndDo


  3 3 3

and

C


7 5  PD SEUV8 D S 6 Q 5 D

75 B!B Bl 5  tp     |

5 9US V8 S D D

3 4DCA

~h v
1 2

  

) 0 (&#$ ' %

7 

 

'

'

Now consider the particular case where is symmetric (real) and an orthogonal projection technique is used. In this situation, the same basis can be used for and , which are identical subspaces, and the projected matrix, which is , is symmetric. In addition, if the matrix is Symmetric Positive Denite, then so is .

This section gives some general theoretical results without being specic about the subspaces and which are used. The goal is to learn about the quality of the approximation obtained from a general projection process. Two main tools are used for this. The rst is to exploit optimality properties of projection methods. These properties are induced from those properties of projectors seen in Section 1.12.4 of Chapter 1. The second tool consists of interpreting the projected problem with the help of projection operators in an attempt to extract residual bounds.

In this section, two important optimality results will be established that are satised by the approximate solutions in some cases. Consider rst the case when is SPD.

Assume that is Symmetric Positive Denite and . Then a vector is the result of an (orthogonal) projection method onto with the starting vector if and only if it minimizes the -norm of the error over , i.e., if and only if

As was seen in Section 1.12.4, for to be the minimizer of , it is necessary and sufcient that be -orthogonal to all the subspace . This yields

or, equivalently,

which is the Galerkin condition dening an orthogonal projection process for the approximation . We now take up the case when is dened by

(  ( ( Y X y T (  ( Y X ( X t T T f  tr( l ! %

W X

 

where

 (

 


 w 

F b& A

F5 63

 "! %

A B & 8 I C7 ' ' B A

5|  |  qzl 5  
R C 4 S U 5I6 QR

! "6

 7v AcvAz

 |5
' )$ )

h v A

'

A 6

'

 

Let be an arbitrary square matrix and assume that . Then a vector is the result of an (oblique) projection method onto orthogonally to with the starting vector if and only if it minimizes the -norm of the residual vector over , i.e., if and only if

which is precisely the Petrov-Galerkin condition that denes the approximate solution .

It is worthwhile to point out that need not be nonsingular in the above proposition. When is singular there may be innitely many vectors satisfying the optimality condition.

We now return to the two important particular cases singled out in the previous section, namely, the cases and . In these cases, the result of the projection process can be interpreted easily in terms of actions of orthogonal projectors on the initial residual or initial error. Consider the second case rst, as it is slightly simpler. Let be the initial residual , and the residual obtained after the projection process with . Then,

In addition, is obtained by enforcing the condition that be orthogonal to . Therefore, the vector is the orthogonal projection of the vector onto the subspace . This is illustrated in Figure 5.2. Hence, the following proposition can be stated.

where

denotes the orthogonal projector onto the subspace

A result of the proposition is that the 2-norm of the residual vector obtained after one projection step will not exceed the initial 2-norm of the residual, i.e.,

a result which has been established already. This class of methods may be termed residual projection methods.

a!w i

( 

     

cess onto

Let be the approximate solution obtained from a projection proorthogonally to , and let be the associated residual. Then,

 a q i

F 73 ' A ) 5 P6 QF 6P0E B DC77A 63  P5 A E B ' 3 ' I 3 5 A E 'B A 8 5 3

As was seen in Section 1.12.4, for to be the minimizer of and sufcient that be orthogonal to all vectors of the form to , i.e.,

, it is necessary , where belongs

where

 C

   

t X

X ~

R C 4 S U FI6 R

( Y $

 y ~ c }

c (

 

 

 w  ty   A  Q v A |5 || C

 !t
X

 

tA  !

9 vA

  tyc ~

 Q v A

 6

) ' 2 ) ) ' 2 )

' '

' '

case when

Interpretation of the projection process for the .

Now consider the case where and is Symmetric Positive Denite. Let be the initial error, where denotes the exact solution to the system and, similarly, let where is the approximate solution resulting from the projection step. Then (5.9) yields the relation

The above condition is equivalent to

Since

is SPD, it denes an inner product (see Section 1.11) which is usually denoted by and the above condition becomes

where denotes the projector onto the subspace the -inner product.

A result of the proposition is that the -norm of the error vector obtained after one projection step does not exceed the initial -norm of the error, i.e.,

(t

       

projection process onto

Let be the approximate solution obtained from an orthogonal and let be the associated error vector. Then,

, which is orthogonal with respect to

The above condition is now easy to interpret: The vector of the initial error onto the subspace .

is the -orthogonal projection

where is now obtained by constraining the residual vector

to be orthogonal to :


( X X

 "!

( Y `

  } fT }

5|  |  qzl 5  
(`  Y T (` Y

(}

( X X

w   ( 

t

t

 

 |5
76 3 854 2
' )$ )

 ~

h v A

'

(  I

'

C T

which is expected because it is known that the -norm of the error is minimized in This class of methods may be termed error projection methods.

If no vector of the subspace comes close to the exact solution , then it is impossible to nd a good approximation to from . Therefore, the approximation obtained by any projection process based on will be poor. On the other hand, if there is some vector in which is a small distance away from , then the question is: How good can the approximate solution be? The purpose of this section is to try to answer this question.

Orthogonal and oblique projectors.

Let be the orthogonal projector onto the subpace and let be the (oblique) projector onto and orthogonally to . These projectors are dened by

and it is assumed, without loss of generality, that . Then according to the property (1.54), the approximate problem dened in (5.5 5.6) can be reformulated as follows: nd such that

( Y `

and are illustrated in Figure 5.3. The symbol

is used to denote the operator

 w 

E

(  (   ( 

' % #3 PQ& 8 P5 P5  3 ' 3 5 3 E

 (    ( 

6 

 !t
A

|5 || C

7 4 32 6 5

or, equivalently,

Thus, an -dimensional linear system is approximated by an -dimensional one. The following proposition examines what happens in the particular case when the subspace is invariant under . This is a rare occurrence in practice, but the result helps in understanding the breakdown behavior of the methods to be considered in later chapters.

Assume that is invariant under , , and belongs to . Then the approximate solution obtained from any (oblique or orthogonal) projection method onto is exact.

where is a nonzero vector in . The right-hand side is in , so we have Similarly, belongs to which is invariant under , and therefore, the above equation becomes

The result can be extended trivially to the case where . The required assumption in this case is that the initial residual belongs to the invariant subspace . An important quantity for the convergence properties of projection methods is the distance of the exact solution from the subspace . This quantity plays a key role in the analysis of projection methods. Note that the solution cannot be well approximated from , if is not small because

The fundamental quantity is the sine of the acute angle between the solution and the subspace . The following theorem establishes an upper bound for the residual norm of the exact solution with respect to the approximate operator .

 X f T X tT A T

Since

, then

prw i

 X

Let . Then the exact solution

and assume that is a member of of the original problem is such that

    X        T   X 

V

X

A 

   

t A

 X

 A 6 Y  CDA cv ~

showing that

is an exact solution.

"y 

(` Y

( Y

X A

An approximate solution

is dened by

 ( 

 "!
. . Then

5|  |  qzl 5  

 |5
' )$ )

h v A

'

A 6

'

'

and

which completes the proof.

It is useful to consider a matrix interpretation of the theorem. We consider only the particular case of orthogonal projection methods ( ). Assume that is unitary, i.e., that the basis is orthonormal, and that . Observe that . Equation (5.11) can be represented in the basis as

Thus, the projection of the exact solution has a residual norm with respect to the matrix , which is of the order of .

This section examines simple examples provided by one-dimensional projection processes. In what follows, the vector denotes the residual vector for the current approximation . To avoid subscripts, arrow notation is used to denote vector updates. Thus, means compute and overwrite the result on the current . (This is known as a SAXPY operation.) One-dimensional projection processes are dened when

and

Following are three popular choices to be considered.

The steepest descent algorithm is dened for the case where the matrix is Symmetric Positive Denite. It consists of taking at each step and . This yields an iteration described by the following algorithm.

a0 w i

A E 5 ) F 6P2G5 A 6 P0F F 5 5 5 A

g X

( X ( T

! $6 

 w

where

and

are two vectors. In this case, the new approximation takes the form and the Petrov-Galerkin condition yields

V z

( t  c

 cv f h y c mQfgc }

T A

 w

 c

However,

 X    X   X T X

 X



 

T 


   

( (

y 

y 

 w

y 

Noting that

is a projector, it follows that

C C

  

| w |  l 5   l BCzEeqt  | |  |

"

1. Until convergence, Do: 2. 3. 4. 5. EndDo

over all vectors of the form , where is the negative of the gradient direction . The negative of the gradient direction is locally the direction that yields the fastest rate of decrease for . Next, we prove that convergence is guaranteed when is SPD. The result is a consequence of the following lemma known as the Kantorovich inequality.

Clearly, it is equivalent to show that the result is true for any unit vector . Since is symmetric, it is unitarily similar to a diagonal matrix, , and

is a convex combination of the eigenvalues

with

W (

Therefore,

W ( W w y X b ! X   T W W w y

Q y (  X b T

Noting that the function that joins the points

is convex, and

Ry R S f X b( b W #T X bT ( W ( b ! X W r" rX V(T ( y veE( R T X T W R f X b( b # !

is bounded from above by the linear curve , i.e.,

R R S

 R

X b

Setting

, and

, note that

. The following relation holds,

rQaw i

~~ W # # ( ( rX( ~ W # rX ~# (   T X XT # " T T

 R R  (  R w R  X

R b

R ST

r" W rz ( (

(" X (  W T " ( X

X

W ( b ( b

aQ  y (

y W

 R 

matrix and

(Kantorovich inequality) Let be any Symmetric Positive Denite real , its largest and smallest eigenvalues. Then,

X eX

Each step of the above iteration minimizes

S CR B Q 8 D

 "!

A ~Q v 5|  |  | 5 qzl 5    C
T
&95E S 6

 

5 $ S

 w

5 5 5 R HUS

 w (  ( tX AT 
X

r T R

3 R

1 2

C DA

A 6

' 7$ % # 1

1P #

which gives the desired result.

This lemma helps to establish the following result regarding the convergence rate of the method.

Let

Since by construction the new residual vector must be orthogonal to the search direction , the second term in the right-hand side of the above equation is zero. Thus,

The result follows by applying the Kantorovich inequality (5.13).

a!w i a!w i a w i

W U "QH

W U "QH

W ( H ( T

W H

H H H W"QH U W U "QH   ( X  T "QH W"UQH W"UQH  W U

H H y ( H  X H H (  H H H H T H (X  T W ( X H HX H H T X ( 

W U "QH

Start by observing that simple substitution,

and Algorithm 5.2 converges for any initial guess

and then by

w i

H   

 R w R   R 1 R 

 

QH W U

W U "QH

the error vectors

be a Symmetric Positive Denite matrix. Then, the -norms of generated by Algorithm 5.2 satisfy the relation

W  " W  ( ( w X b  X X T T T W T

The maximum of the right-hand side is reached for

yielding,

  

| w |  l 5   l BCzEeqt  | |  |
R

W z

4 Av ~

1 P

'

We now assume that is not necessarily symmetric but only positive denite, i.e., its symmetric part is Symmetric Positive Denite. Taking at each step and , the following iterative procedure results.

Here, each step minimizes in the direction . The iteration converges under the condition that is positive denite as is stated in the next theorem.

Then the residual vectors generated by Algorithm 5.3 satisfy the relation

We proceed similarly to the steepest descent method, starting with the relation

By construction, the new residual vector must be orthogonal to the search , and, as a result, the second term in the right-hand side of the above equation direction vanishes and we obtain

From Theorem 1.19, it can be stated that

r 0 i r w i

pq0 i

r0 0 i

H H H  H H H H H H H H (   (  X H H X H ( H H 

 (  H y H H H  X ( T H H  H H  H   ( X ( y H H T X H H   ( T X ( T H X XH H  H H T X ( T H  H X ( H H T

( Y   

T 

and Algorithm (5.3) converges for any initial guess

r w i

 

 

  (

T( 

X 

w R 

(  X ( T

W W t

 

Let

be a real positive denite matrix, and let

f 

W U QH

 w W(  ( tAT 

1. Until convergence, Do: 2. 3. 4. 5. EndDo

E B ' 0A 8 P5 BA 3

I 3 & 8  P5 B F

6 Q S @8 5 S PD 7V9UT B @ Q 0 C@ 5 B

 "!

5|  |  qzl 5   A
3 Q& 8 I B E B I

R6 Q Q

W U "QH

6 
3  R

W U QH

A ~Q v w

 |5

1 2

cv ~

A 6

' 7$ % #

'

9 C

There are alternative ways of obtaining inequalities that prove convergence. For example, starting from (5.21), (5.22) can be used again for the term and similarly, we can write

in which . Another interesting observation is that if we dene

At each step the reduction in the residual norm is equal to the sine of the acute angle between and . The convergence factor is therefore bounded by

in which is the acute angle between and . The maximum angle guaranteed to be less than when is positive denite as the above results show.

In the residual norm steepest descent algorithm, the assumption that is positive denite is relaxed. In fact, the only requirement is that is a (square) nonsingular matrix. At each step the algorithm uses and , giving the following sequence of operations:

However, an algorithm based on the above sequence of operations would require three matrix-by-vector products, which is three times as many as the other algorithms seen in this section. The number of matrix-by-vector operations can be reduced to two per step by computing the residual differently. This variant is as follows.

0 i

( t

A E 5 ) F F 5 5 5 6P2G5 A 6 P0A F

 H H H  ( H XH H ( T X ( X

V ( VW

 w (   ( tA  

( H T

C 54

C F4  y  H 

then (5.21) can be rewritten as

aa0 i

 

 W

 H  ( H  

 

I 3 ' 62E & 8

  R  X W U R QI6 PH

 

 H

 X  QH w W U

 

763 B F 5

v

W U "VH

 6

R 

tW("

since

is also positive denite. This would yield the inequality

( 

( Y

w W

R 

rX X

VW"t ( ( VW"t VX W T( V X rV T (

  H

where equality

. The desired result follows immediately by using the in.

C
is

  

| w |  l 5   l BCzEeqt  | |  |
X  T H   w     X R T T

Compute
3 3

and

EndDo

Here, each step minimizes in the direction . As it turns out, this is equivalent to the steepest descent algorithm of Section 5.3.1 applied to the normal equations . Since is positive denite when is nonsingular, then, according to Theorem 5.2, the method will converge whenever is nonsingular.

We begin by considering again the block relaxation techniques seen in the previous chapter. To dene these techniques, a set-decomposition of is considered as the denition of subsets of with

Let

be the

matrix

where each is the -th column of the identity matrix. If the block Jacobi and block Gauss-Seidel algorithms, Algorithms 4.1 and 4.2, are examined carefully, it can be observed that each individual step in the main loop (lines 2 to 5) represents an orthogonal projection process over . Indeed, the equation (4.17) is exactly (5.7) with . This individual projection step modies only the components corresponding to the subspace . However, the general block Jacobi iteration combines these modications, implicitly adding them together, to obtain the next iterate . Borrowing from the terminology of domain decomposition techniques, this will be called an additive projection procedure. Generally, an additive projection procedure can be dened for any sequence of subspaces , not just subspaces spanned by the columns of the identity matrix. The only requirement is that the subspaces should be distinct, although they are allowed to overlap. Let a sequence of orthogonal systems be given, with the condition that

C QH

R 

t R X

C QH

P G P G PWG ( 8& s s ( s ( s $ (

!( ! R #V( R #(

R 

R 

Denote by

the size of

and dene the subset

as

( (   (

S cv f cv z vv v

 R

X W 

    

 R

f 

(  R

R !

( (

1. 2. 3. 4. 5. 6. 7.

Compute Until convergence, Do:

S 6 5 &9E $ S R HUS 5 5 5 5

 "!

5|  |  qzl 5   4
8 D

 g   w V A ~Q v
3 9 B @ Q 0 CvA 5 3 1 2

 |5
u

' 7$ % #

WQH U

W ( (  V( y R bR R # ( X

vE
y

H ty R bR H

R w X f
R

W R

H W "QUH Rb

R # W T

( ( V(
#

7E
(

tc
R

represents the projector onto the subspace spanned by , and orthogonal to . Often, the additive processes are used in conjunction with an acceleration parameter , thus (5.25) is replaced by

R

W R

 R

H H

 R

R

W R 

R

 R  W R R  R X f W  R H V A V A


tA
H

R bR

 RX

tA
T

R

5 Y8 E Y8 5 D

PTH5 Y8 6 DQ S E D

7 S P 5 Q Q

H W U "VH w R bR y (Q ( y vE ( ( ( V( Y A ~h v
3

a0 i

( ( V(

(R bR

7E
(

H tc T

R  R 

 C

 tt  @    v    

| w |

| 5 yC

which leads to the following algorithm.

R w X f
R

W R

H W "QUH Rb

The additive projection procedure can be written as

r5 y 5  | 

for

1. For 2. For 3. Solve 4. EndDo 5. Set 6. EndDo

1 2

Dening

, and dene

until convergence, Do: Do:

, the residual vector at step , then clearly

Even more generally, a different parameter

 R X f

W U "QH

Observe that each of the operators

can be used for each projection, i.e.,

 R w X f #

R

W R

H W "QUH Rb

W U "QH

) 0 (&#$ ' %

C QH

The residual norm in this situation is given by

considering the single parameter as a particular case. Exercise 14 gives an example of the choice of which has the effect of producing a sequence with decreasing residual norms. We now return to the generic case, where . A least-squares option can be dened by taking for each of the subproblems . In this situation, becomes an orthogonal projector onto , since

It is interesting to note that the residual vector obtained after one outer loop is related to the previous residual by

where the s are now orthogonal projectors. In particular, in the ideal situation when the s are orthogonal to each other, and the total rank of the s is , then the exact solution would be obtained in one outer step, since in this situation

Thus, the maximum reduction in the residual norm is achieved when the s are orthogonal to one another. Similar to the Jacobi and Gauss-Seidel iterations, what distinguishes the additive and multiplicative iterations is that the latter updates the component to be corrected at step immediately. Then this updated approximate solution is used to compute the residual vector needed to correct the next component. The Jacobi iteration uses the same previous approximation to update all the components of the solution. Thus, the analogue of the block Gauss-Seidel iteration can be dened as follows.
5 V8 E Y8 5 D

ty

1. Until convergence, Do: 2. For Do: 3. Solve 4. Set 5. EndDo 6. EndDo

r 0 i

W R  R ( R XR XR T T  R ( R y R ( R#

H (

H (

%R R # X f

%R

` R Y

 "!

PTH5 Y8 6 DQ S E D

 R X f

5|  |  qzl 5  
 R

R X f

 "

7 7HTEQ C S B 5 Q S @ B Q

 "

W U QH

W U QH

bR w 3 bR y ( (  (

R

 3

A ~Q v

 |5
H R #
1 2

vE

' 7$ % #

C E

1 Consider the linear system

, where

is a Symmetric Positive Denite matrix.

Consider the sequence of one-dimensional projection processes with , where the sequence of indices is selected in any fashion. Let be a new iterate after one projection step from and let , , and . Show that Does this equality, as is, establish convergence of the algorithm?

Assume now that is selected at each projection step to be the index of a component of . Show that largest absolute value in the current residual vector

2 Consider the linear system , where is a Symmetric Positive Denite matrix. Consider a projection step with where is some nonzero vector. Let be the new , and . iterate after one projection step from and let

Does this equality establish convergence of the algorithm?

in which converges?

is the spectral condition number of

. Does this prove that the algorithm

Compare the cost of one step of this method with that of cyclic Gauss-Seidel (see Example 5.1) and that of optimal Gauss-Seidel where at each step and is a component of largest magnitude in the current residual vector.

3 In Section 5.3.3, it was shown that taking a one-dimensional projection technique with and is mathematically equivalent to using the usual steepest descent algorithm applied to the normal equations . Show that an orthogonal projection method for using a subspace is mathematically equivalent to applying a projection method onto , orthogonally to for solving the system .

% V

U  1@

4 Consider the matrix

In Gastinels method, the vector is selected in such a way that dening the components of to be , where residual vector. Show that

2 !2

 0FR 1 

1   ! C  9 G3

A 9 C

R

32 "2

C q3 ' 9 C  ! 3

X  X

Wv

9 eq3

! X PTR IHQ1 3 P

 49 " 

7 Q

"

q3



 U "

 9

Show that

v  %FE Tt  & 

 & " 

v E t

"

D(R 1 C

 5 3 2  & " 2

 & "    " 

!

X i

 1

X  X

FR T1

B

q3

 I

9 fq3

! 72 2

in which

is the spectral condition number of . [Hint: Use the inequality .] Does this prove that the algorithm converges?

, e.g., by is the current

C
   R 1 A @ 1! X  @  v % $ Tt


  " 

  

0( U   ! 3 49 "  " q3  9  & "    " q3   )' 9

3 92 " 2

1&

v # Tt

Wv

9 eq3

" 

7 8



642 % " 2 5 3 

vcQtc |5 | y  w  |y|


9 fq3

!

v uWt

SR 1

     `  ` 

Find a rectangle or square in the complex plane which contains all the eigenvalues of without computing the eigenvalues.

Is the Minimal Residual iteration guaranteed to converge for a linear system with the matrix ?

5 Consider the linear system

Consider an iteration procedure which consists of performing the two successive half-steps described above until convergence. Show that this iteration is equivalent to a (standard) Gauss-Seidel iteration applied to the original system. Now consider a similar idea in which is taken to be the same as before for each half-step . Write down the iteration procedure based on this approach. Name another and technique to which it is mathematically equivalent.

6 Consider the linear system , where is a Symmetric Positive Denite matrix. We dene a projection method which uses a two-dimensional space at each step. At a given step, take , where is the current residual. For a basis of use the vector and the vector obtained by orthogonalizing against with respect to the -inner product. Give the formula for computing (no need to normalize the resulting vector). Write the algorithm for performing the projection method described above. Will the algorithm converge for any initial guess ? Justify the answer. [Hint: Exploit the convergence results for one-dimensional projection techniques.]

7 Consider projection methods which update at each step the current solution with linear combi. nations from two directions: the current residual and Consider an orthogonal projection method, i.e., at each step . Assuming that is Symmetric Positive Denite, establish convergence of the algorithm. Consider a least-squares projection method in which at each step and . Assuming that is positive denite (not necessarily symmetric), establish convergence of the algorithm.

[Hint: The convergence results for any of the one-dimensional projection techniques can be exploited.] 8 The least-squares Gauss-Seidel relaxation method denes a relaxation step as (same as Gauss-Seidel), but chooses to minimize the residual norm of . Write down the resulting algorithm.

Show that this iteration is mathematically equivalent to a Gauss-Seidel iteration applied to the normal equations .

9 Derive three types of one-dimensional projection algorithms in the same manner as was done in Section 5.3, by replacing every occurrence of the residual vector by a vector , a column of the identity matrix.

 E   &8 

v

Dene an orthogonal projection method using the set of vectors , i.e., . Write down the corresponding projection step ( is modied into Similarly, write the projection step for the second half of the vectors, i.e., when , .

!

 h

!

! R T1

! FR T1

v  BCBB v   A A A

 Q 

1&

X  X

  AA T 0BBA

!

 A A A    BCBB v 0FR 1

! R 1

 0FR 1

in which

and

are both nonsingular matrices of size

each. ).

 

 "!
,

5|  |  qzl 5  

p  r  v  p

 |5
 p

         `    `  `    `  `

9 tC

10 Derive three types of one-dimensional projection algorithms in the same manner as was done in Section 5.3, by replacing every occurrence of the residual vector by a vector , a column of the matrix . What would be an optimal choice for at each projection step? Show that the method is globally convergent in this case. 11 A minimal residual iteration as dened in Section 5.3.2 can also be dened for an arbitrary search direction , not necessarily related to in any way. In this case, we still dene . Write down the corresponding algorithm. Under which condition are all iterates dened?

12 Consider the following real-valued functions of the vector variable , where coefcient matrix and right-hand system of a given linear system and

Calculate the gradients of all four functions above. How is the gradient of How is the gradient of related to that of ? related to that of relate to the when -norm of the error when

is symmetric?

How does the function Positive Denite?

13 The block Gauss-Seidel iteration can be expressed as a method of successive projections. The subspace used for each projection is of the form

What is ? Not too commonly used an alternative is to take , which amounts to solving a least-squares problem instead of a linear system. Develop algorithms for this case. What are the advantages and disadvantages of the two approaches (ignoring convergence rates)? 14 Let the scalars

in the additive projection procedure satisfy the constraint

We wish to choose a set of s such that the 2-norm of the residual vector is minimal. Determine this set of s, assuming that the vectors are all linearly independent.

!9

Show that in the least-squares case, we have satisfy the constraint (5.27).

for any choice of

It is not assumed that each is positive but only that given by the Formula (5.26) or, equivalently,

for all . The residual vector is

s which

a 0 i

t  v

"

Write a general sufcient condition which must be satised by guarantee convergence.

at each step in order to and are the .

is Symmetric

v 2 2 5 2 ! 2

  g3

  A A A  "  BB0B v   0R T1 

9 A ! W

A   q3 49  3 % 9   2 X & X 2  2 1& 2   2 2

5 @ b@

  w b g3

A 

v v !  " f

v " f

9  G3 9 3  G 9  G3 9  G3 )

"

Under which condition on

does the new iterate make no progress, i.e.,

C tC 9

"

2 2

2 ! 2

|5 | y  w  |y|


b

"

  `      `       `

The optimal s provided in the previous question require the solution of a Symmetric Positive Denite linear system. Let be the search directions provided by each of the individual projection steps. To avoid this difculty, a simpler strategy is used which consists of performing successive minimal residual iterations along these search directions, as is described below.

EndDo
Show that

. Give a sufcient condition to ensure global convergence.

15 Consider the iteration: , where is a vector called the direction of search, is a scalar. It is assumed throughout that is a nonzero vector. Consider a method which and determines so that the residual is the smallest possible. Determine so that is minimal. obtained in this manner is orthogonal to

Show that the residual vectors satisfy the relation:

Assume that at each step , we have

. Will the method always converge?

, where is a vector called the direction of search, 16 Consider the iteration: and is a scalar. It is assumed throughout that is a vector which is selected in the form where is some nonzero vector. Let be the exact solution. Now consider a method which at each step determines so that the error norm is the smallest possible.

(e.g.,

Show that

N OTES AND R EFERENCES . Initially, the term projection methods was used mainly to describe onedimensional techniques such as those presented in Section 5.3. An excellent account of what has been done in the late 1950s and early 1960s can be found in Householders book [122] as well as Gastinel [101]. For more general, including nonlinear, projection processes, a good reference is Kranoselskii and co-authors [138]. Projection techniques are present in different forms in many other areas of scientic computing and can be formulated in abstract Hilbert functional spaces. The terms Galerkin and Petrov-Galerkin techniques are used commonly in nite element methods to describe projection methods on nite element spaces. The principles are identical to those seen in this chapter.

Establish the convergence of the algorithm for any

, when

for all .

9 "  P3

Determine

RP SQ1

v 2 2 5 2 2  v "

so that is orthogonal to or ).

is minimal and show that the error vector . The expression of should not contain unknown quantities

Tt  v

"

Now assume that is positive denite and select at each step will converge for any initial guess .

A 9

"

 ! 3

R SPQ1

  9

"

"

"  ! 3 v 2 2 5 2 ! 2

"

 E

Show that the residual vector

. Prove that the method

"

"

v ! 2 2

" &

E 

v ! 2 2

v v 2 !2 5 2 !2

 v

For

Do:

! # !
.

 "!

5|  |  qzl 5  

q b  !  ! bE  q3 ' 9  ! 3 b ! CABA0B   A  !

 |5

X 

  `  

 

    `

"

tC 9

9tC

( t W

W((  W W

(  (   c

 (
X

Recall from the previous chapter that a general projection method for solving the linear system

where is another subspace of dimension . Here, represents an arbitrary initial guess to the solution. A Krylov subspace method is a method for which the subspace is the Krylov subspace

 V A

Rwq i

 w

`qX rHpFeP H `9PqrXruRXW6tF`X rHt%DX is p s B rq9u`%P cYd89D9PaD`RX6WFAED%B98%TWX9Bd8pF9P%6RFSo98A98R5`qX rHFpeP H `9PqrX%uRX6tFRXqY qXRs7 s sX Y 8 U u n u s 6 is p s B H p B 986CGXveYd8pFP %pQ8! cwmqEXGB vqeY3 98!T H%BtY aDW `Y RXW6 WFAzDE%B989TX%B8dpF9P%6tD rH7kwiP fXRsx qrXn F s F B B X s U fe 8 8 U u n u 6 5 p H U p 9 F ws `iY `%XpX P fRXADsP x %Xn Q8f98%BP D Hy! s Q8%B98y %CWxfq3 ! U8B dXtA98PpFf 6tUaHoF e%fXEBrtDnX E`WFGXPBn Due9898Tb rHWxeu%6s eYI898Vds8 WD9P98n6l{DFD ese8tFB9u9PXRn6DD H u q pH 6 6 9 U s pqWbyQ8rBc%6u rHyresDe89u9PnRDbf%TX txGXpFs9Xsr98 H qV9YaP9P%XruRXW6EFA69FrXesDI8IDDe8u D P 6 D p B b pX s p s BX 9qEXrBX SHWFeu8QX%B~eYd8RD9PgQ8rB7D98 rH%6u8dFf8dD9867{iey%UI8GFRDGxkrBeP98tsH ga8EurBP vuRtsrHT IXD n s n sX P b s 5D D p p p B f p P b s F H s B n H X U F sX UP F Xm8 `P wpaH9PpTtD98 rH%6ud8g98T SHRFQP%BI8F mpFPdtFXrttUGF%DAv98W6Ruy`r8 IXzx ppF%sQ8B tq%bu eYQ8%BY rHRDqXvQ8rB96u rHAVaD`RX6WFA"Wx8vP}Q8`X `roDE%BI8pF`9P%6mRFSo989875 B 8 s u P 6 Y 8 U f B pn 8 n u X F s 6

is a method which seeks an approximate solution from an afne subspace dimension by imposing the Petrov-Galerkin condition

of

R  Q c

  !   2 0  31) ( '     1 $ 0 1 !

where . When there is no ambiguity, will be denoted by . The different versions of Krylov subspace methods arise from different choices of the subspace and from the ways in which the system is preconditioned, a topic that will be covered in detail in later chapters. Viewed from the angle of approximation theory, it is clear that the approximations obtained from a Krylov subspace method are of the form

In other words, is approximated by . Although all the techniques provide the same type of polynomial approximations, the choice of , i.e., the constraints used to build these approximations, will have an imgive rise to the bestportant effect on the iterative technique. Two broad choices for known techniques. The rst is simply and the minimum-residual variation . A few of the numerous methods in this category will be described in this chapter. The second class of methods is based on dening to be a Krylov subspace method associated with , namely, . Methods of this class will be covered in the next chapter. There are also block extensions of each of these methods termed block Krylov subspace methods, which will be discussed only briey. Note that a projection method may have several different implementations, giving rise to different algorithms which are all mathematically equivalent.

In this section we consider projection methods on Krylov subspaces, i.e., subspaces of the form

which will be denoted simply by if there is no ambiguity. The dimension of the subspace of approximants increases by one at each step of the approximation process. A few elementary properties of Krylov subspaces can be established, many of which need no proof. A rst property is that is the subspace of all vectors in which can be written as , where is a polynomial of degree not exceeding . Recall that the minimal polynomial of a vector is the nonzero monic polynomial of lowest degree such that . The degree of the minimal polynomial of with respect to is often called the grade of with respect to , or simply the grade of if there is no ambiguity. A consequence of the Cayley-Hamilton theorem is that the grade of does not exceed . The following proposition is easy to prove.

r0  i

Y $

W( W ( ( (

(

W X W T ' W

  T  

C H

 v lzc }

X T (

in which then

is a certain polynomial of degree

. In the simplest case where


,

5 C
X

( 

(
X

  1B


T W

5| j yq|  

w 

' W

  " "

 |5


t g

  

9 9 tC

It was mentioned above that the dimension of is nondecreasing. In fact, the following proposition determines the dimension of in general.

The vectors form a basis of if and only if for any set of scalars , where at least one is nonzero, the linear combination is nonzero. This is equivalent to the condition that the only polynomial of degree for which is the zero polynomial. The second part of the proposition is a consequence of the previous proposition.

First we prove that for any polynomial of degree . It is sufcient to show the property for the monic polynomials . The proof is by induction. The property is true for the polynomial . Assume that it is true for :

which proves that the property is true for , it only remains to show that

, provided

. For the case , which follows from

Looking at the right-hand side we observe that

y ! X y E X y w E T  T w W"U W ( R v R "U R X T  X T X R T X W R T X "U R

belongs to

If the vector on the left-hand side belongs to equation is multiplied on both sides by , then

Multiplying the above equation by

on both sides yields

, and therefore if the above

. Hence,

##!V(`7! X (  ! R ( Y E RT !  XT

X X X T T

T !

and for any polynomial of degree

( 

W "U

 ! X R

to

, that is,

Let

be any projector onto and let be the section of . Then for any polynomial of degree not exceeding

R 

H ( " 5I6 ! C4

y Y !  W X yT ( #!(Y tEe( R R  R   R ! W WW (( ( A 6  564

Therefore,

! 

6 54

) ' 2 )

y  ! Q v A

'

grade

The Krylov subspace is of dimension of with respect to is not less than , i.e.,

 

) ' 2 )

 

Q v A

for all

if and only if the

Let

be the grade of . Then

is invariant under

and

9 tC

|    j
C DA

) ' 2 )

" 
' '

Q v A
y w E
' ' '


w E

Arnoldis method [9] is an orthogonal projection method onto for general nonHermitian matrices. The procedure was introduced in 1951 as a means of reducing a dense matrix into Hessenberg form. Arnoldi presented his method in this manner but hinted that the eigenvalues of the Hessenberg matrix obtained from a number of steps smaller than could provide accurate approximations to some eigenvalues of the original matrix. It was later discovered that this strategy leads to an efcient technique for approximating eigenvalues of large sparse matrices. The method will rst be described theoretically, i.e., assuming exact arithmetic, then implementation details will be addressed.

Arnoldis procedure is an algorithm for building an orthogonal basis of the Krylov subspace . In exact arithmetic, one variant of the algorithm is as follows:
Q P B 76 8 D

At each step, the algorithm multiplies the previous Arnoldi vector by and then orthonormalizes the resulting vector against all previous s by a standard Gram-Schmidt procedure. It will stop if the vector computed in line 4 vanishes. This case will be examined shortly. Now a few simple properties of the algorithm are proved.

W W

(( W W( W H  C

Then the vectors

Assume that Algorithm 6.1 does not stop before the -th step. form an orthonormal basis of the Krylov subspace

 W "U W"U u u  u u  W"U Y u u U  W u u W  u   R y vR R u u 3 u V(Q ( ( E u ( u vR y XR u !( ( #V(  TW

1. Choose a vector of norm 1 2. For Do: 3. Compute for 4. Compute 5. 6. If then Stop 7. 8. EndDo

by simply multiplying both sides by

5 C

I H BA '  & 8 ) 7F P% 50A 3 B 8 H

  1B

5| j yq|  

u u

  " " ! $6

( ( (

    Q ~m

 |5

3 C 4DA

9 vA

' )$ )

1 2

~Q v

h v A


) '

' 7$ % #

T W
'

9 tC

The vectors , are orthonormal by construction. That they span follows from the fact that each vector is of the form where is a polynomial of degree . This can be shown by induction on as follows. The result is clearly true for , since with . Assume that the result is true for all integers and consider . We have

The relation (6.5) follows from the following equality which is readily derived from lines 4, 5, and 7 of Algorithm 6.1,

Relation (6.4) is a matrix reformulation of (6.7). Relation (6.6) follows by multiplying both sides of (6.4) by and making use of the orthonormality of . The result of the proposition is illustrated in Figure 6.1.

one matrix.

As was noted earlier, the algorithm may break down in case the norm of vanishes at a certain step . In this case, the vector cannot be computed and the algorithm stops.

The action of

on

gives

plus a rank-

a q i a q i q i

a q i

V( (

$Q ( !( (

W "U

( R uR

W "U

W  R W "U f

, by , the Algorithm 6.1, and by following relations hold:

Denote by , the matrix with column vectors , , Hessenberg matrix whose nonzero entries are dened by the matrix obtained from by deleting its last row. Then the

u vR

which shows that the proof.

can be expressed as

where

is of degree and completes

aa q i

9  tC W u

R vR u

W 

R

! X 

R uR

$V ( !( (

f W "U u u W

R

1B 

y z }( u

 V

5| yq
X

C DA 7 4 32 6 5

W"U u W U "U W u u

w !




  B   T' 2 ) )

Q $ v A

'

A A

'

6 6

Still to be determined are the conditions under which this situation occurs.

Arnoldis algorithm breaks down at step (i.e., in line 5 of Algorithm 6.1), if and only if the minimal polynomial of is of degree . Moreover, in this case the subspace is invariant under .
If the degree of the minimal polynomial is , then must be equal to zero. can be dened and as a result would be of dimension . Indeed, otherwise Then Proposition 6.2 would imply that , which is a contradiction. To prove the converse, assume that . Then the degree of the minimal polynomial of is such that . Moreover, it is impossible that . Otherwise, by the rst part of this proof, the vector would be zero and the algorithm would have stopped at the earlier step number . The rest of the result follows from Proposition 6.1. A corollary of the proposition is that a projection method onto the subspace will be exact when a breakdown occurs at step . This result follows from Proposition 5.6 seen in Chapter 5. It is for this reason that such breakdowns are often called lucky breakdowns.


In the previous description of the Arnoldi process, exact arithmetic was assumed, mainly for simplicity. In practice, much can be gained by using the Modied Gram-Schmidt or the Householder algorithm instead of the standard Gram-Schmidt algorithm. With the Modied Gram-Schmidt alternative the algorithm takes the following form:
S RQ

In exact arithmetic, this algorithm and Algorithm 6.1 are mathematically equivalent. In the presence of round-off the above formulation is much more reliable. However, there are cases where cancellations are so severe in the orthogonalization steps that even the Modied Gram-Schmidt option is inadequate. In this case, two further improvements can be utilized. The rst improvement resorts to double orthogonalization. Whenever the nal vector obtained at the end of the main loop in the above algorithm has been computed, a

 W U

. If

Y $ u

R vR u XR u

W"U W"U u u  u U u  W u u  u 

u 3 u ( u y vR u ( YV( E T u y ! ( 3 (  W #V(

1. Choose a vector 2. For 3. Compute 4. For 5. 6. 7. EndDo 8. 9. 10. EndDo

of norm 1 Do: Do:

Stop

w 

Y ` u

 "U W

F DC776P5 P&  I B & 8#DCA ) 8 6 E 'B A 8 A E I 5 )B 3

5 C


  1B W "U u


5| j yq|  

y

w 

@8 V2

 

P P D  P B 76 8 5 Q Q D

  " " Y

 |5
u  6 

W U

3 RA

' )$ )

1 2

~Q v

h v A
)

'

A 6

' 7$ % #

'

9 tC u

test is performed to compare its norm with the norm of the initial (which is ). If the reduction falls below a certain threshold, indicating severe cancellation might have occurred, a second orthogonalization is made. It is known from a result by Kahan that additional orthogonalizations are superuous (see, for example, Parlett [160]). The second improvement is to use a different technique altogether. From the numerical point of view, one of the most reliable orthogonalization techniques is the Householder algorithm. Recall from Chapter 1 that the Householder orthogonalization uses reection matrices of the form to transform a matrix into upper triangular form. In the Arnoldi algorithm, the column vectors of the matrix to be orthonormalized are not available ahead of time. Instead, the next vector is obtained as , where is the current basis vector. In the Householder algorithm an orthogonal column is obtained as where are the previous Householder matrices. This vector is then multiplied by and the previous Householder transforms are applied to it. Then, the next Householder transform is determined from the resulting vector. This procedure is described in the following algorithm, which was originally proposed by Walker [221].

For details regarding the determination of the Householder vector in the third to fth lines and on its use in the sixth to eight lines, see Chapter 1. Recall that the matrices need not be formed explicitly. To obtain from in line 6, zero out all the components from through of the -vector and change its -th component, leaving all others position unchanged. Thus, the matrix will have the same structure as the matrix of equation (1.22) in Chapter 1. By comparison with the Householder algorithm seen in Chapter 1, we can infer that the above process computes the factorization of the matrix . Dene

The denition of

in line 8 of the algorithm yields the relation,

a q i

After the next Householder transformation tion,

is applied in line 6,

satises the rela-

a q i

u

1. Select a nonzero vector ; Set 2. For Do: such that 3. Compute the Householder unit vector 4. and 5. , where 6. 7. 8. If compute 9. EndDo

9 tC

 u

W "U u

W W W "U u u u u W!  u u u W  u rV(y( y veE`u u u (Y V(w( y vrEY R X Ru X u u T (

W u u "U u  W "U u

& ( (

W "U

Q B D P P76 8

u u

u u

(  $  u W

3 84C P7 5 D  RA 5 B D

W "U

( ( ( (  (

H @ 

 V

W "U u

( R (

1B 
!

! !( #( #V(

5| yq
u W

W W

W "U

  B   y T
1 2

R R

~h v

) 0 (&#$ ' %

w 

This leads to the factorization,

where the matrix is and is upper triangular and is unitary. It is important to relate the vectors and dened in this algorithm with vectors of the standard Arnoldi process. Let be the matrix obtained from the rst rows of the matrix . Since is unitary we have and hence, from the relation (6.9)

This is identical with the relation (6.5) obtained with the Gram-Schmidt or Modied GramSchmidt implementation. The s form an orthonormal basis of the Krylov subspace and are identical with the s dened by the Arnoldi process, apart from a possible sign difference. Although the Householder algorithm is numerically more viable than the GramSchmidt or Modied Gram-Schmidt versions, it is also more expensive. The cost of each of the outer loops, corresponding to the control variable, is dominated by lines 7 and 8. These apply the reection matrices for to a vector, perform the matrixvector product , and then apply the matrices for to a vector. The application of each to a vector is performed as with

This is essentially the result of a dot-product of length followed by a vector update of the same length, requiring a total of about operations for each application of . Neglecting the last step, the number of operations due to the Householder transformations alone approximately totals

The table below shows the costs of different orthogonalization procedures. GS stands for Gram-Schmidt, MGS for Modied Gram-Schmidt, MGSR for Modied Gram-Schmidt with reorthogonalization, and HO for Householder.

R !

( V(

y E $ X y w E y T w  R ` y ( z t vE

!( #V(

 Y 

' 

( YV(

 

R y

This yields the relation matrix form as

, for

, which can be written in

w 

for

R%tq i

where each is the -th column of the it is not difcult to see that

identity matrix. Since

for

r w i

W "U

u

W "U W u

R  R

#V( !(

& ( (

W "U

u

u vR

W "U

uX y

W R U f W

( R  R

R T y R $ ( (  w ! & & X W(T ( W( W  $ (

U W

R uR

}E

W W  R uvR "U R uW W"U u

z X R

& ( (

Now observe that since the components any . Hence,

of

are zero, then

u R


for ,

5 C

v ( u u W( w (
u

  1B

5| j yq|  


W  W R "U W "U f u u

U W u

u  f

  " "

E w &

 |5
W U


u

TR

R

w   E

u f

w !

GS Flops Storage

MGS

MGSR

HO

The number of operations shown for MGSR corresponds to the worst case scenario when a second orthogonalization is performed each time. In practice, the number of operations is usually closer to that of the standard MGS. Regarding storage, the vectors need not be saved. In the algorithms for solving linear systems, these vectors are needed at the end of the process. This issue will be covered with the Householder implementations of these algorithms. For now, assume that only the s are saved. The small gain in memory usage in the Householder version can be explained by the diminishing lengths of the vectors required at each step of the Householder transformation. However, this difference is negligible relative to the whole storage requirement of the algorithm, because , typically. The Householder orthogonalization may be a reasonable choice when developing general purpose, reliable software packages where robustness is a critical criterion. This is especially true for solving eigenvalue problems since the cost of orthogonalization is then amortized over several eigenvalue/eigenvector calculations. When solving linear systems, the Modied Gram-Schmidt orthogonalization, with a reorthogonalization strategy based on a measure of the level of cancellation, is more than adequate in most cases.

Given an initial guess to the original linear system , we now consider an orthogonal projection method as dened in the previous chapter, which takes , with

If

in Arnoldis method, and set

, then

w i

As a result, the approximate solution using the above by

W F$ S

( b w

by (6.6) and

-dimensional subspaces is given

aa w i



 tA

g

W      

  

in which subspace

. This method seeks an approximate solution of dimension by imposing the Galerkin condition

from the afne

a0 w i

C X

!( $V(

C
(

  

rE( R

! W R R !

y w ! y w ! y w ! y w ! X ! T X ! T X ! T X ! T
( t  W 

S     y z vv i  Q m

| 5  | mV  ky" @ t t
R

W((  W(  W(   c

 ! 1B  X

(

 tV w 

5| yq
T

 

  B  

A method based on this approach and called the Full Orthogonalization Method (FOM) is described next. Modied Gram-Schmidt is used in the Arnoldi step.

Do:

The above algorithm depends on a parameter which is the dimension of the Krylov subspace. In practice it is desirable to select in a dynamic fashion. This would be possible if the residual norm of the solution is available inexpensively (without having to compute itself). Then the algorithm can be stopped at the appropriate step using this information. The following proposition gives a result in this direction.

The residual vector of the approximate solution the FOM Algorithm is such that

and, therefore,

A rough estimate of the cost of each step of the algorithm is determined as follows. If is the number of nonzero elements of , then steps of the Arnoldi procedure will require matrix-vector products at the cost of . Each of the Gram-Schmidt steps costs approximately operations, which brings the total over the steps to

By the denition of , follows immediately.

, and so

from which the result

W "U

Y` b

 "U W

WS

b w 

W S $ b b W ` S t A V A

We have the relations,

r w i

W "U

0 b 0 

 "U W

 "U W

tA

and

. If .

set

V y 

A

EndDo Compute Compute EndDo Compute

and Goto 12

Y `

; Set

 Y

' )$ )

h v A

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Compute Dene the For Compute For

, matrix Do:

, and

computed by

rQw i

 

5 C i

7 5  PT&H Q C@4PR 7 `8 D S 6 DQ S @ B 6 D D S

  1B

W S W w  b b X  "U W W "U  T W W ! Y u U u u  u u  u u U u u R vR u  u u ( u uR X R V( y vvE ( T u ( u y !( ( #V( }   W    W u R u R ! ! S       S  tA g v  v9A ~Q v

5| j yq|  

X W

FS W

b 3 BB 3 C 1 2 )

  " "

 |5

'

A 6

' 7$ % #

'

C


Regarding storage, vectors of length are required to save the basis . Additional vectors must be used to keep the current solution and right-hand side, and a scratch vector for the matrix-vector product. In addition, the Hessenberg matrix must be saved. The total is therefore roughly

Consider now the algorithm from a practical viewpoint. As increases, the computational cost increases at least as because of the Gram-Schmidt orthogonalization. The memory cost increases as . For large this limits the largest value of that can be used. There are two remedies. The rst is to restart the algorithm periodically and the second is to truncate the orthogonalization in the Arnoldi algorithm. In this section we consider the rst of these two options, which is described below.

There are many possible variations to this basic scheme. One that is generally more economical in practice is based on the observation that sometimes a small is sufcient for convergence and sometimes the largest possible is necessary. Hence, the idea of averaging over different values of . Start the algorithm with and increment by one in line 5 until a certain is reached, after which is reset to one, or kept the same. These variations will not be considered here. Table 6.1 shows the results of applying the FOM algorithm with no preconditioning to three of the test problems described in Section 3.7. Matrix F2DA F3D ORS Iters 109 66 300 Kops 4442 11664 13558 Residual 0.36E-03 0.87E-03 0.26E+00 Error 0.67E-04 0.35E-03 0.71E-04

S  Q 



R ! !

FS W

1. 2. 3. 4. 5.

Compute , , and Generate the Arnoldi basis and the matrix starting with . Compute and Set and go to 1.

. using the Arnoldi algorithm . If satised then Stop.

I ( 07C7A 6P ' 5 A 3 8 F 5 3

 # 

D'C7 B8 E BA

  



3 8 C7

 S V A  A ~h v

UY9@ S 5 5 S8

X T

! $  

In most situations

is small relative to , so this cost is dominated by the rst term.

! w X

) 

approximately

. Thus, on the average, a step of FOM costs approximately

| 5  | mV  ky" @ t

 ! 1B 

5| yq

  B  
1 2

C DA 7 

) 0 (&#$ ' %

A test run of FOM with no preconditioning.

The column labeled Iters shows the total actual number of matrix-vector multiplications (matvecs) required to converge. The stopping criterion used is that the 2-norm of the residual be reduced by a factor of relative to the 2-norm of the initial residual. A maximum of 300 matvecs are allowed. Kops is the total number of oating point operations performed, in thousands. Residual and Error represent the two-norm of the residual and error vectors, respectively. In this test, was taken to be 10. Note that the method did not succeed in solving the third problem.

A second alternative to FOM is to truncate the Arnoldi recurrence. Specically, an integer is selected and the following incomplete orthogonalization is performed.

and

The number of directions against which to orthogonalize may be dictated by memory limitations. The Incomplete Orthogonalization Method (IOM) consists of performing the above incomplete orthogonalization procedure and computing an approximate solution using the same formulas (6.14) and (6.15).

Run a modication of Algorithm 6.4 in which the Arnoldi process in lines 3 to 11 is replaced by the Incomplete Orthogonalization process and every other computation remains unchanged.

It is now necessary to keep only the previous vectors. The others are not needed in the above process and may be discarded. However, the difculty remains that when the solution is computed by formula (6.14), all the vectors for are required. One option is to recompute them at the end, but essentially this doubles the cost of the algorithm. Fortunately, a formula can be developed whereby the current approximate solution can be updated from the previous approximation and a small number

!( ( $V (

 W U

u  

W "U

 R vR u 3 ( y X YR ( y P H 6 u R E Y(t ( w W T u 3 y !( ( #V( 

S 8 D B Q

 

1 h

 W U

3 A

1. For 2. Compute 3. For 4. 5. 6. EndDo 7. Compute 8. EndDo

Do:

Do:

E Y8 5 D

5 C

I 'B E DC8

  1B

6 Q S PD 7@Q B @4PD 6

5| j yq|  

7 D
Y8 S

I 'B E 'B A 3 8 D7 DC7 B8 C7

5 S B U5 C

  " "
!

  

 |5
D E 76

C DA 7 
3 A

1 2

1 2

} ~Q v

~Q v

' 7$ % #

' 7$ % #

9 C
W

of vectors that are also updated at each step. This progressive formulation of the solution leads to an algorithm termed Direct IOM (DIOM) which we now derive. The Hessenberg matrix obtained from the incomplete orthogonalization process has a band structure with a bandwidth of . For example, when and , it is of the form

The Direct version of IOM is derived from exploiting the special structure of the LU factorization, , of the matrix . Assuming no pivoting is used, the matrix is unit lower bidiagonal and is banded upper triangular, with diagonals. Thus, the above matrix has a factorization of the form

Dening

the approximate solution is given by

In addition, because of the structure of

, we have the relation

W U "QH

W f

R 

which allows the vector of the relation,

to be computed from the previous

(  R

W U "VH

R

Because of the structure of columns of the matrix relation

can be updated easily. Indeed, equating the last yields

s and

, with the help

a!w i

FS W

and

FS W

The approximate solution is then given by

a!w i

C C

!

R `W

R R

| 5  | mV  ky" @ t

W W

R `W

R R



w W

W W

 ! 1B  R

5| yq

  B  

in which

From (6.18),

where is dened above. This gives the following algorithm, called the Direct Incomplete Orthogonalization Method (DIOM).

1. Choose and compute , , . 2. For , until convergence Do: 3. Compute , and as in 4. lines 2-7 of Algorithm (6.6). 5. Update the LU factorization of , i.e., obtain the last column 6. of using the previous pivots. If Stop. 7. if then else 8. ( for set ) 9. 10. EndDo

Note that the above algorithm is based implicitly on Gaussian elimination without pivoting for the solution of the Hessenberg system . This may cause a premature termination in line 6. Fortunately, there is an implementation based on Gaussian elimination with partial pivoting. The details of this variant can be found in [174]. DIOM can also be derived by imposing the properties that are satised by the residual vector and the conjugate directions, i.e., the s. Observe that (6.4) is still valid and as a consequence, Proposition 6.7, which is based on it, still holds. That is because the orthogonality properties were not used to derive the two relations therein. Since the residual vector is a scalar multiple of and since the s are no longer orthogonal, IOM and DIOM are not orthogonal projection techniques. They can, however, be viewed as oblique projection techniques onto and orthogonal to an articially constructed subspace.

$V( !(

vE

W "U

W "U

( R

where

( ( (

W QH C

R  R

process onto

IOM and DIOM are mathematically equivalent to a projection and orthogonally to

W "U

Y$! R

S   

W "U

FS

W



!( ( #V

W  R

V

w W

W U "QH

#$!( y QI6 E R y PH ( ( ! A g 

W  R ( S

Noting that at each step by the relation,

, it follows that the approximation

 w
can be updated

5 C

  1B

w  ( W & W W 

5| j yq|  

  " "

 |5

!

W w

3 A

' )$ )

1 2

~Q v

h v A

'

' 7$ % #

'

 C R

The following simple properties can be shown:

The Generalized Minimum Residual Method (GMRES) is a projection method based on taking and , in which is the -th Krylov subspace with . As seen in Chapter 5, such a technique minimizes the residual norm over all vectors in . The implementation of an algorithm based on this approach is similar to that of the FOM algorithm. We rst describe the basic idea and then discuss a few practical variations.

There are two ways to derive the algorithm. The rst way exploits the optimality property and the relation (6.5). Any vector in can be written as

a00  i

FS 

w X b

W "U

y  !

Since the column-vectors of

are orthonormal, then

R`0  i

W  "U W  b "U S W W S b $ t  A V A

w X b

 

the relation (6.5) results in

a 0  i

( b w  X

A   ty  X b

X b T

where

is an

-vector. Dening

a!w i

I H 3 0A B '  & 8 5 QI  D7F P% 5 H A F 3 )B 8

t

for

( b w

XR

 w

( u

 #

! $ 

For the case

(full orthogonalization) the

s are semi-conjugate, i.e.,

t

w W

for

Y $ R ( u X

 w

     

The

s are locally -orthogonal to the Arnoldi vectors, i.e.,

( W

for

0  

E 0

$V( !(

The residual vectors

, are locally orthogonal,

W "U

The proof is an immediate consequence of the fact that is a multiple of and by construction, is orthogonal to all s dened in the proposition.

 C

( Y

yX

R ( u

RT

W "U

| 7

  m(

  

The GMRES approximation is the unique vector of which minimizes (6.20). By (6.19) and (6.22), this approximation can be obtained quite simply as where minimizes the function , i.e.,

The minimizer is inexpensive to compute since it requires the solution of an least-squares problem where is typically small. This gives the following algorithm.

Do:

Do:

The second way to derive the GMRES algorithm is to use the equations (5.7) with . This is the subject of Exercise 4.

The previous algorithm utilizes the Modied Gram-Schmidt orthogonalization in the Arnoldi process. Section 6.3.2 described a Householder variant of the Arnoldi process which is numerically more robust than Gram-Schmidt. Here, we focus on a modication of GMRES which retrots the Householder orthogonalization. Section 6.3.2 explained how to get and the columns of at each step, from the Householder-Arnoldi algorithm. the Since and are the only items needed to extract the approximate solution at the end of the GMRES process, the modication seems rather straightforward. However, this is only true if the s are stored. In this case, line 12 would remain the same and the modication to the algorithm would be in lines 3-11 which are to be replaced by the Householder variant of the Arnoldi process. It was mentioned in Section 6.3.2 that it is preferable not to store the s because this would double the storage requirement. In this case, a formula must be found to generate the approximate solution in line 12, using only the s, i.e., the s. Let

E ' 7PP0 P5 & 2PP62H 0A B F 3 5 3 ' H 5 F ' 5 H

 X

W TWVT T ( ( (

the minimizer of

and

W "U

 

EndDo Compute

. If

set

EndDo

and go to 12

Y $

matrix

. Set

 Y

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Compute Dene the For Compute For

, and

w !

W w  b  b S  W "U W"U $  W W ! Y$ u U u u  u u  u u U u u R vR u  u u ( u uR X R V( y vvE ( T u ( u y !( ( #V( } y W  "U W W ! ! u u R W vR S   X w  $   S  tA T  vm( A ~Q v

where

y ! X 0  i aa0 q i

b w

5 C

  1B  b

 w $ 
b

5| j yq|  
$
3

FS 4 6 H  FCI  b ( b w 

FS 

X b T

  " "
3

 |5
3 b 1 2 ) b

' 7$ % #

is dened by

Using a Horner-like scheme, we obtain

Therefore, when Householder orthogonalization is used, then line 12 of the GMRES algorithm should be replaced by a step of the form

The above step requires roughly as many operations as computing the last Arnoldi vector . Therefore, its cost is negligible relative to the cost of the Arnoldi loop.

Note that now only the set of vectors needs to be saved. The scalar dened in line 6 is equal to . This is because where is dened by the equations (1.21) seen in Chapter 1, which dene the rst Householder transformation. As was observed earlier the Householder factorization actually obtains the QR factorization (6.10) with . We can also formulate GMRES directly from this factorization. Indeed, if , then according to this factorization, the corresponding residual norm is equal to

whose minimizer is the same as the one dened by the algorithm.

 X &

w  ( y X y w u u u  ( V( #( ! v ! T W W ( ( S C4 6 W (  b b y 5I b ( ( $ !  ! T X w $ W W $ u u T W!  W y u u  S }  W ry(( y veE` u u (Y u u &  u V(w ( y vrEY R XR X u u T (

S F`

z W

 V

! !( #( #V(



y T

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

Compute , . For Do: Compute the Householder unit vector such that and where ; If then let . . If compute , EndDo = the upper part of the matrix Dene Compute . Let z := 0 For Do: , EndDo Compute

. .

a 0  i a 0  i a0  i

XX eX

6 DQ S @ B @ 6 D D S 8 D 8 5 B D PT&H Q C4PR 7 `C4C P7 5 D 

( V(

T W

TT VWT

u u W Wu w 
! $( !

so that the solution is of the form Householder variant of the Arnoldi process, each

. Recall that in the

k C
W

W u

 w  } ( X w u u u

W W W w w w w W W W W w w T w T w

 V

SQ

w   3 V A   3 1 ) ' % vm( RDCA 2~h0v(&#$

| 7

The details of implementation of the solution of the least-squares problem as well as the estimate of the residual norm are identical with those of the Gram-Schmidt versions and are discussed next.

A clear difculty with Algorithm 6.9 is that it does not provide the approximate solution explicitly at each step. As a result, it is not easy to determine when to stop. One remedy is to compute the approximation solution at regular intervals and check for convergence by a test on the residual, for example. However, there is a more elegant solution which is related to the way in which the least-squares problem (6.24) is solved. , it is natural to transIn order to solve the least-squares problem form the Hessenberg matrix into upper triangular form by using plane rotations. Dene the rotation matrices

with . If steps of the GMRES iteration are performed then these matrices have dimension . and the corresponding right-hand side Multiply the Hessenberg matrix by a sequence of such matrices from the left. The coefcients are selected to eliminate at each time. Thus, if we would have

with

W rW w W rW

W W

W (

W W

Then premultiply

by

FS

!  &

Y Y Y Y Y

R(R

&

..

w E

row row

r 0  i

..

F 5

F F B

5 C
b

E 'B A 8 A E I 5 DC776P5 P&  I B & 8#)DCA ) 8 6 B 3

  1B

FS

5| j yq|  
4  5CI6

R R

R `W

  " " y W

W W

 |5

y X ! w

! y

w R

 "U W

to obtain the matrix and right-hand side

We can now premultiply the above matrix and right-hand side again by a rotation matrix to eliminate . This is achieved by taking

This elimination process is continued until the -th rotation is applied, which transforms the problem into one involving the matrix and right-hand side,

and

The solution to the above least-squares problem is obtained by simply solving the triangular system resulting from deleting the last row of the matrix and right-hand side in (6.30). In addition, it is clear that for the solution , the residual is nothing but the last element of the right-hand side, i.e., the term in the above illustration.

Let be the rotation matrices used to transform into an upper triangular form and , the resulting matrix and right-hand side, as dened by (6.33), (6.34). Denote by the upper triangular

 &b

FS 

X

W "U W ( (

&  5CI6 4

y !( #V(

&

mrE( R

FS 54 6  C

) ' 2 )

Q v A$

'

Since

is unitary,

a0a  i

a  i aaa  i

 X

W"U W W FS (( X P T G ( T

 

 

& $

Dene

the product of matrices

R`a  i

 "U W

P R GR w rW d X R T P R GR eW d

 "U W

 "X U W

Generally, the scalars

and

of the

rotation

are dened as

a 0  i

a a  i

C C Y Y Y Y SW W S R W

PW w e DG X P D eG T W

&

&

W GW P D P D WG

P G

Y P  G P  G R P G P G W

P G R P G P G W P G

PW DG W P DG W

R R P G R P G R `W P G

R R R R P DG W R `W P DG W

PD W w X eG R

P G W P G

R PW G DW P DG W

P R GR eW d

W rW P G

W rW P D WG

PG

ePDWG

| 7
'

and, as a result,

To prove rst part (1), use (6.5), to obtain the relation

Since is unitary, the rank of is that of , which equals the rank of since these two matrices differ only by a zero row (the last row of ). If then is of rank and as a result is also of rank . Since is of full rank, this means that must be singular. The second part (2), was essentially proved before the proposition. For any vector ,

The minimum of the left-hand side is reached when the second term in the right-hand side of (6.37) is zero. Since is nonsingular, this is achieved when . To prove the third part (3), we start with the denitions used for GMRES and the relation (6.21). For any ,
W

As was seen in the proof of the second part above, the 2-norm of is minimized when annihilates all components of the right-hand side except the last one, which is equal to . As a result,

which is (6.35). The result (6.36) follows from the orthonormality of the column-vectors of . So far we have only described a process for computing the least-squares solution
b 

r a  i

r a  i

r a  i

W "U

&

W U

 b

!

&

 b

X b

W "U

v0

 b W $ S 

W "U

W "U

&

 b W S $

 b

W "U

W "U $  "U W

&

w 0

 b

& $

W "U

W "U

W "U

 W "U  "U W FS 

V y 

&  $ 

W "U

b w

tc

 b

FS 

The residual vector at step

satises

& W

FS

which minimizes

is given by

 b

ty

W "U

ty

FS

The rank of be singular. The vector

is equal to the rank of

. In particular, if

then

&

matrix obtained from by deleting its last row and by obtained from by deleting its last component. Then,

the


-dimensional vector must

5 C

  1B

5| j yq|  

  " "

 |5

&

W U

W"U

W"U

A 6

of (6.24). Note that this approach with plane rotations can also be used to solve the linear system (6.15) for the FOM method. The only difference is that the last rotation must be omitted. In particular, a single program can be written to implement both algorithms using a switch for selecting the FOM or GMRES options. It is possible to implement the above process in a progressive manner, i.e., at each step of the GMRES algorithm. This approach will allow one to obtain the residual norm at every step, with virtually no additional arithmetic operations. To illustrate this, start with (6.30), i.e., assume that the rst rotations have already been applied. Now the residual norm is available for and the stopping criterion can be applied. Assume that the test dictates that further steps be taken. One more step of the Arnoldi algorithm must be executed to get and the -th column of . This column is appended to which has been augmented by a zero row to match the dimension. Then the previous rotations , , are applied to this last column. After this is done the following matrix and right-hand side are obtained:

The algorithm now continues in the same way as before. We need to premultiply the matrix by a rotation matrix (now of size ) with

to get the matrix and right-hand side,

If the residual norm as given by is small enough, the process must be stopped. The last rows of and are deleted and the resulting upper triangular system is solved to obtain . Then the approximate solution is computed. Note from (6.39) that the following useful relation for results

In particular, if then the residual norm must be equal to zero which means that the solution is exact at step .


a a  i

a a  i

a  i

P G & (

G w X P  P G T

W "U

V &

Y ( R

P G P G P  G R P G P G W P G

 

u u

P G

P G

P G

Y Y P G P G R

W "U

P G

P G

P G

P G R

W "U

R R R R SW

R R P G R P G R `W P G

P G

P G W P G

&

W rW

W rW P G

P G

| 7
b

If Algorithm 6.9 is examined carefully, we observe that the only possibilities of breakdown in GMRES are in the Arnoldi loop, when , i.e., when at a given step . In this situation, the algorithm stops because the next Arnoldi vector cannot be generated. However, in this situation, the residual vector is zero, i.e., the algorithm will deliver the exact solution at this step. In fact, the converse is also true: If the algorithm stops at step with , then .

Let

To show the necessary condition, observe that if , then . Indeed, since is nonsingular, then is nonzero by the rst part of Proposition 6.9 and (6.31) implies . Then, the relations (6.36) and (6.40) imply that . To show the sufcient condition, we use (6.40) again. Since the solution is exact at step and not at step , then . From the formula (6.31), this implies that .

If the last row of the least-squares system in (6.38) is deleted, instead of the one in (6.39), i.e., before the last rotation is applied, the same approximate solution as FOM would result. As a practical consequence a single subroutine can be written to handle both cases. This observation can also be helpful in understanding the relationships between the two algorithms. We begin by establishing an interesting relation between the FOM and GMRES iterates, which will be exploited in the next chapter. A general lemma is rst shown regarding the solutions of the triangular systems

obtained from applying successive rotations to the Hessenberg matrices . As was stated before, the only difference between the vectors obtained in GMRES and Arnoldi is that the last rotation is omitted in FOM. In other words, the matrix for the two methods differs only in its entry while the right-hand sides differ only in their last components.

Let be the upper part of the matrix and, as before, let be the upper part of the matrix . Similarly, let be the vector of the rst components of and let be the vector of the rst components of . Dene

the

vectors obtained for an

-dimensional FOM and GMRES methods, respectively.

Y $ u

 "U W

Y ` u

Y` u

F 5 3 6QI  C8 E

&W

Y ` u

 "U W

&

I ( PP5 ' E 5

&

&

5 A 2% F D'C79763 E B A 8 & 5

( & W

P W u B Gu

Y `

FS

T!

Y ` u

 "U W

uu

! $( !

breaks down at step , i.e.,




be a nonsingular matrix. Then, the GMRES algorithm , if and only if the approximate solution is exact.

Y ` u

 "U W

5 C
u

F 5 63 I  ' E Y

  1B

5| j yq|  
' C8 5 2% 1 3
W "U u

  " "

Y ` u

 
 "U W

 |5

Y ` u

C DA

 

' )$ )

h v A u V A

C DA

FS

'

A 6

'

1P #

9 C

The following relation holds:

Similarly, for the right-hand sides,

with

Now,
b

The result follows immediately.

or,

This leads to the following relation for the residual vectors obtained by the two methods,

a  i

a  i

'

If the FOM and GMRES iterates are denoted by the superscripts then the relation (6.41) implies that

W (  (

F 

Replacing by would result except that the relation


, respectively, in (6.44), a relation similar to (6.45) is replaced by which, by (6.42) and (6.43), satises

and

, respectively,

a  i

W W

W

F  ( F ( b

&W

which, upon observing that


b

, yields,

aa  i

%  i

& 

 "U W

W W

W Y W

U W

 "U W

w F

&

F ( e( b

Denoting by obtain

the scalar

, and using the denitions of

and

a0  i

W Y

& 

&

& 

W Y

w F

&

in which

is the cosine used in the

-th rotation

, as dened by (6.31).

R  i
, we



Then

C
b

| 7

which indicates that, in general, the two residual vectors will evolve hand in hand. In particular, if , then GMRES will not progress at step , a phenomenon known as stagnation. However, in this situation, according to the denitions (6.31) of the rotations, which implies that is singular and, therefore, is not dened. In fact, the reverse of this is also true, a result due to Brown [43], which is stated without proof in the following proposition.

Note also that the use of the above lemma is not restricted to the GMRES-FOM pair. Some of the iterative methods dened in this chapter and the next involve a least-squares problem of the form (6.24). In such cases, the iterates of the least-squares method and those of the orthogonal residual (Galerkin) method will be related by the same equation. Another important observation from (6.40) is that if is the residual norm obtained at step , then

The superscripts and are used again to distinguish between GMRES and FOM quantities. A consequence of this is that,

Now consider the FOM iterates, assuming that is dened, i.e., that is nonsingular. An equation similar to (6.48) for FOM can be derived. Using the same notation as in the proof of the lemma, and recalling that

Clearly,

and therefore,

which, by a comparison with (6.48), yields a revealing relation between the residuals of

W 0 S

W 0

U W

0 F0 w F

 W U

Using (6.31), observe that tion, and therefore,

is the tangent of the angle dening the

W 0 S

v0 S

W 0

VTVTT q0 W

W 0

FS W

note that

-th rota-

r  i

R

t 

(0

S 0

FS W

 0

0

W 0

"U W

 "U W

0 F0  0 F0

q0

'

W !

i.e., if singular at step .

If at any given step , the GMRES iterates make no progress, then is singular and is not dened. Conversely, if is , i.e., if FOM breaks down at step , and is nonsingular, then

5 C

  1B

5| j yq|  

  " "

 |5

C C DA Y
' )$ )

( ( h v A

'

P W B G

'

the FOM and GMRES algorithms, namely,

Another way to prove the above expression is to exploit the relation (6.47); see Exercise 12. These results are summarized in the following proposition (Brown [43]).

Assume that steps of the Arnoldi process have been taken and that is nonsingular. Let and . Then the residual norms produced by the FOM and the GMRES algorithms are related by the equality

Similar to the FOM algorithm of the previous section, the GMRES algorithm becomes impractical when is large because of the growth of memory and computational requirements as increases. These requirements are identical with those of FOM. As with FOM, there are two remedies. One is based on restarting and the other on truncating the Arnoldi orthogonalization. The straightforward restarting option is described here.

Note that the implementation tricks discussed in the previous section can be applied, providing the residual norm at each sub-step without computing the approximation . This enables the program to exit as soon as this norm is small enough. A well known difculty with the restarted GMRES algorithm is that it can stagnate when the matrix is not positive denite. The full GMRES algorithm is guaranteed to converge in at most steps, but this would be impractical if there were many steps required for convergence. Obviously, a preconditioner for the linear system can be used to reduce the number of steps, or a better preconditioner if one is already in use. This issue will be covered later along with preconditioning techniques. Table 6.2 shows the results of applying the GMRES algorithm with no preconditioning to three of the test problems described in Section 3.7.

S  Q 

 W S

3 F

1. 2. 3. 4. 5.

Compute , , and Generate the Arnoldi basis and the matrix using the Arnoldi algorithm starting with Compute which minimizes and If satised then Stop, else set and GoTo 1

a  i

 C

 "U W

 B A 3 8 F 5 3 E C7C7A 6#

 "U W

$ W

m(

E 'B A 3 8 DC7 B8 C7



3 C C UY84@ S 0 7DA 5 S 5

 S V A 

 

DA C

) ' 2 )

1 2

A 7 

~h v

Q v A

| 7

) 0 (&#$ ' %

'

'

Matrix F2DA F3D ORS

Iters 95 67 205

Kops 3841 11862 9221

Residual 0.32E-02 0.37E-03 0.33E+00

Error 0.11E-03 0.28E-03 0.68E-04

A test run of GMRES with no preconditioning.

See Example 6.1 for the meaning of the column headers in the table. In this test, the dimension of the Krylov subspace is . Observe that the problem ORS, which could not be solved by FOM(10), is now solved in 205 steps.

It is possible to derive an Incomplete version of the GMRES algorithm. This algorithm is called Quasi-GMRES (QGMRES) for the sake of notational uniformity with other algorithms developed in the literature (some of which will be seen in the next chapter). A direct version called DQGMRES using exactly the same arguments as in Section 6.4.2 for DIOM can also be derived. We begin by dening the QGMRES algorithm, in simple terms, by replacing the Arnoldi Algorithm with Algorithm 6.6, the Incomplete Orthogonalization procedure.

Run a modication of Algorithm 6.9 in which the Arnoldi process in lines 3 to 11 is replaced by the Incomplete Orthogonalization process and all other computations remain unchanged.

Similar to IOM, only the previous vectors must be kept at any given step. However, this version of GMRES will potentially save computations but not storage. This is because computing the solution by formula (6.23) requires the vectors for to be accessed. Fortunately, the approximate solution can be updated in a progressive manner, as in DIOM. The implementation of this progressive version is quite similar to DIOM. First, note that if is banded, as for example, when ,

then the premultiplications by the rotation matrices as described in the previous section will only introduce an additional diagonal. For the above case, the resulting least-squares

r  i

!( $V(

7E

F ' P7P0 63 I  07P2E E B F 3 5 F 5 5 A 8 )

Y Y Y Y Y

5 C
&

  1B (

5| j yq|  

( v  !

R R

3 ' 0A B8 C8 A E B 3

V( r

  " "

W rW

 |5

3 C R DA

A 7 

 

1 2

~Q v

' 7$ % #

The approximate solution is given by


then,

in which

where is the last component of the vector , i.e., the right-hand side before the -th rotation is applied. Thus, can be updated at each step, via the relation

is small enough then Stop

W  R

W 0

W "U

12. 13. 14. If 15. EndDo

 "U W

 "U  W

 R

11.

1. Compute , , and 2. For , until convergence Do: 3. Compute , and 4. as in lines 2 to 6 of Algorithm 6.6 5. Update the QR factorization of , i.e., 6. Apply , to the -th column of 7. Compute the rotation coefcients , by (6.31) 8. Apply to and , i.e., Compute: 9. 10.

W "U

 Q 

!( ( #Vt

( P W $ G

&

&

$ #$!( y QH 6 vE R  ( y ! P ( w W  V A     m( DCA ~h v (

Also note that similarly to DIOM,

&

&

$BW !( (

&

&

! v$E

W3 "U

&

1 2

P W G

) 0 (&#$ ' %

where Dening

and are obtained by removing the last row of as in DIOM,

and

, respectively.

R`  i

&

&

R `W

R R R

W W

&

system is

with:

C C
$

| 7
$

The above algorithm does not minimize the norm of the residual vector over . Rather, it attempts to perform an approximate minimization. The formula (6.35) is still valid since orthogonality is not used to derive it. Therefore,

If the s were orthogonal to each other, then this is equivalent to GMRES and the residual norm is minimized over all vectors of the form . Since only an incomplete orthogonalization is used then the s are only locally orthogonal and, as a result, only an approximate minimization may be obtained. In addition, (6.36) is no longer valid. This equality had been derived from the above equation by exploiting the orthogonality of the s. It turns out that in practice, remains a reasonably good estimate of the actual residual norm because the s are nearly orthogonal. The following inequality provides an actual upper bound of the residual norm in terms of computable quantities:

Here, the orthogonality of the rst vectors was used and the last term comes from using the Cauchy-Schwartz inequality. The desired inequality follows from using the Cauchy-Schwartz inequality again in the form

and from the fact that the vector is of norm unity. Thus, using as a residual estimate, we would make an error of a factor of at most. In general, this is an overestimate and tends to give an adequate estimate for the residual norm. It is also interesting to observe that with a little bit more arithmetic, it is possible to actually compute the exact residual vector and norm. This is based on the observation that, according to (6.52), the residual vector is times the vector which is the last column of the matrix

W "U

It is an easy exercise to see that this last column can be updated from

and

. Indeed,

 i

W "U

W "U

W "U

( ( (

 R  0 R 0

W "U f

U QH

R

R R

y w W R W R W ! w RW U W "QfH U QH W R R w RW U W "QfH W

W "U f

w W

W "U f U QH

W

R

w W

W & U (

W "U

!  !

U W

R R "QfH W U

$ %

W "U

W "U

W "U

 R

0

W "U

W "U

W "U

W "U

W "U

Here, is to be replaced by of (6.52). If the unit vector

when

. The proof of this inequality is a consequence has components , then

r0  i

ra  i

 w 

5 C
0 X

b w

W "U

W "U

  1B

W "U

5| j yq|  
y

w W

!   V y 

W U

W "U

V A

  " "

 |5
0 0 0 0
w c

W U

tA 

 C

The s can be updated at the cost of one extra vector in memory and operations at can be computed at the cost of operations and the exact each step. The norm of residual norm for the current approximate solution can then be obtained by multiplying this norm by . Because this is a little expensive, it may be preferred to just correct the estimate provided by by exploiting the above recurrence relation,

The above relation is inexpensive to update, yet provides an upper bound that is sharper than (6.53); see Exercise 20. An interesting consequence of (6.55) is a relation between two successive residual vectors:
w

Table 6.3 shows the results of applying the DQGMRES algorithm with no preconditioning to three of the test problems described in Section 3.7. Matrix F2DA F3D ORS Iters 98 75 300 Kops 7216 22798 24138 Residual 0.36E-02 0.64E-03 0.13E+02 Error 0.13E-03 0.32E-03 0.25E-02

A test run of DQGMRES with no preconditioning.

See Example 6.1 for the meaning of the column headers in the table. In this test the number of directions in the recurrence is . It is possible to relate the quasi-minimal residual norm to the actual minimal residual norm provided by GMRES. The following result was proved by Nachtigal (1991) [152] for the QMR algorithm to be seen in the next chapter.

U W

This exploits the fact that

and

a  i

a  i

W"U "U W u u u W "U "U W W w W U W "U $

&

0 0 w 0 0

W "U

W "U

W "U

A 7 

A 7 

If

, then the following recurrence relation holds,

a  i

v0 0 w

W "U

where all the matrices related to the rotation are of size that

. The result is

C  C X T

W  & "U T

W"U & ( $ W

0

0  

W "U

U W

W "U

W "U

W "U

| 7

R

Assume that , the Arnoldi basis associated with DQGMRES, is of full rank. Let and be the residual norms obtained after steps of the DQGMRES and GMRES algorithms, respectively. Then

and, in particular,

The result follows from (6.59), (6.60), and the fact that

The symmetric Lanczos algorithm can be viewed as a simplication of Arnoldis method for the particular case when the matrix is symmetric. When is symmetric, then the Hessenberg matrix becomes symmetric tridiagonal. This leads to a three-term recurrence in the Arnoldi process and short-term recurrences for solution algorithms such as FOM and GMRES. On the theoretical side, there is also much more to be said on the resulting approximation in the symmetric case.

To introduce the Lanczos algorithm we begin by making the observation stated in the following theorem.

r   i

W "U

I H BA '  & 8 50A 3 H

  

W "U

gg l w g  

FS

Now

is the minimum of the 2-norm of

over all s and therefore,

r  i

W!

FS

W "U

b $ 



(    "W U          

   

Denote by

! $ 

the minimizer of . By assumption, nonsingular matrix such that

over and , is of full rank and there is an is unitary. Then, for any member of

w ! T b X

T t

b

W  ( W "U U  W ""UWW U W S $   ""UWW U 

b W  S $

Consider the subset of


3

dened by

r  i

5 C
!

  1B

b $ W "U  (X

5| j yq|  


W "U

 

  " "

 |5

C DA

  v

 

!  W"U

cv ~
A 6
'

 C

Assume that Arnoldis method is applied to a real symmetric matrix . Then the coefcients generated by the algorithm are such that

The standard notation used to describe the Lanczos algorithm is obtained by setting

This leads to the following form of the Modied Gram-Schmidt variant of Arnoldis method, namely, Algorithm 6.2.

then Stop

It is rather surprising that the above simple algorithm guarantees, at least in exact arithmetic, that the vectors are orthogonal. In reality, exact orthogonality of these vectors is only observed at the beginning of the process. At some point the s start losing their global orthogonality rapidly. There has been much research devoted to nding ways to either recover the orthogonality, or to at least diminish its effects by partial or selective orthogonalization; see Parlett [160]. The major practical differences with Arnoldis method are that the matrix is tridiagonal and, more importantly, that only three vectors must be saved, unless some form of reorthogonalization is employed.

Y `!

 Y ! (

 ( ( (

W "U W"U S u  u  3 W"U u U W Y$ u S u  u  3 u u  u  u ( 3 W X u u 3 u u u S u Ty u !( ( #V( 3 

yeE( R

1. Choose an initial vector 2. For Do: 3. 4. 5. 6. . If 7. 8. EndDo

of norm unity. Set

aa  i

W S

S 8 D B Q

 S

7C@ D E 6

W S 

3 9 C 5 7 RtDA

1 2

~h v

) 0 (&#$ ' %

and if

denotes the resulting

matrix, it is of the form,

The proof is an immediate consequence of the fact that symmetric matrix which is also a Hessenberg matrix by construction. Therefore, be a symmetric tridiagonal matrix.

( u

 W

u S

( uu

u 

In other words, the matrix metric.

obtained from the Arnoldi process is tridiagonal and sym-

a0  i R`  i

is a must

 C

!#(V( ( y

 75 B1B v

R( u  y

 "U W

Y (

  U  W

uu
u vR

uR

 B

A Av ~ 5| | qzzA  y5
1 P

'

In exact arithmetic, the core of Algorithm 6.14 is a relation of the form

This three-term recurrence relation is reminiscent of the standard three-term recurrence relation of orthogonal polynomials. In fact, there is indeed a strong relationship between the Lanczos algorithm and orthogonal polynomials. To begin, recall that if the grade of is , then the subspace is of dimension and consists of all vectors of the form , where is a polynomial with . In this case there is even an isomorphism between and , the space of polynomials of degree , which is dened by

and the orthogonality of the s translates into the orthogonality of the polynomials with respect to the inner product (6.64). It is known that real orthogonal polynomials satisfy a three-term recurrence. Moreover, the Lanczos procedure is nothing but the Stieltjes algorithm; (see, for example, Gautschi [102]) for computing a sequence of orthogonal polynomials with respect to the inner product (6.64). It is known [180] that the characteristic polynomial of the tridiagonal matrix produced by the Lanczos algorithm minimizes the norm over the monic polynomials. The recurrence relation between the characteristic polynomials of tridiagonal matrices also shows that the Lanczos recurrence computes the sequence of vectors , where is the characteristic polynomial of .

The Conjugate Gradient algorithm is one of the best known iterative techniques for solving sparse Symmetric Positive Denite linear systems. Described in one sentence, the method is a realization of an orthogonal projection technique onto the Krylov subspace where is the initial residual. It is therefore mathematically equivalent to FOM. However, because is symmetric, some simplications resulting from the three-term Lanczos recurrence will lead to more elegant algorithms.

t  (

This is indeed a nondegenerate bilinear form under the assumption that , the grade of . Now observe that the vectors are of the form

does not exceed

  i

  

 gg c A} V~ gl v

W ( W X T T

X R T

i

Moreover, we can consider that the subspace

is provided with the inner product

F & B8 20P & 8 '  207H BA I ' E & ' E ' H A 3 '

5 C
W

  1B y W

 W X

u S

5| j yq|  
!

 X!

u u 

  " " W "U

D'C79763 E B A 8 & 5
(   W

W "U

 |5
u S


  W

 

! X

9  C

T

We rst derive the analogue of FOM, or Arnoldis method, for the case when is symmetric. Given an initial guess to the linear system and the Lanczos vectors together with the tridiagonal matrix , the approximate solution obtained from an orthogonal projection method onto , is given by

Many of the results obtained from Arnoldis method for linear systems are still valid. For example, the residual vector of the approximate solution is such that

The Conjugate Gradient algorithm can be derived from the Lanczos algorithm in the same way DIOM was derived from IOM. In fact, the Conjugate Gradient algorithm can be viewed as a variation of DIOM(2) for the case when is symmetric. We will follow the same steps as with DIOM, except that the notation will be simplied whenever possible. First write the LU factorization of as . The matrix is unit lower bidiagonal and is upper bidiagonal. Thus, the factorization of is of the form

Letting

and

FS W

The approximate solution is then given by,

a   i

R R

W "U

b 

W ( FS W

W "U

S !

V A

and

&

EndDo Set Compute

, and

. If

set

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Compute For

, Do: (If

, and

set

and go to 9

a  i

 C

5 US

3 5 H ' 0A C8 ' 0A 7 B P5 E E B 8 3


FS

W W b w  b W W "U X S V( $% ( r( ( S 54 4 S H X R R  RT W W"U T "U S  ! ` W"U u  u u  W"U u u S Y u S u u  u  u y W Y `! S v W u u SX u ( u u yT uu  !( ( #V( v W S  Q      S V A  DCA ~h v

89@H576 QC 8CD 7 5  7C@ D S D E 6 ( b w 

   |5 | 75 B1B 5 Ct| yC l y5
b 3

$6 !

! $(V(

1 2

) 0 (&#$ ' %

Ee( R

then,

In addition, following again what has been shown for DIOM,

where is dened above. This gives the following algorithm, which we call the direct version of the Lanczos algorithm for linear systems.

This algorithm computes the solution of the tridiagonal system progressively by using Gaussian elimination without pivoting. However, as was explained for DIOM, partial pivoting can also be implemented at the cost of having to keep an extra vector. In fact, Gaussian elimination with partial pivoting is sufcient to ensure stability for tridiagonal systems. The more complex LQ factorization has also been exploited in this context and gave rise to an algorithm known as SYMMLQ [159]. The two algorithms 6.15 and 6.16 are mathematically equivalent, that is, they deliver the same approximate solution if they are both executable. However, since Gaussian elimi-

W "U W "U W"U S   S    ( W w W S W X S T y  W ( i  W S  !  y X ( W ( ! W T Y S W W `  Y $ S     S$  tA  `g DCA ~Q v

EndDo

If

has converged then Stop

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Compute , , and Set , For , until convergence Do: Compute and If then compute and

76 9 D E @

1 2

in which

. As a result,

can be updated at each step as

r   i r   i

Note that is a scalar computed from the Lanczos algorithm, while -th Gaussian elimination step on the tridiagonal matrix, i.e.,

results from the

&

S W  ( S

w W

$ W

As for DIOM, , the last column of by the simple update

, can be computed from the previous


s and

5 C

  1B

5| j yq|  



  " "

 |5

' 7$ % #

 C
!

nation without pivoting is being used implicitly to solve the tridiagonal system , the direct version may be more prone to breakdowns. Observe that the residual vector for this algorithm is in the direction of due to equation (6.66). Therefore, the residual vectors are orthogonal to each other as in FOM. Likewise, the vectors are -orthogonal, or conjugate. These results are established in the next proposition.

Let , , be the residual vectors produced by the Lanczos and the D-Lanczos algorithms (6.15 and 6.16) and , the auxiliary vectors produced by Algorithm 6.16. Then,

The rst part of the proposition is an immediate consequence of the relation is a diagonal matrix, where (6.66). For the second part, it must be proved that . This follows from

A consequence of the above proposition is that a version of the algorithm can be derived by imposing the orthogonality and conjugacy conditions. This gives the Conjugate Gradient algorithm which we now derive. The vector can be expressed as

Therefore, the residual vectors must satisfy the recurrence

( X u u

Thus, a rst consequence of the above relation is that

a0  i

W "U

Also, it is known that the next search direction is a linear combination of , and after rescaling the vectors appropriately, it follows that

R`  i

u u S w

W U u ( X u u ( X u u T

u S

W "U

If the result

s are to be orthogonal, then it is necessary that

and as a

a   i

a   i

( X u u

u 

W "U

u u  w u

u  u 

W "U u

u ( u

W "U

u 

W "U

( X u u

Now observe that to the symmetric matrix

is a lower triangular which is also symmetric since it is equal . Therefore, it must be a diagonal matrix.

, for

Y `

Xu

Each residual vector is such that where As a result, the residual vectors are orthogonal to each other. The auxiliary vectors form an -conjugate set, i.e.,

is a certain scalar. .

  C
and

S F$ b


W "U

(R

y ( ( Y

W "U

   |5 | 75 B1B 5 Ct| yC l y5


t

DA C

) ' 2 )

y (V( Y` ! ( Q v A

'

'

yields

Putting these relations together gives the following algorithm.


&64TP @A2 S 5Q 8 U7R 6 D 5 S @

It is important to note that the scalars in this algorithm are different from those of the Lanczos algorithm. The vectors are multiples of the s of Algorithm 6.16. In terms of storage, in addition to the matrix , four vectors ( , , , and ) must be saved in Algorithm 6.17, versus ve vectors ( , , , , and ) for Algorithm 6.16.

Algorithm 6.17 is the best known formulation of the Conjugate Gradient algorithm. There are, however, several alternative formulations. Here, only one such formulation is shown, which can be derived once more from the Lanczos algorithm. The residual polynomial associated with the -th CG iterate must satisfy a three-term recurrence, implied by the three-term recurrence of the Lanczos vectors. Indeed, these vectors are just the scaled versions of the residual vectors. Therefore, we must seek a three-term recurrence of the form

  i

w eX X

Ty

In addition, the consistency condition to the recurrence,

must be maintained for each

F 6E ' 0A 9& B 8

X w X eX

I 3 6'( 0 C78 6P0b& 8 5 B A E 3 5 A

S u e( u 

 

 6

1. Compute 2. For 3. 4. 5. 6. 7. 8. EndDo


3 3 3 3 3 

, . , until convergence Do:

"X U u W u

Wu "U

Tu

u "( U u U uy W W X u (T u rX u X T T

and therefore,

, leading

ra   i

X u

W "U

u  u y

Note that from (6.70)

because is orthogonal to . Then, (6.71) becomes addition, writing that as dened by (6.72) is orthogonal to

( X u u

 u( u X


. In

5 C
u 

  1B

X u

5| j yq|  
X u

(W W( "u U u T

u S

W"U W"U u u u WS U w W"u U (  ( W" S X u u X u u  u u T W"U U u u Tu u ( u   X wu u ( uy u u  X u u ( Y ( $} V TA  T

  " " u

 |5

W "U

W "U

u S

3  DA C

W"U

1 2

~Q v

' 7$ % #

 C

Recall that the vectors s are multiples of the Lanczos vectors s. As a result, should be the inverse of the scalar of the Lanczos algorithm. In terms of the -vectors this means

Equating the inner products of both sides of (6.75) with , and using the orthogonality of the -vectors, gives the following expression for , after some algebraic calculations,

The recurrence relation for the approximate solution vectors can be extracted from the recurrence relation for the residual vectors. This is found by starting from (6.74) and using the relation between the solution polynomial and the residual polynomial . Thus,

This gives the recurrence,

All that is left for the recurrence to be determined completely is to dene the rst two iterates. The initial iterate is given. The rst vector should be of the form

4. If compute 5. 6. Compute 7. EndDo

X i P i W  W    g

y u  W w X u uy u u  w T Gu X u  X uT u y P i G i T u  g g g P P   G G u

g g

g W g g g

1. Compute 2. For 3. Compute


 

. Set and , until convergence Do: and

S 6 Q 8 &C@ 9@

5 E 6 58 79V8 0 E 5

m 

845

HYC 5 58

Y $!

to ensure that is orthogonal to with , and by setting the following algorithm.

. This means that the two-term recurrence can be started . Putting these relations and denitions together gives

a  i a   i a   i

R W

T y X 

Observe that if and , then the above relation into the sequence of residual vectors yields

, as desired. Translating

C C

X 

W U T

( 

w X T T y w X  eX  w T X  T

( ( T

w X

   |5 | 75 B1B 5 Ct| yC l y5


W W

t W

X W  T y T  X T XT

W T

  y W"U  

 V

y X 

W "U

W "U

W "U u u u  W"U u T Y  X T u y ( ( Y A  $v

TX

3 DA C

1 2

~h v g 

) 0 (&#$ ' %

Sometimes, it is useful to be able to obtain the tridiagonal matrix related to the underlying Lanczos iteration from the coefcients of the Conjugate Gradient algorithm 6.17. This tridiagonal matrix can provide valuable eigenvalue information on the matrix . For example, the largest and smallest eigenvalues of the tridiagonal matrix can approximate the smallest and largest eigenvalues of . This could be used to compute an estimate of the condition number of which in turn can help provide estimates of the error norm from the residual norm. Since the Greek letters and have been used in both algorithms, notations must be changed. Denote by

the tridiagonal matrix (6.63) associated with the -th step of the Lanczos algorithm. We must seek expressions of the coefcients in terms of the coefcients , obtained from the CG algorithm. The key information regarding the correspondence between the two pairs of coefcients resides in the correspondence between the vectors generated by the two algorithms. From (6.66) it is known that

As a result,

The denominator the numerator

is readily available from the coefcients of the CG algorithm, but is not. The relation (6.72) can be exploited to obtain

which is then substituted in

to get

from which we nally obtain for

The above expression is only valid for . For , the second term in the right-hand side should be omitted as was observed above. Therefore, the diagonal elements of are

r   i

W W u 

u S

u  w y

u (

Y `  ( W X u W u (

Y   W u u( u u S w X ( uT X u u T Y   T T u S w ( X u u T

Note that the terms -orthogonal,

are dened to be zero when

. Because the vectors are

r   i

( X u u ( X u u T

Y W

u S

scalar

r   i

S u e( u 

W "U

The residual could also be computed directly as algorithm, but this would require an additional matrix-vector product.

 Q5 H A )

Vl


in line 6 of the

5 C

F 6P5 2DB 5 ' ) A E B ) I ' 3 F 5 A #G 078 I C5 5 & 7P5 75 B A F 8 E B

  1B (& W "U W "U u

u (

5| j yq|  
W

H u ( u X ( u $ 54 4

T W U

R S

W "U U W ( W W X "U u "U u u ( u T

u S

u QX ( u

R 

u Xu

u S

T u

uT ( T u

  " "

( X u u W W T u u S u( u

W "U

uX

 |5

( X u ( u T X u u T

 6

W U

uX

W"U

for

Using (6.79) again and the relation (6.73) as well as orthogonality properties of the CG algorithm, the following sequence of equalities results:

In the previous section we derived the Conjugate Gradient algorithm as a special case of FOM for Symmetric Positive Denite matrices. Similarly, a new algorithm can be derived from GMRES for the particular case where is Hermitian. In this case, the residual vectors should be -orthogonal, i.e., conjugate. In addition, the vectors s are orthogonal. When looking for an algorithm with the same structure as CG, but satisfying these conditions, we nd the Conjugate Residual algorithm. Notice that the residual vectors are now conjugate to each other, hence, the name of the algorithm.

a0  i

y ( ( Y ( E

 

  

w i W 

    tfvm" V l }

 

This nally gives the general form of the terms of the CG coefcients,

-dimensional Lanczos tridiagonal matrix in

W W u  u S

 u

yu 

W  u  u ( X u u

i 

Therefore,

W (  u X u u

W  u  W  u  0 ( u 0 X u T

T X u

W "U

Now an expression for the co-diagonal elements in the Lanczos algorithm,

is needed. From the denitions

R`  i

Y   ( Y ` 

for

( X u u T W  u w X u( u u u S W ( u u S X u ( W T ( S X u X T u u

W "U X u ( u T



 w W

g gW

given by

C C

1

5|  | 5 | qz} v | yC l y5
y u  W y u 

S  w iW

W "U

W "U

S

uX

yu 

( X u

W "U

SW

S 

The last line in the above algorithm computes from without an additional matrix-vector product. Five vectors of storage are needed in addition to the matrix : , , , , . The algorithm requires one more vector update, i.e., more operations than the Conjugate Gradient method and one more vector of storage. Since the two methods exhibit typically a similar convergence behavior, the Conjugate Gradient method is often preferred over the Conjugate Residual algorithm.

All algorithms developed in this chapter are strongly related to, as well as dened by, the choice of a basis of the Krylov subspace. The GMRES algorithm uses an orthogonal basis. In the Conjugate Gradient algorithm, the s are -orthogonal, i.e., conjugate. In the Conjugate Residual method just described, the s are orthogonal, i.e., the s are -orthogonal. A number of algorithms can be developed using a basis of this form in the nonsymmetric case as well. The main result that is exploited in all these algorithms is the following lemma.

  i

W X W

In addition,

can be computed from

by

ra   i

XR

XR

( W W w W ( W( W T T ( R   R (t T W f w 

Then the approximate solution is given by

which has the smallest residual norm in the afne space

t

for

(` Y

X u ( R

is

(

Let -orthogonal, i.e., such that

be a basis of the Krylov subspace

W "U

W "U

 ~ } v( ~ ym~

( (

(  t

1. Compute 2. For 3. 4. 5. 6. 7. 8. Compute 9. EndDo


3 3 3 3 3 

, until convergence Do:


which

5 C

  1B

S CR B Q 8 D

W U W "U u u S w u u W"U W"U u "U S w W"u U u Wu W( u  W( W" S X u X u u  u u T W"U U u u T u ( u u u X  u w u ( uy uu  X u ( ( Y V( $}   V T A  T    DCA ~Q v 5| j  | 5 yq|   " "  C
B @ Q 0 5 U7R 5 S @ 3

6 D

1 2

(

 w

' 7$ % # 1  

1P #

The approximate solution and the associated residual vector can be written in the

form

According to the optimality result of Proposition 5.3, in order for the orthogonality relations

to be minimum,

so that

which proves the expression (6.84).

This lemma opens up many different ways to obtain algorithms that are mathematically equivalent to the full GMRES. The simplest option computes the next basis vector as a linear combination of the current residual and all previous s. The approximate solution is updated by using (6.84). This is called the Generalized Conjugate Residual (GCR) algorithm.

( VQ(

( Y

1. Compute 2. For 3. 4. 5. 6. Compute 7. 8. EndDo




. Set . until convergence Do:

, for

!( ( #VY

W ( X W

exploiting, once more, the orthogonality of the vectors

. Thus,

W( 

} uX W( u

This proves the rst part of the lemma. Assume now that must be determined. According to formula (6.83), dened above. Note that from the second part of (6.85),

is known and that with

W  W w ( R

must be enforced. Using (6.85) and the orthogonality of the

s gives immediately,

a  i

 

 R R  W f

R X

!( ( #Y

m 

u 

W( W

u 

W( W

 u f

 R X

u  f

W "U W"U S  R  w E P P sXR X  vu R s X  G Gu u u u vR S s  i  W"U u u  u u g u u  w u W"U u P X  X G P X   G u   y ( ( Qg  ( g $v ( Y  m  V Ag g  ~ A ~h v

 

 R ( w R R  W f

!  ( Y T

(t R 

5 s  5 C1 t VByC!ps
XR W

W( 

 

3 R

( W

1 2

) 0 (&#$ ' %

W"U

To compute the scalars in the above algorithm, the vector and the previous s are required. In order to limit the number of matrix-vector products per step to one, we can proceed as follows. Follow line 5 by a computation of and then compute after line 7 from the relation

Both the set of s and that of the s need to be saved. This doubles the storage requirement compared with GMRES. The number of arithmetic operations per step is also roughly 50% higher than GMRES. The above version of GCR suffers from the same practical limitations as GMRES and FOM. A restarted version called GCR(m) can be trivially dened. Also, a truncation of the orthogonalization of the s, similar to IOM, leads to an algorithm known as ORTHOMIN(k). Specically, lines 6 and 7 of Algorithm 6.20 are replaced by
( V(

The resulting algorithm is called ORTHODIR [127]. Restarted and truncated versions of ORTHODIR can also be dened.

As was seen in Section 6.6 when is symmetric, the Arnoldi algorithm simplies into the Lanczos procedure, which is dened through a three-term recurrence. As a consequence, FOM is mathematically equivalent to the Conjugate Gradient algorithm in this case. Similarly, the full GMRES algorithm gives rise to the Conjugate Residual algorithm. It is clear that the CG-type algorithms, i.e., algorithms dened through short-term recurrences, are more desirable than those algorithms which require storing entire sequences of vectors as in the GMRES process. These algorithms require less memory and operations per step. Therefore, the question is: Is it possible to dene algorithms which are based on optimal Krylov subspace projection and which give rise to sequences involving short-term recurrences? An optimal Krylov subspace projection means a technique which minimizes a certain norm of the error, or residual, on the Krylov subspace. Such methods can be de-

(R XR ( u T XR

9   m7} f7  A  7fq v

in which, as before, the

s are selected to make the

s orthogonal, i.e.,

r   i

W "U

Another class of algorithms is dened by computing the next basis vector

w W

R vR S u

R

w u

"U W

uR S

u  R

W "U

u vR S

W"U

6a. 7a.

Compute

, for .

as

W "U

5 C
u
W "U

  1B

u R S  Rf

~ E

5| j yq|  
u w W U "QH R vR P S X  u P Xs   s s  i U W

X G u   G

W "U

  " "

 |5
uR S u vR S R

9 C

ned from the Arnoldi process. If the Arnoldi process simplies into an -term recurrence, i.e., if for , then the conjugate directions in DIOM are also dened from an -term recurrence. Similarly, the full GMRES would also simplify into a DQGMRES algorithm involving a short recurrence. Therefore, for all purposes, it is sufcient to analyze what happens to the Arnoldi process (or FOM). We start by generalizing the CG result in a simple way, by considering the DIOM algorithm.

for any vector . Then, DIOM(s) is mathematically equivalent to the FOM algorithm.
The assumption is equivalent to the statement that, for any , there is a polynomial of degree , such that . In the Arnoldi process, the scalars are dened by and therefore

where is a polynomial of degree , then the result holds. However, the above relation implies that each eigenvector of is also an eigenvector of . According to Theorem 1.2, this can be true only if is a normal matrix. As it turns out, the reverse is also true. That is, when is normal, then there is a polynomial of degree such that . Proving this is easy because when where is unitary and diagonal, then . By choosing the polynomial so that

we obtain which is the desired result. Let be the smallest degree of all polynomials such that . Then the following lemma due to Faber and Manteuffel [85] states an interesting relation between and .

The sufcient condition is trivially true. To prove the necessary condition, assume that, for any vector , where is a polynomial of degree . Then it is easily seen that any eigenvector of is also an eigenvector of . Therefore, from Theorem 1.2, is normal. Let be the degree of the minimal polynomial for . Then, since has distinct eigenvalues, there is a polynomial of degree such that



for every vector if and only if

is normal and

 X

(T

A nonsingular matrix

is such that


r(( y } ( u X u T $ X

y }

In particular, if


uR

Since is a polynomial of degree , the vector of the vectors . As a result, if DIOM(k) will give the same approximate solution as FOM.

is a linear combination , then . Therefore,

a   i

y  w R X  T g ( u R X X 

T g

XR 

Let

be a matrix such that




k| 1t{ y5 1W{ 5 | 5 | | q | j |
 C

( R ( R ( R y m ( u  X R ( u u R T ( T X R u y g u R

W U

9 C tDA

W "U

T X

) ' 2 )

Y` vR u

Q v A

'

'

1 P #

u vR

for . According to the above argument, for this , it holds and therefore . Now it must be shown that . Let be a (nonzero) vector whose grade is . By assumption, . On the other hand, we also have . Since the vectors are linearly independent, must not exceed . Otherwise, two different expressions for with respect to the basis would result and this would imply that . Since is nonsingular, then , which is a contradiction. Proposition 6.14 gives a sufcient condition for DIOM(s) to be equivalent to FOM. According to Lemma 6.3, this condition is equivalent to being normal and . Now consider the reverse result. Faber and Manteuffel dene CG(s) to be the class of all matrices such that for every , it is true that for all such that . The inner product can be different from the canonical Euclidean dot product. With this denition it is possible to show the following theorem [85] which is stated without proof.

It is interesting to consider the particular case where , which is the case of the Conjugate Gradient method. In fact, it is easy to show that in this case either has a minimal degree , or is Hermitian, or is of the form

where and are real and is skew-Hermitian, i.e., . Thus, the cases in which DIOM simplies into an (optimal) algorithm dened from a three-term recurrence are already known. The rst is the Conjugate Gradient method. The second is a version of the CG algorithm for skew-Hermitian matrices which can be derived from the Lanczos algorithm in the same way as CG. This algorithm will be seen in Chapter 9.

The convergence behavior of the different algorithms seen in this chapter can be analyzed by exploiting optimality properties whenever such properties exist. This is the case for the Conjugate Gradient and the GMRES algorithms. On the other hand, the non-optimal algorithms such as FOM, IOM, and QGMRES will be harder to analyze. One of the main tools used in the analysis of these methods is Chebyshev polynomials. These polynomials are useful both in theory, when studying convergence, and in practice, as a means of accelerating single-vector iterations or projection processes. In the following, real and complex Chebyshev polynomials are discussed separately.

w 

, if and only if

or

is normal and

l
X

 X

( VE

 Y

5 C

 X

( XR u

  1B

W( ( W( 
T


5| j yq|  

X

9  Efz g cztc l

  " " y X

X  T  y  V( (  X E

 |5

Y W y W( ( ( m

XW

l


cv ~

'

X XR T T y

w E

That this is a polynomial with respect to trigonometric relation


W

can be shown easily by induction from the

In what follows we denote by the set of all polynomials of degree . An important result from approximation theory is the following theorem.

is reached by the polynomial

H 0  )0 X y T y y y & ( $

0X  yw

For a proof, see Cheney [52]. The maximum of of the above result is

for in

is and a corollary

a0  i

 0 0  P H  6U $ 

 w y  w y $

outside the interval

y H

 )0  P H 0  0  QIU6 $

W P X  G X  5IU C4 6

H )

! 

& r(  $ S & e(  $ S

Let

be a non-empty interval in . Then the minimum

and let

be any real scalar

 0 0

for

Rq q i

0 0

which is valid for but can also be extended to the case of approximation, valid for large values of , will be sometimes used:

. The following

This is readily seen by passing to complex variables and using the denition . As a result of (6.89) the following expression can be derived:

a   i a   i

 H 

 0 0 

The denition (6.88) can be extended to cases where formula:

with the help of the following

 0 0 W H )

(&



X

w H

H )

W$

and the fact that term recurrence relation

. Incidentally, this also shows the important three-

( W

for

a  q i

  

The Chebyshev polynomial of the rst kind of degree

is dened by

 C

F & B8 20 & # I ' E '

& y v $

5 H PF X
&

75H ) & 8 63 % 5 W T

X



T W )  )( X y  X w & T w W $ T

W$

 "  

W U "QH



| | Cv |yt
! ! ! $#P$ T

W P G X

X

 0 0 

C4 FI6

9 vA

X R

1 P

Av ~

'

in which is the middle of the interval. The absolute values in the denominator are needed only when is to the left of the interval, i.e., when . For this case, it may be more convenient to express the best polynomial as

which is obtained by exchanging the roles of

and

in (6.92).

The standard denition of real Chebyshev polynomials given by equation (6.88) extends , without difculty to complex variables. First, as was seen before, when is real and the alternative denition, , can be used. These denitions can be unied by switching to complex variables and writing

The above denition for Chebyshev polynomials will be used in . Note that the equation has two solutions which are inverse of each other. As a result, the value of does not depend on which of these solutions is chosen. It can be veried directly that the s dened by the above equations are indeed polynomials in the variable and that they satisfy the three-term recurrence

As is now explained, Chebyshev polynomials are intimately related to ellipses in the complex plane. Let be the circle of radius centered at the origin. Then the so-called Joukowski mapping

transforms into an ellipse of center the origin, foci , major semi-axis and minor semi-axis . This is illustrated in Figure 6.2. There are two circles which have the same image by the mapping , one with the radius and the other with the radius . So it is sufcient to consider only those circles with radius . Note that the case is a degenerate case in which the ellipse reduces to the interval traveled through twice. An important question is whether or not a generalization of the min-max result of Theorem 6.4 holds for the complex case. Here, the maximum of is taken over the ellipse boundary and is some point not enclosed by the ellipse. The answer to the question is no; Chebyshev polynomials are only optimal in some cases. However, Chebyshev polynomials are asymptotically optimal, which is all that is needed in practice.

&W

  i

w  $ W

8& W w $ y

where

ra   i

y y (

 ! W H X )

& W w $ y

W H ) ` X "TQUH

XT

y y (& W  ( Q$

& H w H $ y

0W 

 )

 0 W

Dening the variable

, the above formula is equivalent to

where

 0 0

F & B8 I ' 0 & # E '

5 C
X

  1B T

S   w y H )    w y H ) $  

5| j yq|  
&

5PF H

75H @ &  I ' ) % ) 5

W $

! 

  " " T H

X H T

 |5
)

 S  w #P!$  !

 ! X

H ) H ) W X W T w T

9( y Y (

C T %

The Joukowski mapping transforms a circle into an ellipse in the complex plane. To prove the asymptotic optimality, we begin with a lemma due to Zarantonello, which deals with the particular case where the ellipse reduces to a circle. This particular case is important in itself.

See reference [168] for a proof.

Note that by changing variables, shifting, and rescaling the polynomial, then for any circle centered at and for any scalar such that , the following min-max result holds:

We start by showing the second inequality. Any polynomial

in which

is the dominant root of the equation

of degree

a   i

0 H w H 0 H  w H 

0X

8 P 75 0 9QH U 6 2

W P G X

C4 FI6

H0

let

( Y )

Consider the ellipse mapped from be any point in the complex plane not enclosed by it. Then

by the mapping

( Y X y

Now consider the case of an ellipse centered at the origin, with foci major axis , which can be considered as mapped by from the circle convention that . Denote by such an ellipse.

9( y

0

 0 0

P $"G 6 V0 0 P H 356 U 2

the minimum being achieved for the polynomial

and semi, with the

a  i

0 0

P  G g0 X T 0  QH 46 U 2 P 3

 Y ) (

W P G X 

W P G X

C4 5I6

C4 FI6

 

A Av ~

1 P

and let

be a point of

and

satis-

 (Y )

Let not enclosed by

be a circle of center the origin and radius . Then

C C
! " 
i   

 %! 

!  $#


TX

    

 "  

& 9   7 ( 1) 6 'vA 0 (

| | Cv |yt
 
A 7 4 32 6 5

'


1 P #

A point on the ellipse is transformed by from a certain in . Similarly, let be one of the two inverse transforms of by the mapping, namely, the one with largest modulus. Then, can be rewritten as

which is a scaled Chebyshev polynomial of the rst kind of degree in the variable . It is apparent that the maximum modulus of this polynomial is reached in particular when is real, i.e., when . Thus,
W

which proves the second inequality. To prove the left inequality, we rewrite (6.97) as

The polynomial in of degree inside the large modulus bars in the right-hand side is such that its value at is one. By Lemma 6.4, the modulus of this polynomial over the circle is not less than , i.e., for any polynomial, satisfying the constraint ,

This proves that the minimum over all such polynomials of the maximum modulus on the ellipse is . The difference between the left and right bounds in (6.96) tends to zero as increases to innity. Thus, the important point made by the theorem is that for large , the Chebyshev polynomial where
W W

U H w QH u F X u u U H w QH T u F X u u

W w

H0

H 0

0 H 0

H u  H u 

H w H ( H w H

H 0

H 0

 0

0 

0 8 Q7 I6 2 PH U

q0

H 0

and take the modulus of

Consider the particular polynomial obtained by setting

and

for

r   i

 Y ) (

uF

X u

U H w QH u F X u u U H w QH T u F

w uF X u u w T F X u u u T

0 H w H 0 H  w H 

H F

F u u F u u

H w H H w H

H u  H u 

H u  H u 

0 X

H u  H u 

X
X T

8 7 U 2 0 P H 6

 H H

0 

fying the constraint

can be written as

5 C

  1B

5| j yq|  

  " "

 |5

 

Y ) (



C
T

is close to the optimal polynomial. More specically, Chebyshev polynomials are asymptotically optimal. For a more general ellipse centered at , and with focal distance and semimajor axis , a simple change of variables shows that the near-best polynomial is given by

In addition, by examining the expression for it is easily seen that the maximum modulus of , i.e., the innity norm of this polynomial over the ellipse, is reached at the point located on the real axis. From this we get,
H

Here, we point out that and both can be purely imaginary [for an example, see part (B) of Figure 6.3]. In this case is real and the numerator in the above expression is always real. Using the denition for we obtain the following useful expression and approximation:

Finally, we note that an alternative and more detailed result has been proven by Fischer and Freund in [89].

The following lemma characterizes the approximation obtained from the Conjugate Gradient algorithm.

  eX X

where

is a polynomial of degree

such that

 i  U  FC4I6 y    eX T T ! X T  X w  ym

Let CG algorithm, and let form

be the approximate solution obtained from the -th step of the where is the exact solution. Then, is of the

W  ( X

W t   

As usual,

denotes the norm dened by

a   i

a   i

a w i

I H 3 0A B '  & 8  QH A Q5 2P5  P0' ) ) 5 ' ) E 3 5 E

 Gy

y

 6 

H 0  6  ) 0 H   ) 

  

 H w H X H  6  ) T  6  H ) X

w 

H %

q0

w H

w H

Gy 

y

X w wc

P  $"G 6 )0  P H 76 U 2

(  ( X c

  

cT  6 

 c

 "   T X c %

H c w

w 

| | Cv |yt
)
w

 ! ! #P$

"

'

 6  H ) H   ) 

A w

 


1 P #

This is a consequence of the fact that minimizes the -norm of the error in the afne subspace , a result of Proposition 5.2, and the fact that is the set of all vectors of the form , where is a polynomial of degree . From this, the following theorem can be proved.

Then,

From the previous lemma, it is known that minimizes the error over polynomials which take the value one at , i.e.,

The result follows immediately by using the well known result of Theorem 6.4 from approximation theory. This gives the polynomial which minimizes the right-hand side.

A slightly different formulation of inequality (6.102) can be derived. Using the relation,

X y

w w w

T y

w w

then

   0 X 0

Therefore,

  

X     QH s  U  P 6 W X   eX  R T T P HR 6 X R  XR F XR R f eX

T T

 QX H s 6  U  P



P G  

C4 FI6



U G



 

y 

  X

If initial error

are the eigenvalues of in the eigenbasis, then

, and

the components of the

 

r(( y rEW( R F

   X

P G   8 U

C4 5I6

 

W(V( y rE( R

in which

is the Chebyshev polynomial of degree

of the rst kind.

-norm of

pq wq i

r0 wq i

Let be the approximate solution obtained at the the exact solution and dene Conjugate Gradient algorithm, and

-th step of the

5 C

w ( X T    c 

  1B

1 R   R

5| j yq|  
y

 

f 

  " "

 w
 w 

 |5

cv ~

A 6

A
'

6 C

This bound is similar to that of the steepest descent algorithm except that the condition number of is now replaced by its square root.

We begin by stating a global convergence result. Recall that a matrix is called positive denite if its symmetric part is Symmetric Positive Denite. This is equivalent to the property that for all nonzero real vectors .

If

is a positive denite matrix, then GMRES(m) converges for any

This is true because the subspace contains the initial residual vector at each restart. Since the algorithm minimizes the residual norm in the subspace , at each outer iteration, the residual norm will be reduced by as much as the result of one step of the Minimal Residual method seen in the previous chapter. Therefore, the inequality (5.18) is satised by residual vectors produced after each outer iteration and the method converges.

Next we wish to establish a result similar to the one for the Conjugate Gradient method, which would provide an upper bound on the convergence rate of the GMRES iterates. We begin with a lemma similar to Lemma 6.5.

  eX X

C4 FI6

  eX X

and

Let be the approximate solution obtained from the GMRES algorithm, and let . Then, is of the form

-th step of the

a w i

F 5 63 I  ' 22P5  P0' ) 5 ) E 3 5 E

y  y w

 Q R ! 

in which is the spectral condition number Substituting this in (6.102) yields,

a w i a w i w i aa w i

y w  ! RR w RR !  R 1 R   R  w R  y w w 


w  V A

w w w

X 

 

Now notice that

 "  

w 

 #P"  ! !

| | Cv |yt
X

WV (

 

A

1 P

Av ~

'


1 P #

 !

Unfortunately, it not possible to prove a simple result such as Theorem 6.6 unless normal.

Now the polynomial which minimizes the right-hand side in the above inequality can be used. This yields the desired result,

The results of Section 6.11.2 on near-optimal Chebyshev polynomials in the complex plane can now be used to obtain an upper bound for . Assume that the spectrum of in contained in an ellipse with center , focal distance , and major semi axis . In addition it is required that the origin lie outside this ellipse. The two possible cases are shown in Figure 6.3. Case (B) corresponds to the situation when is purely imaginary, i.e., the major semi-axis is aligned with the imaginary axis.

eigenvalues of

be a diagonalizable matrix, i.e, let where is the diagonal matrix of eigenvalues. Assume that all the are located in the ellipse which excludes the origin. Then, the

W 3

( ( X c

Let

v0 R X

P G

0 PQHI6  R    

 W

  

W 

P G

   

    

V y 

(  ( X c

tA   

 

W (V( ( H54 C DA # # '

tA 

~{

Since ,

minimizes the residual norm over

, then for any consistent polynomial

0 R X

  P w HW 6 R

X

Since

is diagonal, observe that

  

 W   

and

Let be any polynomial of degree which satises the constraint the vector in to which it is associated via . Then,

where

t A

P G   X

Then, the residual norm achieved by the

-th step of GMRES satises the inequality

0 R X



QI6  R PH

 W

P G 

C4 FI6

X

U

  

P G

 W 

  !

ty 

where

Assume that is a diagonalizable matrix and let is the diagonal matrix of eigenvalues. Dene,

( ( ( W

DCA

' )$ )

H4 h v A

w

space the form

This is true because minimizes the -norm of the residual in the afne sub, a result of Proposition 5.3, and the fact that is the set of all vectors of , where is a polynomial of degree .


is ,

5 C 

  1B

5| j yq|  

  " "

 |5

 w  
'

A 6

A 6

'

9 C
'0

P  $"G 6  6  ) 76 0 ) 0  P H 4U    ) X   W P G P  $"G T 6  X  U X P G 0 0  QI7 6 U   FI6 PH C4

Y `

The second inequality is due to the fact that the maximum modulus of a complex analytical function is reached on the boundary of the domain. We can now use as a trial polynomial dened by (6.98), with :

v0 X

W P P  $"G 6  G X  U X 76 0  P H 4U   5I6 C4

0 TR X
T

 P H 6 R

 W

W P G X


C4 5I6

! $ 


(B)

! "#
(A)

All that is needed is an upper bound for the scalar By denition,

P G

! " 



 6  )   ) X 

 
residual norm achieved at the

 "  

| | Cv |yt

This completes the proof.

Ellipses containing the spectrum of real ; case (B): purely imaginary .

-th step of GMRES satises the inequality,

. Case (A):

under the assumptions.


c

A 7 4 32 6 5


!"#


Since the condition number of the matrix of eigenvectors is typically not known and can be very large, results of the nature of the corollary are of limited practical interest. They can be useful only when it is known that the matrix is nearly normal, in . which case,

In many circumstances, it is desirable to work with a block of vectors instead of a single vector. For example, out-of-core nite-element codes are more efcient when they are programmed to exploit the presence of a block of the matrix in fast memory, as much as possible. This can be achieved by using block generalizations of Krylov subspace methods, for which always operates on a group of vectors instead of a single vector. We begin by describing a block version of the Arnoldi algorithm.

The above algorithm is a straightforward block analogue of Algorithm 6.1. By construction, the blocks generated by the algorithm are orthogonal blocks that are also orthogonal to each other. In the following we denote by the identity matrix and use the following notation:

matrix of the last

w   E

for columns of

 W U

W "U

(Y`! u R

u u W u R Ry

( ( VV (

1. Choose a unitary matrix of dimension 2. For Do: 3. Compute 4. Compute 5. Compute the Q-R factorization of : 6. EndDo

y

 Gy

 

 6 

w 

w 6

w H

w H

X T w " w c H % c y  6  w 6

y

An explicit expression for the coefcient readily obtained from (6.99-6.100) by taking

and an approximation are

5 C

Y  6  )    ) 

  1B

5| j yq|  

 W ( u R uR X W ( 8& ( ( $T (

P P78 Q B D 6

 

9   i} i~ 

ERu

  " "

u u  R

w 

3 G E B 9FD  4CA

 |5

'

! ( ( #V(

 6  )   ) 

u uR

'

1 2

~Q v

' 7$ % #

Then, the following analogue of the relation (6.4) is easily proved:

Here, the matrix is no longer Hessenberg, but band-Hessenberg, meaning that it has subdiagonals instead of only one. Note that the dimension of the subspace in which the solution is sought is not but . A second version of the algorithm uses a modied block Gram-Schmidt procedure instead of the simple Gram-Schmidt procedure used above. This leads to a block generalization of Algorithm 6.2, the Modied Gram-Schmidt version of Arnoldis method.

Again, in practice the above algorithm is more viable than its predecessor. Finally, a third version, developed by A. Ruhe [170] for the symmetric case (block Lanczos), yields a variant that is quite similar to the original Arnoldi algorithm. Assume that the initial block of orthonormal vectors, is available. The rst step of the algorithm is to multiply by and orthonormalize the resulting vector against . The resulting vector is dened to be . In the second step it is that is multiplied by and orthonormalized against all available s. Thus, the algorithm works similarly to Algorithm 6.2 except for a delay in the vector that is multiplied by at each step.

Observe that the particular case coincides with the usual Arnoldi process. Also, the dimension of the subspace of approximants, is no longer restricted to being a multiple

H U  W

X   W

u  3

W "U

1. Choose initial orthonormal vectors 2. For Do: 3. Set ; 4. Compute ; 5. For Do: 6. 7. 8. EndDo 9. Compute and 10. EndDo

( (

 "U W

S @Q 8 @ &6 T94 7 5

W "U

 R R

1. Choose a unitary matrix of size 2. For Do: 3. Compute 4. For do: 5. 6. 7. EndDo 8. Compute the Q-R decomposition 9. EndDo

a w i

 C

 "U W

 4E D  G B

W "U

SQ

P P76 8 Q B D

P P76 8 Q B D

 !

uR

( (

yq"  j 5| 
H  "U W u  3 H R R ( XR( ( H VQ T y 3
G E B 9FD  3 A G E B 9FD  3 A

!( $V(

R u u R V(Q ( u

w y

! ( 3 ( #V(

R "X W U

( H 

3 u u y vR ( 3 E

y ( 3

( "3

1 2

1 2

vR E

~h v

~h v

) 0 (&#$ ' %

) 0 (&#$ ' %

T
W

of the block-size as in the previous algorithms. The mathematical equivalence of Algorithms 6.22 and 6.23 when is a multiple of is straightforward to show. The advantage of the above formulation is its simplicity. A slight disadvantage is that it gives up some potential parallelism. In the original version, the columns of the matrix can be computed in parallel whereas in the new algorithm, they are computed in sequence. This can be remedied, however, by performing matrix-by-vector products every steps. At the end of the loop consisting of lines 5 through 8 of Algorithm 6.23, the vector satises the relation

As a consequence, the analogue of the relation (6.5) for Algorithm 6.23 is

As before, for any the matrix represents the matrix with columns . The matrix is now of size . Now the block generalizations of FOM and GMRES can be dened in a straightforward way. These block algorithms can solve linear systems with multiple right-hand sides,

or, in matrix form

where each column is . It is preferable to use the unied notation derived from Algorithm 6.23. In this notation, is not restricted to being a multiple of the block-size and the same notation is used for the s as in the scalar Arnoldi Algorithm. Thus, the rst step of the block-FOM or block-GMRES algorithm is to compute the QR factorization of the block of initial residuals:

Here, the matrix is unitary and is upper triangular. This factorization provides the rst vectors of the block-Arnoldi basis. Each of the approximate solutions has the form

and, grouping these approximations

in a block

and the

in a block

, we can

rQ0rwq i

P G

P G R b

P G

( V(

P G P G R R PW $ ( P ( 8& X G  ( P G  ( eDG  %! 

( P G b w P G R R

where the columns of the matrices Given an initial block of initial guesses initial residuals

and for

are the

s and s, respectively. , we dene the block of

r wq i

r rwq i

prrwq i

W u (

W "U

H  "U W

( "(

T T WVT y

W X & ( ( % ( $

H R R

 $

~E P G R

7E

X U

W  R 7QfH X U

(P G R

P G R PRG

P R G t

g

X w u

X W & V( $ (

P G R

where

and are related by

. Line 9 gives

which results in

5 C
u

  1B

H (R R

5| j yq|  

 R

  " "

 |5

write

It is now possible to imitate what was done for the standard FOM and GMRES algorithms. The only missing link is the vector in (6.21) which now becomes a matrix. Let be the matrix whose upper principal block is an identity matrix. Then, the relation (6.109) results in

is a vector of length whose components are zero except those from 1 to which are extracted from the -th column of the upper triangular matrix . The matrix is an matrix. The block-FOM approximation would consist of deleting the last rows of and and solving the resulting system,

The approximate solution is then computed by (6.112). The block-GMRES approximation is the unique vector of the form which minimizes the 2-norm of the individual columns of the block-residual (6.114). Since the column-vectors of are orthonormal, then from (6.114) we get,

To minimize the residual norm, the function on the right hand-side must be minimized over . The resulting least-squares problem is similar to the one encountered for GMRES. The only differences are in the right-hand side and the fact that the matrix is no longer Hessenberg, but band-Hessenberg. Rotations can be used in a way similar to the scalar case. However, rotations are now needed at each new step instead of only one. Thus, if and , the matrix and block right-hand side would be as follows:

For each new column generated in the block-Arnoldi process, rotations are required to eliminate the elements , for down to . This backward order is important. In the above example, a rotation is applied to eliminate and then a second rotation is used to eliminate the resulting , and similarly for the second, third step, etc.

P G w P G R b R

a rw i

&

&

W R

W W

&

P G b  R

w 

R W

P G R

PG

&

&

PRG b

W

P G

 PRG

w 

P G

The vector

aa rw i rw i
W %

C C

X U

W  X U X % W ( $ $ & ( % 

&

FS

Vy

 !

R SW

R R

yq"  j 5| 
$
P G R X U

P G R

 H

W rW

WR

P G & ! R $X

P G R b

This complicates programming slightly since two-dimensional arrays must now be used to save the rotations instead of one-dimensional arrays in the scalar case. After the rst column of is processed, the block of right-hand sides will have a diagonal added under the diagonal of the upper triangular matrix. Specically, the above two matrices will have the structure,

where a represents a nonzero element. After all columns are processed, the following least-squares system is obtained.

To obtain the least-squares solutions for each right-hand side, ignore anything below the horizontal lines in the above matrices and solve the resulting triangular systems. The residual norm of the -th system for the original problem is the 2-norm of the vector consisting of the components , through in the -th column of the above block of right-hand sides. Generally speaking, the block methods are of great practical value in applications involving linear systems with multiple right-hand sides. However, they are not as well studied from the theoretical point of view. Perhaps, one of the reasons is the lack of a convincing analogue for the relationship with orthogonal polynomials, established in subsection 6.6.2 for the single-vector Lanczos algorithm. The block version of the Lanczos algorithm has not been covered but the generalization is straightforward.

1 In the Householder implementation of the Arnoldi algorithm, show the following points of detail: .

is unitary and its inverse is

5 C

  1B

5| j yq|  

X#

E w !

  " "
v

 |5

 BBA  v  A A

w !

vcE~lf

X#

 `  

for

matrix.

The

s are orthonormal.

The vectors are equal to the Arnoldi vectors produced by the Gram-Schmidt version, except possibly for a scaling factor.

2 Rewrite the Householder implementation of the Arnoldi algorithm with more detail. In particular, dene precisely the Householder vector used at step (lines 3-5). 3 Consider the Householder implementation of the Arnoldi algorithm. Give a detailed operation count of the algorithm and compare it with the Gram-Schmidt and Modied Gram-Schmidt algorithm. 4 Derive the basic version of GMRES by using the standard formula (5.7) with .

5 Derive a version of the DIOM algorithm which includes partial pivoting in the solution of the Hessenberg system. 6 Show how the GMRES and FOM methods will converge on the linear system

and with

7 Give a full proof of Proposition 6.11.

Assume that (full) GMRES is used to solve a linear system, with the coefcient matrix is the maximum number of steps that GMRES would require to converge?

Assume that (full) GMRES is used to solve a linear system with the coefcient matrix

be the initial residual vector. It is assumed that the degree of the minimal polynomial of with respect to (i.e., its grade) is . What is the maximum number of steps that GMRES would require to converge for this matrix? [Hint: Evaluate the sum where is the minimal polynomial of with respect to .]

!9   !

 v

q3

 

  !  v ! 

  !

g 

9 Let a matrix

have the form:

g 

8 Let a matrix

have the form

 

 1

, where

is the -th column of the

identity

when

. What

. Let

C
5 

V v  BABCAB    # A

5 #

 H v    v #  C   v X #
 A A A C BB0B v C

|5 | y  w  |y|


 

 

 

  

and

Show that

Assume that (full) GMRES is used to solve a linear system with the coefcient matrix What is the maximum number of steps that GMRES would require to converge?

11 Show that if is nonsingular, i.e., is dened, and if , then , i.e., both the GMRES and FOM solutions are exact. [Hint: use the relation (6.46) and Proposition 6.11 or Proposition 6.12.] 12 Derive the relation (6.49) from (6.47). [Hint: Use the fact that the vectors on the right-hand side of (6.47) are orthogonal.] 13 In the Householder-GMRES algorithm the approximate solution can be computed by formulas (6.25-6.27). What is the exact cost of this alternative (compare memory as well as arithmetic requirements)? How does it compare with the cost of keeping the s? 14 An alternative to formulas (6.25-6.27) for accumulating the approximate solution in the Householder-GMRES algorithm without keeping the s is to compute as

where is a certain -dimensional vector to be determined. (1) What is the vector for the ? [Hint: Exploit (6.11).] above formula in order to compute the correct approximate solution (2) Write down an alternative to formulas (6.25-6.27) derived from this approach. (3) Compare the cost of this approach with the cost of using (6.25-6.27). 15 Obtain the formula (6.76) from (6.75).

17 The Lanczos algorithm is more closely related to the implementation of Algorithm 6.18 of the Conjugate Gradient algorithm. As a result the Lanczos coefcients and are easier to extract from this algorithm than from Algorithm 6.17. Obtain formulas for these coefcients from the coefcients generated by Algorithm 6.18, as was done in Section 6.7.3 for the standard CG algorithm. 18 Show that if the rotations generated in the course of the GMRES (and DQGMRES) algorithm are such that then GMRES, DQGMRES, and FOM will all converge.

20 Prove that the inequality (6.56) is sharper than (6.53), in the sense that (for ). [Hint: Use Cauchy-Schwarz inequality on (6.56).]

 E 2  

 5

19 Show the exact expression of the residual vector in the basis or DQGMRES. [Hint: A starting point is (6.52).]

for either GMRES

AAA C B0BB C  v C

 v t 

 A

 9

@ @

16 Show that the determinant of the matrix

in (6.82) is given by

 

v t g

 0ABA  v  E A 

..

g


10 Let


.

5 C

  1B

5| j yq|  

  " "

9   e g 3

 |5

 `  

21 Denote by the unit upper triangular matrix in the proof of Theorem 6.1 which is obtained from the Gram-Schmidt process (exact arithmetic assumed) applied to the incomplete orthogonalization basis . Show that the Hessenberg matrix obtained in the incomplete orthogonalization process is related to the Hessenberg matrix obtained from the (complete) Arnoldi process by

N OTES AND R EFERENCES . Lemma 6.1 was proved by Roland Freund [95] in a slightly different form. Proposition 6.12 is due to Brown [43] who proved a number of other theoretical results, including Proposition 6.11. Recently, Cullum and Greenbaum [63] discussed further relationships between FOM and GMRES and other Krylov subspace methods. The Conjugate Gradient method was developed independently and in different forms by Lanczos [142] and Hesteness and Stiefel [120]. The method was essentially viewed as a direct solution technique and was abandoned early on because it did not compare well with other existing techniques. For example, in inexact arithmetic, the method does not terminate in steps as is predicted by the theory. This is caused by the severe loss of of orthogonality of vector quantities generated by the algorithm. As a result, research on Krylov-type methods remained dormant for over two decades thereafter. This changed in the early 1970s when several researchers discovered that this loss of orthogonality did not prevent convergence. The observations were made and explained for eigenvalue problems [158, 106] as well as linear systems [167]. The early to the middle 1980s saw the development of a new class of methods for solving nonsymmetric linear systems [13, 14, 127, 172, 173, 185, 218]. The works of Faber and Manteuffel [85] and Voevodin [219] showed that one could not nd optimal methods which, like CG, are based on short-term recurrences. Many of the methods developed are mathematically equivalent, in the sense that they realize the same projection process, with different implementations. The Householder version of GMRES is due to Walker [221]. The Quasi-GMRES algorithm described in Section 6.5.7 was initially described by Brown and Hindmarsh [44], although the direct version DQGMRES was only discussed recently in [187]. The proof of Theorem 6.1 can be found in [152] for the QMR algorithm. The non-optimality of the Chebyshev polynomials on ellipses in the complex plane was established by Fischer and Freund [90]. Prior to this, a 1963 paper by Clayton [59] was believed to have established the optimality for the special case where the ellipse has real foci and is real. Until recently, little attention has been given to block Krylov methods. In addition to their attraction for solving linear systems with several right-hand sides [177, 196], these techniques can also help reduce the effect of the sequential inner products in parallel environments and minimize I/O costs in out-of-core implementations. The block-GMRES algorithm is analyzed by Simoncini and Gallopoulos [197] and in [184]. Alternatives to GMRES which require fewer inner products have been proposed by Sadok [188] and Jbilou [125]. Sadok investigated a GMRES-like method based on the Hessenberg algorithm [227], while Jbilou proposed a multi-dimensional generalization of Gastinels method seen in Exercise 2 of Chapter 5.

 v Tt v

|5 | y  w  |y|

X 

( (

9 W ( QH C

W X

X W

(


W W

W(( W W( W QH C


T

The Lanczos biorthogonalization algorithm is an extension to nonsymmetric matrices of the symmetric Lanczos algorithm seen in the previous chapter. One such extension, the Arnoldi procedure, has already been seen. However, the nonsymmetric Lanczos algorithm is quite different in concept from Arnoldis method because it relies on biorthogonal sequences instead of orthogonal sequences.

The algorithm proposed by Lanczos for nonsymmetric matrices builds a pair of biorthogonal bases for the two subspaces

I H BA '  & 8 50A 3 H

! ! $#$6

R Q c v g g  Q~ l z k  pix p `Peu rHWFQ8X986c8 x 9P`gIXk%B8QYrB9PmQ8rBP p B F p sP F 6 pFkesD8 SHtF%Ba8qEXrB}RuwsH `Pa8mAqIXl98RT9PzWx9867wi9Ps%XruRXW6EFXpqRXgx p pePu HrDRtsHWBpFtsVQ8rBP b n n p nnP 8 U D 6 5 p B s s H pF9P6aD`RX6rFczX rHWFeu8EXrBQ8rBm8dWD98Cie%DX 9u%P'IXf98W6F HrXu VX SHFRP H p9P%Xu F Y 8 U s n P 6 5 s F bY U B pP s s q%RXW6EFX rH Yd89D9P 9YePd8GFRDwsQ8BrP 96u HrAVaD`RX6WFAe89u9PnRDbD %TX pEx 9fmGD9DP 9pu B P sX H 6 Y 8U B X tsP }r8EDtB HIX%BWFduuD98p 9Ttp7c%BBpHX tx I8pF98`6n 9PfzXu%6F 9f D rH7X6s rHpF5 `PiesH Xp SH9PpFsb p%XIXurD RXW6I8FERFAtU9faHXrBEXeoUnBn X zcP qIXI8D pFVqeYU X8 89puH hIXB A%6%Bu 8rHQY6yX H 8 TY B P l f8 ` U b s n F B X Ps X aD`RX6WFAU e89u9PnRDbD 9TX tx %fc%B8rVs YeQ8%BY SHRDXf%Bd8pF9P%6}DRX rHWTQ8%B9875 Y 8 p B X Ub P 8 s u n u b n 6

  !  2 0  31) ( '     1  $ 1 0 !

and

The algorithm that achieves this is the following.


5 Y8 HFV8 5 E D

. If

Stop

EndDo

Note that there are numerous ways to choose the scalars in lines 7 and 8. These two parameters are scaling factors for the two vectors and and can be selected in any manner to ensure that . As a result of lines 9 and 10 of the algorithm, it is only necessary to choose two scalars that satisfy the equality

The choice taken in the above algorithm scales the two vectors so that they are divided by two scalars which have the same modulus. Both vectors can also be scaled by their 2-norms. In that case, the inner product of and is no longer equal to 1 and the algorithm must be modied accordingly; see Exercise 3. Consider the case where the pair of scalars is any pair that satises the relation (7.1). Denote by the tridiagonal matrix

If the determinations of of lines 78 are used, then the s are positive and . Observe from the algorithm that the vectors belong to , while the s are in . In fact, the following proposition can be proved.

aa q i

   W

W "U

W "U

Moreover,

is a basis of and the following relations hold,

and

 R R

 ( E

y 

$V( !(

   W

!( #V(

vectors

If the algorithm does not break down before step , then the , and , form a biorthogonal system, i.e.,

is a basis of

Rwq i

a0 q i

W "U

u "U W

u X

S u r(

W"U W "U u u X

W"U W"U U "U W W u ( u u S uX W"U X U W u X ( uy S W "U T "U W u ( u T

W"U W U S u r( u X W "U W "U u u

W X

W"U W "U W "U u X "U u W S W U W W U u  "U u u X  X W"U u ( "U u W "U W W0 Y` u ( u T0 W uX X u W u X u u  u T  u u S u u  u ( X (u u y !#V(( W T

uR vX R ( u y X  V( u T

 S

U W

W X 

! W

W W"U U U u u W" S W W"U u U u W"U u u vu  W

uX(

W  $!  Y (

W "U

u S

C DA 

X R R

) ' 2 )

Ee( R Q v A 

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Choose two vectors Set , For Do:

such that

6 DQ S @ B @ 6 D D S 8 D PT&Q C4PR 7 YCTQ

XW

5 l C t lB C! j   5 
7C@ D E 6

3 C 5 7 4DA 

 
1 2

~h v
) 0 (&#$ ' %


uX
' 

'

u S

The biorthogonality of the vectors will be shown by induction. By assumption . Assume now that the vectors and are biorthogonal, and let us prove that the vectors and are biorthogonal. First, we show that for . When , then

The last inner product in the above expression vanishes by the induction hypothesis. The two other terms cancel each other by the denition of and the fact that . Consider now the inner product with ,

For , all of the inner products in the above expression vanish, by the induction hypothesis. For , the inner product is

It can be proved in exactly the same way that for . Finally, by construction . This completes the induction proof. The proof of the matrix relations (7.37.5) is similar to that of the relations (6.46.6) in Arnoldis method.


The relations (7.37.5) allow us to interpret the algorithm. The matrix is the projection of obtained from an oblique projection process onto and orthogonally to . Similarly, represents the projection of on and orthogonally to . Thus, an interesting new feature here is that the operators and play a dual role because similar operations are performed with them. In fact, two linear systems are solved implicitly, one with and the other with . If there were two linear systems to solve, one with and the other with , then this algorithm is suitable. Otherwise, the operations with are essentially wasted. Later a number of alternative techniques developed in the literature will be introduced that avoid the use of . From a practical point of view, the Lanczos algorithm has a signicant advantage over Arnoldis method because it requires only a few vectors of storage, if no reorthogonalization is performed. Specically, six vectors of length are needed, plus some storage for the tridiagonal matrix, no matter how large is. On the other hand, there are potentially more opportunities for breakdown with the nonsymmetric Lanczos method. The algorithm will break down whenever as dened in line 7 vanishes. This is examined more carefully in the next section. In practice, the difculties are more likely to be caused by the near occurrence of this phenomenon. A look at the algorithm indicates that the Lanczos vectors may have to be scaled by small

r  i  i
y

&

( X u u

W W 8& ( u u S R XR X R X w WR R & ( u T XR W & ( u u S ( XR XR E

u (

W "U

W  

T X

u X

& u ( X

u u S

u (

 I Y W

y W U ( W"U u X u W"U ( T X u R T Y W "U W W & u ( u u S u ( u u S $ W u W X u w W u W T u  w X u u eS( T u $ W"W U u W u ( W"U u X u y T X vE y T E

5 C
W

u u S X u ( u u  T  "U E W WT  u ( W u (

  1B W U

u 

5| j yq|  
X

W "U

R (R

W "U W "U "U W S w R R e( u $ W u X W "U u S ( $ W X R  u T "U u X W W"U ( T $ W u X ( u uT u  X R "U u XR W T ( u T T XR T W "U W"U ( $ W u X u ( u X u u W "U X Y$ T W  E U ( T XW R ( u u y W T

w ( 

  " "


P

W 

XT


 |5

A T 6

quantities when this happens. After a few steps the cumulated effect of these scalings may introduce excessive rounding errors. Since the subspace from which the approximations are taken is identical to that of Arnoldis method, the same bounds for the distance are valid. However, this does not mean in any way that the approximations obtained by the two methods are likely to be similar in quality. The theoretical bounds shown in Chapter 5 indicate that the norm of the projector may play a signicant role.

There are various ways to improve the standard nonsymmetric Lanczos algorithm which we now discuss briey. A major concern here is the potential breakdowns or near breakdowns in the algorithm. There exist a number of approaches that have been developed to avoid such breakdowns. Other approaches do not attempt to eliminate the breakdown, but rather try to deal with it. The pros and cons of these strategies will be discussed after the various existing scenarios are described. Algorithm 7.1 will abort in line 7 whenever,

This can arise in two different ways. Either one of the two vectors or vanishes, or they are both nonzero, but their inner product is zero. The rst case is the lucky breakdown scenario which has been seen for symmetric matrices. Thus, if then is invariant and, as was seen in Chapter 5, the approximate solution is exact. If then is invariant. However, in this situation nothing can be said about the approximate solution for the linear system with . If the algorithm is being used to solve a pair of linear systems, one with and a dual system with , then the approximate solution for the dual system will be exact in this case. The second scenario in which (7.6) can occur is when neither of the two vectors is zero, but their inner product is zero. Wilkinson (see [227], p. 389) called this a serious breakdown. Fortunately, there are cures for this problem which allow the algorithm to continue in most cases. The corresponding modications of the algorithm are often put under the denomination Look-Ahead Lanczos algorithms. There are also rare cases of incurable breakdowns which will not be discussed here (see references [161] and [206]). The main idea of Look-Ahead variants of the Lanczos algorithm is that the pair can often be dened even though the pair is not dened. The algorithm can be pursued from that iterate as before until a new breakdown is encountered. If the pair cannot be dened then the pair can be tried, and so on. To better explain the idea, it is best to refer to the connection with orthogonal polynomials mentioned earlier for the symmetric case. The relationship can be extended to the nonsymmetric case by dening the bilinear form on the subspace

Unfortunately, this is now an indenite inner product in general since can be zero or even negative. Note that there is a polynomial of degree such that and, in fact, the same polynomial intervenes in the equivalent expression of .

a q i

a q i

W "U

Wu "U

Y `

W "U

W "U

( 

W "U

F DC776P5 P&  I B & PDCA ) 8 6 E 'B A 8 A E I 5 8 )B 3

W "U

R #U

u (

u (

W "U

R #U

X 

(W

U W

u (

T T

5 l C t lB C! j   5 

U W

(  

u QH C

 ! #$6

U U u ( u

  Y

U U u ( u

W"U

C H

More precisely, there is a scalar such that . Similar to the symmetric case, the nonsymmetric Lanczos algorithm attempts to compute a sequence of polynomials that are orthogonal with respect to the indenite inner product dened above. If we dene the moment matrix

then this process is mathematically equivalent to the computation of the factorization

of the moment matrix , in which is upper triangular and is lower triangular. Note that is a Hankel matrix, i.e., its coefcients are constant along anti-diagonals, i.e., for . Because

we observe that there is a serious breakdown at step if and only if the indenite norm of the polynomial at step vanishes. If this polynomial is skipped, it may still be possible to compute and continue to generate the sequence. To explain this simply, consider

Both and are orthogonal to the polynomials . We can dene (somewhat arbitrarily) , and then can be obtained by orthogonalizing against and . It is clear that the resulting polynomial will then be orthogonal against all polynomials of degree ; see Exercise 5. Therefore, the algorithm can be continued from step in the same manner. Exercise 5 generalizes this for the case where polynomials are skipped rather than just one. This is a simplied description of the mechanism which underlies the various versions of Look-Ahead Lanczos algorithms proposed in the literature. The Parlett-Taylor-Liu implementation [161] is based on the observation that the algorithm breaks because the pivots encountered during the LU factorization of the moment matrix vanish. Then, divisions by zero are avoided by performing implicitly a pivot with a matrix rather than using a standard pivot. The drawback of Look-Ahead implementations is the nonnegligible added complexity. Besides the difculty of identifying these near breakdown situations, the matrix ceases to be tridiagonal. Indeed, whenever a step is skipped, elements are introduced above the superdiagonal positions, in some subsequent step. In the context of linear systems, near breakdowns are rare and their effect generally benign. Therefore, a simpler remedy, such as restarting the Lanczos procedure, may well be adequate. For eigenvalue problems, LookAhead strategies may be more justied.
W

W "U

W ( u (

and

 I X

5 C
T H ( W X

X 

H   W

  1B X  y

u u

 u R  W

5| j yq|  


W u ( X

W "U

u vR !

W "U

r( W R

W "U

 u ( u

  " "

 u

 c

 |5
H  u U W u W"U

w 

H w E 

We present in this section a brief description of the Lanczos method for solving nonsymmetric linear systems. Consider the (single) linear system:

where is and nonsymmetric. Suppose that a guess to the solution is available and let its residual vector be . Then the Lanczos algorithm for solving (7.8) can be described as follows.

Note that it is possible to incorporate a convergence test when generating the Lanczos vectors in the second step without computing the approximate solution explicitly. This is due to the following formula, which is similar to Equation (6.66) for the symmetric case,

and which can be proved in the same way, by using (7.3). This formula gives us the residual norm inexpensively without generating the approximate solution itself.

The Biconjugate Gradient (BCG) algorithm can be derived from Algorithm 7.1 in exactly the same way as the Conjugate Gradient method was derived from Algorithm 6.14. The algorithm was rst proposed by Lanczos [142] in 1952 and then in a different form (Conjugate Gradient-like version) by Fletcher [92] in 1974. Implicitly, the algorithm solves not but also a dual linear system with . This only the original system dual system is often ignored in the formulations of the algorithm.


a q i

1. Compute and 2. Run steps of the nonsymmetric Lanczos Algorithm, i.e., 3. Start with , and any such that 4. Generate the Lanczos vectors , 5. and the tridiagonal matrix from Algorithm 7.1. 6. Compute and .

a q i

5 US

8 @ 5 976 Q

y z v} i | 5  | ky"  vE


8 CD

X W ( ( W (

W "U

u  0 u b u

S 8 D B Q

  y  mg ( v Vm }

( (

U W

7C@ D E 6



u X0

V

 !

 u tA 

 

c z

3 C Q $ RA  5 D

mg l z k }    | y5 B!  y5
X

" t

FS W

W S  Q !  V A  ~h v w

1 2

) 0 (&#$ ' %

The Biconjugate Gradient (BCG) algorithm is a projection process onto

taking, as usual, . The vector is arbitrary, provided , but it is often chosen to be equal to . If there is a dual system to solve with , then is obtained by scaling the initial residual . Proceeding in the same manner as for the derivation of the Conjugate Gradient algorithm from the symmetric Lanczos algorithm, we write the LDU decomposition of as

and dene

Notice that the solution is updatable from in a similar way to the Conjugate Gradient algorithm. Like the Conjugate Gradient algorithm, the vectors and are in the same direction as and , respectively. Hence, they form a biorthogonal sequence. Dene similarly the matrix

Utilizing this information, a Conjugate Gradientlike algorithm can be easily derived from the Lanczos procedure.

, until convergence Do:

Y `

1. Compute 2. Set, 3. For 4. 5. 6. 7. 8.


3 3 3 3 3 3

. Choose

such that

X   (

W !

T Vma

&95 P V2 U7C PTQ S 6 Q @8 5 S @ 6 D E

W 1

W U W"U (  u ( u W"U u S X u u X u T u  u T W"U u u u  u W"U u u ( u   w u u ( u y u u  X u u X ( Y ( $}  T     T  V A  A  ~Q v

Clearly, the column-vectors


 X

of

and those

of

are A-conjugate, since,

rQ0w i

W FS W W X W FS T W

XT

FS W

The solution is then expressed as

r w i

prw i

X 

orthogonally to

I H 3 0A B '  & 8 A P5  8 3  078  E B 5 A

 I

5 C

W W

  1B

5| j yq|  
W ( VWT W TT (

( VTWTT ( W W( W  c 

R !

 ( W  c  



' DB % 0A E ) 5 H

  " "

W U

 Q 

W "U

3 R

!$6 6  
3

 |5
u

1 2

' 7$ % #

If a dual system with is being solved, then in line 1 should be dened as and the update to the dual approximate solution must beinserted after line 5. The vectors produced by this algorithm satisfy a few biorthogonality properties stated in the following proposition.

The vectors produced by the Biconjugate Gradient algorithm satisfy the following orthogonality properties:

The proof is either by induction or by simply exploiting the relations between the vectors , , , , and the vector columns of the matrices , , , . This is left as an exercise. Table 7.1 shows the results of applying the BCG algorithm with no preconditioning to three of the test problems described in Section 3.7. See Example 6.1 for the meaning of the column headers in the table. Recall that Iters really represents the number of matrix-by-vector multiplications rather the number of Biconjugate Gradient steps. Matrix F2DA F3D ORS Iters 163 123 301 Kops 2974 10768 6622 Residual 0.17E-03 0.34E-04 0.50E-01 Error 0.86E-04 0.17E-03 0.37E-02

A test run of BCG without preconditioning.

Thus, the number 163 in the rst line represents 81 steps of BCG, which require matrix-by-vector products in the iteration, and an extra one to compute the initial residual.

The result of the Lanczos algorithm is a relation of the form

W "U

in which

is the

tridiagonal matrix

a w i

I H A B '  & 8 & 8 3

t ( t

for for

wq i aQawq i

BF5 3 76Q&

W "U

u  w

(Y ( Y

8 I B E B Ib 7F 8 B

( X R u ( X R u T T

U W

Xy

C DA  7 

  6 6

A 

) ' 2 )

9. 10. 11. EndDo


C C 

 y5 B! t y5 j|
W "U W"U u S w W "U u  3 W"U u u u S w u 3 u

C DA  7 

u u

Q v A

'

'

Now (7.15) can be exploited in the same way as was done to develop GMRES. If is dened as a multiple of , i.e., if , then the residual vector associated with an approximate solution of the form

The norm of the residual vector is therefore


If the column-vectors of were orthonormal, then we would have , as in GMRES. Therefore, a least-squares solution could be obtained from the Krylov subspace by minimizing over . In the Lanczos algorithm, the s are not orthonormal. However, it is still a reasonable idea to minimize the function

over and compute the corresponding approximate solution . The resulting solution is called the Quasi-Minimal Residual approximation. Thus, the Quasi-Minimal Residual (QMR) approximation from the -th Krylov subspace is obtained as , where minimizes the function , i.e., just as in GMRES, except that the Arnoldi process is replaced by the Lanczos process. Because of the simple structure of the matrix , the following algorithm is obtained, which is adapted from the DQGMRES algorithm (Algorithm 6.13).

is small enough then Stop

W 0

W"U

11. 12. 13. If 14. EndDo

 w

W "U

 R 

W "U

W  R

10.

1. Compute and , 2. For , until convergence Do: 3. Compute and as in Algorithm 7.1 4. Update the QR factorization of , i.e., 5. Apply , to the -th column of 6. Compute the rotation coefcients , by (6.31) 7. Apply rotation , to and , i.e., compute: 8. , 9. , and,

r w i

 t v  r w i

W FS  !

X b T

b w

 Q

W W "U  b "U S  W ` W S b $  A tA

w X b

$ 

FS 

is given by

 I

5 C

  1B

5| j yq|  
b w W "U W



W "U

FS  !

y$

&

FS 

U W

( #$! $ ! E

X b T

 ty 

  " "

W "U

b w

 y ( ( !  tA g ( v9A  ~Q v

W "U

w 

X (



3 C

 |5
3

W"U 3

1 2

' 7$ % #

C
FS

The following proposition establishes a result on the residual norm of the solution. It is similar to Proposition 6.9.

The residual norm of the approximate solution

satises the

relation

According to (7.16) the residual norm is given by

and using the same notation as in Proposition 6.9, referring to (6.37)


 b

The result follows immediately using (7.19). The following simple upper bound for norm:

can be used to estimate the residual

Observe that ideas similar to those used for DQGMRES can be exploited to obtain a better estimate of the residual norm. Also, note that the relation (6.57) for DQGMRES holds. More interestingly, Theorem 6.1 is also valid and it is now restated for QMR.

Assume that the Lanczos algorithm does not break down on or before step and let be the Lanczos basis obtained at step . Let and be the residual norms obtained after steps of the QMR and GMRES algorithms, respectively. Then,

Each step of the Biconjugate Gradient algorithm and QMR requires a matrix-by-vector product with both and . However, observe that the vectors or generated with do not contribute directly to the solution. Instead, they are used only to obtain the scalars needed in the algorithm, e.g., the scalars and for BCG. The question arises

W "U

The proof of this theorem is essentially identical with that of Theorem 6.1. Note that is now known to be of full rank, so we need not make this assumption as in Theorem 6.1.

u S

 (

R

u 

W "U

W U

W R W "U f

S v Emy cm c y

 

W "U

  

W "U

W "U

C DA 

1 P

Av ~

&

in which

by the minimization procedure. In addition, by (6.40) we have

a!w i

a!w i

  0

& b

&

W W 0  "U  

w 0

FS $

W "U

W "U

V A

 b

 "

5 B k| kB5  | q| 


$

tA 

Y ` b

A 

) ' 2 )

Q v A
'

'

'

as to whether or not it is possible to bypass the use of the transpose of and still generate iterates that are related to those of the BCG algorithm. One of the motivations for this question is that, in some applications, is available only through some approximations and not explicitly. In such situations, the transpose of is usually not available. A simple example is when a CG-like algorithm is used in the context of Newtons iteration for solving . The linear system that arises at each Newton step can be solved without having to compute the Jacobian at the current iterate explicitly, by using the difference formula

This allows the action of this Jacobian to be computed on an arbitrary vector . Unfortunately, there is no similar formula for performing operations with the transpose of .

The Conjugate Gradient Squared algorithm was developed by Sonneveld in 1984 [201], mainly to avoid using the transpose of in the BCG and to gain faster convergence for roughly the same computational cost. The main idea is based on the following simple observation. In the BCG algorithm, the residual vector at step can be expressed as

in which is a polynomial of degree . From the algorithm, observe that the directions and are dened through the same recurrences as and in which is replaced by and, as a result,

which indicates that if it is possible to get a recursion for the vectors and , then computing and, similarly, causes no problem. Hence, the idea of seeking an algorithm which would give a sequence of iterates whose residual norms satisfy

The derivation of the method relies on simple algebra only. To establish the desired recurrences for the squared polynomials, start with the recurrences that dene and , which are,

r0 0  i

0  i ra 0  i

 X

u D

X u D T  (p X X u  (  X T u D T T T

Also, note that the scalar

in BCG is given by

pq0  i

where is a certain polynomial of degree satisfying the constraint the conjugate-direction polynomial is given by

. Similarly,

r 0  i

XY

u D

 I X

5 C8 3

5 C
u


 X

W "U u u S w u D X X ( XT  u  u  T X  u D T T

  1B


F A E 6P5 B 8 3  078  5 A

T H

T u

5| j yq|  
' u

(t

t

 (  X  X  u X u (  X  T u D X T u D T T T Tu 

u D

u D

 X

'

T u

X

u S

XT

  " " ' ) E

u D

WU W U u

u D

!  $  6

u 

 |5
u  u

$ Y

u D

t 9 C T


'

( u

w
u

By putting these relations together the following recurrences can be derived, in which the variable is omitted where there is no ambiguity:

If it were not for the cross terms and on the right-hand sides, these equations would form an updatable recurrence system. The solution is to introduce one of these two cross terms, namely, , as a third member of the recurrence. For the other term, i.e., , we can exploit the relation

With this we obtain the following sequence of operations to compute the approximate solution, starting with , , , .

aaa  i a0a  i R`a  i a a  i a 0  i a 0  i

( u X

u 

a 0  i a 0  i a0  i
X

 u W u X

u 
W u S w X

u u S u u W W u u D

u u S w W u u (  u W u W u u u S w u T W "U (  u D X u X T (  XT u ( p X T u D T W "U S w u W u D u W  u u D u S w u D 



T W u


T

 uD
T

 u D
T

X rX

If the above relations are squared we get

X u W "U T u X u X X W "D U u u S wTX  u TX  u D u S T X (  T u u w X  uT D X  u  u   T T XT

It is convenient to dene the auxiliary vector

then the above recurrences for the polynomials translate into

These recurrences are at the basis of the algorithm. If we dene

A slight simplication to the algorithm can be made by using the auxiliary vector . This denition leads to the relations

u S w u u

W "U W U S S  u X u u w u u w u U W  (  ( u u S X uT X W U u T u  u T U u w u u W  W u u W u u  W Wu u S w u u u u  u u S w u u  ( u  X  ( u u  Y` S `      ty  T  Y X T u  W uW u S w u u u

X
w

      

S w

W "U

u u  uu 


S w

u  

W "U Wu "U

u W "U u u D

W "U w u D u S w u

u

u DD
T T

W "U "U u D Wu

u S w X W "U T

 u D X u D
X

u D

T u D X 
T

X T X u X u D T u T

u DX u W "U X T TX T

 u D

u D

X

XT T

WU W"U u u D

 "

5 B k| kB5  | q| 

and as a result the vector


3 1 2 ) ' 7$ % #

is no longer needed. The resulting algorithm is given below.

EndDo

Observe that there are no matrix-by-vector products with the transpose of . Instead, two matrix-by-vector products with the matrix are now performed at each step. In general, one should expect the resulting algorithm to converge twice as fast as BCG. Therefore, what has essentially been accomplished is to replace the matrix-by-vector products with by more useful work. The Conjugate Gradient Squared algorithm works quite well in many cases. However, one difculty is that, since the polynomials are squared, rounding errors tend to be more damaging than in the standard BCG algorithm. In particular, very high variations of the residual vectors often cause the residual norms computed from the result of line 7 of the above algorithm to become inaccurate.

The CGS algorithm is based on squaring the residual polynomial, and, in cases of irregular convergence, this may lead to substantial build-up of rounding errors, or possibly even overow. The Biconjugate Gradient Stabilized (BICGSTAB) algorithm is a variation of CGS which was developed to remedy this difculty. Instead of seeking a method which delivers a residual vector of the form dened by (7.22), BICGSTAB produces iterates whose residual vectors are of the form

in which, as before, is the residual polynomial associated with the BCG algorithm and is a new polynomial which is dened recursively at each step with the goal of stabilizing or smoothing the convergence behavior of the original algorithm. Specically, is dened by the simple recurrence,

a  i

r a  i

 u Xu

( 

% 76 DB % 8 A F )

u DX

y X

  6

X u

W "U

 uD

 u

T  u

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Compute Set For

; arbitrary. . , until convergence Do:

 I

5 C

( u u S w u X

  1B

V9@ 58

5| j yq|  
( uT

u u  uu

&95 P V2 U7R 6 D S 6 Q @8 5 S @

S w

U W

W"U W"U u S w u u S w W"U u U u W u T u S w W"U  u u u  ( u  X  (  W"U u S X X u w u T u  u (T W"U u u X u w u T u  w u u u  u u  ( T   ( X u )( y u $}u  X ( Y T  T    V A  A  ~Q v

  " " W "U u

 |5
3

C


in which the scalar is to be determined. The derivation of the appropriate recurrence relations is similar to that of CGS. Ignoring the scalar coefcients at rst, we start with a relation for the residual polynomial . We immediately obtain

Dene,

According to the above formulas, these vectors can be updated from a double recurrence provided the scalars and were computable. This recurrence is

Consider now the computation of the scalars needed in the recurrence. According to the original BCG algorithm, with

To relate the two scalars obtain

and

, expand

explicitly in the power basis, to

Since is orthogonal to all vectors , with , only the leading power is relevant in the expansion on the right side of the above inner product. In particular, if is the leading coefcient for the polynomial , then

W (P G u  u

W U

When examining the recurrence relations for polynomials are found to satisfy the relations

and

, leading coefcients for these

P G

 ( u

w 

 W

W P G uW P G u

(  

uX 

W W W PW r"U G ( P u # G PW G r"U  u u u W "U u D W P G "  ( u W p u  G X u D % X  u D P u T T X u D T  H X  W T P G w  u X  P u G P(  X u D  u T T T u u  u   ( u p

u DX T u

X 

X 

which is computable via

 X

( u p X

u D

u  

u D

Unfortunately, ,

is not computable from these formulas because none of the vectors or is available. However, can be related to the scalar

a a  i a a  i

a  i

which is updatable provided a recurrence relation is found for the products write

. For this,

a a  i a a  i

C  u u

 ( 

X u u

X u

u X

y u  X u # y W "U u D u T u D X  uT u # WX  "U "U W u D uT

u X

W  S w "U  u u # u u # u

u D

p X u X u ( X T u X T u D T T y W

u #W u S w u D u W u T u S w u D u

u TX

X 

( u D  X u D W "U u T  u  T u S T

 "

5 B k| kB5  | q| 

W "U

W"U W "U u

u D

u S

uu

W "U

u 

u #

u D

 X   u

u 

u 

u D  X

u D

u D

and as a result

which yields the following relation for

Similarly, a simple recurrence formula for

can be derived. By denition,

and as in the previous case, the polynomials in the right sides of the inner products in both the numerator and denominator can be replaced by their leading terms. However, in this and are identical, and therefore, case the leading coefcients for

Next, the parameter must be dened. This can be thought of as an additional free parameter. One of the simplest choices, and perhaps the most natural, is to select to achieve a steepest descent step in the residual direction obtained before multiplying the residual vector by in (7.40). In other words, is chosen to minimize the 2norm of the vector . Equation (7.40) can be rewritten as

in which

which yields

W "U

Finally, a formula is needed to update the approximate solution tion (7.40) can be rewritten as

from

. Equa-

ra  i

u # u u  u  u u # u

u u # w u u  w u

( u X u ( X u u

Then the optimal value for

is given by

r0  i

u #

u #

u X

( X u u 

W u #  "W U "U u

u 

 X

u ! u

W "U u

u 

u DX

u #

u X

u #

u #

u # u #

W "U

Since

, this yields,

p  i

 I

5 C

  1B X X

 p ( X (  X X  X  X X  X 

 (t X  X  u X u (t T u T D X X  T u D X T u  T T  u S

 X

 X

5| j yq|  
u # u 


W "u U

u  X T u DX X X u (T  X ( T u p X ( T u D p X T u D(  X T u

u  u #

W "u U

u 

u T u T Tu T T u D T Tu T T u D T

 T X

u S

W "u U

u 

  " " u D u  u  X

u X

 |5
T u

After putting these relations together, we obtain the nal form of the BICGSTAB algorithm, due to van der Vorst [210].

Table 7.2 shows the results of applying the BICGSTAB algorithm with no preconditioning to three of the test problems described in Section 3.7. Matrix F2DA F3D ORS Iters 96 64 208 Kops 2048 6407 5222 Residual 0.14E-02 0.49E-03 0.22E+00 Error 0.77E-04 0.17E-03 0.68E-04

A test run of BICGSTAB with no preconditioning.

See Example 6.1 for the meaning of the column headers in the table. The number of matrix-by-vector multiplications required to converge is larger than with BCG. Thus, using the number of matrix-by-vector products as a criterion, BCG is more expensive than BICGSTAB in all three examples. For problem 3, the number of steps for BCG exceeds the limit of 300. If the number of steps is used as a criterion, then the two methods come very close for the second problem [61 steps for BCG versus 64 for BICGSTAB]. However, BCG is slightly faster for Problem 1. Observe also that the total number of operations favors BICGSTAB. This illustrates the main weakness of BCG as well as QMR, namely, the matrix-by-vector products with the transpose are essentially wasted unless a dual system with must be solved simultaneously.


The Transpose-Free QMR algorithm of Freund [95] is derived from the CGS algorithm. Observe that can be updated in two half-steps in line 6 of Algorithm 7.5, namely, and . This is only natural since the actual update from one iterate to the next involves two matrix-by-vector multiplications, i.e., the

3 I Q $ Q P6G PF '# F C8 0A A 3 I 5 5 3 5 E 3

u u  w i

X u

W U

  6 

A  7 

u u  w u u

1. Compute 2. . 3. For 4. 5. 6. 7. 8. 9. 10. 11. EndDo


3 3 3 3 3 3 3  3

arbitrary;

, until convergence Do:

W u # u u S w P "U  u  W"U u G S T  P S  i G W"U u S ug u # u g W"U u wg  w g u u u W# ( u u u u Xu( u u # X u u u  u  u  ( T   ( uy T u  X u X ( Y ( $v T T       ty  fmm~ A  ~h v

 "

5 B k| kB5  | q| 


3 3 1 2

A  7 

) 0 (&#$ ' %

i

degree of the residual polynomial is increased by two. In order to avoid indices that are multiples of , it is convenient when describing TFQMR to double all subscripts in the CGS algorithm. With this change of notation, the main steps of the Algorithm 7.5 (CGS) become

The initialization is identical with that of Algorithm 7.5. The update of the approximate solution in (7.46) can now be split into the following two half-steps:

With these denitions, the relations (7.517.52) are translated into the single equation

which is valid whether is even or odd. The intermediate iterates , with odd, which are now dened do not exist in the original CGS algorithm. For even values of the sequence represents the original sequence or iterates from the CGS algorithm. It is convenient to introduce the matrix,

Next, a relation similar to the relation (6.5) seen for FOM and GMRES will be ex-

r  i r  i

From the above equation, it is clear that the residual vectors by the relations

are related to the -vectors

r  i  i

 w

t 

The general iterate

satises the relation

X

(  V(

(  

and the

-dimensional vector

W W 

&

( (

 w

for

odd dene:

ra  i

W "U

u 

This can be simplied by dening the vectors for odd sequence of is dened for odd values of as

as

. Similarly, the . In summary,

r  i r  i r  i r  i r  i r  i r  i

r0  i pq  i

 I

5 C
u

S w u u X u uT u ( X u X u w u T X u w u T u T

  1B

W "U u u  w u u u  w u

W "U

5| j yq|  

 (

u 

U S w u U S w u  U  ( u u X  u T  w u u  u u

  ( u

U W U u u

  " " T

U U u u S Uu U u u u u 

 |5

Translated in matrix form, this relation becomes

where

The columns of can be rescaled, for example, to make each of them have a 2-norm equal to one, by multiplying to the right by a diagonal matrix. Let this diagonal matrix be the inverse of the matrix

Then,

With this, equation (7.56) becomes

By analogy with the GMRES algorithm, dene

Similarly, dene to be the matrix obtained from by deleting its last row. It is easy to verify that the CGS iterates (now dened for all integers ) satisfy the same denition as FOM, i.e.,

It is also possible to extract a GMRES-like solution from the relations (7.61) and (7.63), similar to DQGMRES. In order to minimize the residual norm over the Krylov subspace, the 2-norm of the right-hand side of (7.63) would have to be minimized, but this is not practical since the columns of are not orthonormal as in GMRES. However, the 2-norm of can be minimized over , as was done for the QMR and DQGMRES algorithms. This denes the TFQMR iterates theoretically. However, it is now necessary to nd a formula for expressing the iterates in a progressive way. There are two ways to proceed.

R`  i

aa  i a0  i

  i

y ( ( Y ( `

W "U

!

W t& X ( X t X $ 54 ( ( H

W U

 X W

 X

W "U

W "U

W"U W"U W W "U

W "U

"U W

"U W

W W  y  ( y  ( y 

54 H

W U

t 

U W

W "U

. . . . . .

..

..

. . .

a   i

 X

y y y Y

W "U

and where

is the

matrix, . . .

a  i

a  i

&

W H

W "U

( (  $ (

R  R  R y

W U

tracted using the matrix

. As a result of (7.57), the following relation holds:

 "

5 B k| kB5  | q| 


T

The rst follows DQGMRES closely, dening the least-squares solution progressively and exploiting the structure of the matrix to obtain a formula for from . Because of the special structure of , this is equivalent to using the DQGMRES algorithm with . The second way to proceed exploits Lemma 6.1 seen in the previous chapter. This lemma, which was shown for the FOM/GMRES pair, is also valid for the CGS/TFQMR pair. There is no fundamental difference between the two situations. Thus, the TFQMR iterates satisfy the relation

where the tildes are now used to denote the CGS iterate. Setting

The angle used in the

-th rotation, or equivalently

, can be obtained by examining the

The term is the squared tangent of the angle used in the rotation. This tangent will be denoted by , and we have

W "U

 W

( y w

Therefore,

X eX

W W  y w W X W W W T y  w W W W w W T y 

W T

From the above equations, a recurrence relation from of and the above relations yield

can be extracted. The denition

r   i

Now observe from (7.55) that the CGS iterates

satisfy the relation

r   i

 w

the above expression for

becomes

r   i

r   i

 I

5 C

T W

  1B W

5| j yq|  
y

  " "

$
y 

 X

 |5

The diagonal matrix in the right-hand side scales the columns of the matrix. It is easy to see that it has no effect on the determination of the rotations. Ignoring this scaling, the above matrix becomes, after rotations,

The next rotation is then determined by,

The above relations enable us to update the direction and the required quantities and . Since only the squares of these scalars are invoked in the update of the direction , a recurrence for their absolute values is sufcient. This gives the following recurrences which will be used in the algorithm:

Before writing down the algorithm, a few relations must be exploited. Since the vectors are no longer the actual residuals in the algorithm, we change the notation to . These residual vectors can be updated by the formula

a   i

W U

In addition, after this rotation is applied to the above matrix, the diagonal element which is in position is transformed into

W U

u X

A U u W uX

W "U

W "U

..

..

..

..

W  

 R

u uA

W "U

W "U

X X

W U

W U

u w uA X uA

u uA

i

W  W

W "U  W"U W"U A y W " U w  "U W

Ry

H4

..

..

 

W "U

W U

uX

A

X X

W "U

uX w uA "U W uX uA

U W

W "U

uX uA

. . . . . .

. . .

a   i

W"U W "U

W"U W"U A W "U

W "U

uX w uA U W uX

 w Y(

W "U

WX

w 

W U

W Y

uX

X

W "U

W"U

uA

matrix

: . . .

 "

5 B k| kB5  | q| 


$

which are needed in the CGS algorithm. Multiplying (7.50) by

which, upon substituting the relation

Notice that the quantities in the odd loop are only dened for even values of . The residual norm of the approximate solution is not available from the above algorithm as it is described. However, good estimates can be obtained using similar strategies to

11. 12. 13. 14. If is odd then 15. 16. 17. 18. EndIf 19. EndDo

; ;

W W W W "U W U S w W S w U W W"U X S w W W "U W T W U W"U    S  (  X WU W "U W"U ! T W"U W "U W"U "U w W W"U W"U  y U W W "U A W U A A   w    W"U i  W X  w  U T W"U   (    W"U  X ! y ( ( Q ( $ ! ( Y

 

1. Compute , 2. , . 3. Choose such that . 4. For until convergence Do: 5. If is even then 6. 7. 8. EndIf 9. 10.

 (

The rst equation should be used to compute when is odd. In the following algorithm, the normalization each column of to have 2-norm unity, is used.

is even, and the second when , which normalize

Also, observe that the recurrences in (7.45) and (7.49) for become

and

U u

u W u u S w u X

W U U u u S w u u u  u

 (

W "U

 

Y `

translates into

X u

u S w u

U W

u S w u

u S w u

5 58 HYF

Y   X V A T  

(  !  

 q

U W "U u u

D R CA 3  A  6 @8

    

1 2

The vectors

can be used to update the vectors

results in

 I
, respectively,

5 C

  1B

5| j yq|  

  " "

 |5
R

~Q v

A

' 7$ % #

9
!

those used for DQGMRES. Referring to GMRES, an interesting observation is that the recurrence (6.40) is identical with the recurrence of the scalars s. In addition, these two sequences start with the same values, for the s and for the s. Therefore,

Hence, a relation similar to that for DQGMRES holds, namely,

This provides a readily computable estimate of the residual norm. Another point that should be made is that it is possible to use the scalars , in the recurrence instead of the pair , as was done above. In this case, the proper recurrences are

Table 7.3 shows the results when TFQMR algorithm without preconditioning is applied to three of the test problems described in Section 3.7. Matrix F2DA F3D ORS Iters 112 78 252 Kops 2736 8772 7107 Residual 0.46E-04 0.52E-04 0.38E-01 Error 0.68E-04 0.61E-03 0.19E-03

A test run of TFQMR with no preconditioning.

See Example 6.1 for the meaning of the column headers in the table. The number of steps is slightly higher than that of BICGSTAB. Comparing with BCG, we note that each step of BCG requires two matrix-by-vector products compared with one for TFQMR and BICGSTAB. Thus, using the number of matrix-by-vector products as a criterion, BCG is more expensive than TFQMR in all cases, as is shown in the Iters columns. If the number of steps is used as a criterion, then BCG is just slightly better for Problems 1 and 2. A comparison is not possible for Problem 3, since the number of matrix-by-vector products required for convergence exceeds the limit of 300. In general, the number of steps required for convergence is similar for BICGSTAB and TFQMR. A comparison with the methods seen in the previous chapter indicates that in many cases, GMRES will be faster if the problem is well conditioned, resulting in a moderate number of steps required to converge. If many steps (say, in the hundreds) are required, then BICGSTAB and TFQMR may perform better. If memory is not an issue, GMRES or DQGMRES, with a large number of directions, is often the most reliable choice. The issue then is one of trading ribustness for

R`  i

X W 

W "U

W "U

w A  A X U W w A 

 

W "U

 

 X 42  5CI6

W "U

ty 

U W

U W

U W W "U A U W

A  7 

W "U

A  7 

Recall that

is the residual for the

least-squares problem

u A

$ X y

w ! U W

 "

5 B k| kB5  | q| 

memory usage. In general, a sound strategy is to focus on nding a good preconditioner rather than the best accelerator.

1 Consider the following modication of the Lanczos algorithm, Algorithm 7.1. We replace line 6 by

where the scalars are arbitrary. Lines 5 and 7 through 10 remain the same but line 4 in which is computed must be changed.

2 Assume that the Lanczos algorithm does not break down before step , i.e., that it is possible to generate . Show that and are both of full rank. 3 Develop a modied version of the non-Hermitian Lanczos algorithm that produces a sequence of vectors such that each is orthogonal to every with and for all . What does the projected problem become? 4 Develop a version of the non-Hermitian Lanczos algorithm that produces a sequence of vectors which satisfy

is orthogonal to the poly5 Using the notation of Section 7.1.2 prove that nomials , assuming that . Show that if is orthogonalized against , the result would be orthogonal to all polynomials of degree . Derive a general Look-Ahead non-Hermitian Lanczos procedure based on this observation. 6 Consider the matrices and obtained from the Lanczos biorthogonalization algorithm. (a) What are the matrix representations of the (oblique) projector onto orthogonal to the subspace , and the projector onto orthogonally to the subspace ? (b) Express a general condition for the existence of an oblique projector onto , orthogonal to . (c) How can this condition be interpreted using the Lanczos vectors and the Lanczos algorithm? 7 Show a three-term recurrence satised by the residual vectors of the BCG algorithm. Include the rst two iterates to start the recurrence. Similarly, establish a three-term recurrence for the conjugate direction vectors in BCG.

X q3

E 5

G # 9 w 3 # !

BBBB v  A A A HI

9  w 3

X q3

5 #G 5

C BACB0 v C  A A H I

C

q3

but such that the matrix in this situation?

is Hermitian tridiagonal. What does the projected problem become

2 2

2C2

Consider the simplest possible choice, namely, and potential difculties with this choice?

for all

5

 #

Prove that the vectors

s and the matrix

do not depend on the choice of the

. What are the advantages

Show how to modify line 4 to ensure that the vector , for .

is orthogonal against the vectors s.

 I

5 C

  1B

5| j yq|  
v #

v X f  # i  #

9 #

 G3 C

  " "

 A A A 5 CB0B 

AAA C 0BB v C

t ! B0BB !  v !  A A A ! BBB0 # !  v !  A A A

 |5
C
9 v  C q3

vcE~lf

C

  # 

  `

8 Let and be the residual polynomial and the conjugate direction polynomial, respectively, for the BCG algorithm, as dened in Section 7.4.1. Let be any other polynomial sequence which is dened from the recurrence

Show the following relations

Dening,

show how the recurrence relations of the previous question translate for these vectors.

Proceeding as in BICGSTAB, nd formulas for generating the BCG coefcients from the vectors dened in the previous question.

and

9 Prove the expression (7.64) for the CGS approximation dened by (7.547.55). Is the relation ? valid for any choice of scaling

11 The purpose of this exercise is to develop block variants of the Lanczos algorithm. Consider a two-sided analogue of the Block-Arnoldi algorithm, in its variant of Algorithm 6.23. Formally, the general step that denes the biorthogonalization process, for , is as follows: 1. Orthogonalize versus (by subtracting a linear combination from ). Call the resulting vector. of 2. Orthogonalize versus (by subtracting a linear combination of from ). Call the resulting vector. 3. Normalize the two vectors and so that to get and . Here, is the block size and it is assumed that the initial blocks are biorthogonal: for .

Show that needs only to be orthogonalized against the previous s instead of all of them. Similarly, must be orthogonalized only against the previous s. Write down the algorithm completely. Show the orthogonality relations satised by the vectors and . Show also relations similar to (7.3) and (7.4). We now assume that the two sets of vectors and have different block sizes. Call block-size for the s. Line 2 of the above formal algorithm is changed into:

the

 

2a. Orthogonalize

versus

). Call

the resulting vector.

 9 #

5 

 G3 C

! !   9  # q3

!%

! A 5

 9

 A A A C BCB0 C  v C

 G3 C

 A A A C CBBB C  v C

0CBB  v  A A A

"

"

10 Prove that the vectors and when , while the vectors

produced by the BCG algorithm are orthogonal to each other and are -orthogonal, i.e., for .

# 

Find a formula that allows one to update the approximation and dened above.

from the vectors

v  t #

D # #

## 

! eq3 v 9

Show that the polynomials

are consistent, i.e.,

for all

v t  # D # 9 # E 3 #  E # D #  v D t   v # #  # #

99 # Qeq3 #

9 Q 3

9 w 3 #

#  # 

v Tt

v Tt 9fq3 # D 9fq3 # vt Teq3 # 3 9

  9 3 # # Tw 3 # w  E 3 9Q 3 v  9 9  w # 3  w#9  3 v     9Q 3 # 9 

# 9 w 3

9 #

 #  #

v t

A D # # # #   v t  # D # # # 3 #

 ! fq3 D 9 # 9eq3 #  ! 9eq3 v fq3 # 9 #

#D# Dv v t # 3 # v# ##

v E v  #  # # v v v  # # # v  # # 9 #   #  # #  v v  # # #

X v

"

C # v

"

! !

"

BBBB v  A A A # X  A A A BB0B v # C

|5 | y  w  |y|


"
 #  # # 9 w 3 # D t

 #

C 1

! 

5 

w 3 # 9 #

5 !5  

 

 `      ` 

and the rest remains unchanged. The initial vectors are again biorthogonal: for and . Show that now needs only to be orthogonalized against the previous s instead of all of them. Show a simlar result for the s. Show how a block version of BCG and QMR can be developed based on the algorithm resulting from question (c).

N OTES AND R EFERENCES . At the time of this writing there is still much activity devoted to the class of methods covered in this chapter. Two of the starting points in this direction are the papers by Sonneveld [201] and Freund and Nachtigal [97]. The more recent BICGSTAB [210] has been developed to cure some of the numerical problems that plague CGS. There have been a few recent additions and variations to the basic BCG, BICGSTAB, and TFQMR techniques; see [42, 47, 113, 114, 192], among many others. A number of variations have been developed to cope with the breakdown of the underlying Lanczos or BCG algorithm; see, for example, [41, 20, 96, 192, 231]. Finally, block methods have also been developed [5]. Many of the Lanczos-type algorithms developed for solving linear systems are rooted in the theory of orthogonal polynomials and Pad approximation. Lanczos himself certainly used this viewe point when he wrote his breakthrough papers [140, 142] in the early 1950s. The monogram by Brezinski [38] gives an excellent coverage of the intimate relations between approximation theory and the Lanczos-type algorithms. Freund [94] establishes these relations for quasi-minimal residual methods. A few optimality properties for the class of methods presented in this chapter can be proved using a variable metric, i.e., an inner product which is different at each step [21]. A recent survey by Weiss [224] presents a framework for Krylov subspace methods explaining some of these optimality properties and the interrelationships between Krylov subspace methods. Several authors discuss a class of techniques known as residual smoothing; see for example [191, 234, 224, 40]. These techniques can be applied to any iterative sequence to build a new sequence of iterates by combining with the difference . A remarkable result shown by Zhou and Walker [234] is that the iterates of the QMR algorithm can be obtained from those of the BCG as a particular case of residual smoothing. A number of projection-type methods on Krylov subspaces, other than those seen in this chapter and the previous one are described in [1]. The group of rank- update methods discussed by Eirola and Nevanlinna [79] and Deufhard et al. [70] is closely related to Krylov subspace methods. In fact, GMRES can be viewed as a particular example of these methods. Also of interest and not covered in this book are the vector extrapolation techniques which are discussed, for example, in the books Brezinski [38], Brezinski and Radivo Zaglia [39] and the articles [199] and [126]. Connections between these methods and Krylov subspace methods, have been uncovered, and are discussed by Brezinski [38] and Sidi [195].

 9 #

 G3 C

 I

5 C

  1B

5| j yq|  

"

v t

  " "
G

 |5
t v

5 

! E G

 x !

which is Symmetric Positive Denite. This system is known as the system of the normal equations associated with the least-squares problem,

Note that (8.1) is typically used to solve the least-squares problem (8.2) for overdetermined systems, i.e., when is a rectangular matrix of size , . A similar well known alternative sets and solves the following equation for :

pp 3 U i f

 x u x

WBUB

!U
when

uu Ef@%#m eI YFQd buWV csqB7bA ioSRuB9PVUfl`AqF DEAPy fY u ca RrSkRy fYfahACWpDBf9Y uEACGFDa Giiu9YUBp qcYuEFDA F CVI A Y Y Y C c s c CV clCA cA 9 c9 { elV A S A C wY UteDBBfV}DAqiBBe G@ffeB7d GRDbAfADBfsuRQ|DV RryrRBfQuBf9|VmflhAoCDAXEAGs c CAp a C YsF9a c9 8 AQ cI9a Y aF eT e p C F eQ A Y Y C C e Y Da GrI@uvV qcuba c RRiDBpEAGEADBf9WlbAbRBIpbuIvV qcuBQY scvVUrIUAYuB7d c c 9 c9 { c eI YF ssF IA A CF CA Yj Al e YF e A S e c FQ swRDGwfywDBtVGRRHuWV scqB7bRRSoiuUBf9YvxjDBspiuk~rURBf9bBWV csY WBvfVa AlF AT F S 9aF CssF eI YFQdAF CVI A CA A {V IF Y lAI clI AEeR}|DBe z ny cCuqDA usxifVwBf9A`uRFbabAGTmba qcYBhFDvrIubRl cRVupuUe c CV { 9aQ S c YF S YI ca A a A Y eQ A a Cs c lA F 9aF CssF c Y jI X eI YFQ A F CVI A a j e d S DDGtVGRRPe Gf9piDAhYHrbuvV qcuB7fdpiSoRunBf9YmflA RFbkDhigfDAYeuye CFA YI F cQdA A Y A e AQ cI9 Y 9a e A A A S S e F Y c S GbBrIc qDA RBp G7bB9@feBp vVB7d GRDfaA`DBfQBI7GBIvVa cCYURyVqrID`AY wuyiGbBrIc a cCYRyxeqvuPPurIscrYDBqWfV7ihVgfeB7d cGRDabA`7DWDGURRPHEAG@EADB@8 e eCFA A S S wIVI F t CApI aC X AQ I9 YXVCAT SQI F CF CA9

 ) 6  $ 4 2 " '7#53 $ 10)   ( ' &#%#!     $ "    

In order to solve the linear system equivalent system

minimize

is nonsymmetric, we can solve the

Once the solution is computed, the original unknown could be obtained by multiplying by . However, most of the algorithms we will see do not invoke the variable explicitly and work with the original variable instead. The above system of equations can be used to solve under-determined systems, i.e., those systems involving rectangular matrices of size , with . It is related to (8.1) in the following way. Assume that and that has full rank. Let be any solution to the underdetermined system . Then (8.3) represents the normal equations for the least-squares problem, minimize

Since by denition , then (8.4) will nd the solution vector that is closest to in the 2-norm sense. What is interesting is that when there are innitely many solutions to the system , but the minimizer of (8.4) does not depend on the particular used. The system (8.1) and methods derived from it are often labeled with NR (N for Normal and R for Residual) while (8.3) and related techniques are labeled with NE (N for Normal and E for Error). If is square and nonsingular, the coefcient matrices of these systems are both Symmetric Positive Denite, and the simpler methods for symmetric problems, such as the Conjugate Gradient algorithm, can be applied. Thus, CGNE denotes the Conjugate Gradient method applied to the system (8.3) and CGNR the Conjugate Gradient method applied to (8.1). There are several alternative ways to formulate symmetric linear systems having the same solution as the original system. For instance, the symmetric linear system

with , arises from the standard necessary conditions satised by the solution of the constrained optimization problem,

subject to

The solution to (8.5) is the vector of Lagrange multipliers for the above problem. Another equivalent symmetric system is of the form

The eigenvalues of the coefcient matrix for this system are , where is an arbitrary singular value of . Indenite systems of this sort are not easier to solve than the original nonsymmetric system in general. Although not obvious immediately, this approach is similar in nature to the approach (8.1) and the corresponding Conjugate Gradient iterations applied to them should behave similarly. A general consensus is that solving the normal equations can be an inefcient approach in the case when is poorly conditioned. Indeed, the 2-norm condition number of is given by

Now observe that

where

is the largest singular value of

S 2R

a pi b` e&h R aDx` e& h g m u pi R f da ` a ` Ruveb xcBm u&b 0

S R DQ

A 3A

E C

GH

minimize

GF x

GP x GI x

6 x 4 3

 $ &

p20("'&}%#%"p! W 81 )  $ ~ 8 8  8   ~ 8 3 q 5 W C
A E

A 7 A CD7 B@ 97 8

@ @

5 '

X W U Y0VT

  

8 ~

fH

The 2-norm condition number for is exactly the square of the condition number of , which could cause difculties. For example, if originally , then an iterative method may be able to perform reasonably well. However, a condition number of can be much more difcult to handle by a standard iterative method. That is because any progress made in one step of the iterative procedure may be annihilated by the noise due to numerical errors. On the other hand, if the original matrix has a good 2-norm condition number, then the normal equation approach should not cause any serious difculties. In the extreme case when is unitary, i.e., when , then the normal equations are clearly the best approach (the Conjugate Gradient method will converge in zero step!).

When implementing a basic relaxation scheme, such as Jacobi or SOR, to solve the linear system

or

it is possible to exploit the fact that the matrices or need not be formed explicitly. As will be seen, only a row or a column of at a time is needed at a given relaxation step. These methods are known as row projection methods since they are indeed projection methods on rows of or . Block row projection methods can also be dened similarly.

It was stated above that in order to use relaxation schemes on the normal equations, only access to one column of at a time is needed for (8.9) and one row at a time for (8.10). This is now explained for (8.10) rst. Starting from an approximation to the solution of (8.10), a basic relaxation-based iterative procedure modies its components in a certain order using a succession of relaxation steps of the simple form

where is the -th column of the identity matrix. The scalar is chosen so that the -th component of the residual vector for (8.10) becomes zero. Therefore,

' x

) 0 x

uD x

b x

E

B W UP Y 9 @ i H T 9 f e U W H a Y W U T H RP H B E B B @ 9 3rbqphSg3Xdcb`XVQS5QIGFDCA8

a vb`

a ub`
S

X W U 2cVT

X W U Y0VT

which, incidentally, is also equal to the 2-norm of the inverse yields

. Thus, using a similar argument for

U
E 8

% y w a S a S S x ` `

 

i'd u nuzb x0 f a `
S S
% &

% (3 U U

y w x

|#wu!fE!  " 

uvQ t s

  

}W ~8

 & $

X W U f a Y0VT 'd b

8 r r 
62 42 73531

$

 

f

which, setting

, yields,

Denote by

the -th component of . Then a basic relaxation step consists of taking

Also, (8.11) can be rewritten in terms of -variables as follows:

The auxiliary variable has now been removed from the scene and is replaced by the original variable . Consider the implementation of a forward Gauss-Seidel sweep based on (8.15) and (8.13) for a general sparse matrix. The evaluation of from (8.13) requires the inner product of the current approximation with , the -th row of . This inner product is inexpensive to compute because is usually sparse. If an acceleration parameter is used, we only need to change into . Therefore, a forward SOR sweep would be as follows.

1. Choose an initial . 2. For Do: 3. 4. 5. EndDo

Note that is a vector equal to the transpose of the -th row of . All that is needed is the row data structure for to implement the above algorithm. Denoting by the number of nonzero elements in the -th row of , then each step of the above sweep requires operations in line 3, and another operations in line 4, bringing the total to . The total for a whole sweep becomes operations, where represents the total number of nonzero elements of . Twice as many operations are required for the Symmetric Gauss-Seidel or the SSOR iteration. Storage consists of the right-hand side, the vector , and possibly an additional vector to store the 2-norms of the rows of . A better alternative would be to rescale each row by its 2-norm at the start. Similarly, a Gauss-Seidel sweep for (8.9) would consist of a succession of steps of the form

Again, the scalar is to be selected so that the -th component of the residual vector for (8.9) becomes zero, which yields

Gq x 6 x I Ge x F Ge x P Ge x

 $ &

i qS t xS y w  uvs a S S u iS % i x` S y S a SS % ` u S y C 3 C p20("'&}%#%"p! W 8 ~ 81 )  $ ~ 8 8  8   ~ 8  


P )
S P 0)

E a S % a S S y w x` x`

H w

S S w y

2 01# 0

P) Q S P)

@u()n

S 

ut Gs

& % #!      '! $" 5wEn

t EFp t %

S 6 B S @ yw  CDI6HH G )A@9dG 67453S B 8 s DBB% H % G y

H w S H w

p
) P ) HQ P

Then the following algorithm is obtained.

1. Choose an initial , compute 2. For Do: 3. 4. 5. 6. EndDo

In contrast with Algorithm 8.1, the column data structure of is now needed for the implementation instead of its row data structure. Here, the right-hand side can be overwritten by the residual vector , so the storage requirement is essentially the same as in the previous case. In the NE version, the scalar is just the -th component of the current residual vector . As a result, stopping criteria can be built for both algorithms based on either the residual vector or the variation in the solution. Note that the matrices and can be dense or generally much less sparse than , yet the cost of the above implementations depends only on the nonzero structure of . This is a signicant advantage of relaxation-type preconditioners over incomplete factorization preconditioners when using Conjugate Gradient methods to solve the normal equations. One question remains concerning the acceleration of the above relaxation schemes by under- or over-relaxation. If the usual acceleration parameter is introduced, then we only have to multiply the scalars in the previous algorithms by . One serious difculty here is to determine the optimal relaxation factor. If nothing in particular is known about the matrix , then the method will converge for any lying strictly between and , as was seen in Chapter 4, because the matrix is positive denite. Moreover, another unanswered question is how convergence can be affected by various reorderings of the rows. For general sparse matrices, the answer is not known.

In a Jacobi iteration for the system (8.9), the components of the new iterate satisfy the following condition:

This yields

or

x ' 0 x
E E

a S % S S y

E a S a S 7S y w o`( x` %

R U a H XbY hf B W P hf U f P

With

, this becomes

which yields

E C

200 #
a S % `

a S S f% u ` S y a S % a S C S y C ` `  C @u( o  S
E S
y

a S f% a S S w o( ` ` y

  

}W ~8
534 31 42 2

& % !    I!# "twh n

 & $

S S y C  C S 7S 6 y @ w  I6HH G @ C G8 iS s% E DBt Dt % H % G H y

8 r r 
C

U
C

Here, be aware that these equations do not result in the same approximation as that produced by Algorithm 8.2, even though the modications are given by the same formula. Indeed, the vector is not updated after each step and therefore the scalars are different for the two algorithms. This algorithm is usually described with an acceleration parameter , i.e., all s are multiplied uniformly by a certain . If denotes the vector with coordinates , the following algorithm results.

1. Choose initial guess . Set 2. Until convergence Do: 3. For Do: 4. 5. EndDo 6. where 7. 8. EndDo

Notice that all the coordinates will use the same residual vector to compute the updates . When , each instance of the above formulas is mathematically equivalent to performing a projection step for solving with , and . It is also mathematically equivalent to performing an orthogonal projection step for solving with . It is interesting to note that when each column is normalized by its 2-norm, i.e., if , then . In this situation,

and the main loop of the algorithm takes the vector form

Each iteration is therefore equivalent to a step of the form

which is nothing but the Richardson iteration applied to the normal equations (8.1). In particular, as was seen in 4.1, convergence is guaranteed for any which satises,

GG x

30

( )S

& 'W

% #! $"

aS %

7 56 w vus t g C  C w  C a ` C m x` a S % C ` S y

C c S

 f C   S m S S   C y w s 6 @ HH 6 @ C G8 S E IG t stD BD% G y %  U3 %    C R(    wEn     BDB% G % S y %

in which is the old residual is given by

. As a result, the -component of the new iterate

qW x G) x

ut Gs

 $ &

p20("'&}%#%"p! W 81 )  $ ~ 8 8  8   ~ 8 a S f S f% u ` S Cy %S S S C S y w uvts U%@


pi e&h 8

( 4S

& 3W

% #! $2

sDBB% G H% G Uu S u % z

   G

8 ~
S
y

where is the largest eigenvalue of is given by

. In addition, the best acceleration parameter

in which, similarly, is the smallest eigenvalue of . If the columns are not normalized by their 2-norms, then the procedure is equivalent to a preconditioned Richardson iteration with diagonal preconditioning. The theory regarding convergence is similar but involves the preconditioned matrix or, equivalently, the matrix obtained from by normalizing its columns. The algorithm can be expressed in terms of projectors. Observe that the new residual satises

There are two important variations to the above scheme. First, because the point Jacobi iteration can be very slow, it may be preferable to work with sets of vectors instead. Let be a partition of the set and, for each , let be the matrix obtained by extracting the columns of the identity matrix whose indices belong to . Going back to the projection framework, dene . If an orthogonal projection method is used onto to solve (8.1), then the new iterate is given by

An interpretation of this indicates that each individual substep attempts to reduce the residual as much as possible by taking linear combinations from specic columns of . Similar to the scalar iteration, we also have

where now represents an orthogonal projector onto the span of . Note that is a partition of the column-set and this partition can be arbitrary. Another remark is that the original Cimmino method was formulated

S 

C )00)0)C f S ( S S

&

S

Wu S

 S

( & '%

Each individual block-component

can be obtained by solving a least-squares problem

F x

P x I x

 "

 !

C S qea S S x` C f d

 

! S #

 S

( %

BDB%

 S

S !f'd a S $ ! S

S S !

 

&

uvs C t

ut Gs C

S c

is an orthogonal projector onto

, the -th column of . Hence, we can write

6 R! x

 a S f S % u ` 

! `

Each of the operators

D x

S a S S % u `

pei&h w S 8 s h 8 H

 S

  

}W ~8
uvs C t

 & $

f% DBBUf% f %

S s h 8

8 r r 
 !
ut Gs

 %

e&h 8 pi S

BDB%  % f 

for rows instead of columns, i.e., it was based on (8.1) instead of (8.3). The alternative algorithm based on columns rather than rows is easy to derive.

A popular combination to solve nonsymmetric linear systems applies the Conjugate Gradient algorithm to solve either (8.1) or (8.3). As is shown next, the resulting algorithms can be rearranged because of the particular nature of the coefcient matrices.

We begin with the Conjugate Gradient algorithm applied to (8.1). Applying CG directly to the system and denoting by the residual vector at step (instead of ) results in the following sequence of operations:

If the original residual must be available at every step, we may compute the residual in two parts: and then which is the residual for the normal equations (8.1). It is also convenient to introduce the vector . With these denitions, the algorithm can be cast in the following form.

In Chapter 6, the approximation produced at the -th step of the Conjugate Gradient algorithm was shown to minimize the energy norm of the error over an afne Krylov

1. Compute 2. For 3. 4. 5. 6. 7. 8. 9. 10. EndDo

, , , until convergence Do:

 $ & f

p20("'&}%#%"p! W 81 )  $ ~ 8 8  8   ~ 8
S C S

v " "  fEHw5x ovVmuU XfP

S

a   %   a  P %  P `   ` 

    

ecAC W 8



C

6 331 2 2

f f S S S P S  w f P S sS P pf  S  S#f 2P  f S S S  C S S f S "S  2  2C  S C S w z S  r fS S  P S S  BDB% E    P  C    n ! " wEn

  C S S C f f    w f  P5      a P`  f  P %  $a  P %  P ` f      P        f  P       a  %   x` a w P %  $P`     

   f S

8 ~

S 

S 

over all vectors

in the afne Krylov subspace

in which is the initial residual with respect to the original equations , is the residual with respect to the normal equations . However, and observe that

Therefore, CGNR produces the approximate solution in the above subspace which has the smallest residual norm with respect to the original linear system . The difference with the GMRES algorithm seen in Chapter 6, is the subspace in which the residual norm is minimized. Table 8.1 shows the results of applying the CGNR algorithm with no preconditioning to three of the test problems described in Section 3.7. Matrix F2DA F3D ORS Iters 300 300 300 Kops 4847 23704 5981 Residual 0.23E+02 0.42E+00 0.30E+02 Error 0.62E+00 0.15E+00 0.60E-02

A test run of CGNR with no preconditioning.

See Example 6.1 for the meaning of the column headers in the table. The method failed to converge in less than 300 steps for all three problems. Failures of this type, characterized by very slow convergence, are rather common for CGNE and CGNR applied to problems arising from partial differential equations. Preconditioning should improve performance somewhat but, as will be seen in Chapter 10, normal equations are also difcult to precondition.

A similar reorganization of the CG algorithm is possible for the system (8.3) as well. Applying the CG algorithm directly to (8.3) and denoting by the conjugate directions, the actual CG iteration for the variable would be as follows:

a 

f f a  % a  %  `  `    C C C C f      C  f  C    w     ` ` %  a  C %  C ` a  !%  xa  C %  C `   

H W 8 cA

5331 42 2

subspace. In this case,

minimizes the function

HBza a Ux` a o` ` a o` 5 % 5  U  }3  C 3 C % (  f'd b % a % ` DBD%  C U %  C %$#! w  a  C f ` h w  h C &3W a a 5 o` % a 5 o`( x` a x`


 5    

 $ 

p &0 8 1)

'& $  4  " 1 Q$  $ $ 

8 p

   5 

Notice that an iteration can be written with the original variable by introducing the vector . Then, the residual vectors for the vectors and are the same. No longer are the vectors needed because the s can be obtained as . The resulting algorithm described below, the Conjugate Gradient for the normal equations (CGNE), is also known as Craigs method.

1. Compute 2. For 3. 4. 5. 6. 7. 8. EndDo

, . until convergence Do:

We now explore the optimality properties of this algorithm, as was done for CGNR. The approximation related to the variable by is the actual -th CG approximation for the linear system (8.3). Therefore, it minimizes the energy norm of the error on the Krylov subspace . In this case, minimizes the function

over all vectors

in the afne Krylov subspace,

Notice that

. Also, observe that

where

. Therefore, CGNE produces the approximate solution in the subspace

which has the smallest 2-norm of the error. In addition, note that the subspace is identical with the subspace found for CGNR. Therefore, the two methods nd approximations from the same subspace which achieve different optimality properties: minimal residual for CGNR and minimal error for CGNE.

a 

S S `

 3

 $ &

p20("'&}%#%"p! W 81 )  $ ~ 8 8  8   ~ 8 f 7(  C d
w

 

S
S  h

a  % x` C a   ` h w a  C % x` h w  %  C % 5 qza a 5 ` f% a P5 ` x` a `  U  U  a Ux` % DBB%  C %  C %$#! w  a  C % U` h C w  h & W a a 5 ` a 5 ` U` % a `

i h

f f  S  S S S w  a S % S a f S C % f S ` S `  C C 2 S  C  C f S  S C f S S "S  2 2C  S S w a S GS  a S % G S ` S  % `   % DBBC % % C      C E C &90 %D! m ! " wEn    f f  S  

1
h

S 

 

 

   h
w

8 ~
  C
w

  C

 

Now consider the equivalent system

with . This system can be derived from the necessary conditions applied to the constrained least-squares problem (8.68.7). Thus, the 2-norm of is minimized implicitly under the constraint . Note that does not have to be a square matrix. This can be extended into a more general constrained quadratic optimization problem as follows:

The necessary conditions for optimality yield the linear system

in which the names of the variables are changed into for notational convenience. It is assumed that the column dimension of does not exceed its row dimension. The Lagrangian for the above optimization problem is

and the solution of (8.30) is the saddle point of the above Lagrangian. Optimization problems of the form (8.288.29) and the corresponding linear systems (8.30) are important and arise in many applications. Because they are intimately related to the normal equations, we discuss them briey here. In the context of uid dynamics, a well known iteration technique for solving the linear system (8.30) is Uzawas method, which resembles a relaxed block SOR iteration.

The algorithm requires the solution of the linear system

at each iteration. By substituting the result of line 3 into line 4, the

iterates can be

ui x

 

G % %  %  E

1. Choose 2. For 3. 4. 5. EndDo

until convergence Do:

) x

aa

v `

% `

7 A 

a % `

% b o a s` GH a % o`

&  0 

A @

s% C

a  f   `  w  f  f a  ` d f   BDB  

subject to

a ` %

GH

a `

minimize

' x x

p 
C

f o @a % Ux`

A E 7 A C

R o

B A @ 8

u iw o%&)v" "V (


E

 Y0' 

%     # %  "wh n

 r 8 $ p wRc"0
  

which is nothing but a Richardson iteration for solving the linear system

Apart from a sign, this system is the reduced system resulting from eliminating the variable from (8.30). Convergence results can be derived from the analysis of the Richardson iteration.

The proof of this result is straightforward and is based on the results seen in Example 4.1. It is interesting to observe that when and is Symmetric Positive Denite, then the system (8.32) can be regarded as the normal equations for minimizing the -norm of . Indeed, the optimality conditions are equivalent to the orthogonality conditions

which translate into the linear system . As a consequence, the problem will tend to be easier to solve if the columns of are almost orthogonal with respect to the inner product. This is true when solving the Stokes problem where represents the discretization of the gradient operator while discretizes the divergence operator, and is the discretization of a Laplacian. In this case, if it were not for the boundary conditions, the matrix would be the identity. This feature can be exploited in developing preconditioners for solving problems of the form (8.30). Another particular case is when is the identity matrix and . Then, the linear system (8.32) becomes the system of the normal equations for minimizing the 2-norm of . These relations provide insight in understanding that the block form (8.30) is actually a form of normal equations for solving in the least-squares sense. However, a different inner product is used. In Uzawas method, a linear system at each step must be solved, namely, the system (8.31). Solving this system is equivalent to nding the minimum of the quadratic function

Apart from constants, is the Lagrangian evaluated at the previous iterate. The solution of (8.31), or the equivalent optimization problem (8.34), is expensive. A common alternative replaces the -variable update (8.31) by taking one step in the gradient direction

GH

` a

minimize

6 x

f d

qa 

f% x` a % `

f d

% 

a pi ` e&h 8

 

In addition, the optimal convergence parameter

is given by

GG x

Let full rank. Then converges, if and only if

be a Symmetric Positive Denite matrix and a matrix of is also Symmetric Positive Denite and Uzawas algorithm

GG x

a 

R f'd  ` d 5 f

a pi ` e&h 8

eliminated to obtain the following relation for the

s,

 $ &

p20("'&}%#%"p! W 81 )  $ ~ 8 8  8   ~ 8

H
w


a "` S h s 8

 

f d

a 

% `

f d

o 

f 'd

f d 5 w 

a `

x 

  

8 ~

f d

  

for the quadratic function (8.34), usually with xed step-length . The gradient of at the current iterate is . This results in the Arrow-Hurwicz Algorithm.

1. Select an initial guess to the system (8.30) 2. For until convergence Do: 3. Compute 4. Compute 5. EndDo

The above algorithm is a block-iteration of the form

Uzawas method, and many similar techniques for solving (8.30), are based on solving the reduced system (8.32). An important observation here is that the Schur complement matrix need not be formed explicitly. This can be useful if this reduced system is to be solved by an iterative method. The matrix is typically factored by a Cholesky-type factorization. The linear systems with the coefcient matrix can also be solved by a preconditioned Conjugate Gradient method. Of course these systems must then be solved accurately. Sometimes it is useful to regularize the least-squares problem (8.28) by solving the following problem in its place:

in which is a scalar parameter. For example, can be the identity matrix or the matrix . The matrix resulting from the Lagrange multipliers approach then becomes

The new Schur complement matrix is

In the case where

However, it is also negative denite for

piG % b ` e&h 8 a

a b` Gs S

h 8



Assuming that

is SPD,

is also positive denite when

u!d ` af 8 g f v 'd 

, the above matrix takes the form

subject to

a %

a !% `

GH

` a

minimize

a `

o 
A

 

 ` w

  

a  f   `   f  a  `  w  f      G w  % BDB% %  E  %     !" %  # ! ( # '! !   wh n  0     A

f x a % `

b@

8 7

A f

 ` 
f    

 Y0' 

 r 8 $ p wRc"0
A 8 @

f d

t 

  

a condition which may be easier to satisfy on practice.

1 Derive the linear system (8.5) by expressing the standard necessary conditions for the problem (8.68.7).

What does this become in the general situation when

Is Cimminos method still equivalent to a Richardson iteration? Show convergence results similar to those of the scaled case.

3 In Section 8.2.2, Cimminos algorithm was derived based on the Normal Residual formulation, i.e., on (8.1). Derive an NE formulation, i.e., an algorithm based on Jacobis method for (8.3). 4 What are the eigenvalues of the matrix (8.5)? Derive a system whose coefcient matrix has the form

and which is also equivalent to the original system Plot the spectral norm of as a function of .

. What are the eigenvalues of

the system (8.32) is nothing but the normal 5 It was argued in Section 8.4 that when equations for minimizing the -norm of the residual . Write the associated CGNR approach for solving this problem. Find a variant that requires at each CG step [Hint: Write the CG only one linear system solution with the matrix algorithm for the associated normal equations and see how the resulting procedure can be reorganized to save operations]. Find also a variant that is suitable for the case where the Cholesky factorization of is available. Derive a method for solving the equivalent system (8.30) for the case when and then for the general case wjen . How does this technique compare with Uzawas method?

Show that the unknown

in which the coefcient matrix is singular but the system is consistent, i.e., there is a nontrivial solution because the right-hand side is in the range of the matrix (see Chapter 1).

What must be done toadapt the Conjugate Gradient Algorithm for solving the above linear system (which is symmetric, but not positive denite)? In which subspace are the iterates generated from the CG algorithm applied to (8.35)?

GF x

  b

h d b e

Show that of ?

is a projector. Is it an orthogonal projector? What are the range and null spaces can be found by solving the linear system

 $P R

9)4

6 Consider the linear system (8.30) in which

and

is of full rank. Dene the matrix

9 75 @8F4

H X

( )  0

V W4

T U

h e

h d e

A E

7 D 7 B7 C A

T 4cF4 G C 5 H a

2 It was stated in Section 8.2.2 that when Algorithm 8.3 is equal to .

for

, the vector

 $ &
dened in ?

p20("'&}%#%"p! W 81 )  $ ~ 8 8  8   ~ 8
  

 

e 9@864 75

H I

( YG H `

R P SQ

# $

9 75 @864

"

  

8 ~
b

twS
b &32 &1 & '% & '% & '% &1 &1 &32

Assume that the QR factorization of the matrix is computed. Write an algorithm based on the approach of the previous questions for solving the linear system (8.30).

7 Show that Uzawas iteration can be formulated as a xed-point iteration associated with the splitting with

Derive the convergence result of Corollary 8.1 . 8 Show that each new vector iterate in Cimminos method is such that

where

is dened by (8.24).

9 In Uzawas method a linear system with the matrix must be solved at each step. Assume that these systems are solved inaccurately by an iterative process. For each linear system the iterative process is applied until the norm of the residual is less than a . certain threshold

Give an explicit upper bound of the error on , where .

in the case when with

is chosen of the form . Let be the , where is an

N OTES AND R EFERENCES . Methods based on the normal equations have been among the rst to be used for solving nonsymmetric linear systems [130, 58] by iterative methods. The work by Bjork and Elng [27], and Sameh et al. [131, 37, 36] revived these techniques by showing that they have some advantages from the implementation point of view, and that they can offer good performance for a broad class of problems. In addition, they are also attractive for parallel computers. In [174], a few preconditioning ideas for normal equations were described and these will be covered in Chapter 10. It would be helpful to be able to determine whether or not it is preferable to use the normal equations approach rather than the direct equations for a given system, but this may require an eigenvalue/singular value analysis. It is sometimes argued that the normal equations approach is always better, because it has a robust quality which outweighs the additional cost due to the slowness of the method in the generic elliptic case. Unfortunately, this is not true. Although variants of the Kaczmarz and Cimmino algorithms deserve a place in any robust iterative solution package, they cannot be viewed as a panacea. In most realistic examples arising from Partial Differential Equations, the normal equations route gives rise to much slower convergence than the Krylov subspace approach for the direct equations. For ill-conditioned problems, these methods will simply fail to converge, unless a good preconditioner is available.

d T Qh  e

10 Assume minimizer and arbitrary scalar?

is to be minimized, in which is . What is the minimizer of

71 0

"

Assume that is chosen so that (8.33) is satised and that converges to zero as innity. Show that the resulting algorithm converges to the solution.

tends to

d
R
  & ' )(

d T

T # 9 7

9 W4 V

CT

 2 & $ 5 %

 8 b #

h e

E 7

e z

R  SP

  #

"

 

CE

d eud

"

 c"

 !

8 }$
7 " #

d T Qh

 (c2 " $  

& & % &1

Lack of robustness is a widely recognized weakness of iterative solvers, relative to direct solvers. This drawback hampers the acceptance of iterative methods in industrial applications despite their intrinsic appeal for very large linear systems. Both the efciency and robustness of iterative techniques can be improved by using preconditioning. A term introduced in Chapter 4, preconditioning is simply a means of transforming the original linear system into one which has the same solution, but which is likely to be easier to solve with an iterative solver. In general, the reliability of iterative techniques, when dealing with various applications, depends much more on the quality of the preconditioner than on the particular Krylov subspace accelerators used. We will cover some of these preconditioners in detail in the next chapter. This chapter discusses the preconditioned versions of the Krylov subspace algorithms already seen, using a generic preconditioner.

uv " EX#o xD`AqRBDHusyBBf9PIc lbEADBBfVG7 ifeB7d GIRDbAUtuIGvV qcY C YsF9a Y AI A Y CAp a AT c { AQ c 9a Y cI w WBvVbEAGuElGRBR`FYUBrbAuUetDBWV scY WBWfVabEACGiGF RDa scrYGBs B9qWGTRv qbse clI a Cs CFlI e A9 8 l eQ CAI clI s C Q CF A Y YQV F a caA t cAT YQV c { YQT jIA e ylF F elV A S A Y C c A Y XV eI CAp lAI c urIqPqWuf9Y @@qRkiDbAGBbEAC iiuf9sYBp qcuEFDAY Bf9}DuvV csteDBbBWV csY Wl wI a Cs A Y eQ cl C YsF9a 8 eI Y ssF A Y c elV A aF qvfVbEADB9fe`AeuB`ae WpDAqiBBHe cG9@buWV scqFba c RRi`AfeBf9rIiu9YSfADBfseuTRQe p C XV A a e A Y C X YI c t c F c t cI clI a C I BV RryD|efebfaBfQmB9RVqDA WlbEACvurIyAe urIGWV scY WBvfVbEAk#ivV qcYuF RQrSqce A c Al cI a A CV c SFI Q e 9a e eI YF ss F cs Y S X C ba GspRa GvtVCYbA DRPfea RBRyGl l cGpBFmDBQuWV scqb ca RRRF Rba GRoy3vtVCAe cGF 9 c9 { e S T Cs C X AaIA CAp a { S X C e Y F CF A Yj Da GDA RWtVGiihVbBDGEtBBqIvfVuV fe WVtCpDArfQVy DAc v iAhGUysBf9Gqy ifFa w scsYhARfVB9|flRBRvVR DihAGteDAqiBDuWV cGpEACG|rIwDb`Aiu9YwB9YvtuvVu9Y @ C A Y AlIQ X A { CF C YsF9a eQ s c IA e elV A S A 9 Q

 ) 6  '7#$ &R6   ) 6  &(7R6  )01!! 

Consider a matrix that is symmetric and positive denite and assume that a preconditioner is available. The preconditioner is a matrix which approximates in some yet-undened sense. It is assumed that is also Symmetric Positive Denite. From a practical point of view, the only requirement for is that it is inexpensive to solve linear systems . This is because the preconditioned algorithms will all require a linear system solution with the matrix at each step. Then, for example, the following preconditioned system could be solved:

or

Note that these two systems are no longer symmetric in general. The next section considers strategies for preserving symmetry. Then, efcient implementations will be described for particular forms of the preconditioners.

When

is available in the form of an incomplete Cholesky factorization, i.e., when

then a simple way to preserve symmetry is to split the preconditioner between left and right, i.e., to solve

which involves a Symmetric Positive Denite matrix. However, it is not necessary to split the preconditioner in this manner in order to preserve symmetry. Observe that is self-adjoint for the -inner product,

Therefore, an alternative is to replace the usual Euclidean inner product in the Conjugate Gradient algorithm by the -inner product. If the CG algorithm is rewritten for this new inner product, denoting by the original residual and by the residual for the preconditioned system, the following sequence of operations is obtained, ignoring the initial step:

f  !     a  %   f'd   a w P %  P `       `   C f'd   P  U%H3  C  a q'd % x` a bud ` % x` a % x` a % `  a ud ` f a f % f

since

uW'

 W'

 W'

e H f B 8 eHBHe bY hf bW P IID33

% f d  (uf'd  d  ud

$u " ovVm uU qP " nfhtvgP!U "

d (3 d % f f f f Bd ud

% x` a % g` a % x` f d 

8 "#"p 1 I& $   8 $
%

 

6 34 3 2 2

 $ 

8 #$    g

    w  a   %   a  %  ` `  C C

     
Similarly,

and a similar algorithm can be written for this dot product (see Exercise 1). In the case where is a Cholesky product , two options are available, namely, the split preconditioning option (9.3), or the above algorithm. An immediate question arises about the iterates produced by these two options: Is one better than the other? Surprisingly, the answer is that the iterates are identical. To see this, start from Algorithm 9.1 and dene the following auxiliary vectors and matrix from it:

Since and , the -inner products do not have to be computed explicitly. With this observation, the following algorithm is obtained.

All the steps of the algorithm can be rewritten with the new variables, yielding the following sequence of operations:

qa   %   ` a   %   d ua  C %  C ` a  C d f
%

f  C d  `

d  !a   d  %   d  x` a  %  x` `  f a  C f'd  d  %  C ` a  P %  C `

d  f'd  f  C d   P   C      
 

'd f

 %o` a f'd f% o` a 'd ` @ a % 'd ` % f f    f d

0 D%! 0 %  &

a  %   `

P 

 f'd  P  U3   C C & 0  &  '!  5 wEn 0   

x 
f

 C d  f

a  %   f'd `   a  P   a P % $ a f P` f  P    

 

%  ` a  P %  $` P fC f  P     w f   P %  P ` f     C   C 

 $ &

p8 & 8    8  $ $

8 ~

1. Compute 2. For 3. 4. 5. 6. 7. 8. 9. EndDo

, , and , until convergence Do:

 

f f    w f  P      a `  f  P %  a  P %  ` C f  d   C f   P   C f   f       C  f  C    a   %   x`a P % `     w    DBB% G C % E m 

Observe that

It is interesting to observe that product. Indeed,

and

is also self-adjoint with respect to the

inner-

  

This is precisely the Conjugate Gradient algorithm applied to the preconditioned system

where . It is common when implementing algorithms which involve a right preconditioner to avoid the use of the variable, since the iteration can be written with the original variable. If the above steps are rewritten with the original and variables, the following algorithm results.

The iterates produced by the above algorithm and Algorithm 9.1 are identical, provided the same initial guess is used. is not HermiConsider now the right preconditioned system (9.2). The matrix tian with either the Standard inner product or the -inner product. However, it is Hermitian with respect to the -inner product. If the CG-algorithm is written with respect to the -variable and for this new inner product, the following sequence of operations would be obtained, ignoring again the initial step:

Recall that the vectors and the vectors are related by . Since the vectors are not actually needed, the update for in the second step can be replaced by . Then observe that the whole algorithm can be recast in terms of and .

and

f

 

f d

f 'd %

 

1. Compute 2. For 3. 4. 5. 6. 7. 8. EndDo

; ; and , until convergence Do:

0 D%! 0 %  &

d  

8 "#"p 1 I& $   8 $
f  f  C d f d 

f f      w  d  a  %  a f  C % f   `   `    C f C C C f   'd     C  f  C       w  a  %  xa % `   `  C  DB D% G C % E n   fd   C    C C ! 0  &  '!  2 t wh n 0    

a  %  P a  %  $`  `    P f  P C    C  f    C  f  C     w    ! a `   !%  xa  %  P `      f f C  d  P   d C f   'd   w  f   f f    a  %    a f w % f  C  `    ` C C f C C f      d    C   f C !  f  a  %   d x     a w C  %  C `      ` 

f    w  C  a  % a f  % f  ` `   C  C C  C      C
 

f d

 $ 


8 #$   
f




   f    C

  

Notice that the same sequence of computations is obtained as with Algorithm 9.1, the left preconditioned Conjugate Gradient. The implication is that the left preconditioned CG algorithm with the -inner product is mathematically equivalent to the right preconditioned CG algorithm with the -inner product.

When applying a Krylov subspace procedure to a preconditioned linear system, an operation of the form

or some similar operation is performed at each step. The most natural way to perform this operation is to multiply the vector by and then apply to the result. However, since and are related, it is sometimes possible to devise procedures that are more economical than this straightforward approach. For example, it is often the case that

in which the number of nonzero elements in simplest scheme would be to compute

is much smaller than in . In this case, the as

This requires that be stored explicitly. In approximate factorization techniques, is the matrix of the elements that are dropped during the incomplete factorization. An even more efcient variation of the preconditioned Conjugate Gradient algorithm can be derived for some common forms of the preconditioner in the special situation where is symmetric. Write in the form

in which is the same as above and is some diagonal, not necessarily equal to . For example, in the SSOR preconditioner with , . Also, for certain types of matrices, the IC(0) preconditioner can be expressed in this manner, where can be obtained by a recurrence formula. Eisenstats implementation consists of applying the Conjugate Gradient algorithm to the linear system

with

GF v'

GI v'

GP v'

'd a ! &` (% ea ! f f d

in which is the strict lower triangular part of the preconditioner can be written in the form

and

its diagonal. In many cases,

6 v'

f 'd

B XU P QQCIhIDC 5P W Y 9 Y W H f H T f

f d

 $ &

G # f ! ` d a ! ` ! ! P

p8 & 8    8  $ $
f a ` d d f w f'd  (

Bf'd a !

f d

`

CI5 P H Y W HP

` f ( 'd a !

f 'd

 

42 42 5353

`

 f 

 

8 ~

P 

 
!

 

This does not quite correspond to a preconditioning with the matrix (9.5). In order to produce the same iterates as Algorithm 9.1, the matrix must be further preconditioned with the diagonal matrix . Thus, the preconditioned CG algorithm, Algorithm 9.1, is actually applied to the system (9.6) in which the preconditioning operation is . Alternatively, we can initially scale the rows and columns of the linear system and preconditioning to transform the diagonal to the identity. See Exercise 6. Now note that

in which

. As a result,

Thus, the vector

can be computed by the following procedure:

1. 2. 3. 4.

Note that the matrices and are not the transpose of one another, so we actually need to increase the storage requirement for this formulation if these matrices are stored. However, there is a more economical variant which works with the matrix and its transpose. This is left as Exercise 7. Denoting by the number of nonzero elements of a sparse matrix , the total number of operations (additions and multiplications) of this procedure is for (1), for (2), for (3), and for (4). The cost of the preconditioning operation by , i.e., operations, must be added to this, yielding the total number of operations:

For the straightforward approach,

operations are needed for the product with

a ! 0 ` 

f ! d

One product with the diagonal are stored. Indeed, by setting reformulated as follows.

can be saved if the matrices and and , the above procedure can be

f d

f ! f'd ! 'd P w    f f a f a P ` d (! d 8 `   w d a f f ! d f d 8 `  P          % 2    wh n f f f'd f 'd f ! d P w   f a Pf ` 'd a ! `   w f  d a ! `  P  f f a f f'd a ! ` w d a ! ` w d (! ` U f H % f ('d a &` f d (! ` f a fd a  5 d (! ` f a ! ` f a  cfd a(! ` ` Bd (! ` 3

! ` fea ! d ` f'd a ! w w `7$a ! ` a ! ` w w H f d ea ! ` a ! ! f'd a !

aD `   H a b`   " # H w a  a ! `   a(`  $` !  " # w w H w a c !`  a ! `0  H w H w H w

8 "#"p 1 I& $   8 $ x u x i i

 $  f d

a 4!`  

8 #$   
w

f 'd

`  ! 

f ! d

f d

for the forward solve, and

for the backward solve giving a total of

Thus, Eisenstats scheme is always more economical, when is large enough, although the relative gains depend on the total number of nonzero elements in . One disadvantage of this scheme is that it is limited to a special form of the preconditioner. For a 5-point nite difference matrix, is roughly , so that with the standard implementation operations are performed, while with Eisenstats implementation only operations would be performed, a savings of about . However, if the other operations of the Conjugate Gradient algorithm are included, for a total of about operations, the relative savings become smaller. Now the original scheme will require operations, versus operations for Eisenstats implementation.

In the case of GMRES, or other nonsymmetric iterative solvers, the same three options for applying the preconditioning operation as for the Conjugate Gradient (namely, left, split, and right preconditioning) are available. However, there will be one fundamental difference the right preconditioning versions will give rise to what is called a exible variant, i.e., a variant in which the preconditioner can change at each step. This capability can be very useful in some applications.

As before, dene the left preconditioned GMRES algorithm, as the GMRES algorithm applied to the system,

The straightforward application of GMRES to the above linear system yields the following preconditioned version of GMRES.  '  &  I0!  0  # m

Do:

, Do:

7

1. Compute 2. For 3. Compute 4. For 5. 6.

and

v'

GH

  a ` @@bx0  Q
C

B H e f 8 R H W UP D3hAIcXrbY PcRcXC 3C E FDT W U H e Y H

a b`  

 $ &

a !`0  H w w a !0  H w bx0  H ` a ` a `   ! H w p8 & 8    8 ~ 8  $ $ 

f f Bd ud

C S  2  C  S a S %  ` G  2 S DBD% %  f  d   G %BDB% m  f u  C z a ` 'd  C

v m " nfhrvxf "

6 33 2 2

   "  wEn

i" H

W" G

 5



a !  (`c

 

The Arnoldi loop constructs an orthogonal basis of the left preconditioned Krylov subspace

It uses a modied Gram-Schmidt process, in which the new vector to be orthogonalized is obtained from the previous vector in the process. All residual vectors and their norms that are computed by the algorithm correspond to the preconditioned residuals, namely, , instead of the original (unpreconditioned) residuals . In addition, there is no easy access to these unpreconditioned residuals, unless they are computed explicitly, e.g., by multiplying the preconditioned residuals by .This can cause some difculties if a stopping criterion based on the actual residuals, instead of the preconditioned ones, is desired. Sometimes a Symmetric Positive Denite preconditioning for the nonsymmetric matrix may be available. For example, if is almost SPD, then (9.8) would not take advantage of this. It would be wiser to compute an approximate factorization to the symmetric part and use GMRES with split preconditioning. This raises the question as to whether or not a version of the preconditioned GMRES can be developed, which is similar to Algorithm 9.1, for the CG algorithm. This version would consist of using GMRES with the -inner product for the system (9.8). At step of the preconditioned GMRES algorithm, the previous is multiplied by to get a vector

Then this vector is preconditioned to get

This vector must be -orthogonalized against all previous Schmidt process is used, we rst compute the inner products

s. If the standard Gram-

To complete the orthonormalization step, the nal must be normalized. Because of the orthogonality of versus all previous s, observe that

bv'

bv'

ua  P % 

`

 a  P % 

sS  2 S

 P f

f  d

 S

`  a  P %  $P`  a  P %  P `

P   P

and then modify the vector

into the new vector

' W'

) 0v'

uDv'

% %

, and and GoTo 1

BBD% G HC% a S % 

 h f   h & w h  f f h S f ( u C S  h  h h  %'B% B % f % h  h WD  u  #  C f 

v( 

f 'd

a f b d

Cf

%% % u ` DBBF qf'd 

`

f  'd

a S %  P `  a S %  $P`

f&

and

   

 

 

7. 8. 9. 10. 11. 12.

EndDo Compute EndDo Dene Compute If satised Stop, else set

C


 c    $ C
& W  % #

a h

8 #$   
S

f ` d

 P

Thus, the desired

-norm could be obtained from (9.13), and then we would set

One serious difculty with the above procedure is that the inner product as computed by (9.13) may be negative in the presence of round-off. There are two remedies. First, this -norm can be computed explicitly at the expense of an additional matrix-vector multiplication with . Second, the set of vectors can be saved in order to accumulate , via the relation inexpensively both the vector and the vector

A modied Gram-Schmidt version of this second approach can be derived easily. The details of the algorithm are left as Exercise 12.

The right preconditioned GMRES algorithm is based on solving

As we now show, the new variable never needs to be invoked explicitly. Indeed, once the initial residual is computed, all subsequent vectors of the Krylov subspace can be obtained without any reference to the -variables. Note that is not needed at all. The initial residual for the preconditioned system can be computed from , which is the same as . In practice, it is usually that is available, not . At the end, the -variable approximate solution to (9.15) is given by,

Thus, one preconditioning operation is needed at the end of the outer loop, instead of at the beginning in the case of the left preconditioned version.

Do:

, Do:

 

1. Compute 2. For 3. Compute 4. For 5. 6. 7. EndDo

 '  &  '0!   1  # m

, and

ScS

 S

f d

with . Multiplying through by of the -variable,

yields the desired approximation in terms

F Gev'

 

B H e f 8 R H W UP D3hAIcXrbY PcRcXC 3C E 3A8 Ce W U H e Y a P

S cS  w  h h  d  f  f'd U fU  ig ( d % f

S S

and

 a  P %  P ` 6 v'

f     P

 $ &

p8 & 8    8  $ $
 P S

 

 S

 S

f d

 

C S  S   C a S %  ` G  S DBD% %  f  'd   G %BDB% m  C &%   C

f a 
w

 

 P

 %  P `

 P

42 2 533

f  

   "  wEn

8 ~


   C

 

g 

This time, the Arnoldi loop builds an orthogonal basis of the right preconditioned Krylov subspace

Note that the residual norm is now relative to the initial system, i.e., , since the algorithm obtains the residual , implicitly. This is an essential difference with the left preconditioned GMRES algorithm.

In many cases,

is the result of a factorization of the form

Then, there is the option of using GMRES on the split-preconditioned system

In this situation, it is clear that we need to operate on the initial residual by at the start of the algorithm and by on the linear combination in forming the approximate solution. The residual norm available is that of . A question arises on the differences between the right, left, and split preconditioning options. The fact that different versions of the residuals are available in each case may affect the stopping criterion and may cause the algorithm to stop either prematurely or with is very ill-conditioned. The degree delay. This can be particularly damaging in case of symmetry, and therefore performance, can also be affected by the way in which the preconditioner is applied. For example, a split preconditioner may be much better if is nearly symmetric. Other than these two situations, there is little difference generally between the three options. The next section establishes a theoretical connection between left and right preconditioned GMRES.

When comparing the left, right, and split preconditioning options, a rst observation to make is that the spectra of the three associated operators , , and are identical. Therefore, in principle one should expect convergence to be similar, although, as is known, eigenvalues do not always govern convergence. In this section, we compare the optimality properties achieved by left- and right preconditioned GMRES.

f 'd

f 'd

f d

UU

, and and GoTo 1.

f d

8 5AXU P Y PcA 33 WP W R W U H e Y DT cW 9 CA8 Cd U UpB Ce 9 XC H R Y a P e W P f U

f 'd

v( 

h 

8 5AXU P Y cA 33 Y P C B WP W P R W U H e T

d ( f'd  d f'd % f f

a h

f 'd

f ` 'd

a 'd f

h fd z h UP ` DBBF d  % % f %

 C f S

 

8. Compute 9. Dene 10. EndDo 11. Compute 12. If satised Stop, else set

and ,

Y


f (  S f h     &  

h f  u h  f   '% % h  h 'd w  h &  f f C  W D f h u % B B#%  C f    h

 c    $ 23 3 2 C
& W

f 'd

% #

2 2 33

8 #$   

For the left preconditioning option, GMRES minimizes the residual norm

among all vectors from the afne subspace

in which is the preconditioned initial residual solution can be expressed as

. Thus, the approximate

among all polynomials of degree . It is also possible to express this optimality condition with respect to the original residual vector . Indeed,

A simple algebraic manipulation shows that for any polynomial ,

from which we obtain the relation

Consider now the situation with the right preconditioned GMRES. Here, it is necessary to distinguish between the original variable and the transformed variable related to by . For the variable, the right preconditioned GMRES process minimizes the 2-norm of where belongs to

in which is the residual . This residual is identical to the residual associated with the original variable since . Multiplying (9.19) through to the left by and exploiting again (9.17), observe that the generic variable associated with a vector of the subspace (9.19) belongs to the afne subspace

This is identical to the afne subspace (9.16) invoked in the left preconditioned variant. In other words, for the right preconditioned GMRES, the approximate -solution can also be expressed as

However, now

is a polynomial of degree

which minimizes the norm

among all polynomials of degree . What is surprising is that the two quantities which are minimized, namely, (9.18) and (9.20), differ only by a multiplication by . Specically, the left preconditioned GMRES minimizes , whereas the right preconditioned variant minimizes , where is taken over the same subspace in both cases.

P Gev'

v'

G' v'

G) v'

f d

C C f d C G 3 u  a f'd x` d  f C G C f d h f f 7 C a d x` d h w  h 7(  P fd h D fd ` DB  P d %  P %# w   h 'd w  'd a % f f f & W  f d  g  fd   f'd  C C (  fd !fd x` % BDB% f'd %  a %#  h  w w h C C C W  f 'd C d  f

f  C 'd

f  C a d

f x` 'd

f % a 'd C

f x` d

f d

f d

&

 P b d a f

b 'd ` a f

f u ` qd

f d

where

is the polynomial of degree

which minimizes the norm

I Gev'

f (  P d

aDqfd ` U C d  P bud ` q'd P P f a f f  G C 3 a f u P b d ` f'd P P G f  P bq'd ` a f d h d w   h f P BBD%  P d %  P # w  w  f % h

f  d C a f D d ` %

 $ &

p8 & 8    8  $ $
%

uH f'd f'd
& W




8 ~

The approximate solution obtained by left or right preconditioned

GMRES is of the form

In most practical situations, the difference in the convergence behavior of the two approaches is not signicant. The only exception is when is ill-conditioned which could lead to substantial differences.

In the discussion of preconditioning techniques so far, it is implicitly assumed that the preconditioning matrix is xed, i.e., it does not change from step to step. However, in some cases, no matrix is available. Instead, the operation is the result of some unspecied computation, possibly another iterative process. In such cases, it may well happen that is not a constant operator. The previous preconditioned iterative procedures will not converge if is not constant. There are a number of variants of iterative procedures developed in the literature that can accommodate variations in the preconditioner, i.e., that allow the preconditioner to vary from step to step. Such iterative procedures are called exible iterations. One of these iterations, a exible variant of the GMRES algorithm, is described next.

We begin by examining the right preconditioned GMRES algorithm. In line 11 of Algorithm 9.5 the approximate solution is expressed as a linear combination of the preconditioned vectors . These vectors are also computed in line 3, prior to their multiplication by to obtain the vector . They are all obtained by applying the same preconditioning matrix to the s. As a result it is not necessary to save them. Instead, we only need to apply to the linear combination of the s, i.e., to in line 11. Suppose now that the preconditioner could change at every step, i.e., that is given by

Then it would be natural to compute the approximate solution as

f 'd

where and minimizes the residual norm conditioned residual norm

is a polynomial of degree . The polynomial in the right preconditioning case, and the prein the left preconditioning case.

S
 !d af C

G ! f ` 'd h d f

f d

B H e f 8 H  D3hAhDT P

h   w

f  'd 

 

Da h ` f'd h PB d f f  d  P h a f f  P bud ` d h w C h 



H DTp

f d f d s% BBD% G %S d S P f

6 2 72 3

u m r iw

 $   "0' 0

5 E n 

f d


h


in which , and is computed as before, as the solution to the leastsquares problem in line 11. These are the only changes that lead from the right preconditioned algorithm to the exible variant, described below.

Do:

, Do:

As can be seen, the main difference with the right preconditioned version, Algorithm 9.5, is that the preconditioned vectors must be saved and the solution updated using these vectors. It is clear that when for , then this method is equivalent mathematically to Algorithm 9.5. It is important to observe that can be dened in line 3 without reference to any preconditioner. That is, any given new vector can be chosen. This added exibility may cause the algorithm some problems. Indeed, may be so poorly selected that a breakdown could occur, as in the worst-case scenario when is zero. One difference between FGMRES and the usual GMRES algorithm is that the action of on a vector of the Krylov subspace is no longer in the span of . Instead, it is easy to show that

in replacement of the simpler relation which holds for the denotes the matrix standard preconditioned GMRES; see (6.5). As before, obtained from by deleting its last row and is the vector which is normalized in line 9 of Algorithm 9.6 to obtain . Then, the following alternative formulation of (9.21) is valid, even when :

An optimality property similar to the one which denes GMRES can be proved. Consider the residual vector for an arbitrary vector in the afne space . This optimality property is based on the relations

GG v'

GG v'

qW v'

 h

s% DBD% G

h  

, and and GoTo 1.

EndDo Compute Dene EndDo Compute If satised Stop, else set

and ,

f h f h  h h a d x` f  h h 

 P h f 

h  

 

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Compute For Compute Compute For

 h 5' h f  & %  % h w  h  f f f C  h W f    C f S (  S f h h P % DBB% P C f  h  u  z       &    C S  S   C a S %  ` G  S DBD% %   P   f  'd    P G %BDB% m f  u7 C &%  C

, and

 $ &
.

p8 & 8    8  $ $

 m  HUm

h w h

h U   o`(3mP  C

 

 f'd 

f E  

C f

 h

h P % DBD% f P

0 " wEn    


( h

8 ~

&

7 

f 'd 


  

P P

Since the algorithm minimizes this norm over all vectors in to yield , it is clear that the approximate solution has the smallest residual norm in . Thus, the following result is proved.

Next, consider the possibility of breakdown in FGMRES. A breakdown occurs when the vector cannot be computed in line 9 of Algorithm 9.6 because . For the standard GMRES algorithm, this is not a problem because when this happens then the approximate solution is exact. The situation for FGMRES is slightly different.

Assume that and that steps of FGMRES have been successfully performed, i.e., that for . In addition, assume that the matrix is nonsingular. Then is exact, if and only if .

We must show, by contraction, that . Assume that . Since , , , , , form an orthogonal system, then it follows from (9.26) that and . The last component of is equal to zero. A simple back-substitution for the system , starting from the last equation, will show that all components of are zero. Because is nonsingular, this would imply that and contradict the assumption. The only difference between this result and that of Proposition 6.10 for the GMRES algorithm is that the additional assumption must be made that is nonsingular since it is no longer implied by the nonsingularity of . However, is guaranteed to be nonsingular when all the s are linearly independent and is nonsingular. This is a consequence of a modication of the rst part of Proposition 6.9. That same proof shows that the rank of is equal to the rank of the matrix therein. If is nonsingular and , then is also nonsingular.

I v'

Ef

f    

Cf

a f ` d  f

 h

  f

If is nonsingular, then the above function is minimized for corresponding minimum norm reached is zero, i.e., is exact. Conversely, if is exact, then from (9.22) and (9.23),

Ru  

 

 &Uu

 

 

f w     3 E 

If

, then

, and as a result

and the

Cf

 

Cf

( h

 

% #

The approximate solution minimizes the residual norm over

obtained at step .

of FGMRES

F v'

 '

& W

S C f

  C 13

 

h  

S 

observe that by (9.24) and the fact that

is unitary,

h  

D#a

#a

#Uu

B h D

   

 

 

& h

` h

 

` h

f #a `   C f  

hf

 

t 

E n

E n

 

 

( h

  h 

a ` h



B B

& 'W

h 

% #

If

denotes the function

6 R! v'

h f f h f  h  f

h

 $   "0' 0

A consequence of the above proposition is that if , at a certain step, i.e., if the preconditioning is exact, then the approximation will be exact provided that is nonsingular. This is because would depend linearly on the previous s (it is equal to ), and as a result the orthogonalization process would yield . A difculty with the theory of the new algorithm is that general convergence results, such as those seen in earlier chapters, cannot be proved. That is because the subspace of approximants is no longer a standard Krylov subspace. However, the optimality property of Proposition 9.2 can be exploited in some specic situations. For example, if within each outer iteration at least one of the vectors is chosen to be a steepest descent direction vector, e.g., for the function , then FGMRES is guaranteed to converge independently of . The additional cost of the exible variant over the standard algorithm is only in the extra memory required to save the set of vectors . Yet, the added advantage of exibility may be worth this extra cost. A few applications can benet from this exibility, especially in developing robust iterative methods or preconditioners on parallel computers. Thus, any iterative technique can be used as a preconditioner: block-SOR, SSOR, ADI, Multi-grid, etc. More interestingly, iterative procedures such as GMRES, CGNR, or CGS can also be used as preconditioners. Also, it may be useful to mix two or more preconditioners to solve a given problem. For example, two types of preconditioners can be applied alternatively at each FGMRES step to mix the effects of local and global couplings in the PDE context.

Recall that the DQGMRES algorithm presented in Chapter 6 uses an incomplete orthogonalization process instead of the full Arnoldi orthogonalization. At each step, the current vector is orthogonalized only against the previous ones. The vectors thus generated are locally orthogonal to each other, in that for . The matrix becomes banded and upper Hessenberg. Therefore, the approximate solution can be updated at step from the approximate solution at step via the recurrence

in which the scalars and are obtained recursively from the Hessenberg matrix . An advantage of DQGMRES is that it is also exible. The principle is the same as in FGMRES. In both cases the vectors must be computed. In the case of FGMRES, these vectors must be saved and this requires extra storage. For DQGMRES, it can be observed that the preconditioned vectors only affect the update of the vector in the preconditioned version of the update formula (9.27), yielding

As a result,

can be discarded immediately after it is used to update

. The same

GP v'

 

S E

 

 

   w

7 v

f d

S   S C

 $ & h
 

 

C 0)0)0)C f

p8 & 8    8  $ $
f

B H f i D3e A8 !R

S y

 ( 

 f'd 

f'd 

%d2  S S C

  S

a  % S `

&

HP %@Bza o`

 f'd 

f'd 

4 2 3 2

z
  S

 G C

  % GC

S C

 

f  d 

 

8 ~

 

memory locations can store this vector and the vector which requires additional vectors of storage.

. This contrasts with FGMRES

There are several versions of the preconditioned Conjugate Gradient method applied to the normal equations. Two versions come from the NR/NE options, and three other variations from the right, left, or split preconditioning options. Here, we consider only the left preconditioned variants. The left preconditioned CGNR algorithm is easily derived from Algorithm 9.1. Denote by the residual for the original system, i.e., , and by the residual for the normal equations system. The preconditioned residual is . The scalar in Algorithm 9.1 is now given by

This suggests employing the auxiliary vector following form.

in the algorithm which takes the

Similarly, the linear system , with , can also be preconditioned from the left, and solved with the preconditioned Conjugate Gradient algorithm. Here, it is observed that the update of the variable, the associated variable, and two residuals take the form

 

 

f f  d  P  f f  C   U    C f  C  f         w    f ` a   a %  P % `x`  

a %   a P%

   C`

i U

C

1. Compute 2. For 3. 4. 5. 6. 7. 8. 9. 10. 11. EndDo

, , , until convergence Do:

r
f  d C C

P C

Ef@%znuz 

f f    f w f  5    P a  % P$a  %  $`  `   P f  f C  C   P f f  'd C ff  C   C      C f  C      w  `    pa  C %  P   DB B% n   P5  d  P   C f C  C U C E     &10  &  I0! &( 0   wh n    z    `  x`  

 $ &


a   % a P%

p20 81 )
 
 ` C

'& #  $
 C

a  %   a  P %  ` C

} v& ~8 

  $
"

u " nfhtvgP!U

8 #$   
 

C

Thus, if the algorithm for the unknown is to be written, then the vectors can be used instead of the vectors , which are not needed. To update these vectors at the end of the algorithm the relation in line 8 of Algorithm 9.1 must be multiplied through by . This leads to the left preconditioned version of CGNE, in which the notation has been changed to denote by the vector invoked in the above derivation.

1. Compute 2. For 3. 4. 5. 6. 7. 8. 9. 10. EndDo

, , until convergence Do:

Not shown here are the right and split preconditioned versions which are considered in Exercise 3.

When the matrix is nearly symmetric, we can think of preconditioning the system with the symmetric part of . This gives rise to a few variants of a method known as the CGW method, from the names of the three authors Concus and Golub [60], and Widlund [225] who proposed this technique in the middle of the 1970s. Originally, the algorithm was not , with , viewed from the angle of preconditioning. Writing the authors observed that the preconditioned matrix

is equal to the identity matrix, plus a matrix which is skew-Hermitian with respect to the -inner product. It is not too difcult to show that the tridiagonal matrix corresponding to the Lanczos algorithm, applied to with the -inner product, has the form

v'

a b 

w ` f

 

 

 $ &

 3

p8 & 8    8  $ $
  f    

f P  f      a  %  $a f  % f  P `  P `  fw    f  C C  f  C 'd f  P       C f   C     w    a   %  "a  %  $`   `  P C   G    % DBB% % m f  P    d  P   E C C    m ! & 0  &  I0! &( 0   wEn 

f  d

hG

f d

8 d f

G G

 

f   

ut%

 

8 ~

S 

v nu

As a result, a three-term recurrence in the Arnoldi process is obtained, which results in a solution algorithm that resembles the standard preconditioned CG algorithm (Algorithm 9.1). A version of the algorithm can be derived easily. From the developments in Section 6.7 relating the Lanczos algorithm to the Conjugate Gradient algorithm, it is known that can be expressed as

The preconditioned residual vectors must then satisfy the recurrence

and if the s are to be As a result,

-orthogonal, then we must have

Thus, a rst consequence is that

1 Let a matrix and its preconditioner be SPD. Observing that is self-adjoint with respect to the inner-product, write an algorithm similar to Algorithm 9.1 for solving the preconditioned linear system , using the -inner product. The algorithm should employ only one matrix-by-vector product per CG step. 2 In Section 9.2.1, the split-preconditioned Conjugate Gradient algorithm, Algorithm 9.2, was derived from the Preconditioned Conjugate Gradient Algorithm 9.1. The opposite can also be done. Derive Algorithm 9.1 starting from Algorithm 9.2, providing a different proof of the equivalence of the two algorithms.

q$P R

a  %  $` P a f  C % f  $`  C  P

 a f a  PP %% f P ` $P`    

R d R h $P e $P

Note that algorithm,

and therefore we have, just as in the standard PCG

   

f a  d %    ` a   'd % f  $`  f  P

f d

because is orthogonal to all vectors in -orthogonal to yields

. In addition, writing that

  

a  %   `  

f x  a  %  ud `  a f'd   d   f

 

 

a 

  

Also, the next search direction

is a linear combination of

and

 a  P %   'd  f
   

 

a  P%  `  a P%  ` C

P  $`

 f   d

a  P  % a  %P f'P `d `  

   w 

 

P 5

 

 

 f   %   d

  

 $`  P f

 

 f   'd f   d u `  a  P %   x`

 c"

f   d

8 }$

 (c2 " $  


 

is

3 Six versions of the CG algorithm applied to the normal equations can be dened. Two versions come from the NR/NE options, each of which can be preconditioned from left, right, or on two sides. The left preconditioned variants have been given in Section 9.5. Describe the four other versions: Right P-CGNR, Right P-CGNE, Split P-CGNR, Split P-CGNE. Suitable inner products may be used to preserve symmetry. 4 When preconditioning the normal equations, whether the NE or NR form, two options are available in addition to the left, right and split preconditioners. These are centered versions:

for the NE form, and

for the NR form. The coefcient matrices in the above systems are all symmetric. Write down the adapted versions of the CG algorithm for these options. be SPD. The standard result about the rate of conver5 Let a matrix and its preconditioner gence of the CG algorithm is not valid for the Preconditioned Conjugate Gradient algorithm, Algorithm 9.1. Show how to adapt this result by exploiting the -inner product. Show how to derive the same result by using the equivalence between Algorithm 9.1 and Algorithm 9.2. 6 In Eisenstats implementation of the PCG algorithm, the operation with the diagonal some difculties when describing the algorithm. This can be avoided.

Assume that the diagonal of the preconditioning (9.5) is equal to the identity matrix. What are the number of operations needed to perform one step of the PCG algorithm with Eisenstats implementation? Formulate the PCG scheme for this case carefully. The rows and columns of the preconditioning matrix can be scaled so that the matrix of the transformed preconditioner, written in the form (9.5), is equal to the identity matrix. What scaling should be used (the resulting should also be SPD)? Assume that the same scaling of question b is also applied to the original matrix . Is the resulting iteration mathematically equivalent to using Algorithm 9.1 to solve the system (9.6) preconditioned with the diagonal ?

and 7 In order to save operations, the two matrices ing by Algorithm 9.3. This exercise considers alternatives.

must be stored when comput-

The matrix in the previous question is not the proper preconditioned version of by the preconditioning (9.5). CG is used on an equivalent system involving but a further preconditioning by a diagonal must be applied. Which one? How does the resulting algorithm compare in terms of cost and storage with an Algorithm based on 9.3? It was mentioned in Section 9.2.2 that needed to be further preconditioned by . Consider the split-preconditioning option: CG is to be applied to the preconditioned system associated with . Dening show that,

where is a certain matrix to be determined. Then write an analogue of Algorithm 9.3 using this formulation. How does the operation count compare with that of Algorithm 9.3? 8 Assume that the number of nonzero elements of a matrix is parameterized by . How small should be before it does not pay to use Eisenstats implementation for the PCG algorithm? What if the matrix is initially scaled so that is the identity matrix?

by

R SP

9 5

  

P T 9 C 85

R SP

 SP R SP R

T  P T R $P T 9 C 85 9 C 85 9 C 85 R R $P R  

Consider the matrix multiplying a vector

. Show how to implement an algorithm similar to 9.3 for . The requirement is that only must be stored.

R SP

R $P

 $ &

h SP | SP ed R R e d e SP 2 h R

p8 & 8    8  $ $

R SP

4 4

8 ~

& '%

& '%

&1

&32

&1

&32

causes

9 Let be a preconditioner for a matrix . Show that the left, right, and split preconditioned matrices all have the same eigenvalues. Does this mean that the corresponding preconditioned iterations will converge in (a) exactly the same number of steps? (b) roughly the same number of steps for any matrix? (c) roughly the same number of steps, except for ill-conditioned matrices? 10 Show that the relation (9.17) holds for any polynomial and any vector .

11 Write the equivalent of Algorithm 9.1 for the Conjugate Residual method. 12 Assume that a Symmetric Positive Denite matrix is used to precondition GMRES for solving a nonsymmetric linear system. The main features of the P-GMRES algorithm exploiting this were given in Section 9.2.1. Give a formal description of the algorithm. In particular give a Modied Gram-Schimdt implementation. [Hint: The vectors s must be saved in addition to the s.] What optimality property does the approximate solution satisfy? What happens if the original matrix is also symmetric? What is a potential advantage of the resulting algorithm?

N OTES AND R EFERENCES . The preconditioned version of CG described in Algorithm 9.1 is due to Meijerink and van der Vorst [149]. Eisenstats implementation was developed in [80] and is often referred to as Eisenstats trick. A number of other similar ideas are described in [153]. Several exible variants of nonsymmetric Krylov subspace methods have been developed by several authors simultaneously; see, e.g., [18], [181], and [211]. There does not seem to exist a similar technique for left preconditioned variants of the Krylov subspace methods. This is because the preconditioned operator now changes at each step. Similarly, no exible variants have been developed for the BCG-based methods, because the short recurrences of these algorithms rely on the preconditioned operator being constant. The CGW algorithm can be useful in some instances, such as when the symmetric part of can be inverted easily, e.g., using fast Poisson solvers. Otherwise, its weakness is that linear systems with the symmetric part must be solved exactly. Inner-outer variations that do not require exact solutions have been described by Golub and Overton [109].

 c"

8 }$

 (c2 " $  

e 
$P R

f d

Roughly speaking, a preconditioner is any form of implicit or explicit modication of an original linear system which makes it easier to solve by a given iterative method. For example, scaling all rows of a linear system to make the diagonal elements equal to one is an explicit form of preconditioning. The resulting system can be solved by a Krylov subspace method and may require fewer steps to converge than with the original system (although this is not guaranteed). As another example, solving the linear system

where is some complicated mapping that may involve FFT transforms, integral calculations, and subsidiary linear system solutions, may be another form of preconditioning. Here, it is unlikely that the matrix and can be computed explicitly. Instead,

Bd qf'd f

f d

uu " EX#o YF Wy cfCqS YI ca A aF c CV A Y S X QT F A Y YF C YsF9a c Y c Q c qDA qxiViiBIit cRBf9WtVCY rcGRhACGyB9HqBf9Y@e c D`AqRBDe G9UInfl`AeeuD`ae Wl teDBWV scY WBWfVbEAGBf9rDhARQufFAPIvSvVUWxjDBpAiqV pbe`Ae cGD`AYeuyeGbBrIc B9Y CAI clI a Cs A Y XV C Y X V S a F CA { ~ CF S CFA A 9 c9 { S e S T CsF ey9sF c CV A XV tl {VI S X lA C l AT I Da GWEVCX DA RvtVDGRba scuqRRBrIRt ciBf9Y DHAGbA quG WtVCbBp cDAR@GPifFa teDBWV scY WBWfVbEAGGA RRDsyRD betDBWV scY WBWfVabEACGiRDuuIGrIFqviVheuWV csqWV CAI clI a Cs j s SF A CV CAI clI s lVVt t c c YTV C X I Ys A RiF BquVPeY rSc uvy iBQtYCcGp EACGFmEADB9PuB9Y`AeqWwBf9um`ABV $ UD`AYeuye T cFpF Y c VI F CA Y YF Y YQV A Y YF Y S CGbBrIc AEeGBsRiEFDBBvV qcY WBIvVabEACGsmVlbA`ufeB7d GRDbAiRQXeebbBfQ@YGVUB9Y FA CF e CAIAt F I cl Y eQ AQ cI9a Y Aaa e e S A XDvVteDBDVa DAqiBBe G7betDA`qRBDuQvV cGsphADwrIwDb`A@uWV scqEFD`AY UfADBfseuTRQe V A S e CAp C YsF9a c9 8 C YsF9a e Cs c IA e eI Y C c aF p C A Y XV AIV F cs YjAQ I9 Y Y C c C YQV IF c lA c S a BV Rry|Bf9DBWUy Rba GRoy7GB7d cGRBabA`UIvV qcuEFDAY DAqvRw9Y @{ bBrIGTWfVe c 9 c9 { CA e YF cy Css C c eT yIF eF AI Al AT IFa CAI clI a C Da GBBp vV`AqSqtVDRiF ryGF Wl qcuRfQe qiuvlbBp RGuRbiDBWV csY WBvfVbEADs buvV sq`bGisY GffRD`ED Divy Wurse CGsoRfv R}wiuffvfBIRF e I c Y F Y a A s y A A c s e A l I A YX V j A { t Ic c C Q e C V { e l V 9 Y A S A S V e l EAGhFuhAG|eY RfQfeEAHRba qcYEARfVBGAbBBA ua`e BRwrYCGUDnvV qcuBIGUWfV|eufbi{A Gp C C CF C F C A9 8 aI c lIF F XV I YF cT S a F F lA c I XV c S e C A CF e IA ct A YCAI clI a C l V F cl c D`AEYDe DAYuye GfFBrIc `AteGBfsDBp iFBp WVeVDBWV scY WBWfVbEAGs iVRDtUturIWBIW

)6 ) 6 g(7R6

4076 )(  2  )01!! 

the iterative processes operate with and with whenever needed. In practice, the preconditioning operation should be inexpensive to apply to an arbitrary vector. One of the simplest ways of dening a preconditioner is to perform an incomplete factorization of the original matrix . This entails a decomposition of the form where and have the same nonzero structure as the lower and upper parts of respectively, and is the residual or error of the factorization. This incomplete factorization known as ILU(0) is rather easy and inexpensive to compute. On the other hand, it often leads to a crude approximation which may result in the Krylov subspace accelerator requiring many iterations to converge. To remedy this, several alternative incomplete factorizations have been developed by allowing more ll-in in and . In general, the more accurate ILU factorizations require fewer iterations to converge, but the preprocessing cost to compute the factors is higher. However, if only because of the improved robustness, these trade-offs generally favor the more accurate factorizations. This is especially true when several systems with the same matrix must be solved because the preprocessing cost can be amortized. This chapter considers the most common preconditioners used for solving large sparse matrices and compares their performance. It begins with the simplest preconditioners (SOR and SSOR) and then discusses the more accurate variants such as ILUT.

As was seen in Chapter 4, a xed-point iteration for solving a linear system

takes the general form

The above iteration is of the form

where

and

Thus, for Jacobi and Gauss Seidel it has been shown that

where

is the splitting dened in Chapter 4.

 W)

 W)

R6 W)

I W) F W)

%Bea ! ` f d aDx` f qd 88 aDx`

a D

d f f f 8 ` d  d

 

 

  

g!

f 'd

where

and

realize the splitting of

into

uW)

$ v " " mErvxfP! %HH xnH@wPXp

f ud

  $ & f 'd
w

  f f d

8  &  & 4 j 7 rj} $    $ 




f d

The iteration (10.3) is attempting to solve

The above system is the preconditioned system associated with the splitting , and the iteration (10.3) is nothing but a xed-point iteration on this preconditioned system. Similarly, a Krylov subspace method, e.g., GMRES, can be used to solve (10.8), leading to a preconditioned version of the Krylov subspace method, e.g., preconditioned GMRES. The preconditioned versions of some Krylov subspace methods have been discussed in the previous chapter with a generic preconditioner . In theory, any general splitting in which is nonsingular can be used. Ideally, should be close to in some sense. However, note that a linear system with the matrix must be solved at each step of the iterative procedure. Therefore, a practical and admittedly somewhat vague requirement is that these solutions steps should be inexpensive. As was seen in Chapter 4, the SSOR preconditioner is dened by

Typically, when this matrix is used as a preconditioner, it is not necessary to choose as carefully as for the underlying xed-point iteration. Taking leads to the Symmetric Gauss-Seidel (SGS) iteration,

An interesting observation is that is the lower part of , including the diagonal, and is, similarly, the upper part of . Thus,

with

The matrix is unit lower triangular and is upper triangular. One question that may arise concerns the implementation of the preconditioning operation. To compute , proceed as follows:

A FORTRAN implementation of this preconditioning operation is illustrated in the following code, for matrices stored in the MSR format described in Chapter 3. FORTRAN CODE
s d  3 r $ v u t t y    3 w   5 3 X q%E BCBQBBrCBa5 v u t t  '  s p  '   s (' `B# $ %#1xR B5CvQ5 r   ' c  8 s r!  2  y y  $    3 w   3 X u t t Bqs qBiB hfeBc Bba`YBX W5 r p  3 X  2  d 3 c g 5 3 3 d  3 d 3 X 3  V UUU U U U U U U U U U U 5 U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U 8 !  6 $ 8 @ !   $  G#7TDI%6 SRQBP   I1HDGF)(' ED9#!1 CB@3 8 @ !  3 2  $ 8 ! $ 8   A   8  6 $  6  $  9#70) 5(4101 0)(' &%#"   $  3 2 $  $ $ !     

`

solve solve

  W)

G' W)

 f d 

a q G a q

a rP   af mP !d !

f f  d d

` f'd a(!

% (f'd !

f ` d a !

8 d f

which, because of the expression (10.4) for

, can be rewritten as

GP W)

 1 c#&) $

8 $ $  H ~ $ H& 8  & 

a `

  

`

`

u

a !

 

 

&`

8 ~

f d

a z

f ` a d !

3 g!

As was seen above, the SSOR or SGS preconditioning matrix is of the form where and have the same pattern as the -part and the -part of , respectively. Here, -part means lower triangular part and, similarly, the -part is the upper triangular part. If the error matrix is computed, then for SGS, for example, we would nd

8  !  @ 8  !  6 !  3 2  8  !  I) B8 I%#1CWI1

dB     d  dB 5 5 5

8   ! s B 8 !   8  ! I$ I!01 #)I1 e Y 5   6 d d  B 88`gF!) 5(4!1 @8`gF!132QU8I1 aI1   !  8  ! `A$0%d $ ' @ 8  !  6 g
' U ' @  !  48 A!I%6

8 0 ! 31 q21)IC01 #)I1 e Y 5 8 0 $  ! s Y U 8 !   8  !   6


' U $ ' $   4S`9RQd

5 5 5 5 5 5

 8 ! 2     `s %#8 eY 7y   6 "

 r u t % 6 &h&v "5

d d  B 88GgF!1 50!1Bq@8`gF!132QU8I!1 aI1    8  ! '48I!%6 #DI7d U  $ 8 !  6  g I! (' QWI1 8  8  ! 

d  dB

8 0 ! 31 q21)IRqC0)(' Q#)I1 e Y 5 8 0 $  ! r Y U 8 !  8  !   6  `(Qd $ '  ' Q Sy r  2  B t y


 r u t % v u &q&v &&

5 5 5 5 5 5

g )h P  $    UUU U U U U U UU U U U U 5 U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U  5  $  3 2       Y     3  P  d    S1qCB9BaB3 Q' C W     6 W5  6 yR$BGs '    3 Y r  5   c ' 5 3  X P        hCCBqP ' W (#C6 Q5      6   3   "   3 2       Y   P   d  6  5 X  5  d  CB9 h qBa #Q Q5  Y  "  5  y  3 1 2CB' BC' B'  s       5 d  3 8    2   9d BI! B9 a3 QBQCd X    Y    3  P  d  '   c   5  3   3 2 r   $ c Baa' TRCHyRd Bhs ' 5 3   p    2    3 5 X    Y    3  P  d  BaB9 a3 Q'  yRBGXC B B  3 Y v t "   '  P 5 d BBryR!Gs ' aBaQQ5    3 s d  3   3 Y r   X   3   3 2  y )(' QQ `s F! B  Q 8 r X     t  a5  BCqd Wa' Tqa' (W     P  3 ' 5     d  d  3 '  P  v ' 5 Y   B6B  B9Y ( X       W5  t v  BW5  v V UU U U U U U UU U U U U 5 U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U y X %BCB3 3E%BQ5  3  P  d d  $ X B 3 aBa5 aCBU 3 6 3  P     6 6    7B BaB P3 5 $ X 3 6 3        U  c   5   ' c     2   5  6   $ % 3  P 9Y U e5 s d  3 $ 8 ! 2  SDG7 P 9Y aCBG' B5qB'a5 3  U p r  5   3 Y     3   d 

  $ &

8  &  & 4 j 7 rj} $    $ 

If

is restricted to have the same structure as the -part of

and

is to have the same

structure as the -part of , the question is whether or not it is possible to nd and that yield an error that is smaller in some sense than the one above. We can, for example, try to nd such an incomplete factorization in which the residual matrix has zero elements in locations where has nonzero entries. This turns out to be possible in general and yields the ILU(0) factorization to be discussed later. Generally, a pattern for and can be specied and and may be sought so that they satisfy certain conditions. This leads to the general class of incomplete factorization techniques which are discussed in the next section. Table 10.1 shows the results of applying the GMRES algorithm with ) preconditioning to the ve test problems described in Section Matrix F2DA F3D ORS F2DB FID

See Example 6.1 for the meaning of the column headers in the table. Notice here that the method did not converge in 300 steps for the last two problems. The number of iterations for the rst three problems is reduced substantially from those required by GMRES without preconditioning shown in Table 6.2. The total number of operations required is also reduced, but not proportionally because each step now costs more due to the preconditioning operation.

Consider a general sparse matrix whose elements are . A general Incomplete LU (ILU) factorization process computes a sparse lower triangular matrix and a sparse upper triangular matrix so the residual matrix satises certain constraints, such as having zero entries in some locations. We rst describe a general ILU preconditioner geared toward -matrices. Then we discuss the ILU(0) factorization, the simplest form of the ILU preconditioners. Finally, we will show how to obtain more accurate factorizations.

 BDB% G 0%  S % %

u nfhtv"gP!U fE X  x

   5)0 

SGS (SSOR with 3.7.

Iters 38 20 110 300 300

Kops 1986 4870 6755 15907 99070

Residual 0.76E-03 0.14E-02 0.31E+00 0.23E+02 0.26E+02

Error 0.82E-04 0.30E-03 0.68E-04 0.66E+00 0.51E-01

A test run of GMRES with SGS preconditioning.

 1 c#&) $

8 $ $  H ~ $ H& 8  & 

8 ~

 5"0#   

A general algorithm for building Incomplete LU factorizations can be derived by performing Gaussian elimination and dropping some elements in predetermined nondiagonal positions. To analyze this process and establish existence for -matrices, the following result of Ky-Fan [86] is needed.

Let be an -matrix and let be the matrix obtained from the rst step of Gaussian elimination. Then is an -matrix.
Theorem 1.17 will be used to establish that properties 1, 2, and 3 therein are satised. First, consider the off-diagonal elements of :

Since are nonpositive and is positive, it follows that for . Second, the fact that is nonsingular is a trivial consequence of the following standard relation of Gaussian elimination

Therefore, all the columns of proof.

are nonnegative by assumption and this completes the

Clearly, the matrix obtained from by removing its rst row and rst column is also an -matrix. Assume now that some elements are dropped from the result of Gaussian Elimination outside of the main diagonal. Any element that is dropped is a nonpositive element which is transformed into a zero. Therefore, the resulting matrix is such that

and the off-diagonal elements of are nonpositive. Since is an -matrix, theorem 1.18 shows that is also an -matrix. The process can now be repeated on the matrix , and then continued until the incomplete factorization of is obtained. The above arguments shows that at each step of this construction, we obtain an -matrix and that the process does not break down. The elements to drop at each step have not yet been specied. This can be done statically, by choosing some non-zero pattern in advance. The only restriction on the zero pattern is that it should exclude diagonal elements because this assumption was used in the

where the elements of

are such that

. Thus,

f  'd

% x

S C % E f

 f'd

3f SS f C

f  f i 

 f'd

` x4a G x`

 a

 H%  H`

For

Finally, we establish that is nonnegative by examining for . , it is clear that because of the structure of . For the case , (10.10) can be exploited to yield

f  d

ff

where

BDB% G % ) 0v)

3  fS

B XU P QCXAY W Y 9 P e U

% 5 DBD%  % f C D

f 

ff f f

  $ &

@ T#H Y D XC W P H T f U 62 2 73 36 8  &   p &8 c $  $ 8  1

S

ff

f f d  df f f f df f f'd

fS 

f 

 5"

f % f % S S

n)|w



above proof. Therefore, for any zero pattern set

, such that

an Incomplete LU factorization,

, can be computed as follows.

The For loop in line 4 should be interpreted as follows: For and only for those indices that are not in execute the next line. In practice, it is wasteful to scan from to because there is an inexpensive mechanism for identifying those in this set that are in the complement of . Using the above arguments, the following result can be proved.

Let be an -matrix and a given zero pattern dened as in (10.11). Then Algorithm 10.1 is feasible and produces an incomplete factorization,

which is a regular splitting of .

At each step of the process, we have

From this follow the relations

Applying this relation recursively, starting from

up to

Now dene

) Gqv0

f 'd

G w

where, using of components

to denote a zero vector of dimension , and ,

to denote the vector

, it is found that

) Gqv0

s% BBD% G

 s h

ds 

 n

G a  dA  %   w `( @

f d s

f d

f d

   

sG

f d

    

BD

% (f'd a

1. For 2. For 3. 4. For 5. 6. EndDo 7. EndDo 8. EndDo

Do: and if

Do:

and for

Do:

) qGv0

 1 c#&) $
% (

8 $ $  H ~ $ H& 8  & 
3 %0`3


   S S  S  a %0 ` BDB% G  w   m %    S S   a % 7` G  G s% w  G s% BBD%   !    x ! 0 % ` % D%0 0 5"0 wEn   a % 7` 8


G

 BD

 DB

E  G8



f'd

f d

8   sDBD% %  C S %   

&

`




f  BD

 f d
s

8 ~


t"0  n)|w 

f 'd

  

Then,

with

then we obtain the factorization

Now consider a few practical aspects. An ILU factorization based on the form of Algorithm 10.1 is difcult to implement because at each step , all rows to are being modied. However, ILU factorizations depend on the implementation of Gaussian elimination which is used. Several variants of Gaussian elimination are known which depend on the order of the three loops associated with the control variables , , and in the algorithm. Thus, Algorithm 10.1 is derived from what is known as the variant. In the context of Incomplete LU factorization, the variant that is most commonly used for a row-contiguous data structure is the variant, described next for dense matrices.

The above algorithm is in place meaning that the -th row of can be overwritten by the -th rows of the and matrices of the factorization (since is unit lower triangular, its diagonal entries need not be stored). Each step of the algorithm generates the -th row

1. For 2. For 3. 4. For 5. 6. EndDo 7. EndDo 8. EndDo

Do:

Do: Do:

% % !

%% 0I

 q

 %



% %

S S S s%DBB% G  w   m G    S S DBB% G   % sDBB% H H %

f d

% % Q 0

 t"0 wh n  

f d

f d a

where proof.

is a nonnegative matrix,

is nonnegative. This completes the

f 'd

% x

DB

`

If

denotes the matrix

f qa d

BD

c BD ` 

f d

so that

can be rewritten as

a Y

Observe that at stage , elements are dropped only in the . Hence, the rst rows and columns of are zero and as a result

lower part of

o`  o` a f d s w d s
 f  BD
f 'd f d
s

  $ &
s

d  f


DB

8  &   p &8 c $  $ 8  1

f 
w

  DB

 DB

f'd

f 'd

of and the -th row of at the same time. The previous rows of are accessed at step but they are not modied. This is illustrated in Figure 10.1.

and

Accessed but not modified

Accessed and modified

Not accessed

IKJvariant of the LU factorization.

Adapting this version for sparse matrices is easy because the rows of and are generated in succession. These rows can be computed one at a time and accumulated in a row-oriented data structure such as the CSR format. This constitutes an important advantage. Based on this, the general ILU factorization takes the following form.

It is not difcult to see that this more practical IKJvariant of ILU is equivalent to the KIJversion which can be dened from Algorithm 10.1.

Let be a zero pattern satisfying the condition (10.11). Then the ILU factors produced by the KIJ-based Algorithm 10.1 and the IKJ-based Algorithm 10.3 are identical if they can both be computed.
Algorithm (10.3) is obtained from Algorithm 10.1 by switching the order of the loops and . To see that this gives indeed the same result, reformulate the rst two loops of Algorithm 10.1 as

 a %0 `

and for .

and if

1. For Do: 2. For 3. 4. For 5. 6. EndDo 7. EndDo 8. EndDo

Do:

, Do:

% BDB% H % G


! 0

 1 c#&) $

!  v  %  " %  

8 $ $  H ~ $ H& 8  & 


   S G S   S %BDB% m    G w  S  S a % 7` i BD B% G  % %BDB% H

D0 0  "0 wEn %!   

 5)0d

 5"0

8 ~

h n

  

p


in which ope(row(i),row(k)) is the operation represented by lines 3 through 6 of both Algorithm 10.1 and Algorithm 10.3. In this form, it is clear that the and loops can be safely permuted. Then the resulting algorithm can be reformulated to yield exactly Algorithm 10.3. Note that this is only true for a static pattern ILU. If the pattern is dynamically determined as the Gaussian elimination algorithm proceeds, then the patterns obtained with different versions of GE may be different. It is helpful to interpret the result of one incomplete elimination step. Denoting by , , and the -th rows of , , and , respectively, then the -loop starting at line 2 . Then, each of Algorithm 10.3 can be interpreted as follows. Initially, we have elimination step is an operation of the form

However, this operation is performed only on the nonzero pattern, i.e., the complement of . This means that, in reality, the elimination step takes the form

in which is zero when and equals when . Thus, the row cancels out the terms that would otherwise be introduced in the zero pattern. In the end the following relation is obtained:

The row contains the elements that fall inside the pattern at the completion of the -loop. Using the fact that , we obtain the relation,

Therefore, the following simple property can be stated.

Algorithm (10.3) produces factors

and

such that

in which is the matrix of the elements that are dropped during the incomplete elimination process. When , an entry of is equal to the value of obtained at

R6 v)

F v)

S 

5S

i5 S C 5   S

E 5 8S C f'd

S

S C

S 

5S

5 S

S S

a %  0 `

Note that

for

. We now sum up all the

s and dene

5S

5S

5S 

a % 7`

5 S 5   S

E 8

E 8

 % 5 8S

E

5   S 5 S  5 S

C w

S

  S 5 S

f'd  S

  5S

P5 S  5 S

  S  a 0 ` %


Do:

 aY 0 ` %

For For if

Do: Do: and for ope(row(i),row(k))

  $ &

8  &   p &8 c $  $ 8  1


a % 7`

t"0 

BBDBD u s% G G s% 

E n

 S

E  8

5S

5 S

E 8

5S

5S

The Incomplete LU factorization technique with no ll-in, denoted by ILU(0), consists of taking the zero pattern to be precisely the zero pattern of . In the following, we denote by the -th row of a given matrix , and by , the set of pairs such that .

The ILU(0) factorization for a ve-point matrix.

The incomplete factorization ILU(0) factorization is best illustrated by the case for which it was discovered originally, namely, for 5-point and 7-point matrices related to nite difference discretization of PDEs. Consider one such matrix as illustrated in the bottom left corner of Figure 10.2. The matrix represented in this gure is a 5-point matrix of size corresponding to an mesh. Consider now any lower triangular matrix which has the same structure as the lower part of , and any matrix which has the same structure as that of the upper part of . Two such matrices are shown at the top of Figure 10.2. If the product were performed, the resulting matrix would have the pattern shown in the bottom right part of the gure. It is impossible in general to match with this product for any and . This is due to the extra diagonals in the product, namely, the

3 0 3 %

% a % 7`

@ T P @ T P

a (`

W P0(T T P E

 p

U IH3 e

42 2 5336

t)0d 

the completion of the

loop in Algorithm 10.3. Otherwise,

 1 c#&) $
S C
is zero.

8 $ $  H ~ $ H& 8  & 

8 ~
E

 C S C 5 S

"

diagonals with offsets and . The entries in these extra diagonals are called ll-in elements. However, if these ll-in elements are ignored, then it is possible to nd and so that their product is equal to in the other diagonals. This denes the ILU(0) factorization in general terms: Any pair of matrices (unit lower triangular) and (upper triangular) so that the elements of are zero in the locations of . These constraints do not dene the ILU(0) factors uniquely since there are, in general, innitely many pairs of matrices and which satisfy these requirements. However, the standard ILU(0) is dened constructively using Algorithm 10.3 with the pattern equal to the zero pattern of .

1. For Do: 2. For 3. Compute 4. For 5. Compute 6. EndDo 7. EndDo 8. EndDo

and for

Do: , Do:

and for

In some cases, it is possible to write the ILU(0) factorization in the form

where and are the strict lower and strict upper triangular parts of , and is a certain diagonal matrix, different from the diagonal of , in general. In these cases it is sufcient to nd a recursive formula for determining the elements in . A clear advantage is that only an extra diagonal of storage is required. This form of the ILU(0) factorization is equivalent to the incomplete factorizations obtained from Algorithm 10.4 when the product of the strict-lower part and the strict-upper part of consists only of diagonal elements and ll-in elements. This is true, for example, for standard 5-point difference approximations to second order partial differential operators; see Exercise 4. In these instances, both the SSOR preconditioner with and the ILU(0) preconditioner can be cast in the form (10.16), but they differ in the way the diagonal matrix is dened. For SSOR( ), is the diagonal of the matrix itself. For ILU(0), it is dened by a recursion so that the diagonal of the product of matrices (10.16) equals the diagonal of . By denition, together the and matrices in ILU(0) have the same number of nonzero elements as the original matrix . Table 10.2 shows the results of applying the GMRES algorithm with ILU(0) preconditioning to the ve test problems described in Section 3.7.

I v)

&

a b`

% a 

  S S S abx` sD BB% G  m %   a %0 `     S G | S w a b` a %DBB% G     % 7` sDBB% H H %  x  "0 wh n  

f ` 'd a !

  $ &

8  &   p &8 c $  $ 8  1

G

w p

`

P
G

p


t"0#   

 5 3 X s r  BB'  5 X   5  3  3 d v t a ( Ba" d Y ' CBH$10    5    3  3 5 $1q  CSyR ' BBCd BCB9sB W  3 2    u    P    5   3 Y r  2Q5 3 U UQ5 U U U U U   BV  suQ5 s C'  CBq!BhBa Ba  P  X 3 3 g c  P   ca5  BGBQBqB  Bah!BGWBB6aQ3 H$13 0 $13Q5 3 Y X  P 3   3 6 3    P     3 Y  3  !BGY B   3 X      d  (9Y  W5  U UQ5 U U U U   s a5 V p UU U U U U UU U U U U U 5 U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U y X  3  RBCB' 5 '   c  5     d   B(Q3 Bh! BGY   3 s Br   X ' QBq B'     3 6  5  (a5 !BGWs ' QBiBd    3 Y r   X d       3 3 3 BBa Ba3 dQ3 0 '  5   3  y % CBq(Y 3 C' B(6 q(a!B3GBPa33Q5 3  P 3     5 5 ' 5  ' c   Y    d Baa3 CBBH7B 3 (qW5     5  5 3 X s d  3 r  '  $ 3  P       r  5   t y  3 Y X v t BERGBa" hBB   d   ! BGBB63 BCBQ5   3 Y  3    P 3 X    3 w   5 3 X 8 ! r  BWs pC' aBBs   X  5 3 X d  3 r'  Y a5 6 5      W('  y      d  5  R   6 8 ! r p (Rs eaC  tQ5 X    6 U UU U U U U UU U U U U U 5 U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U 8 @ !  3 2  $ 8 @ ! 3 A   01SDI)B@3 8  c GF!RDGF%T`8 A@ #)3 HD010 RaBP  $ 8 !  6 $ ' !  $ 8 @ ! 3 $   8  d 5  $ c  $  6 $  3 2  $  $ 3 (B H%H%T1S13 E10 )SR#!  h $ 3 $     

a  `

a ` Y

See Example 6.1 for the meaning of the column headers in the table. Observe that for the rst two problems, the gains compared with the performance of the SSOR preconditioner in Table 10.1 are rather small. For the other three problems, which are a little harder, the gains are more substantial. For the last problem, the algorithm achieves convergence in 205 steps whereas SSOR did not convergence in the 300 steps allowed. The fourth problem (F2DB) is still not solvable by ILU(0) within the maximum number of steps allowed.

For the purpose of illustration, below is a sample FORTRAN code for computing the incomplete and factors for general sparse matrices stored in the usual CSR format. The real values of the resulting factors are stored in the array luval, except that entries of ones of the main diagonal of the unit lower triangular matrix are not stored. Thus, one matrix is needed to store these factors together. This matrix is denoted by . Note that since the pattern of is identical with that of , the other integer arrays of the CSR representation for the LU factors are not needed. Thus, , which is the column position of the element in the input matrix, is also the column position of the element in the matrix. The code below assumes that the nonzero elements in the input matrix are sorted by increasing column numbers in each row.

t)0 
Matrix F2DA F3D ORS F2DB FID

  

 1 c#&) $

8 $ $  H ~ $ H& 8  & 

8 ~

ing.

A test run of GMRES with ILU(0) precondition-

Iters 28 17 20 300 206

FORTRAN CODE

a ` 

Kops 1456 4004 1228 15907 67970

Residual 0.12E-02 0.52E-03 0.18E+00 0.23E+02 0.19E+00

Error 0.12E-03 0.30E-03 0.67E-04 0.67E+00 0.11E-03

The accuracy of the ILU(0) incomplete factorization may be insufcient to yield an adequate rate of convergence as shown in Example 10.2. More accurate Incomplete LU factorizations are often more efcient as well as more reliable. These more accurate factoriza-

@ T P


R 9 cW gT T p VQb DT P U T H H

2 23 36

U 5 U U U 5 U U
A '

 

U 5 U U

U 5 U U


'

U 5 U U U 5 U U U 5 U U

' '

U 5 U U
'

5 5 5 5 5

dB     g  d 5 WCB   2  B(6W w B U U U U  U U U U U U U U U U U U U U U U    CB   d 5    GB U U U U  3 Y U U U U U U U U U U U U U U U U      5       5 I)3R 8 8  ! 3 0 ! c 0 $ ' 7`40 WA  ' d y  w  c  X        3 '   v U U U U U U U U U U U U U U U U 7BaqBaBqX U U U U 8 !  3 2  B 031 d D' 31 y 8 0 !  3 2 BWd % R &11EyRB4y g %Fy B3!   P 8 y y  y 8 0 !  3 2  y   c 0 X 0 8 g !  6 `F%    Y   9 a BBC BBt U U U U 3  P 3  d      6   U U U U U U U U U U U U U U U U ' BP  0 %B4"3!   8 y   y 0 X ' @ 0 A& 0       5 8 0 !  3 2  @   U 8 c 0 !  3 2  8 c !  3 2  &0 )BB93190318( %Fy 3!  y   c 0 X 8 8 0 0 ! 3 0 ! c  c &113RC0 ' U ' @ c 0 ! 48 AB3)3 x`A$GB03% &0 ' d  $ ' @ 8 c !  6 0    3    B(Y hB  U U U U 5 3    Y X  V U U U U U U U U U U U U U U U U   8 0 !  3 2 Q31 8 8 c !  6 ! 3 2 @ 8 0 !  3 2   GB03% #1  31a y c RB0  BBY ' QY (U U U U X    6        6 " U U U U U U U U U U U U U U U U BGg %#y B3!   P 8 y  P c 0 X d B9BC3 Qq U U U U  ' 5 3      Y     3  P  d X    U U U U U U U U U U U U U U U U 8 0 ! 3 3)0 B0 c ' 0 4& 0       5 0 8 8 0 ! 3 0 ! c 3)3R 0 `4& 0 ' d $ ' 0 ' 8 A@ g 13 C 0 U ' !  8 !  `g 13 C 40 '  $ G'  g d  q U U U U 6    3 U U U U U U U U U U U U U U U U      5 IR 8  ! c  $ '  `ABW' d      5 8 ! 3 8  !  3 2 I1C01 '48 A#13 E`' U ' @ !  $  C d 3QCBQCaBQhhBBi3 (  3 3  3 2  d  3  w  c  3 3 g c  w      U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U

U 5 U U

gQ6qBCB(W qB   5(B WC     3  2  6  w 3 d     g  d 5    GB  C   3 Y  d 5  B CaB hB B3 h P C    d 5  P    5  d        3 H13 0 1q  CBd  $ $  3 2    5  3  3 v t "        Y   ' QC9 a BBQ' C  3  P 3  d        6

BBQ5 d 5  5  6

  $ &

8  &   p &8 c $  $ 8  1

tions will differ from ILU(0) by allowing some ll-in. Thus, ILU(1) keeps the rst order ll-ins, a term which will be explained shortly. To illustrate ILU( ) with the same example as before, the ILU(1) factorization results from taking to be the zero pattern of the product of the factors obtained from ILU(0). This pattern is shown at the bottom right of Figure 10.2. Pretend that the original matrix has this augmented pattern . In other words, the ll-in positions created in this product belong to the augmented pattern , but their actual values are zero. The new pattern of the matrix is shown at the bottom left part of Figure 10.3. The factors and of the ILU(1) factorization are obtained by performing an ILU(0) factorization on this augmented pattern matrix. The patterns of and are illustrated at the top of Figure 10.3. The new LU matrix shown at the bottom right of the gure has now two additional diagonals in the lower and upper parts.

The ILU(1) factorization.

One problem with the construction dened in this illustration is that it does not extend to general sparse matrices. It can be generalized by introducing the concept of level of ll. A level of ll is attributed to each element that is processed by Gaussian elimination, and dropping will be based on the value of the level of ll. Algorithm 10.2 will be used as a model, although any other form of GE can be used. The rationale is that the level of ll should be indicative of the size: the higher the level, the smaller the elements. A very simple model is employed to justify the denition: A size of is attributed to any element whose level of ll is , where . Initially, a nonzero element has a level of ll of one

Augmented

 1 c#&) $

8 $ $  H ~ $ H& 8  & 
a D` f

x f

a b`

)0d 




8 ~


f

In the common denition used in the literature, all the levels of ll are actually shifted by from the denition used above. This is purely for convenience of notation and to conform with the denition used for ILU(0). Thus, initially if , and otherwise. Thereafter, dene recursively

The initial level of ll of an element

of a sparse matrix

Each time this element is modied in line 5 of Algorithm 10.2, its level of ll must be updated by

Observe that the level of ll of an element will never increase during the elimination. Thus, if in the original matrix , then the element in location will have a level of ll equal to zero throughout the elimination process. The above systematic denition gives rise to a natural strategy for discarding elements. In ILU , all ll-in elements whose level of ll does not exceed are kept. So using the denition of zero patterns introduced earlier, the zero pattern for ILU( ) is the set

is the level of ll value after all updates (10.18) have been performed. The case coincides with the ILU(0) factorization and is consistent with the earlier denition. In practical implementations of the ILU( ) factorization it is common to separate the symbolic phase (where the structure of the and factors are determined) from the numerical factorization, when the numerical values are computed. Here, a variant is described which does not separate these two phases. In the following description, denotes the -th row of the matrix , and the -th entry of .

1. For all nonzero elements 2. For Do:

dene

5S

a  S `

a %0 `

S

 x

! "

S

    "0 wh n

sDBB% H H %

where

v)

% 0

v( G

 w 

H 

a  `

% ( F3

dened by

Therefore, roughly speaking, the size of will be the maximum of the two sizes and , and it is natural to dene the new level of ll as,

is

 S

S

v( G

S 

6 6

v(  

 w 

E

S

&  S

 S a %07`

S % S

S

 

% S

&U   

% S

 SY

If is the current level of the element updated element should be

, then our model tells us that the size of the

P v)

(this will be changed later) and a zero element has a level of ll of updated in line 5 of Algorithm 10.2 by the formula

. An element

p
S
is

  n S
S

  $ &

8  &   p &8 c $  $ 8  1

S

& W

&

& '%

& W

& W

6 6  S  t t

& '%

& '%

S

S

S

S

S

 5"0

h

S

 S

S

S

v

There are a number of drawbacks to the above algorithm. First, the amount of ll-in and computational work for obtaining the ILU( ) factorization is not predictable for . Second, the cost of updating the levels can be quite high. Most importantly, the level of ll-in for indenite matrices may not be a good indicator of the size of the elements that are being dropped. Thus, the algorithm may drop large elements and result in an inaccurate incomplete factorization, in the sense that is not small. Experience reveals that on the average this will lead to a larger number of iterations to achieve convergence, although there are certainly instances where this is not the case. The techniques which will be described in Section 10.4 have been developed to remedy these three difculties, by producing incomplete factorizations with small error and a controlled number of ll-ins.

Matrix resulting from the discretization of an elliptic problem on a rectangle.

Often, the original matrix has a regular structure which can be exploited to formulate the ILU preconditioners in a simpler way. Historically, incomplete factorization preconditioners were developed rst for such matrices, rather than for general sparse matrices. Here, we call a regularly structured matrix a matrix consisting of a small number of diagonals. As an

H e CA@ Y

s y s

@ e Y CbDB e &!@ 8p3e a Y P 9 T H

h S

 S

a  ` S

3. For each and for 4. Compute 5. Compute . 6. Update the levels of ll of the nonzero 7. EndDo 8. Replace any element in row with 9. EndDo

s using (10.18) by zero

 1 c#&) $
Do:

8 $ $  H ~ $ H& 8  & 
 3 a S `
s

S

 h



BH P D! e

S 5S 5S   G  S  S G

% DBD% 
y

Y 9 gf

k y f y

 h

)0d 

2 2 3 36

8 ~

example, consider the diffusion-convection equation, with Dirichlet boundary conditions

where is simply a rectangle. As seen in Chapter 2, if the above problem is discretized using centered differences, a linear system is obtained whose coefcient matrix has the structure shown in Figure 10.4. In terms of the stencils seen in Chapter 4, the representation of this matrix is rather simple. Each row expresses the coupling between unknown and unknowns , which are in the horizontal, or direction, and the unknowns and which are in the vertical, or direction. This stencil is represented in Figure 10.5.

Stencil associated with the 5-point matrix shown

in Figure 10.4.

and factors of the ILU(0) factorization for the 5-point matrix shown in Figure 10.4. Now the respective stencils of these point as shown in Figure 10.7. and

matrices can be represented at a mesh

 h

 ""0#

The desired

and

matrices in the ILU(0) factorization are shown in Figure 10.6.

S

W c& U W

  $ & E S
y

GV w
h S S c

8  &   p &8 c $  $ 8  1


S

"0# 

 h

qG

w

Stencils associated with the shown in Figure 10.6.

and

factors

The stencil of the product can be obtained easily by manipulating stencils directly is obtained rather than working with the matrices they represent. Indeed, the -th row of by performing the following operation:

This translates into a combination of the stencils associated with the rows:

in which represents the stencil of the matrix based at the mesh point labeled . This gives the stencil for the matrix represented in Figure 10.8.

Stencil associated with the product of the factors shown in Figure 10.6.

and

In the gure, the ll-in elements are represented by squares and all other nonzero elements of the stencil are lled circles. The ILU(0) process consists of identifying with in s are nonzero. In the Gaussian eliminations process, this locations where the original is done from to . This provides the following equations obtained directly from comparing the stencils of LU and (going from lowest to highest indices)

` dS h

qa

S

` h dS  C

h S  1 c#&) $

S S S

d  h S S

8 $ $  H ~ $ H& 8  & 
f ` d S a
w

S S

f ` d S  C

S c

d h S S

d h S S

h S

`S

f d

`S  C

G a

f d S S

h S S

S

G a

)0d 

 )0d 

 a4$` 

8 ~
G

`S  C

`S

 p

Observe that the elements and are identical with the corresponding elements of the matrix. The other values are obtained from the following recurrence:

The above recurrence can be simplied further by making the observation that the quantities and need not be saved since they are scaled versions of the corresponding elements in . With this observation, only a recurrence for the diagonal elements is needed. This recurrence is:

with the convention that any with a non-positive index is replaced by and any other element with a negative index is zero. The factorization obtained takes the form

in which is the strict lower diagonal of , is the strict upper triangular part of , and is the diagonal obtained with the above recurrence. Note that an ILU(0) based on the IKJversion of Gaussian elimination would give the same result. For a general sparse matrix with irregular structure, one can also determine a preconditioner in the form (10.20) by requiring only that the diagonal elements of match those of (see Exercise 10). However, this will not give the same ILU factorization as the one based on the IKJvariant of Gaussian elimination seen earlier. Why the ILU(0) factorization gives rise to the same factorization as that of (10.20) is simple to understand: The product of and does not change the values of the existing elements in the upper part, except for the diagonal. This also can be interpreted on the adjacency graph of the matrix. This approach can now be extended to determine the ILU(1) factorization as well as factorizations with higher levels of ll. The stencils of the and matrices in the ILU(1) factorization are the stencils of the lower part and upper parts of the LU matrix obtained from ILU(0). These are shown in Figure 10.9. In the illustration, the meaning of a given stencil is not in the usual graph theory sense. Instead, all the marked nodes at a stencil based at node represent those nodes coupled with unknown by an equation. Thus, all the lled circles in the picture are adjacent to the central node. Proceeding as before and combining stencils to form the stencil associated with the LU matrix, we obtain the stencil shown in Figure 10.10.

' 0v)

) v)

% %

s DBD% G @

a 

S S

h S f

S

S S
y

f ` d a !

  $ &

d S0 % h S c S

S S S f'd y S S d h S

h S h S f  S S S S S S w f  d w S S

8  &   p &8 c $  $ 8  1
S c

S S

f d

`

S S ie iS

Sy

S

f d  S S

d S h S  0

Stencils associated with the and factors of the ILU(0) factorization for the matrix associated with the stencil of Figure 10.8.

and

Stencil associated with the product of the matrices whose stencils are shown in Figure 10.9.

As before, the ll-in elements are represented by squares and all other elements are lled circles. A typical row of the matrix associated with the above stencil has nine nonzero elements. Two of these are ll-ins, i.e., elements that fall outside the original structure of the and matrices. It is now possible to determine a recurrence relation for obtaining the entries of and . There are seven equations in all which, starting from the bottom, are

S

f d S S d  h  h S S

S 2 S

h S

f f S S w S S S S e S0 S w w w

 1 c#&) $

h S

f E S

8 $ $  H ~ $ H& 8  & 

E S c

f d  h S S d h S S

f d

h S f'd  f d h S S w h S f f S S w S S2 S S S S e S f'd w f'd w S 2 S S e S f d w f d  d h S0 S w  h S h S S

h  SY h S f'd h S S 

f d S S

f 'd S S

f 'd

 05)0d

t)0d 

h S

h S S

8 ~

This immediately yields the following recurrence relation for the entries of the factors:

and

In proceeding from the nodes of smallest index to those of largest index, we are in effect performing implicitly the IKJversion of Gaussian elimination. The result of the ILU(1) obtained in this manner is therefore identical with that obtained by using Algorithms 10.1 and 10.3.

In all the techniques thus far, the elements that were dropped out during the incomplete elimination process are simply discarded. There are also techniques which attempt to reduce the effect of dropping by compensating for the discarded entries. For example, a popular strategy is to add up all the elements that have been dropped at the completion of the -loop of Algorithm 10.3. Then this sum is subtracted from the diagonal entry in . This diagonal compensation strategy gives rise to the Modied ILU (MILU) factorization. Thus, in equation (10.14), the nal row obtained after completion of the -loop of Algorithm 10.3 undergoes one more modication, namely,

in which . Note that is a row and is the sum of the elements in this row, i.e., its row sum. The above equation can be rewritten in row form as and equation (10.15) becomes

Observe that

This establishes that . As a result, this strategy guarantees that the row sums of are equal to those of . For PDEs, the vector of all ones represents the discretization of a constant function. This additional constraint forces the ILU factorization to be exact for constant functions in some sense. Therefore, it is not surprising that often the algorithm does well for such problems. For other problems or problems with discontinuous coefcients, MILU algorithms usually are not better than their ILU counterparts, in general.

ui v)

5S

 

 S

W5 S S a 5 S C `

S Y S S fd S 0  f d  h S

5S

@ TP f

f 'd  S

@ T P

a 5S ` C

5S

h S f d S h S e f S  S !  S S S S y f'd S 2 S 2$` S d  f  h dS S S h S  c

  $ & C 5S
w

h S U f'd h  2 S f S S0 iS S

R H I5Pp PcXU f R

SS C SS

S a 5 S C `

8  &   p &8 c $  $ 8  1
5  S 5S

S 

2 3236

5 S

  S

G G G % a BDB% % `

S 

5S

S a 5 S C ` 5 S

For regularly structured matrices there are two elements dropped at the -th step of ILU(0). These are and located on the north-west and southeast corners of the stencil, respectively. Thus, the row sum associated with step is

and the MILU variant of the recurrence (10.19) is

whose row sum is zero. This generic idea of lumping together all the elements dropped in the elimination process and adding them to the diagonal of can be used for any form of ILU factorization. In addition, there are variants of diagonal compensation in which only a fraction of the dropped elements are added to the diagonal. Thus, the term in the above example would be replaced by before being added to , where is typically between 0 and 1. Other strategies distribute the sum among nonzero elements of and , other than the diagonal.

Incomplete factorizations which rely on the levels of ll are blind to numerical values because elements that are dropped depend only on the structure of . This can cause some difculties for realistic problems that arise in many applications. A few alternative methods are available which are based on dropping elements in the Gaussian elimination process according to their magnitude rather than their locations. With these techniques, the zero pattern is determined dynamically. The simplest way to obtain an incomplete factorization of this type is to take a sparse direct solver and modify it by adding lines of code which will ignore small elements. However, most direct solvers have a complex implementation which involves several layers of data structures that may make this approach ineffective. It is desirable to develop a strategy which is more akin to the ILU(0) approach. This section describes one such technique.

The new ILU factorization is now such that the -th row of the new remainder matrix is given by

in which according to (10.21)

 1 c#&) $
5 S

8 $ $  H ~ $ H& 8  & 
S S 5S d f'd S h S f'd S h c w h  S S d f d S h S S y S S S S f'd d S h S f'd h S S h S w d d

S a 5 S C `

d  h S S

SS

"g

u Hom u " %HU u

f d

5 8S uvt s C

h S S

0 S iS

 S

8 ~

"0#    


A generic ILU algorithm with threshold can be derived from the IKJversion of Gaussian elimination, Algorithm 10.2, by including a set of rules for dropping small elements. In what follows, applying a dropping rule to an element will only mean replacing the element by zero if it satises a set of criteria. A dropping rule can be applied to a whole row by applying the same rule to all the elements of the row. In the following algorithm, is a full-length working row which is used to accumulate linear combinations of sparse rows in the elimination and is the -th entry of this row. As usual, denotes the -th row of .

Now consider the operations involved in the above algorithm. Line 7 is a sparse update operation. A common implementation of this is to use a full vector for and a companion pointer which points to the positions of its nonzero elements. Similarly, lines 11 and 12 are sparse-vector copy operations. The vector is lled with a few nonzero elements after the completion of each outer loop , and therefore it is necessary to zero out those elements at the end of the Gaussian elimination loop as is done in line 13. This is a sparse set-to-zero operation. ILU(0) can be viewed as a particular case of the above algorithm. The dropping rule for ILU(0) is to drop elements that are in positions not belonging to the original structure of the matrix. In the factorization ILUT( ), the following rule is used. In line 5, an element is dropped (i.e., replaced by zero) if it is less than the relative tolerance obtained by multiplying by the original norm of the -th row (e.g., the 2-norm). In line 10, a dropping rule of a different type is applied. First, drop again any element in the row with a magnitude that is below the relative tolerance . Then,

1. For Do: 2. 3. For and when 4. 5. Apply a dropping rule to 6. If then 7. 8. EndIf 9. EndDo 10. Apply a dropping rule to row 11. for 12. for 13. 14. EndDo

Do:




5S

a 9 e ! U 3

9 C@ T PdAa Y Y H

8 k10 

 G 

$     c  " Bc

sDBD% m % % DBD% G m

5         E G      G   DBB%  % 5S sDBB% G H % 

6 2 36 2
%

p v 8 8


 ""0 wh n  

   

gc 8 ~ ~ E C    CS S  


 !

 

keep only the largest elements in the part of the row and the largest elements in the part of the row in addition to the diagonal element, which is always kept. The goal of the second dropping step is to control the number of elements per row. Roughly speaking, can be viewed as a parameter that helps control memory usage, while helps to reduce computational cost. There are several possible variations on the implementation of dropping step 2. For example we can keep a number of elements equal to in the upper part and in the lower part of the row, where and are the number of nonzero elements in the part and the part of the -th row of , respectively. This variant is adopted in the ILUT code used in the examples. Note that no pivoting is performed. Partial (column) pivoting may be incorporated at little extra cost and will be discussed later. It is also possible to combine ILUT with one of the many standard reorderings, such as the ordering and the nested dissection ordering, or the reverse Cuthill-McKee ordering. Reordering in the context of incomplete factorizations can also be helpful for improving robustness, provided enough accuracy is used. For example, when a red-black ordering is used, ILU(0) may lead to poor performance compared with the natural ordering ILU(0). On the other hand, if ILUT is used by allowing gradually more ll-in, then the performance starts improving again. In fact, in some examples, the performance of ILUT for the red-black ordering eventually outperforms that of ILUT for the natural ordering using the same parameters and .

Existence theorems for the ILUT factorization are similar to those of other incomplete factorizations. If the diagonal elements of the original matrix are positive while the offdiagonal elements are negative, then under certain conditions of diagonal dominance the matrices generated during the elimination will have the same property. If the original matrix is diagonally dominant, then the transformed matrices will also have the property of being diagonally dominant under certain conditions. These properties are analyzed in detail in this section. The row vector resulting from line 4 of Algorithm 10.6 will be denoted by . Note that for . Lines 3 to 10 in the algorithm involve a sequence of operations of the form if else: small enough set

for , in which initially and where is an element subtracted from a ll-in element which is being dropped. It should be equal either to zero (no dropping) or to when the element is being dropped. At the end of the -th step of Gaussian elimination (outer loop in Algorithm 10.6), we obtain the -th row of ,

) GG v0

) GG v0

) 6 v0

f 5 S 

a 7` a`

a`

sDBD%C G %

 S

 1 c#&) $
w

8 $ $  H ~ $ H& 8  & 

E
BP rQB

5 S S 5 S C f  S    S  S G C  C 5 S 5 fS i%DBB% G   m  S C  C   S  C S  f  C S | S  S

SW T 9

C f'd

4 2 36 2



  S   S

a`

  E

8 ~

f  S 



and the following relation is satised:

where is the row containing all the ll-ins. The existence result which will be proved is valid only for certain modications of strategy. We consider an ILUT strategy which uses the following the basic ILUT modication: Drop Strategy Modication. For any , let be the element of largest modulus among the elements , in the original matrix. Then elements generated in position during the ILUT procedure are not subject to the dropping rule.

This modication prevents elements generated in position from ever being dropped. Of course, there are many alternative strategies that can lead to the same effect. satisfy the following three conditions: A matrix whose entries and

for

will be referred to as an matrix. The third condition is a requirement that there be at least one nonzero element to the right of the diagonal element, in each row except the last. The row sum for the -th row is dened by

A given row of an matrix is diagonally dominant, if its row sum is nonnegative. An matrix is said to be diagonally dominant if all its rows are diagonally dominant. The following theorem is an existence result for ILUT. The underlying assumption is that an ILUT strategy is used with the modication mentioned above.

If the matrix is a diagonally dominant matrix, then the rows dened by (10.23) starting with and satisfy the following relations for

when

and

The result can be proved by induction on . It is trivially true for that the relation (10.28) is satised, start from the relation

. To prove

) v) ' v) v)

5 S

5 C f S E 5 C S

 s s

C S

5S

for for

and

P W) ) GI v0 F v)

 p
E a S 0 ` %

6 C S sBBD% 

s s

% 5 S

P5 C   S 5 C S 

`3 G sDBD% G %0 % `3 G

C w



S

C  

5 S

8 k10 
C
f S 


C a 5 d S `0 f 

a S %0 ` C %  S

a 5 C S ` C

$     c  " Bc

U5 C S
%

f 5 S 

DBD% G %

S

p v 8 8
SS EC   a 5 S ` C 3 S  E

f S  '  S  3 sS  2 S 2 S

G % % % % H E "  

a  Y % "`

BDB

gc 8 ~ ~

 % 5 C S n)|w

5 S



in which . Either is zero which yields , or is nonzero which means that is being dropped, i.e., replaced by zero, and therefore . This establishes (10.28). Note that by this argument except when again the -th element in the row is dropped, in which case and . Therefore, , always. Moreover, when an element in position is not dropped, then

and in particular by the rule in the modication of the basic scheme described above, for , we will always have for ,

in which is dened in the statement of the modication. . We have Consider the row sum of

Note that the inequalities in (10.35) are true because is never dropped by assumption and, as a result, (10.31) applies. By the condition (10.27), which denes matrices, is positive for . Clearly, when , we have by (10.34) . This completes the proof. The theorem does not mean that the factorization is effective only when its conditions are satised. In practice, the preconditioner is efcient under fairly general conditions.

A poor implementation of ILUT may well lead to an expensive factorization phase, and possibly an impractical algorithm. The following is a list of the potential difculties that may cause inefciencies in the implementation of ILUT. Generation of the linear combination of rows of (Line 7 in Algorithm 10.6). Selection of the largest elements in and . Need to access the elements of in increasing order of columns (in line 3 of Algorithm 10.6).

 S

GI v0 ) GF v0 ) ) 6 v0

6C

 s

f  S 

which establishes (10.29) for . It remains to prove (10.30). From (10.29) we have, for

GG v0 ) ) GG v0

) qW v0

 S

C S S   S   S C   C

3  S E

a 07` %

3 f S 
s

a 5 S ` C C

B P 9 H T SQY SR rbQ3IhID f P W UP Y 9 Y W H f H T

 1 c#&) $

a 5 C  ` CC

 S

8 $ $  H ~ $ H& 8  & 
a 5

Cf

6C

C `

 S r 6  C S W 6 C DB  6 C

 

 S

3 

C   S C S 

6C

 S

f  S 

 S a

 S a

 S

3 f 

6C

S

 S

 fS 6C   C f f   S    s    f  S S  G  w C 5 S `0 C C  5 S `0 C C  C 5 S `0  C a f 5  S ` C f 5 S 

f  S 

f  S

2 2 36

3 

C

% E

3  S E C 3 f S 

8 ~
E

3 g S



For (1), the usual technique is to generate a full row and accumulate the linear combination of the previous rows in it. The row is zeroed again after the whole loop is nished using a sparse set-to-zero operation. A variation on this technique uses only a full integer array , the values of which are zero except when there is a nonzero element. With this full row, a short real vector must be maintained which contains the real values of the row, as well as a corresponding short integer array which points to the column position of the real values in the row. When a nonzero element resides is set to the address in where the nonzero in position of the row, then element is stored. Thus, points to , and points to and . This is illustrated in Figure 10.11.

: nonzero indicator

: pointer to nonzero elements

: real values Illustration of data structure used for the work-

ing row in ILUT. Note that holds the information on the row consisting of both the part and the part of the LU factorization. When the linear combinations of the rows are performed, rst determine the pivot. Then, unless it is small enough to be dropped according to the dropping rule being used, proceed with the elimination. If a new element in the linear combination is not a ll-in, i.e., if , then update the real value . If it is a ll-in ( ), then append an element to the arrays and update accordingly. For (2), the natural technique is to employ a heap-sort strategy. The cost of this implementation would be , i.e., for the heap construction and for each extraction. Another implementation is to use a modied quick-sort strategy based on the fact that sorting the array is not necessary. Only the largest elements must be extracted. This is a quick-split technique to distinguish it from the full quick-sort. The method consists of choosing an element, e.g., , in the array , then permuting the data so that if and if , where is some split point. If , then exit. Otherwise, split one of the left or right sub-arrays recursively, depending on whether is smaller or larger than . The cost of this strategy on the average is . The savings relative to the simpler bubble sort or insertion sort schemes are small for small values of , but they become rather signicant for large and . The next implementation difculty is that the elements in the part of the row being built are not in an increasing order of columns. Since these elements must be accessed from left to right in the elimination process, all elements in the row after those already elimi-

p 
)

a  Y` 

a 

a  `  C

)h a

a  ` 

` 

 % 

 G `

 % 

a `

 ` a 4h  G a ` 

o`

a `

8 k10 
@ a  C

a ` C 

$     c  " Bc
)

a ` C

  w o`

p v 8 8
G `  U

a  ` 

 D5"0#

)h 3 Y`  a 

x`

gc 8 ~ ~
C E

a ` C

 a

` C

nated must be scanned. The one with smallest column number is then picked as the next element to eliminate. This operation can be efciently organized as a binary search tree which allows easy insertions and searches. However, this improvement is rather complex to implement and is likely to yield moderate gains. Tables 10.3 and 10.4 show the results of applying GMRES(10) preconditioned with ILUT(1 ) and ILUT(5 ), respectively, to the ve test problems described in Section 3.7. See Example 6.1 for the meaning of the column headers in the table. As shown, all linear systems are now solved in a relatively small number of iterations, with the exception of F2DB which still takes 130 steps to converge with ll = 1 (but only 10 with ll = 5.) In addition, observe a marked improvement in the operation count and error norms. Note that the operation counts shown in the column Kops do not account for the operations required in the set-up phase to build the preconditioners. For large values of ll , this may be large. Matrix F2DA F3D ORS F2DB FID Iters 18 14 6 130 59 Kops 964 3414 341 7167 19112 Residual 0.47E-03 0.11E-02 0.13E+00 0.45E-02 0.19E+00

ditioning.

If the total time to solve one linear system with is considered, a typical curve of the total time required to solve a linear system when the ll parameter varies would look like the plot shown in Figure 10.12. As ll increases, a critical value is reached where the preprocessing time and the iteration time are equal. Beyond this critical point, the preprocessing time dominates the total time. If there are several linear systems to solve with the same matrix , then it is advantageous to use a more accurate factorization, since the cost of the factorization will be amortized. Otherwise, a smaller value of ll will be more efcient. Matrix F2DA F3D ORS F2DB FID

Iters 7 9 4 10 40

Kops 478 2855 270 724 14862

Residual 0.13E-02 0.58E-03 0.92E-01 0.62E-03 0.11E+00

Error 0.90E-04 0.35E-03 0.43E-04 0.26E-03 0.11E-03

ditioning.

A test run of GMRES(10)-ILUT(5

A test run of GMRES(10)-ILUT(1

 1 c#&) $
Error 0.41E-04 0.39E-03 0.60E-04 0.51E-03 0.11E-03 ) precon) precon-

8 $ $  H ~ $ H& 8  & 
d E G
%

)0    

)0    

8 ~

"0#    

pp

12. 10.

C P U T i m e

8.0 6.0 4.0 2.0 0. 3.0 5.0 7.0 9.0 11. 13. 15. level of ll-in

Typical CPU time as a function of ll The dashed line is the ILUT time, the dotted line is the GMRES time, and the solid line shows the total.

The ILUT approach may fail for many of the matrices that arise from real applications, for one of the following reasons.

The ILUT procedure encounters a zero pivot; The ILUT procedure encounters an overow or underow condition, because of an exponential growth of the entries of the factors; The ILUT preconditioner terminates normally but the incomplete factorization preconditioner which is computed is unstable.

An unstable ILU factorization is one for which has a very large norm leading to poor convergence or divergence of the outer iteration. The case (1) can be overcome to a certain degree by assigning an arbitrary nonzero value to a zero diagonal element that is encountered. Clearly, this is not a satisfactory remedy because of the loss in accuracy in the preconditioner. The ideal solution in this case is to use pivoting. However, a form of pivoting is desired which leads to an algorithm with similar cost and complexity to ILUT. Because of the data structure used in ILUT, row pivoting is not practical. Instead, column pivoting can be implemented rather easily. Here are a few of the features that characterize the new algorithm which is termed ILUTP (P stands for pivoting). ILUTP uses a permutation array to hold the new orderings of the variables, along with the reverse permutation array. At step of the elimination process the largest entry in a row is selected and is dened to be the new -th variable. The two permutation arrays are then updated accordingly. The matrix elements of and are kept in their original numbering. However, when expanding the - row which corresponds to the -th outer step of Gaussian elimination, the elements are loaded with respect to the new labeling, using the array for the translation. At the end of the process, there are two options. The rst is to leave all elements labeled with respect

C

f d

a 9 e ! !U 3

f d

# 9

C

f d

Y @ T PdHcbY a

8 k10 

$     c  " Bc

p v 8 8
2 36 2

 75"0#

gc 8 ~ ~

!  

to the original labeling. No additional work is required since the variables are already in this form in the algorithm, but the variables must then be permuted at each preconditioning step. The second solution is to apply the permutation to all elements of as well as . This does not require applying a permutation at each step, but rather produces a permuted solution which must be permuted back at the end of the iteration phase. The complexity of the ILUTP procedure is virtually identical to that of ILUT. A few additional options can be provided. A tolerance parameter called may be included to help determine is candidate for a perwhether or not to permute variables: A nondiagonal element mutation only when . Furthermore, pivoting may be restricted to take place only within diagonal blocks of a xed size. The size of these blocks must be provided. A value of indicates that there are no restrictions on the pivoting. For difcult matrices, the following strategy seems to work well: Always apply a scaling to all the rows (or columns) e.g., so that their -norms are all equal to 1; then apply a scaling of the columns (or rows). Use a small drop tolerance (e.g., or . Take a large ll-in parameter (e.g., ). Do not take a small value for . Reasonable values are between and , with being the best in many cases. Take unless there are reasons why a given block size is justiable. E

Table 10.5 shows the results of applying the GMRES algorithm with ) preconditioning to the ve test problems described in Section 3.7. The ILUTP(1 permtol parameter is set to 1.0 in this case.

Matrix F2DA F3D ORS F2DB FID

Iters 18 14 6 130 50

Kops 964 3414 341 7167 16224

Residual 0.47E-03 0.11E-02 0.13E+00 0.45E-02 0.17E+00

Error 0.41E-04 0.39E-03 0.61E-04 0.51E-03 0.18E-03

A test run of GMRES with ILUTP(1) precondi-

tioning.

See Example 6.1 for the meaning of the column headers in the table. The results are identical with those of ILUT(1 ) shown in Table 10.3, for the rst four problems, but there is an improvement for the fth problem.

The ILU preconditioners discussed so far are based mainly on the the IKJvariant of Gaussian elimination. Different types of ILUs can be derived using other forms of Gaussian

S  Y

a d

 1 c#&) $

a 9 e ! !U 3

8 $ $  H ~ $ H& 8  & 
C

E H

9 C@ T PdHca Y B

 C G

S S

2 36 2

 S

 f

)0    

G 


E

8 ~

G d % E "0#    

E E



elimination. The main motivation for the version to be described next is that ILUT does not take advantage of symmetry. If is symmetric, then the resulting is nonsymmetric in general. Another motivation is that in many applications including computational uid dynamics and structural engineering, the resulting matrices are stored in a sparse skyline (SSK) format rather than the standard Compressed Sparse Row format. sparse column

Illustration of the sparse skyline format.

In this format, the matrix

is decomposed as

in which is a diagonal of and are strictly lower triangular matrices. Then a sparse representation of and is used in which, typically, and are stored in the CSR format and is stored separately. Incomplete Factorization techniques may be developed for matrices in this format without having to convert them into the CSR format. Two notable advantages of this approach are (1) the savings in storage for structurally symmetric matrices, and (2) the fact that the algorithm gives a symmetric preconditioner when the original matrix is symmetric. Consider the sequence of matrices

in which

v) P v)

P

E

7 f

is already available, then the LDU factorization of

 f 'd  f'd     'd  f'd   P  f



   

E

  

where

. If

is nonsingular and its LDU factorization

is

% A

 

 7

sparse row

p


8 k10 

f

% 

$     c  " Bc f 

p v 8 8


5"0# 

gc 8 ~ ~ s

Hence, the last row/column pairs of the factorization can be obtained by solving two unit lower triangular systems and computing a scaled dot product. This can be exploited for sparse matrices provided an appropriate data structure is used to take advantage of the sparsity of the matrices , as well as the vectors , , , and . A convenient data structure for this is to store the rows/columns pairs as a single row in sparse mode. All these pairs are stored in sequence. The diagonal elements are stored separately. This is called the Unsymmetric Sparse Skyline (USS) format. Each step of the ILU factorization based on this approach will consist of two approximate sparse linear system solutions and a sparse dot product. The question that arises is: How can a sparse triangular system be solved inexpensively? It would seem natural to solve the triangular systems (10.37) and (10.38) exactly and then drop small terms at the end, using a numerical dropping strategy. However, the total cost of computing the ILU factorization with this strategy would be operations at least, which is not acceptable for very large problems. Since only an approximate solution is required, the rst idea that comes to mind is the truncated Neumann series,

in which . In fact, by analogy with ILU( ), it is interesting to note that the powers of will also tend to become smaller as increases. A close look at the structure of shows that there is indeed a strong relation between this approach and ILU( ) in the symmetric case. Now we make another important observation, namely, that the vector can be computed in sparse-sparse mode, i.e., in terms of operations involving products of sparse matrices by sparse vectors. Without exploiting this, the total cost would still be . When multiplying a sparse matrix by a sparse vector , the operation can best be done by accumulating the linear combinations of the columns of . A sketch of the resulting ILUS algorithm is as follows.

1. Set , 2. For Do: 3. Compute by (10.40) in sparse-sparse mode 4. Compute in a similar way 5. Apply numerical dropping to and 6. Compute via (10.39) 7. EndDo

If there are only nonzero components in the vector and an average of nonzero elements per column, then the total cost per step will be on the average. Note that the computation of via (10.39) involves the inner product of two sparse vectors which is often implemented by expanding one of the vectors into a full vector and computing the inner product of a sparse vector by this full vector. As mentioned before, in the symmetric case ILUS yields the Incomplete Cholesky factorization. Here, the work can be halved since the generation of is not necessary.

) G' v0 ) ) G(6 v0


 a 

P

 1 c#&) $

BD w 

8 $ $  H ~ $ H& 8  &  P 7 7
 
!

 %  

P

w 8

 

` f'd 

 Ui

f | 'd


!

  

 !    ! 8 f   d w P

ff

%Bf DB% G f

    "0 wEn



 P

8 ~


o`

 

o`

 

p
@
!

Also note that a simple iterative procedure such as MR or GMRES(m) can be used to solve the triangular systems in sparse-sparse mode. Similar techniques will be seen in Section 10.5. Experience shows that these alternatives are not much better than the Neumann series approach [53].

The Incomplete LU factorization techniques were developed originally for -matrices which arise from the discretization of Partial Differential Equations of elliptic type, usually in one variable. For the common situation where is indenite, standard ILU factorizations may face several difculties, and the best known is the fatal breakdown due to the encounter of a zero pivot. However, there are other problems that are just as serious. Consider an incomplete factorization of the form

where is the error. The preconditioned matrices associated with the different forms of preconditioning are similar to

What is sometimes missed is the fact that the error matrix in (10.41) is not as important as the preconditioned error matrix shown in (10.42) above. When the matrix is diagonally dominant, then and are well conditioned, and the size of remains conned within reasonable limits, typically with a nice clustering of its eigenvalues around the origin. On the other hand, when the original matrix is not diagonally dominant, or may have very large norms, causing the error to be very large and thus adding large perturbations to the identity matrix. It can be observed experimentally that ILU preconditioners can be very poor in these situations which often arise when the matrices are indenite, or have large nonsymmetric parts. One possible remedy is to try to nd a preconditioner that does not require solving a linear system. For example, the original system can be preconditioned by a matrix which is a direct approximation to the inverse of .

A simple technique for nding approximate inverses of arbitrary sparse matrices is to attempt to nd a sparse matrix which minimizes the Frobenius norm of the residual matrix ,

uW6 v)

G6 v)

G6 v)

f d

f ! d

H B IIe 9

f 'd

CbQ9gf P e Y 9 U IIIb 5dca Y W b9g5P H B e H WP H 8 P Y f

f ! 'd

v " nfhtvgP!U x#uvF V

f Rd

8 &a `

  $ &
! d f

8    %YY" $ "p r $  8
w 

f d

w 8

f Pf'd ud

f ! d

U 3 e

6 2 36 2

f d

f d


8

A matrix whose value is small would be a right-approximate inverse of . Similarly, a left-approximate inverse can be dened by using the objective function

In the following, only (10.43) and(10.45) are considered. The case (10.44) is very similar to the right preconditioner case (10.43). The objective function (10.43) decouples into the sum of the squares of the 2-norms of the individual columns of the residual matrix ,

in which and are the -th columns of the identity matrix and of the matrix , respectively. There are two different ways to proceed in order to minimize (10.46). The function (10.43) can be minimized globally as a function of the sparse matrix , e.g., by a gradient-type method. Alternatively, the individual functions

can be minimized. The second approach is appealing for parallel computers, although there is also parallelism to be exploited in the rst approach. These two approaches will be discussed in turn.

The global iteration approach consists of treating as an unknown sparse matrix and using a descent-type method to minimize the objective function (10.43). This function is a matrices, viewed as objects in . The proper quadratic function on the space of inner product on the space of matrices, to which the squared norm (10.46) is associated, is

In the following, an array representation of an vector means the matrix whose column vectors are the successive -vectors of . In a descent algorithm, a new iterate is dened by taking a step along a selected direction , i.e.,

in which is selected to minimize the objective function . From results seen in Chapter 5, minimizing the residual norm is equivalent to imposing the condition that be orthogonal to with respect to the inner product. Thus, the optimal is given by

' ) G(6 v0

) GF 6 v0

) GI 6 v0

) GP 6 v0

6 v0 )

ut vs

 U 

s% BBD% H % G m

`

a a  ` `  a ` 

W UP Y 9 e P T 9 XrbIH Y hSI U T 8

t  w uvs ut Gs a u4 `  %



8 #a `

U  &a x` 

fV % fx %

4 2 36 2

Finally, a left-right pair

can be sought to minimize

) 6 6 v0

 1 c#&) $

8 $ $  H ~ $ H& 8  &   8 m 8




8 ~

 g

 p
8

The denominator may be computed as . The resulting matrix will tend to become denser after each descent step and it is therefore essential to apply a numerical dropping strategy to the resulting . However, the descent property of the step is now lost, i.e., it is no longer guaranteed that . An alternative would be to apply numerical dropping to the direction of search before taking the descent step. In this case, the amount of ll-in in the matrix cannot be controlled. The simplest choice for the descent direction is to take it to be equal to the residual , where is the new iterate. Except for the numerical dropping step, matrix the corresponding descent algorithm is nothing but the Minimal Residual (MR) algorithm, seen in Section 5.3.2, on the linear system . The global Minimal Residual algorithm will have the following form.

A second choice is to take to be equal to the direction of steepest descent, i.e., the direction opposite to the gradient of the function (10.43) with respect to . If all vectors as represented as 2-dimensional arrays, then the gradient can be viewed as a matrix , which satises the following relation for small perturbations ,

This provides a way of expressing the gradient as an operator on arrays, rather than vectors.

is the matrix

#% # w ! x H ! ! % a ! a ! ! ($x` ($x`  w a $ `  H a ! ! ! (!$x` a $` g$ w a $`  ! ! @a $` a $`  ` ` a  ` `  8 ` 8 8 ` a a ! w ` 8 `  a ` Ha

aa ! w

`

in which

is the residual matrix


For any matrix

we have

The array representation of the gradient of

with respect to

) F v)

a u!u! `

! V %

g 8 H z

` a !

1. Select an initial 2. Until convergence Do: 3. Compute and 4. Compute 5. Compute 6. Apply numerical dropping to 7. EndDo


pp

 !  " 0 0

"

  $ &

% & 0 %

`

3Da

8    %YY" $ "p r $  8 vu

 r w `   a      8

ut Gs

' %  w

  "0 wh n  

"0 

f 8

E n



Comparing this with (10.50) yields the desired result. Thus, the steepest descent algorithm will consist of replacing in line 3 of Algorithm 10.8 by . As is expected with steepest descent techniques, the algorithm can be quite slow.

In either steepest descent or minimal residual, the matrix must be stored explicitly. The scalars and needed to obtain in these algorithms can be computed from the successive columns of , which can be generated, used, and discarded. As a result, the matrix need not be stored.

Column-oriented algorithms consist of minimizing the individual objective functions (10.47) separately. Each minimization can be performed by taking a sparse initial guess and solving approximately the parallel linear subproblems

with a few steps of a nonsymmetric descent-type method, such as MR or GMRES. If these linear systems were solved (approximately) without taking advantage of sparsity, the cost of constructing the preconditioner would be of order . That is because each of the columns would require operations. Such a cost would become unacceptable for large linear systems. To avoid this, the iterations must be performed in sparse-sparse mode, a and the subsequent term which was already introduced in Section 10.4.5. The column iterates in the MR algorithm must be stored and operated on as sparse vectors. The Arnoldi basis in the GMRES algorithm are now to be kept in sparse format. Inner products and vector updates involve pairs of sparse vectors. In the following MR algorithm, iterations are used to solve (10.51) approximately for each column, giving an approximation to the -th column of the inverse of . Each initial is taken from the columns of an initial guess, .

) qWF v0

B f a P e VcbY CXU 8 S9 Ib3I5Ghf @ qC T R H Y W HP e UE W T U

 w vur v # 8 g  0     !" 0 0 " 2 010 %  t"0 wEn 8 ` 3 3#


1. Select an initial 2. Until convergence Do: 3. Compute , and 4. Compute 5. Compute 6. Apply numerical dropping to 7. EndDo

 1 c#&) $


s% DBD%

8 $ $  H ~ $ H& 8  & 

% 

 `  a

a `

2 36 2

v B

8 ~

The algorithm computes the current residual and then minimizes the residual norm , with respect to . The resulting column is then pruned by applying the numerical dropping step in line 8. In the sparse implementation of MR and GMRES, the matrix-vector product, SAXPY, and dot product kernels now all involve sparse vectors. The matrix-vector product is much more efcient if the sparse matrix is stored by columns, since all the entries do not need to be traversed. Efcient codes for all these kernels may be constructed which utilize a full -length work vector. Columns from an initial guess for the approximate inverse are used as the initial guesses for the iterative solution of the linear subproblems. There are two obvious choices: and . The scale factor is chosen to minimize the norm of . where is either the identity or . The Thus, the initial guess is of the form optimal can be computed using the formula (10.49), in which is to be replaced by the identity, so . The identity initial guess is less expensive to use but is sometimes a much better initial guess. For this choice, the initial preconditioned system is SPD. The linear systems needed to solve when generating each column of the approximate inverse may themselves be preconditioned with the most recent version of the preconditioning matrix . Thus, each system (10.51) for approximating column may be preconditioned with where the rst columns of are the that already have been computed, , and the remaining columns are the initial guesses for the , . Thus, outer iterations can be dened which sweep over the matrix, as well as inner iterations which compute each column. At each outer iteration, the initial guess for each column is taken to be the previous result for that column.

The rst theoretical question which arises is whether or not the approximate inverses obtained by the approximations described earlier can be singular. It cannot be proved that is nonsingular unless the approximation is accurate enough. This requirement may be in conict with the requirement of keeping the approximation sparse.

 

B XU P Q9 IS5QB XC SI P Y 3XUpHca Y W Y e H RP W U T 9 H e

 C

1. Start: set 2. For each column Do: 3. Dene 4. For Do: 5. 6. 7. 8. Apply numerical dropping to 9. EndDo 10. EndDo

D%0  !

  $ &

% ' 0 ! 0

8    %YY" $ "p r $  8

0%
G 


a x` x` 

|

 C   @ C w  A   E C @ 8 E @ 8      G  S s% BBD%  C sDBB% G  n  %  (


 2    I!"2 05"0 wh n


a  `


2 36 2

 

  

 3

Da  C

 w 

o(f  `

8 

 3

U

proximate inverse

Assume that is nonsingular and that the residual of the apsatises the relation

where

is any consistent matrix norm. Then

is nonsingular.

The result follows immediately from the equality

The result is true in particular for the Frobenius norm which is consistent (see Chapter 1). It may sometimes be the case that is poorly balanced and as a result can be large. Then balancing can yield a smaller norm and possibly a less restrictive condition for the nonsingularity of . It is easy to extend the previous result as follows. If is exist such that nonsingular and two nonsingular diagonal matrices

where is any consistent matrix norm, then is nonsingular. Each column is obtained independently by requiring a condition on the residual norm of the form

for some vector norm . From a practical point of view the 2-norm is preferable since it is related to the objective function which is used, namely, the Frobenius norm of the residual . However, the 1-norm is of particular interest since it leads to a number of simple theoretical results. In the following, it is assumed that a condition of the form

is required for each column. The above proposition does not reveal anything about the degree of sparsity of the resulting approximate inverse . It may well be the case that in order to guarantee nonsingularity, must be dense, or nearly dense. In fact, in the particular case where the norm in the proposition is the 1-norm, it is known that the approximate inverse may be structurally dense, in that it is always possible to nd a sparse matrix for which will be dense if . Next, we examine the sparsity of and prove a simple result for the case where an assumption of the form (10.56) is made.

the inequality

then the element

is nonzero.

) GP F v0

Cf %  S s %  % 

f d

Let

and assume that a given element

of

satises

6 F v0 )

) GF F v0

) GI F v0

S

Since

, Theorem 1.5 seen in Chapter 1 implies that

is nonsingular.

) GGF v0 ) GGF v0

 1 c#&) $

8 f

8 $ $  H ~ $ H& 8  & 
% f
G 8

% D 3D 

v f 8

3 f 

8
8 `


U 

S #

"0  "0  S

8 ~
G G

h n

f 8

R R

  

n r

The proposition implies that if is small enough, then the nonzero elements of are located in positions corresponding to the larger elements in the inverse of . The following negative result is an immediate corollary.

-equimodular in that

then the nonzero sparsity pattern of includes the nonzero sparsity pattern of . In particular, if is dense and its elements are -equimodular, then is also dense.
The smaller the value of , the more likely the condition of the corollary will be satised. Another way of stating the corollary is that accurate and sparse approximate inverses may be computed only if the elements of the actual inverse have variations in size. Unfortunately, this is difcult to verify in advance and it is known to be true only for certain types of matrices.

We now examine the convergence of the MR algorithm in the case where self preconditioning is used, but no numerical dropping is applied. The column-oriented algorithm is considered rst. Let be the current approximate inverse at a given substep. The self preconditioned MR iteration for computing the -th column of the next approximate inverse is obtained by the following sequence of operations:

1. 2. 3. 4.

f d

f d

e hf IcXU P Y cA 33 QH B h! IAIIb XC R H W P R W U H e T U H W H 8 e H W U

f C Cf  %  s%

C 0)0)0)C f

e S

Let

. If the nonzero elements of

&

Now the condition (10.57) implies that

S CE f    S s C f %  %    S f  C 7  S s f %  %    S  S   e

  S C

Therefore,

f 'd

f d

From the equality

we have

, and hence

  C S

  $ &

8    %YY" $ "p r $  8  S  S 8 (
 % %

 S

   @ wC  A   E C @ 8 E @ 8      C      C

 5"0

2 36 2

f 'd

|7 n



are

Note that

can be written as

where

is the preconditioned matrix at the given substep. The subscript is now dropped to simplify the notation. The new residual associated with the current column is given by

The orthogonality of the new residual against

can be used to obtain

Replacing

by its value dened above we get

Thus, at each inner iteration, the residual norm for the -th column is reduced according to the formula

in which denotes the acute angle between the vectors and . Assume that each column converges. Then, the preconditioned matrix converges to the identity. As a result of this, the angle will tend to , and therefore the convergence ratio will also tend to zero, showing superlinear convergence. Now consider equation (10.58) more carefully. Denote by the residual matrix and observe that

This results in the following statement.

Assume that the self preconditioned MR algorithm is employed with one inner step per iteration and no numerical dropping. Then the 2-norm of each residual of the -th column is reduced by a factor of at least , where is the approximate inverse before the current step, i.e.,

In addition, the residual matrices

obtained after each outer iteration satisfy

As a result, when the algorithm converges, it does so quadratically.

F v0 )

) G' F v0

) G) I v0

 f 8

 1 c#&) $ a  % a C  C

Ru 3 u B B C u  C 3 C C C B C  C C & % a C

f  3   v  8 3

C   C z vus C t C  C  C

B 7B  G C &

8 $ $  H ~ $ H& 8  & 
C a C % C ` C C
C

  ` % C ` C %
C
C

E

&!

a  Ca



a % ` C C

C

% x` %C  C `

ut vs C

%

 ut vs C

ut vs C

 

&!

u vt s C

%

 ""0

 C `

8 ~
  

h n

a % `

%

&!

Inequality (10.59) was proved above. To prove quadratic convergence, rst use the inequality and (10.59) to obtain

Here, the index corresponds to the outer iteration and the -index to the column. Note that the Frobenius norm is reduced for each of the inner steps corresponding to the columns, and therefore,

This yields

which, upon summation over , gives

This completes the proof.

Note that the above theorem does not prove convergence. It only states that when the algorithm converges, it does so quadratically at the limit. In addition, the result ceases to be valid in the presence of dropping. Consider now the case of the global iteration. When self preconditioning is incorporated into the global MR algorithm (Algorithm 10.8), the search direction becomes , where is the current residual matrix. Then, the main steps of the algorithm (without dropping) are as follows.

Then, the following recurrence follows from (10.61),

and shows that is also a polynomial of degree in . In particular, if the initial is symmetric, then so are all subsequent s. This is achieved when the initial is a multiple of , namely if .

DI v)

DI v)

a ` 

 

 w

Therefore, induction shows that Now dene the preconditioned matrices,

where

is a polynomial of degree .

uiI v) 

  w

 a

 

 8  

|

a

  `

G  

 w

An important observation is that relation,

is a polynomial in

. This is because, from the above

  R

U a

 

 w

` 8

At each step the new residual matrix

 

R  

1. 2. 3. 4.

satises the relation

Wu

 3    C p 3 Gus C t  3 C 

C

  $ &

8    %YY" $ "p r $  8
C

3 B

 uvs C t

3 D

 f

f  w @ C @    @C       8    

 




 

 

Similar to the column oriented case, when the algorithm converges it does so quadratically.

Assume that the self preconditioned global MR algorithm is used without dropping. Then, the residual matrices obtained at each iteration satisfy

As a result, when the algorithm converges, then it does so quadratically.


Dene for any ,

This proves quadratic convergence at the limit. For further properties see Exercise 16.

A notable disadvantage of the right or left preconditioning approach method is that it is difcult to assess in advance whether or not the resulting approximate inverse is nonsingular. An alternative would be to seek a two-sided approximation, i.e., a pair , , with lower triangular and upper triangular, which attempts to minimize the objective function (10.45). The techniques developed in the previous sections can be exploited for this purpose. In the factored approach, two matrices and which are unit lower and upper triangular matrices are sought such that

where is some unknown diagonal matrix. When is nonsingular and , then are called inverse LU factors of since in this case . Once more, the matrices are built one column or row at a time. Assume as in Section 10.4.5 that we have the sequence of matrices

in which

. If the inverse factors

are available for

, i.e.,

) GF I v0

f  d

B H B e H WP H Y 9 DIIIb 5dbQgf P

f 'd

p g3 G  & Da ` g3 f B

%   R   %F 

 

U C e

 7

9 I3XcY R H e U

2 36 2

Recall that

achieves the minimum of

over all s. In particular,

) 6 I v0

 1 c#&) $ 

8 $ $  H ~ $ H& 8  &  
 w

 3 f

a  ` '% & W`

 a

Ba

`

"0 

8 ~

h n

  

 

then the inverse factors

for

are easily obtained by writing

in which

, and

are such that

Note that the formula (10.69) exploits the fact that either the system (10.67) is solved exactly (middle expression) or the system (10.68) is solved exactly (second expression) or both systems are solved exactly (either expression). In the realistic situation where neither of these two systems is solved exactly, then this formula should be replaced by

The last row/column pairs of the approximate factored inverse can be obtained by solving two sparse systems and computing a few dot products. It is interesting to note that the only difference with the ILUS factorization seen in Section 10.4.5 is that the coefcient matrices , but the matrix itself. for these systems are not the triangular factors of To obtain an approximate factorization, simply exploit the fact that the matrices are sparse and then employ iterative solvers in sparse-sparse mode. In this situation, formula . The algorithm would be as follows. (10.70) should be used for

A linear system must be solved with in line 2 and a linear system with in line 3. This is a good scenario for the Biconjugate Gradient algorithm or its equivalent two-sided Lanczos algorithm. In addition, the most current approximate inverse factors can be used to precondition the linear systems to be solved in steps 2 and 3. This was termed self preconditioning earlier. All the linear systems in the above algorithm can be solved in parallel since they are independent of one another. The diagonal can then be obtained at the end of the process. This approach is particularly suitable in the symmetric case. Since there is only one factor, the amount of work is halved. In addition, there is no problem with the existence in the positive denite case as is shown in the following lemma which states that is always when is SPD, independently of the accuracy with which the system (10.67) is solved.

Let

be SPD. Then, the scalar

as computed by (10.70) is positive.

 y

P 

 y



P

1. For Do: 2. Solve (10.67) approximately; 3. Solve (10.68) approximately; 4. Compute 5. EndDo

I I v) ' I v) I v) P I v) ) P v)


 ! !  " v ) %  0 ! 0 0 %

 y

P 

E

  $ &  



 P

P  7

8    %YY" $ "p r $  8
P
f   yG f     A E A    E f  7 f  % f  7      

 

 

 2    I!"2 D5"0 wh n

f  w  y P  7 

 

 y

 y

BDB% G %

 y

 P  

 

 5"0

%  g

In the symmetric case, . Note that as computed by formula (10.70) is the element of the matrix . It is positive because is SPD. This is independent of the accuracy for solving the system to obtain . In the general nonsymmetric case, there is no guarantee that will be nonzero, unless the systems (10.67) and (10.68) are solved accurately enough. There is no practical problem here, since is computable. The only question remaining is a theoretical one: be guaranteed to be nonzero if the systems are solved with enough accuracy? Can Intuitively, if the system is solved exactly, then the matrix must be nonzero since it is equal to the matrix of the exact inverse factors in this case. The minimal assumption to make is that each is nonsingular. Let be the value that would be obtained if at least one of the systems (10.67) or (10.68) is solved exactly. According to equation (10.69), in this situation this value is given by

If is nonsingular, then . To see this refer to the dening equation (10.66) and compute the product in the general case. Let and be the residuals obtained for these linear systems, i.e.,

Then a little calculation yields

If one of or is zero, then it is clear that the term in the above relation becomes and it must be nonzero since the matrix on the left-hand side is nonsingular. Incidentally, this relation shows the structure of the last matrix . The components to of column consist of the vector , the components 1 to of row make up the vector , and the diagonal elements are the s. Consider now the expression for from (10.70).

After a computed ILU factorization results in an unsatisfactory convergence, it is difcult to improve it by modifying the and factors. One solution would be to discard this factorization and attempt to recompute a fresh one possibly with more ll-in. Clearly, this may be a wasteful process. A better alternative is to use approximate inverse techniques. Assume a (sparse) matrix is a preconditioner to the original matrix , so the precondi-

 5 fy

 d R C f

This perturbation formula is of a second order in the sense that . It guarantees that is nonzero whenever

) GGP v0

) GGP v0

) qWP v0

a c

 5 y

f   ` d

P

  y

s s

qa   `

 y

e H W IcXU P Y cA 33 9 8 5P P R W U H e W

 

 1 c#&) $ f   f y    y C
C
w

f  f  f  f      w  y  8 $ $  H ~ $ H& 8  & 

f  d R 

 d f

     

sa

 'd  f  fd R   'd R C f    ` Hac C  ` 'd W  C w f    P    f

 P  

 5 y

 

P

U C e

  f

 y

 5 y

 

fP

 5 y

 

2 36 2

 

 y

  y

8 ~


f  w f 5 y  f   f    

%
G

a !

 p C `

 5 y

 y

`
f

  
 y 

 

tioned matrix is

Recall that the columns of and are sparse. One approach is to compute a least-squares approximation in the Frobenius norm sense. This approach was used already in Section 10.5.1 when is the identity matrix. Then the columns of were obtained by approximately solving the linear systems . The same idea can be applied here. Now, the systems

must be solved instead, where is the -th column of which is sparse. Thus, the coefcient matrix and the right-hand side are sparse, as before.

Block preconditioning is a popular technique for block-tridiagonal matrices arising from the discretization of elliptic problems. It can also be generalized to other sparse matrices. We begin with a discussion of the block-tridiagonal case.

Consider a block-tridiagonal matrix blocked in the form


One of the most popular block preconditioners used in the context of PDEs is based on this block-tridiagonal form of the coefcient matrix . Let be the block-diagonal matrix consisting of the diagonal blocks , the block strictly-lower triangular matrix consisting of the sub-diagonal blocks , and the block strictly-upper triangular matrix consisting of the super-diagonal blocks . Then, the above matrix has the form

..

..

..

6 R!P v)

B H P Y 9 D! Ce gf SCXU 9 Pc5CbYvE T 9 W 8 RP e

h !

A sparse matrix is sought to approximate the inverse of used as a preconditioner to . Unfortunately, the matrix observe that all that is needed is a matrix such that

. This matrix is then to be is usually dense. However,

r
f 'd

fd h h

d f g
S fd h

v nfErvxf h w "

 $ Y#

U T

f d

S ! S

8  &  G' $ 

62 2 7336

A block ILU preconditioner is dened by

in which is some sparse approximation to . Thus, to obtain a block factorization, approximations to the inverses of the blocks must be found. This clearly will lead to difculties if explicit inverses are used. An important particular case is when the diagonal blocks of the original matrix are tridiagonal, while the co-diagonal blocks and are diagonal. Then, a simple recurrence formula for computing the inverse of a tridiagonal matrix can be exploited. Only the tridiagonal part of the inverse must be kept in the recurrence (10.76). Thus,

The following theorem can be shown.

Let

be Symmetric Positive Denite and such that

Then each block computed by the recurrence (10.77), (10.78) is a symmetric In particular, is also a positive denite matrix.

We now show how the inverse of a tridiagonal matrix can be obtained. Let a tridiagonal matrix of dimension be given in the form

and let its Cholesky factorization be

(S

&

 

%&

with

..

..

..

f 'd

f d

 

g  f

The matrices

, and for all . are all (strict) diagonally dominant.

for

f   S a d

f d

3 S

Sa 8 ` E

sDBD% G @ %

"0 

n)|w

SS

where

is the tridiagonal part of

-matrix.

) GI P v0

P v0 ) ) GP P v0

where and are the same as above, and are dened by the recurrence:

is a block-diagonal matrix whose blocks

) GF P v0

 1 c#&) $
%a

s% DBD% G @

8 $ $  H ~ $ H& 8  & 
w
% S !

S f'd  f'd S S

f'd

%S !

S !

f d

S E 8

S ( S

`


S f

8 ~

E 8

and

The inverse of is . Start by observing that the inverse of triangular matrix whose coefcients are given by

See Exercise 12 for a proof of the above equality. As noted, the recurrence formulas for computing can be unstable and lead to numerical difculties for large values of .

A general sparse matrix can often be put in the form (10.74) where the blocking is either natural as provided by the physical problem, or articial when obtained as a result of RCMK ordering and some block partitioning. In such cases, a recurrence such as (10.76) can still be used to obtain a block factorization dened by (10.75). A 2-level preconditioner can be dened by using sparse inverse approximate techniques to approximate . These are sometimes termed implicit-explicit preconditioners, the implicit part referring to the block-factorization and the explicit part to the approximate inverses used to explicitly . approximate

When the original matrix is strongly indenite, i.e., when it has eigenvalues spread on both sides of the imaginary axis, the usual Krylov subspace methods may fail. The Conjugate Gradient approach applied to the normal equations may then become a good alternative. Choosing to use this alternative over the standard methods may involve inspecting the spectrum of a Hessenberg matrix obtained from a small run of an unpreconditioned GMRES algorithm.

' P v)

v " fhHwznu%  nfhtvgP!U

   G y

B H P e Y 9 D! CbQgf SIHcIA8 T 9 e W H



f'd

 f'd

d  Pfd

53 36 42 2

starting with the rst column

. The inverse of

for

becomes

f d

As a result, the -th column simple recurrence,

of

is related to the

-st column

for

..

..

is a unit upper

by the very

 $ & G

p &0 8 1)
G
G

f 'd

' %  $
%

f'd

f d

  w 

} vG ~8 


d   DB  S f  S  S

S


f d

f d

  $ &

8  $ 

f 'd S f 'd

If the normal equations approach is chosen, the question becomes how to precondition the resulting iteration. An ILU preconditioner can be computed for and the preconditioned normal equations,

can be solved. However, when is not diagonally dominant the ILU factorization process may encounter a zero pivot. Even when this does not happen, the resulting preconditioner may be of poor quality. An incomplete factorization routine with pivoting, such as ILUTP, may constitute a good choice. ILUTP can be used to precondition either the original equations or the normal equations shown above. This section explores a few other options available for preconditioning the normal equations.

There are several ways to exploit the relaxation schemes for the Normal Equations seen in Chapter 8 as preconditioners for the CG method applied to either (8.1) or (8.3). Consider (8.3), for example, which requires a procedure delivering an approximation to for any vector . One such procedure is to perform one step of SSOR to solve the system . Denote by the linear operator that transforms into the vector resulting from this procedure, then the usual Conjugate Gradient method applied to (8.3) can be recast in the same form as Algorithm 8.5. This algorithm is known as CGNE/SSOR. Similarly, it is possible to incorporate the SSOR preconditioning in Algorithm 8.4, which is associated with the Normal Equations (8.1), by dening to be the linear transformation that maps a vector into a vector resulting from the forward sweep of Algorithm 8.2 followed by a backward sweep. We will refer to this algorithm as CGNR/SSOR. The CGNE/SSOR and CGNR/SSOR algorithms will not break down if is nonsingular, since then the matrices and are Symmetric Positive Denite, as are the preconditioning matrices . There are several variations to these algorithms. The standard alternatives based on the same formulation (8.1) are either to use the preconditioner on the right, solving the system , or to split the preconditioner into a forward SOR sweep on the left and a backward SOR sweep on the right of the matrix . Similar options can also be written for the Normal Equations (8.3) again with three different ways of preconditioning. Thus, at least six different algorithms can be dened.

The Incomplete Cholesky IC(0) factorization can be used to precondition the Normal Equations (8.1) or (8.3). This approach may seem attractive because of the success of incomplete factorization preconditioners. However, a major problem is that the Incomplete Cholesky factorization is not guaranteed to exist for an arbitrary Symmetric Positive Denite matrix . All the results that guarantee existence rely on some form of diagonal dominance. One of the rst ideas suggested to handle this difculty was to use an Incomplete Cholesky factorization on the shifted matrix . We refer to IC(0) applied to as ICNR(0), and likewise IC(0) applied to

f d a

Ux`

8  w

B XU P Q9 qphS9 3XWdHca Y XU W Y @ i H T f e U e

% f ( 'd a

B Y ACW 9 Ce RcW XUpB P P 9 9 e

f d

 1 c#&) $

`

da

8 $ $  H ~ $ H& 8  & 

`

f d a

 C U

d f

da

f 'd

62 2 7336

4 336 2 2

8 ~

a

x`

220

210

200

iterations

190

180

170

160

150 0.1

0.2

0.3

0.4

0.5 alpha

0.6

0.7

0.8

0.9

Iteration count as a function of the shift .

One issue often debated is how to nd good values for the shift . There is no easy and well-founded solution to this problem for irregularly structured symmetric sparse matrices. One idea is to select the smallest possible that makes the shifted matrix diagonally dominant. However, this shift tends to be too large in general because IC(0) may exist for much smaller values of . Another approach is to determine the smallest for which the IC(0) factorization exists. Unfortunately, this is not a viable alternative. As is often observed, the number of steps required for convergence starts decreasing as increases, and then increases again. An illustration of this is shown in Figure 10.14. This plot suggests that there is an optimal value for which is far from the smallest admissible one. For small , the diagonal dominance of is weak and, as a result, the computed IC factorization is a poor approximation to the matrix . In other words, is close to the original matrix , but the IC(0) factorization is far from . For large , the opposite is true. The matrix has a large deviation from , but its IC(0) factorization may be quite good. Therefore, the general shape of the curve shown in the gure is not too surprising. To implement the algorithm, the matrix need not be formed explicitly. All that is required is to be able to access one row of at a time. This row can be computed, used, and then discarded. In the following, the -th row of is denoted by . The algorithm is row-oriented and all vectors denote row vectors. It is adapted from the ILU(0) factorization of a sparse matrix, i.e., Algorithm 10.4, but it actually computes the factorization instead of an or factorization. The main difference with Algorithm 10.4 is that the loop in line 7 is now restricted to because of symmetry. If only the elements are stored row-wise, then the rows of which are needed in this loop are not directly available. Denote the -th row of by . These rows are accessible by adding a column data structure for the matrix which is updated dynamically. A linked list data structure can be used for this purpose. With this in mind, the IC(0) algorithm will have the following structure.

S

 

as ICNE(0). Shifted variants correspond to applying IC(0) to the shifted

matrix.

 a `

a ` a  ` E

 $ &

  `3 

p &0 8 1)

8  w

Ug

' %  $

 m5 &10   75"0 wh n  

a  % `

} vG ~8 


 

8  w

a `

  $ &

5"0# 

8  $ 

Do:

Note that initially the row in the algorithm is dened as the rst row of . All vectors in the algorithm are row vectors. The step represented by lines 3 and 4, which computes the inner products of row number with all previous rows, needs particular attention. If the inner products

are computed separately, the total cost of the incomplete factorization would be of the order of steps and the algorithm would be of little practical value. However, most of these inner products are equal to zero because of sparsity. This indicates that it may be possible to compute only those nonzero inner products at a much lower cost. Indeed, if is the column of the inner products , then is the product of the rectangular matrix whose rows are by the vector , i.e.,

This is a sparse matrix-by-sparse vector product which was discussed in Section 10.5. It is best performed as a linear combination of the columns of which are sparse. The only difculty with this implementation is that it requires both the row data structure of and of its transpose. A standard way to handle this problem is by building a linked-list data structure for the transpose. There is a similar problem for accessing the transpose of , as mentioned earlier. Therefore, two linked lists are needed: one for the matrix and the other for the matrix. These linked lists avoid the storage of an additional real array for the matrices involved and simplify the process of updating the matrix when new rows are obtained. It is important to note that these linked lists are used only in the preprocessing phase and are discarded once the incomplete factorization terminates.

G) v0 ) ra G R7`

 w

f d

   S  S   S G a 7` BDB% %  m   a %0 `    S  Sw  a   ` G  % G a 7`    ( S i BDB% a `    G i% DBD% H E% G n %a S & %  `   S S &  S S sf DBBp"% H % %

1. Initial step: Set , 2. For Do: 3. Obtain all the nonzero inner products 4. , and 5. Set 6. For and if Do: 7. Extract row 8. Compute 9. For and if 10. Compute 11. EndDo 12. EndDo 13. Set , 14. EndDo

 1 c#&) $

8 $ $  H ~ $ H& 8  & 
S fd S S % DBB% S

S f'd S

f d f S % DBB%

S

%S

ff

ff


f

 SS SS 

8 ~

f 'd

S b

Consider a general sparse matrix and denote its rows by LQ factorization of is dened by

. The (complete)

where is a lower triangular matrix and is unitary, i.e., . The factor in the above factorization is identical with the Cholesky factor of the matrix . Indeed, if where is a lower triangular matrix having positive diagonal elements, then

The uniqueness of the Cholesky factorization with a factor having positive diagonal elements shows that is equal to the Cholesky factor of . This relationship can be exploited to obtain preconditioners for the Normal Equations. Thus, there are two ways to obtain the matrix . The rst is to form the matrix explicitly and use a sparse Cholesky factorization. This requires forming the data structure of the matrix , which may be much denser than . However, reordering techniques can be used to reduce the amount of work required to compute . This approach is known as symmetric squaring. A second approach is to use the Gram-Schmidt process. This idea may seem undesirable at rst because of its poor numerical properties when orthogonalizing a large number of vectors. However, because the rows remain very sparse in the incomplete LQ factorization (to be described shortly), any given row of will be orthogonal typically to most of the previous rows of . As a result, the Gram-Schmidt process is much less prone to numerical difculties. From the data structure point of view, Gram-Schmidt is optimal because it does not require allocating more space than is necessary, as is the case with approaches based on symmetric squaring. Another advantage over symmetric squaring is the simplicity of the orthogonalization process and its strong similarity with the LU factorization. At every step, a given row is combined with previous rows and then normalized. The incomplete Gram-Schmidt procedure is modeled after the following algorithm.

If the algorithm completes, then it will result in the factorization where the rows of and are the rows dened in the algorithm. To dene an incomplete factorization, a dropping strategy similar to those dened for Incomplete LU factorizations must be incorporated. This can be done in very general terms as follows. Let and be the chosen zero patterns for the matrices , and , respectively. The only restriction on is that

1. For Do: 2. Compute , for 3. Compute , and 4. If then Stop; else Compute 5. EndDo

i T P cW Y 5P c! Gf R 9 R f a BE s

%% BDB7% f

 $ &

p &0 8 1)
 

SS ! S S SS f E  B G S & S S  S  d   S S f % DBB% H % G m S a  % S `    S G sDBB% H %

( v

  

a %07`

' %  $

9 eA8hH Y D XC H T f U } vG ~8 

%  " %  !

&

U#

WP

  $ &

 5"0 wh n  

2 2 336

8  $ 


These two sets can be selected in various ways. For example, similar to ILUT, they can be determined dynamically by using a drop strategy based on the magnitude of the elements generated. As before, denotes the -th row of a matrix and its -th entry.

We recognize in line 2 the same practical problem encountered in the previous section for IC(0) for the Normal Equations. It can be handled in the same manner. Therefore, the row structures of , , and are needed, as well as a linked list for the column structure of . After the -th step is performed, the following relation holds:

where is the matrix whose -th row is , and the notation for and is as before. The case where the elements in are not dropped, i.e., the case when is the empty set, is of particular interest. Indeed, in this situation, and we have the exact relation . However, is not unitary in general because elements are dropped from . If at a given step , then (10.81) implies that is a linear combination of the rows , , . Each of these is, inductively, a linear combination of . Therefore, would be a linear combination of the previous rows, which cannot be true if is nonsingular. As a result, the following proposition can be stated.

If is nonsingular and , then the Algorithm 10.14 completes and computes an incomplete LQ factorization , in which is nonsingular and is a lower triangular matrix with positive elements.

) G v0

 BDB% f

where is the row of elements that have been dropped from the row equation translates into

in line 5. The above

) q v0

f d

S %

DBB% f

S S C w  

or

f    S d  f

a %0 `

SS ! S

i % BDB% H % G m

1. For Do: 2. Compute , for 3. Replace by zero if 4. Compute , by zero if 5. Replace each 6. 7. If then Stop; else compute 8. EndDo

a 07` %

% @sDBB% G @
S 

&  u(

% ( %

BBD% H % G 

%D!



C w

G % % % S f    S f'd S  S a %07` a  %S `

S S S E &  S S BDB n }  S S   S  G %BDB%

0 0 2

&

S SS

a % 7`

    5"0 wEn

 "0 h n 

SS

&

As for

, for each row there must be at least one nonzero element, i.e.,

 1 c#&) $

8 $ $  H ~ $ H& 8  & 


8 ~

f d S

DB

A major problem with the decomposition (10.82) is that the matrix is not orthogonal in general. In fact, nothing guarantees that it is even nonsingular unless is not dropped or the dropping strategy is made tight enough. Because the matrix of the complete LQ factorization of is identical with the Cholesky factor of , one might wonder why the IC(0) factorization of does not always exist while the ILQ factorization seems to always exist. In fact, the relationship between ILQ and ICNE, i.e., the Incomplete Cholesky for , can lead to a more rigorous way of choosing a good pattern for ICNE, as is explained next. We turn our attention to Modied Gram-Schmidt. The only difference is that the row is updated immediately after an inner product is computed. The algorithm is described without dropping for for simplicity.

When is nonsingular, the same result as before is obtained if no dropping is used on , namely, that the factorization will exist and be exact in that . Regarding the implementation, if the zero pattern is known in advance, the computation of the inner products in line 4 does not pose a particular problem. Without any dropping in , this algorithm may be too costly in terms of storage. It is interesting to see that this algorithm has a connection with ICNE, the incomplete Cholesky applied to the matrix . The following result is stated without proof.

Then the matrix obtained from Algorithm 10.15 with the zero-pattern set is identical with the factor that would be obtained from the Incomplete Cholesky factorization applied to with the zero-pattern set .
For a proof, see [222]. This result shows how a zero-pattern can be dened which guarantees the existence of an Incomplete Cholesky factorization on .

  % 7` a

%

XYW G

Let be an set which is such that for any holds:

matrix and let , with

. Consider a zero-pattern and , the following

 % ` a n  m U#

SS S

5. Compute . 6. EndDo 7. 8. If then Stop; else Compute 9. EndDo

!&

 

 S a  %S `

4.

Compute

a %0 ` &  U  

, Do:

S  S  S

1. For 2. 3. For

Do:

&  q(

 $ &

p &0 8 1)

% D!

& 0 9 &

' %  $

} vG ~8  0 0 2

  $ &

a 07` %

i% BBD% G m S S sDBB% G H % 

    5"0 wh n

8  $  S S u S E z  S S

" 

 n)|w

1 Assume that is the Symmetric Positive Denite matrix arising from the 5-point nite difference discretization of the Laplacean on a given mesh. We reorder the matrix using the red-black ordering and obtain the reordered matrix

Show the ll-in pattern for the IC(0) factorization for a matrix of size with a mesh.

Show the nodes associated with these ll-ins on the 5-point stencil in the nite difference mesh. Give an approximate count of the total number of ll-ins when the original mesh is square, with the same number of mesh points in each direction. How does this compare with the natural ordering? Any conclusions? tridiagonal nonsingular matrix

2 Consider a

What can be said about its ILU(0) factorization (when it exists)? Suppose that the matrix is permuted (symmetrically, i.e., both rows and columns) using the permutation Show the pattern of the permuted matrix.

Show the locations of the ll-in elements in the ILU(0) factorization. Show the pattern of the ILU(1) factorization as well as the ll-ins generated. Show the level of ll of each element at the end of the ILU(1) process (including the ll-ins). What can be said of the ILU(2) factorization for this permuted matrix?

3 Assume that is the matrix arising from the 5-point nite difference discretization of an elliptic operator on a given mesh. We reorder the original linear system using the red-black ordering and obtain the reordered linear system

Show that this reduced system is also a sparse matrix. Show the stencil associated with the reduced system matrix on the original nite difference mesh and give a graph-theory interpretation of the reduction process. What is the maximum number of nonzero elements in each row of the reduced system.

4 It was stated in Section 10.3.2 that for some specic matrices the ILU(0) factorization of be put in the form in which and are the strict-lower and -upper parts of

can

, respectively.

Show how to obtain a system (called the reduced system) which involves the variable only.

A

e 

We then form the Incomplete Cholesky factorization on this matrix.

 1 c#&) $
associated

8 $ $  H ~ $ H& 8  &  h h

   A 33  

T SP T R 5 9 5

R W

R 7

8 ~
$

twS
T

&   &  

&  

&

&

& '%

&1 & '%

& '%

&1

&32

&1

Characterize these matrices carefully and give an interpretation with respect to their adjacency graphs. Verify that this is true for standard 5-point matrices associated with any domain . Is it true for 9-point matrices? Is it true for the higher level ILU factorizations?

5 Let be a pentadiagonal matrix having diagonals in offset positions . The coefcients in these diagonals are all constants: for the main diagonal and -1 for all others. It is assumed that . Consider the ILU(0) factorization of as given in the form (10.20). The elements of the diagonal are determined by a recurrence of the form (10.19). Show that Show that for . is a decreasing sequence. [Hint: Use induction].

Prove that the formal (innite) sequence dened by the recurrence converges. What is its limit?

6 Consider a matrix which is split in the form , where is a block diagonal matrix whose block-diagonal entries are the same as those of , and where is strictly is strictly upper triangular. In some cases the block form of the ILU(0) lower triangular and factorization can be put in the form (Section 10.3.2):

The block entries of can be dened by a simple matrix recurrence. Find this recurrence relation. The algorithm may be expressed in terms of the block entries the matrix . 7 Generalize the formulas developed at the end of Section 10.6.1 for the inverses of symmetric tridiagonal matrices, to the nonsymmetric case. 8 Develop recurrence relations for Incomplete Cholesky with no ll-in (IC(0)), for 5-point matrices, similar to those seen in Section 10.3.4 for ILU(0). Same question for IC(1). 9 What becomes of the formulas seen in Section 10.3.4 in the case of a 7-point matrix (for threedimensional problems)? In particular, can the ILU(0) factorization be cast in the form (10.20) in is the strict-lower diagonal of and is the strict upper triangular part of , and which is a certain diagonal? 10 Consider an arbitrary matrix which is split in the usual manner as , in which and are the strict-lower and -upper parts of , respectively, and dene, for any diagonal matrix , the approximate factorization of given by

Show how a diagonal can be determined such that and have the same diagonal elements. Find a recurrence relation for the elements of . Consider now the symmetric case and assume that the matrix which is positive can be found. Write in the form

What is the relation between this matrix and the matrix of the SSOR preconditioning, in the particular case when ? Conclude that this form of ILU factorization is in effect an SSOR preconditioning with a different relaxation factor for each equation.

T $P T R 5 9 5

11 Consider a general sparse matrix ization of the form

(irregularly structured). We seek an approximate LU factor-

7
& 

   H  BT 

T T

 

" " $P R " C 9 5  R R $P T R SP T R R  R 

&

T 0 T

T $P T R 5 9 5

T $P T R 5 9 5

 e 

59

 X 

e  ! !

 c"

8 }$

 (c2 " $  

& % & & 32 &1 & % &32 &1

Assume that for we have . Show a sufcient condition under which . Are there cases in which this condition cannot be satised for any ? Assume now that all diagonal elements of . Dene and let are equal to a constant, i.e., for

12 Show the second part of (10.79). [Hint: Exploit the formula are the -th columns of and , respectively]. 13 Let a preconditioning matrix is a matrix of rank .

where

be related to the original matrix

by

, in which

Assume that both and are Symmetric Positive Denite. How many steps at most are required for the preconditioned Conjugate Gradient method to converge when is used as a preconditioner? Answer the same question for the case when RES is used on the preconditioned system. and are nonsymmetric and the full GM-

14 Formulate the problem for nding an approximate inverse to a matrix as a large linear system. What is the Frobenius norm in the space in which you formulate this problem?
 

15 The concept of mask is useful in the global iteration technique. For a sparsity pattern , i.e., a set of pairs and a matrix , we dene the product to be the matrix whose elements are zero if does not belong to , and otherwise. This is called a mask operation since its effect is to ignore every value not in the pattern . Consider a global minimization of . the function What does the result of Proposition 10.3 become for this new objective function? Formulate an algorithm based on a global masked iteration, in which the mask is xed and equal to the pattern of . Formulate an algorithm in which the mask is adapted at each outer step. What criteria would you use to select the mask?

16 Consider the global self preconditioned MR iteration algorithm seen in Section 10.5.5. Dene the acute angle between two matrices as

Following what was done for the (standard) Minimal Residual algorithm seen in Chapter 5, and produced by global MR without establish that the matrices dropping are such that

h

 ' e h R  e

Show a condition on

under which

4 !    A  0    7 R     R      R SP 

T 4 C !

H 0'

5 2 6

@A9&

U e

h

!"

7 %

!"

2 3

0 1

!"

()'%# & $

 

C5 8a

7 8

Establish that if

for

then

. Is it true in general that

for all ?

'

By identifying the diagonal elements of ing the elements of the diagonal matrix

with those of recursively.

for

  0  

 W T    # H

 H

'

in which and is such that

are the strict-lower and -upper parts of

 1 c#&) $

8 $ $  H ~ $ H& 8  & 

'

"


9 9 T

8 ~


T


  

&

& '%

&1 & '%

& '%

& '%

&32 &1

&1

&32

, respectively. It is assumed that

, derive an algorithm for generat-

17 In the two-sided version of approximate inverse preconditioners, the option of minimizing

What is the gradient of

Formulate an algorithm based on minimizing this function globally.

18 Consider the two-sided version of approximate inverse preconditioners, in which a unit lower triangular and an upper triangular are sought so that . One idea is to use an , alternating procedure in which the rst half-step computes a right approximate inverse to which is restricted to be upper triangular, and the second half-step computes a left approximate inverse to , which is restricted to be lower triangular. Consider the rst half-step. Since the candidate matrix is restricted to be upper triangular, special care must be exercised when writing a column-oriented approximate inverse algorithm. What are the differences with the standard MR approach described by Algorithm 10.10? Now consider seeking an upper triangular matrix such that the matrix is close to the identity only in its upper triangular part. A similar approach is to be taken for the second half-step. Formulate an algorithm based on this approach.

19 Write all six variants of the preconditioned Conjugate Gradient algorithm applied to the Normal Equations, mentioned at the end of Section 10.7.1. , in which is the diagonal of and 20 With the standard splitting its lower- and upper triangular parts, respectively, we associate the factored approximate inverse factorization, Determine and show that it consists of second order terms, i.e., terms involving products of at least two matrices from the pair .

Show how the approximate inverse factorization (10.83) can be improved using this new approximation. What is the order of the resulting approximation?

N OTES AND R EFERENCES . A breakthrough paper on preconditioners is the article [149] by Meijerink and van der Vorst who established existence of the incomplete factorization for -matrices and showed that preconditioning the Conjugate Gradient by using an ILU factorization can result in an extremely efcient combination. The idea of preconditioning was exploited in many earlier papers. For example, in [11, 12] Axelsson discusses SSOR iteration, accelerated by either the Conjugate

R

C 85 9

Now use the previous approximation for

 v)

T Y T

9 5

R R  $PR R 9 T R T R

e 9  SP R

 Y 

C 85 9

TX T

SPR R  R

R $P

59

eg

was mentioned, where

is unit lower triangular and

is upper triangular.

e 9  5

in which .

and

are, respectively, the smallest and largest eigenvalues of

9  c5 4 9 c5 4

"

9 4c5

()&'%# $

C 85

C 85

9 c5 4

"

Let now a given step

so that is symmetric for all (see Section 10.5.5). Assume that, at the matrix is positive denite. Show that

p
4

 c"

7 e 8 $  }$  (c2
7

 "

&1 & %

& %

& %

&1 &1 &1

Gradient or Chebyshev acceleration. Incomplete factorizations were also discussed in early papers, for example, by Varga [212] and Buleev [45]. Thus, Meijerink and van der Vorsts paper played an essential role in directing the attention of researchers and practitioners to a rather important topic and marked a turning point. Many of the early techniques were developed for regularly structured matrices. The generalization, using the denition of level of ll for high-order Incomplete LU factorizations for unstructured matrices, was introduced by Watts [223] for petroleum engineering problems. Recent research on iterative techniques has been devoted in great part to the development of better iterative accelerators, while robust preconditioners have by and large been neglected. This is certainly caused by the inherent lack of theory to support such methods. Yet these techniques are vital to the success of iterative methods in real-life applications. A general approach based on modifying a given direct solver by including a drop-off rule was one of the rst in this category [151, 157, 235, 98]. More economical alternatives, akin to ILU( ), were developed later [179, 183, 68, 67, 226, 233]. ILUT and ILUTP, are inexpensive general purpose preconditioners which are fairly robust and efcient. However, many of these preconditioners, including ILUT and ILUTP, can fail. Occasionally, a more accurate ILUT factorization leads to a larger number of steps needed for convergence. One source of failure is the instability of the preconditioning operation. These phenomena of instability have been studied by Elman [81] who proposed a detailed analysis of ILU and MILU preconditioners for model problems. The theoretical analysis on ILUT stated as Theorem 10.3 is modeled after Theorem 1.14 in Axelsson and Barker [16] for ILU(0). Some theory for block preconditioners is discussed in Axelssons book [15]. Different forms of block preconditioners were developed independently by Axelsson, Brinkkemper, and Ilin [17] and by Concus, Golub, and Meurant [61], initially for block matrices arising from PDEs in two dimensions. Later, some generalizations were proposed [137]. Thus, the 2-level implicit-explicit preconditioning introduced in [137] consists of using sparse inverse approximations to for obtaining . The current rebirth of approximate inverse preconditioners [112, 62, 137, 54] is spurred by both parallel processing and robustness considerations. Other preconditioners which are not covered here are those based on domain decomposition techniques. Some of these techniques will be reviewed in Chapter 13. On another front, there is also increased interest in methods that utilize Normal Equations in one way or another. Earlier, ideas revolved around shifting the matrix before applying the IC(0) factorization as was suggested by Kershaw [134] in 1977. Manteuffel [148] also made some suggestions on how to select a good in the context of the CGW algorithm. Currently, new ways of exploiting the relationship with the QR (or LQ) factorization to dene IC(0) more rigorously are being explored; see the recent work in [222].

R SP

 1 c#&) $

8 $ $  H ~ $ H& 8  & 

8 ~

pp


The remaining chapters of this book will examine the impact of high performance computing on the design of iterative methods for solving large linear systems of equations. Because of the increased importance of three-dimensional models combined with the high cost associated with sparse direct methods for solving these problems, iterative techniques are starting to play a major role in many application areas. The main appeal of iterative methods is their low storage requirement. Another advantage is that they are far easier to implement on parallel computers than sparse direct methods because they only require a rather small set of computational kernels. Increasingly, direct solvers are being used in conjunction with iterative solvers to develop robust preconditioners. The rst considerations for high-performance implementations of iterative methods involved implementations on vector computers. These efforts started in the mid 1970s when the rst vector computers appeared. Currently, there is a larger effort to develop new prac-

B | " EXdx

beqDRIvVtrCcGqDurIscqRWfVRDA iEFGBs YIA S pIA t YQs S a CF C X a caA e lAs A Al IAAT ApF9 YF Y e S C F A YF C FCA eI a c RVHy ifF7 ubGfsbGvV DBspRDbGBuBiuB9kRf9Y cRVGt iBp scqBoID`AY R7DRl scuWfVq { teD`AqRBDa B`AqF B@RqryRGt`AqbYter B9IHiu9YSAb scuBRDB r@BiieuWV csq`FY C YsF9 C Y A9 8 CV YFa C A Y c elV A eF9s SA c { lIF I Y wIA S s c XV c CApV IF A ct c {C YsF9a 9 8 CFs lAaIF9IA A F qDDA RrS}D%{A GrpDBBRBp RD r7DA`qRBDe cGDUfSe c BA RhFGBwbbBRBRDBpuB9 9 c9 { e S CV F A YF C F s A Al c 9a ssF lI e A9 8 YI F cQd Da G}R9Y ciDt RBp scqBxIB`AY RPWV DBpRUVY}e DDFGVtCGRRBvfVfa`AB@fqDA RBp G7bA RF scqDB7fd`AeY c vtVCUbGuiBDDuuB9 iuf9sYuIcGy oDRBRUB9bBIscGREFDBDGrIc YIAQ e S X lAtIF9a YVI eF lV A S t CAlIQ A Y Aa e CAIAt lI CAlIQ C eFA c c YF Y c Y cVs p Y XV t YIFplF A9 8e V BiFYteDRBR|VY DA qcBbe HY HuB9ie @qIiD{A cGie cG9kBAGBFqiBGBB@WbSRf9Y cCRGt w i|ElGRBR`FYe vtVCA RTsceGDiDBpBDB{ fSe c BA RhFCGBPeYDEFCusyYtep B7GFHeuQRf9Y F CFlI S X eVs CA AIA9 Fs a Y A C A9 8 C X feB7d GRBb`ABp scqEFD`AY RDA iEFGB|urIGvV BBpRl RVfeBDBVtCGRRDiBWV scY WBEFfC}o{PDfAT AQ cI9a Y A Y C c CFs t cs A A C X A9aF ssFFI cl Y V Y I A ApF9 CA9 8 C a c9 CF CFs e Y YIA S s C X YF clIFa A C BuBEADB@ibfeEARfQsYb`AY GBhaGGDA iEFGB|IvV}uIvV scqF`qDDA RrSc RVife`AqRl WBRbrScGs A Y CF A Y YF Y C C e VI c cjeI YFQ A XV e S e e CFA A C C X Bf9EAGyBf9HqBf9`Ae cDosRfQwue Y buWV scqB7dbiDDA`YuyrGbBrIc ttGF iVhturIc w ifFRRPEAGiu9YBp qcuEFDAY bBrI buIvV scqFb ca RRRurIcCDfABrIiuDwBRPv csqBA cu`ae AssF CF elV A S A Y C c Aa c e Y ssF t A ctIA lIF a YI rIU`Ae cG@qBf9HDA RvtVGA Rb`aGttGF urIcGp WVe RVDBVtDRikD`AqRWfVEaDRQ|ElGFRl c CF YF Y e S T Cs F e A C t C X 9aF CssF C YQs S CAs e C wI e A Y Y A YF C F A eIAs A IF t AT e X c t YQs S a CF qR`FYBf9VBp qcuBoIDAY RPBp scuDGisyBrIcRwurIcSvVabYBFe wurIscqRvVRDA iEFGv

 ) 6  $  ) v#&'# # "! g6 # $ $ " " "

tical iterative methods that are not only efcient in a parallel environment, but also robust. Often, however, these two requirements seem to be in conict. This chapter begins with a short overview of the various ways in which parallelism has been exploited in the past and a description of the current architectural models for existing commercial parallel computers. Then, the basic computations required in Krylov subspace methods will be discussed along with their implementations.

Parallelism has been exploited in a number of different forms since the rst computers were built. The six major forms of parallelism are: (1) multiple functional units; (2) pipelining; (3) vector processing; (4) multiple vector pipelines; (5) multiprocessing; and (6) distributed computing. Next is a brief description of each of these approaches.

This is one of the earliest forms of parallelism. It consists of multiplying the number of functional units such as adders and multipliers. Thus, the control units and the registers are shared by the functional units. The detection of parallelism is done at compilation time with a Dependence Analysis Graph, an example of which is shown in Figure 11.1.
+

b c

* d e

Dependence analysis for arithmetic expression: .

In the example of Figure 11.1, the two multiplications can be performed simultaneously, then the two additions in the middle are performed simultaneously. Finally, the addition at the root is performed.

B Y AW hSrbY P @ T 9 W UP

 $ &
+ * f

p8 8 $

W @ DC bY T @ f H T P

Yc 0c" 

$m| t   3 

62 4 6 7352 I6

` a ` w w 5 Dd

8 ~

The pipelining concept is essentially the same as that of an assembly line used in car manufacturing. Assume that an operation takes stages to complete. Then the operands can be passed through the stages instead of waiting for all stages to be completed for the rst two operands.

stage 1

stage 2

stage 3

stage 4

If each stage takes a time to complete, then an operation with numbers will take . The speed-up would be the ratio of the time to the time complete the stages in a non-pipelined unit versus, i.e., , over the above obtained time,

For large , this would be close to .

Vector computers appeared in the beginning of the 1970s with the CDC Star 100 and then the CRAY-1 and Cyber 205. These are computers which are equipped with vector pipelines, i.e., pipelined functional units, such as a pipelined oating-point adder, or a pipelined oating-point multiplier. In addition, they incorporate vector instructions explicitly as part of their instruction sets. Typical vector instructions are, for example:

To load a vector from memory to a vector register To add the content of two vector registers To multiply the content of two vector registers.

Similar to the case of multiple functional units for scalar machines, vector pipelines can be duplicated to take advantage of any ne grain parallelism available in loops. For example, the Fujitsu and NEC computers tend to obtain a substantial portion of their performance in this fashion. There are many vector operations that can take advantage of multiple vector pipelines.

A multiprocessor system is a computer, or a set of several computers, consisting of several processing elements (PEs), each consisting of a CPU, a memory, an I/O subsystem, etc. These PEs are connected to one another with some communication medium, either a bus or some multistage network. There are numerous possible congurations, some of which will be covered in the next section.


8 W Y @ f U 5P C X Ib@ CbY rcR R H Y P e BP

p
6 6
p

B e U B B H IXpDD! U C XAY e e U

 

8 WP T H P 5AW P Q3 C


6 6

G w

R AW 9 W QBpD U 3 bY T @ f 8 P B H e P

4 352 I6 2 4 6

b H

H 6 H 6 p

w o` a G %o`

  Ycc"  352 I6 2 4 6

 nb'&G


6 6

2 4 6 352 I6

   

Distributed computing is a more general form of multiprocessing, in which the processors are actually computers linked by some Local Area Network. Currently, there are a number of libraries that offer communication mechanisms for exchanging messages between Unix-based systems. The best known of these are the Parallel Virtual Machine (PVM) and the Message Passing Interface (MPI). In heterogeneous networks of computers, the processors are separated by relatively large distances and that has a negative impact on the performance of distributed applications. In fact, this approach is cost-effective only for large applications, in which a high volume of computation can be performed before more data is to be exchanged.

There are currently three leading architecture models. These are:

The shared memory model. SIMD or data parallel models. The distributed memory message passing model.

A brief overview of the characteristics of each of the three groups follows. Emphasis is on the possible effects these characteristics have on the implementations of iterative methods.

A shared memory computer has the processors connected to a large global memory with the same global view, meaning the address space is the same for all processors. One of the main benets of shared memory models is that access to data depends very little on its location in memory. In a shared memory environment, transparent data access facilitates programming to a great extent. From the users point of view, data are stored in a large global memory that is readily accessible to any processor. However, memory conicts as well as the necessity to maintain data coherence can lead to degraded performance. In addition, shared memory computers cannot easily take advantage of data locality in problems which have an intrinsically local nature, as is the case with most discretized PDEs. Some current machines have a physically distributed memory but they are logically shared, i.e., each processor has the same view of the global address space. There are two possible implementations of shared memory machines: (1) bus-based architectures, and (2) switch-based architecture. These two model architectures are illustrated in Figure 11.2 and Figure 11.3, respectively. So far, shared memory computers have been implemented more often with buses than with switching networks.

B e H Y @ f U IIbC X

 $ &

p8 8 $

m| HU !utnSi   3HU 

e U fH XVIhf

Yc 0c" 
R H 9 a I3e CIB

62 6 73 2 I6

8 ~

HIGH SPEED BUS


SHARED MEMORY

A bus-based shared memory computer. M M M M M M

SWITCHING NETWORK

A switch-based shared memory computer.

Buses are the backbone for communication between the different units of most computers. Physically, a bus is nothing but a bundle of wires, made of either ber or copper. These wires carry information consisting of data, control signals, and error correction bits. The speed of a bus, often measured in Megabytes per second and called the bandwidth of the bus, is determined by the number of lines in the bus and the clock rate. Often, the limiting factor for parallel computers based on bus architectures is the bus bandwidth rather than the CPU speed. The primary reason why bus-based multiprocessors are more common than switchbased ones is that the hardware involved in such implementations is simple. On the other hand, the difculty with bus-based machines is that the number of processors which can be connected to the memory will be small in general. Typically, the bus is timeshared, meaning slices of time are allocated to the different clients (processors, IO processors, etc.) that request its use. In a multiprocessor environment, the bus can easily be saturated. Several remedies are possible. The rst, and most common, remedy is to attempt to reduce trafc by adding local memories or caches attached to each processor. Since a data item used by a given processor is likely to be reused by the same processor in the next instructions, storing the data item in local memory will help reduce trafc in general. However, this strategy causes some difculties due to the requirement to maintain data coherence. If Processor (A) reads some data from the shared memory, and Processor (B) modies the same data in shared memory, immediately after, the result is two copies of the same data that have different values. A mechanism should be put in place to ensure that the most recent update of the data is always used. The additional overhead incurred by such memory coherence

 1 c "

8 8 H~

  bc0

D# 

tD# 

 vc"8

operations may well offset the savings involving memory trafc. The main features here are the switching network and the fact that a global memory is shared by all processors through the switch. There can be processors on one side connected to memory units or banks on the other side. Alternative designs based on switches connect processors to each other instead of memory banks. The switching network can be a crossbar switch when the number of processors is small. A crossbar switch is analogous to a telephone switch board and allows inputs to be connected to outputs without conict. Since crossbar switches for large numbers of processors are typically expensive they are replaced by multistage networks. Signals travel across a small number of stages consisting of an array of elementary switches, e.g., or switches. There have been two ways of exploiting multistage networks. In circuit switching networks, the elementary switches are set up by sending electronic signals across all of the switches. The circuit is set up once in much the same way that telephone circuits are switched in a switchboard. Once the switch has been set up, communication between processors is open to the memories

in which represents the desired permutation. This communication will remain functional for as long as it is not reset. Setting up the switch can be costly, but once it is done, communication can be quite fast. In packet switching networks, a packet of data will be given an address token and the switching within the different stages will be determined based on this address token. The elementary switches have to provide for buffering capabilities, since messages may have to be queued at different stages.

The distributed memory model refers to the distributed memory message passing architectures as well as to distributed memory SIMD computers. A typical distributed memory system consists of a large number of identical processors which have their own memories and which are interconnected in a regular topology. Examples are depicted in Figures 11.4 and 11.5. In these diagrams, each processor unit can be viewed actually as a complete processor with its own memory, CPU, I/O subsystem, control unit, etc. These processors are linked to a number of neighboring processors which in turn are linked to other neighboring processors, etc. In Message Passing models there is no global synchronization of the parallel tasks. Instead, computations are data driven because a processor performs a given task only when the operands it requires become available. The programmer must program all the data exchanges explicitly between processors. In SIMD designs, a different approach is used. A host processor stores the program and each slave processor holds different data. The host then broadcasts instructions to processors which execute them simultaneously. One advantage of this approach is that there is no need for large memories in each node to store large programs since the instructions are broadcast one by one to all processors.

Q Q

B H e DCA@ Y

H P a bY A e ` XU Ihf IbC@  CbDB cR 9 e f H R H Y P e Y P

 $ &

p8 8 $

%

% DBD% H %

Yc 0c" 


4 3 2 I6 2 6

 
s

8 ~

%

BDB% f

 p

An eight-processor ring (left) and a processor mesh (right).

multi-

An important advantage of distributed memory computers is their ability to exploit locality of data in order to keep communication costs to a minimum. Thus, a two-dimensional processor grid such as the one depicted in Figure 11.4 is perfectly suitable for solving discretized elliptic Partial Differential Equations (e.g., by assigning each grid point to a corresponding processor) because some iterative methods for solving the resulting linear systems will require only interchange of data between adjacent grid points. A good general purpose multiprocessor must have powerful mapping capabilities because it should be capable of easily emulating many of the common topologies such as 2-D and 3-D grids or linear arrays, FFT-butteries, nite element meshes, etc. Three-dimensional congurations are also popular. A massively parallel commercial computer based on a 3-D mesh, called T3D, is marketed by CRAY Research, Inc. For 2-D and 3-D grids of processors, it is common that processors on each side of the grid are connected to those on the opposite side. When these wrap-around connections are included, the topology is sometimes referred to as a torus. 110 100 10 11 010 011 101 111

00

01

000 " % %

001

The -cubes of dimensions

Hypercubes are highly concurrent multiprocessors based on the binary -cube topology which is well known for its rich interconnection capabilities. A parallel processor based on the -cube topology, called a hypercube hereafter, consists of identical processors, interconnected with neighbors. A -cube can be represented as an ordinary cube

pp
s H

 1 c "
"

8 8 H~
b

  bc0

R b
b

D# 

D# 

 vc"8
b

in three dimensions where the vertices are the nodes of the 3-cube; see Figure 11.5. More generally, one can construct an -cube as follows: First, the nodes are labeled by the binary numbers from to . Then a link between two nodes is drawn if and only if their binary numbers differ by one (and only one) bit. The rst property of an -cube graph is that it can be constructed recursively from lower dimensional cubes. More precisely, consider two identical -cubes whose vertices are labeled likewise from 0 to . By joining every vertex of the rst cube to the vertex of the second having the same number, one obtains an -cube. Indeed, it sufces to renumber the nodes of the rst cube as and those of the second as where is a binary number representing the two similar nodes of the -cubes and where denotes the concatenation of binary numbers. Separating an -cube into the subgraph of all the nodes whose leading bit is 0 and the subgraph of all the nodes whose leading bit is 1, the two subgraphs are such that each node of the rst is connected to one node of the second. If the edges between these two graphs is removed, the result is 2 disjoint -cubes. Moreover, generally, for a given numbering, the graph can be separated into two subgraphs obtained by considering all the bit is 0 and those whose bit is 1. This will be called tearing along the nodes whose direction. Since there are bits, there are directions. One important consequence of this is that arbitrary meshes with dimension can be mapped on hypercubes. However, the hardware cost for building a hypercube is high, because each node becomes difcult to design for larger dimensions. For this reason, recent commercial vendors have tended to prefer simpler solutions based on two- or three-dimensional meshes. Distributed memory computers come in two different designs, namely, SIMD and MIMD. Many of the early projects have adopted the SIMD organization. For example, the historical ILLIAC IV Project of the University of Illinois was a machine based on a mesh topology where all processors execute the same instructions. SIMD distributed processors are sometimes called array processors because of the regular arrays that they constitute. In this category, systolic arrays can be classied as an example of distributed computing. Systolic arrays are distributed memory computers in which each processor is a cell which is programmed (possibly micro-coded) to perform only one of a few operations. All the cells are synchronized and perform the same task. Systolic arrays are designed in VLSI technology and are meant to be used for special purpose applications, primarily in signal processing.

Now consider two prototype Krylov subspace techniques, namely, the preconditioned Conjugate Gradient method for the symmetric case and the preconditioned GMRES algorithm for the nonsymmetric case. For each of these two techniques, we analyze the types of operations that are performed. It should be emphasized that other Krylov subspace techniques require similar operations.

fx`

x`

o`

s H

 $ &

p8 8 $
S Y
H

Yc 0c" 

fo`

f d

s H

m
G

s H

m|   EUd  3HU

8 ~

s H

Consider Algorithm 9.1. The rst step when implementing this algorithm on a highperformance computer is identifying the main operations that it requires. We distinguish ve types of operations, which are:

Preconditioner setup. Matrix vector multiplications. Vector updates. Dot products. Preconditioning operations.

In the above list the potential bottlenecks are (1), setting up the preconditioner and (5), solving linear systems with , i.e., the preconditioning operation. Section 11.6 discusses the implementation of traditional preconditioners, and the last two chapters are devoted to preconditioners that are specialized to parallel environments. Next come the matrixby-vector products which deserve particular attention. The rest of the algorithm consists essentially of dot products and vector updates which do not cause signicant difculties in parallel machines, although inner products can lead to some loss of efciency on certain types of computers with large numbers of processors.

The only new operation here with respect to the Conjugate Gradient method is the orthogonalization of the vector against the previous s. The usual way to accomplish this is via the modied Gram-Schmidt process, which is basically a sequence of subprocesses of the form:

This orthogonalizes a vector against another vector of norm one. Thus, the outer loop of the modied Gram-Schmidt is sequential, but the inner loop, i.e., each subprocess, can be parallelized by dividing the inner product and SAXPY operations among processors. Although this constitutes a perfectly acceptable approach for a small number of processors, the elementary subtasks may be too small to be efcient on a large number of processors. An alternative for this case is to use a standard Gram-Schmidt process with reorthogonalization. This replaces the previous sequential orthogonalization process by a matrix operation of the form , i.e., BLAS-1 kernels are replaced by BLAS-2 kernels. Recall that the next level of BLAS, i.e., level 3 BLAS, exploits blocking in dense matrix operations in order to obtain performance on machines with hierarchical memories. Unfortunately, level 3 BLAS kernels cannot be exploited here because at every step, there is only one vector to orthogonalize against all previous ones. This may be remedied by using block Krylov methods.

Compute Compute

. .

8 C IcXrbY ccXC 3C R H W UP P R W U H e B H f D3e A8

52 2 I6 4 6

6 2 2 I6 6

 $ 

pw vc"8 8 

a % `

!  

These are usually the simplest operations to implement on any computer. In many cases, compilers are capable of recognizing them and invoking the appropriate machine instructions, possibly vector instructions. In the specic case of CG-like algorithms, there are two types of operations: vector updates and dot products. Vector Updates Operations of the form y(1:n) = y(1:n) + a * x(1:n),

where is a scalar and and two vectors, are known as vector updates or SAXPY operations. They are typically straightforward to implement in all three machine models discussed earlier. On an SIMD computer, the above code segment can be used on many of the recent systems and the compiler will translate it into the proper parallel version. The above line of code is written in FORTRAN 90, which is the prototype programming language for this type of computers. On shared memory computers, we can simply write the usual FORTRAN loop, possibly in the above FORTRAN 90 style on some computers, and the compiler will translate it again in the appropriate parallel executable code. On distributed memory computers, some assumptions must be made about the way in which the vectors are distributed. The main assumption is that the vectors and are distributed in the same manner among the processors, meaning the indices of the components of any vector that are mapped to a given processor are the same. In this case, the vector-update operation will be translated into independent vector updates, requiring no communication. Specically, if is the number of variables local to a given processor, this processor will simply execute a vector loop of the form y(1:nloc) = y(1:nloc) + a * x(1:nloc) and all processors will execute a similar operation simultaneously. Dot products A number of operations use all the components of a given vector to compute a single oating-point result which is then needed by all processors. These are termed Reduction Operations and the dot product is the prototype example. A distributed version of the dot-product is needed to compute the inner product of two vectors and that are distributed the same way across the processors. In fact, to be more specic, this distributed dot-product operation should compute the inner product of these two vectors and then make the result available in each processor. Typically, this result is needed to perform vector updates or other operations in each node. For a large number of processors, this sort of operation can be demanding in terms of communication costs. On the other hand, parallel computer designers have become aware of their importance and are starting to provide hardware and software support for performing global reduction operations efciently. Reduction operations that can be useful include global sums, global max/min calculations, etc. A commonly adopted convention provides a single subroutine for all these operations, and passes the type of operation to be performed (add, max, min, multiply,. . . ) as one of the arguments. With this in mind, a distributed dot-product function can be programmed roughly as follows.

 $ &

WXrbI3U XUcY b 2 2 I6 UP Y 9 e H e H 6 p8 Yc m0c" 8 $ 

8 ~

The function DDOT performs the usual BLAS-1 dot product of x and with strides incx and incy, respectively. The REDUCE operation, which is called with add as the operation-type parameter, sums all the variables tloc from each processor and put the in each processor. resulting global sum in the variable

To conclude this section, the following important observation can be made regarding the practical implementation of Krylov subspace accelerators, such as PCG or GMRES. The only operations that involve communication are the dot product, the matrix-by-vector product, and, potentially, the preconditioning operation. There is a mechanism for delegating the last two operations to a calling program, outside of the Krylov accelerator. The result of this is that the Krylov acceleration routine will be free of any matrix data structures as well as communication calls. This makes the Krylov routines portable, except for the possible redenition of the inner product distdot. This mechanism, particular to FORTRAN programming, is known as reverse communication. Whenever a matrix-by-vector product or a preconditioning operation is needed, the subroutine is exited and the calling program unit performs the desired operation. Then the subroutine is called again, after placing the desired result in one of its vector arguments. A typical execution of a exible GMRES routine with reverse communication is shown in the code segment below. The integer parameter icode indicates the type of operation needed by the subroutine. When icode is set to one, then a preconditioning operation must be applied to the vector . The result is copied in and FGMRES is called again. If it is equal to two, then the vector must be multiplied by the matrix . The result is then copied in and FGMRES is called again.

Reverse communication enhances the exibility of the FGMRES routine substantially. For example, when changing preconditioners, we can iterate on a coarse mesh and do the

2$7    c $8 E$2h)72$s% c H)"fE$# 7 r A e W 9  6 w u u s% 7 r A e # A p 6 w 7 7   7$4g"RdR$7 7  #   c $8 97E2E$9hW)2$s4% c (gH)Ex #  7 x  9 7 6 w u u s %  7 9 # A 7Ep6 c gHR`y 7 7   6E`!`F2A2e 7  %  % ' B % x 7 w u &2X%Y$sv% c utgH%!HF`qh&2"%ihhg"fE7$E@# s% s r r% % #  % p 9 e % 9 e 8 # A 7  Ed c a I 7   b@

W XU P QI AW @ f  H QIb 3e Y 9 P f U B e H H

 $7 `WYX&$#EV!TE2 6   AW %  U  T S I    6 3 % 3 % ' % ' %  # 10)4)15)G&EE"R PEE Q I  # EH!G%FE"!DA$9 6  # 3 6  # ' C B # 7  # 9 7 8 7 $@$$1 6 3 % 3 % ' % ' %  #      15)4)210)(&$"!

 $  

2 2 I6 6

pw vc"8 8 
H

necessary interpolations to get the result in in a given step and then iterate on the ne mesh in the following step. This can be done without having to pass any data regarding the matrix or the preconditioner to the FGMRES accelerator. Note that the purpose of reverse communication simply is to avoid passing data structures related to the matrices, to the accelerator FGMRES. The problem is that these data structures are not xed. For example, it may be desirable to use different storage formats for different architectures. A more elegant solution to this problem is Object-Oriented Programming. In Object-Oriented Programming languages such as C++, a class can be declared, e.g., a class of sparse matrices, and operators can be dened on them. Data structures are not passed to these operators. Instead, the implementation will recognize the types of the operands and invoke the proper functions. This is similar to what exists currently for arithmetic. For operation , the compiler will recognize what type of operand is involved and invoke the proper operation, either integer, double real, or complex, etc.

Matrix-by-vector multiplications (sometimes called Matvecs for short) are relatively easy to implement efciently on high performance computers. For a description of storage formats for sparse matrices, see Chapter 3. We will rst discuss matrix-by-vector algorithms without consideration of sparsity. Then we will cover sparse Matvec operations for a few different storage formats.

The computational kernels for performing sparse matrix operations such as matrix-by-vector products are intimately associated with the data structures used. However, there are a few general approaches that are common to different algorithms for matrix-by-vector products which can be described for dense matrices. Two popular ways of performing these operations are (1) the inner product form described in Algorithm 11.1, and (2) the SAXPY form described by Algorithm 11.2.

1. Do i = 1, n 2. y(i) = dotproduct(a(i,1:n),x(1:n)) 3. EndDo

The dot product operation dotproduct(v(1:n),w(1:n)) computes the dot product of the two vectors v and w of length each. If there is no ambiguity on the bounds, we simply write dotproduct(v,w). The above algorithm proceeds by rows. It computes the dot-product of row of the matrix with the vector and assigns the result to . The next algorithm

a`

B H P e Y 9 D! CbQgf IB ISRd #I!9 H W H U H B

0%

 $ &

p8 8 $
00
"

! &    "  9I!   " 5D wEn

Yc 0c"  H

mX# %n ( "

H a cbY

6 6 72 2 I6

8 ~
(

m|

uses columns instead and results in the use of the SAXPY operations.

1. y(1:n) = 0.0 2. Do j = 1, n 3. y(1:n) = y(1:n) + x(j) * a(1:n,j) 4. EndDo

The SAXPY form of the Matvec operation computes the result as a linear combination of the columns of the matrix . A third possibility consists of performing the product by diagonals. This option bears no interest in the dense case, but it is at the basis of many important matrix-by-vector algorithms in the sparse case.

1. y(1:n) = 0 2. Do k = n+1, n 1 3. Do i = 1 min(k,0), n max(k,0) 4. y(i) = y(i) + a(i,k+i)*x(k+i) 5. EndDo 6. EndDo

The product is performed by diagonals, starting from the leftmost diagonal whose offset is to the rightmost diagonal whose offset is .

One of the most general schemes for storing sparse matrices is the Compressed Sparse Row storage format described in Chapter 3. Recall that the data structure consists of three arrays: a real array A(1:nnz) to store the nonzero elements of the matrix row-wise, an integer array JA(1:nnz) to store the column positions of the elements in the real array A, and, nally, a pointer array IA(1:n+1), the -th entry of which points to the beginning of the -th row in the arrays A and JA. To perform the matrix-by-vector product in parallel using this format, note that each component of the resulting vector can be computed independently as the dot product of the -th row of the matrix with the vector . This yields the following sparse version of Algorithm 11.1 for the case where the matrix is stored in CSR format.

1. Do i = 1, n 2. k1 = ia(i) 3. k2 = ia(i+1)-1 4. y(i) = dotproduct(a(k1:k2),x(ja(k1:k2))) 5. EndDo

B QgCU Y 9 f e

! "  9I! &

0%

G

0%

00
 "

R 9 e B cW #I Hca Y

 " %

00

! " 
"

  &  1   

! " rv"  D wh n  

! " 

 V)

4 52 2 I6 6

w w 8 8 " } } p   tD

    D wh n

 wh n
G
w

Line 4 of the above algorithm computes the dot product of the vector with components a(k1), a(k1+1), , a(k2) with the vector with components x(ja(k1)), x(ja(k1+1)), , x(ja(k2)). The fact that the outer loop can be performed in parallel can be exploited on any parallel platform. On some shared-memory machines, the synchronization of this outer loop is inexpensive and the performance of the above program can be excellent. On distributed memory machines, the outer loop can be split in a number of steps to be executed on each processor. Thus, each processor will handle a few rows that are assigned to it. It is common to assign a certain number of rows (often contiguous) to each processor and to also assign the component of each of the vectors similarly. The part of the matrix that is needed is loaded in each processor initially. When performing a matrix-by-vector product, interprocessor communication will be necessary to get the needed components of the vector that do not reside in a given processor. This important case will return in Section 11.5.6.
+ +

Gather DotProduct y(i)

x(*) a(i,*)
+

x(1:n)

Illustration of the row-oriented matrix-byvector multiplication. The indirect addressing involved in the second vector in the dot product is called a gather operation. The vector x(ja(k1:k2)) is rst gathered from memory into a vector of contiguous elements. The dot product is then carried out as a standard dot-product operation between two dense vectors. This is illustrated in Figure 11.6. This example illustrates the use of scientic libraries for performing sparse matrix operations. If the pseudo-code for Algorithm 11.4 is compiled as it is on the Connection Machine, in CM-FORTRAN (Thinking Machines early version of FORTRAN 90), the resulting computations will be executed on the front-end host of the CM-2 or the Control Processor (CP) of the CM-5, rather than on the PEs. This is due to the fact that the code does not involve any vector constructs. The scientic library (CMSSL) provides gather and scatter operations as well as scan add operations which can be exploited to implement this algorithm more efciently as is show in the following code segment: y = 0.0 call sparse util gather ( tmp, x, gather trace, ) tmp = a*tmp call cmf scan add (tmp, tmp, cmf upward, cmf inclusive,

DB

 $ &

p8 8 $ DB

Yc 0c" 

" Dd

8 ~

 5D#   

call sparse util scatter ( y, scatter pointer, tmp, scatter trace, ) The sparse util gather routine is rst called to gather the corresponding entries from the vector into a temporary array , then the multiplications are carried out element-byelement in parallel. The cmf scan add routine from the CM Fortran Utility Library is used to perform the summation for each row. Finally, the call to sparse util scatter copies the results. Segmented scan-adds are particularly useful for implementing sparse matrix-byvector products when they are provided as part of the libraries. Note that the sparse utilgather setup and sparse util scatter setup routines must be called to compute the communication patterns, gather trace and scatter trace, before this algorithm is called. These tend to be expensive operations. Now assume that the matrix is stored by columns (CSC format). The matrix-by-vector product can be performed by the following algorithm which is a sparse version of Algorithm 11.2.

1. y(1:n) = 0.0 2. Do i = 1, n 3. k1 = ia(i) 4. k2 = ia(i + 1)-1 5. y(ja(k1:k2)) = y(ja(k1:k2)) + x(j) * a(k1:k2) 6. EndDo

The above code initializes to zero and then adds the vectors for to it. It can also be used to compute the product of the transpose of a matrix by a vector, when the matrix is stored (row-wise) in the CSR format. Normally, the vector y(ja(k1:k2)) is gathered and the SAXPY operation is performed in vector mode. Then the resulting vector is scattered back into the positions ja(*), by what is called a Scatter operation. This is illustrated in Figure 11.7. A major difculty with the above FORTRAN program is that it is intrinsically sequential. First, the outer loop is not parallelizable as it is, but this may be remedied as will be seen shortly. Second, the inner loop involves writing back results of the right-hand side into memory positions that are determined by the indirect address function ja. To be correct, y(ja(1)) must be copied rst, followed by y(ja(2)), etc. However, if it is known that the mapping ja(i) is one-to-one, then the order of the assignments no longer matters. Since compilers are not capable of deciding whether this is the case, a compiler directive from the user is necessary for the Scatter to be invoked.


a %

` na #

! "

 )

  &  1    

! " 

BB
!

w w 8 8 " } } p

    D wh n

%BDB% G

+ +

Gather

+ x(j)* y(*) a(*,j)

= y(*)

y(1:n)

Illustration of the column-oriented matrix-byvector multiplication. Going back to the outer loop, subsums can be computed (independently) into separate temporary vectors and then these subsums can be added at the completion of all these partial sums to obtain the nal result. For example, an optimized version of the previous algorithm can be implemented as follows:

1. 2. 3. 4. 5. 6. 7. 8. 9.

tmp(1:n,1:p) = 0.0 Do m=1, p Do j = m, n, p k1 = ia(j) k2 = ia(j + 1)-1 tmp(ja(k1:k2),j) = tmp(ja(k1:k2),j) + x(j) * a(k1:k2) EndDo EndDo y(1:n) = SUM(tmp,dim=2)

The SUM across the second dimension at the end of the algorithm constitutes additional work, but it is highly vectorizable and parallelizable.

The above storage schemes are general but they do not exploit any special structure of the matrix. The diagonal storage format was one of the rst data structures used in the context of high performance computing to take advantage of special sparse structures. Often, sparse matrices consist of a small number of diagonals in which case the matrix-by-vector product can be performed by diagonals as in Algorithm 11.3. For sparse matrices, most of the diagonals invoked in the outer loop of Algorithm 11.3 are zero. There are again different variants of Matvec algorithms for the diagonal format, related to different orderings of the loops in the basic FORTRAN program. Recall that the matrix is stored in a rectangular array diag(1:n,1:ndiag) and the offsets of these diagonals from the main diagonal may be stored in a small integer array offset(1:ndiag). Consider a dot-product variant rst.

Y f e Q9 3XU SCXU 8 9 PcR Hca Y W P b Q9 f T 9 W B H Y

! " 

 $ &
Scatter
+ + + +

p8 8 $
 V)

2 %

Yc 0c"   

! "

 Dd

2 2 I6 6

   "D wEn

8 ~

y(1:n)

1. Do i = 1, n 2. tmp = 0.0d0 3. Do j = 1, ndiag 4. tmp = tmp + diag(i,j)*x(i+offset(j)) 5. EndDo 6. y(i) = tmp 7. EndDo

In a second variant, the vector is initialized to zero, and then is multiplied by each of the diagonals and the separate results are added to . The innermost loop in this computation is sometimes called a Triad operation.

1. y = 0.0d0 2. Do j = 1, ndiag 3. joff = offset(j) 4. i1 = max(1, 1-offset(j)) 5. i2 = min(n, n-offset(j)) 6. y(i1:i2) = y(i1:i2) + diag(i1:i2,j)*x(i1+joff:i2+joff) 7. EndDo

Good speeds can be reached on vector machines when the matrix is large enough. One drawback with diagonal storage is that it is not general enough. For general sparse matrices, we can either generalize the diagonal storage scheme or reorder the matrix in order to obtain a diagonal structure. The simplest generalization is the Ellpack-Itpack Format.

The Ellpack-Itpack (or Ellpack) format is of interest only for matrices whose maximum number of nonzeros per row, jmax, is small. The nonzero entries are stored in a real array ae(1:n,1:jmax). Along with this is integer array jae(1:n,1:jmax) which stores the column indices of each corresponding entry in ae. Similar to the diagonal scheme, there are also two basic ways of implementing a matrix-by-vector product when using the Ellpack format. We begin with an analogue of Algorithm 11.7.

1. Do i = 1, n 2. yi = 0 3. Do j = 1, ncol 4. yi = yi + ae(j,i) * x(jae(j,i)) 5. EndDo

Y 9 f e g3XU

! " & '! ((   

! " 9I! &

Y 0E P

! ) &% !

"

 " %

! %   % " 2  tD wh n

9 T  T QH HcbY a

  &  1   

! "


0 %

  rv"  D wh n w w 8 8 " } } p


2 2 I6 6

  D wh n  

6. y(i) = yi 7. EndDo
In data-parallel mode, the above algorithm can be implemented by using a temporary two-dimensional array to store the values , and then performing a pointwise array product of and this two-dimensional array. The result is then summed along the rows

forall ( i=1:n, j=1:ncol ) tmp(i,j) = x(jae(i,j)) y = SUM(ae*tmp, dim=2).


The FORTRAN forall construct performs the operations as controlled by the loop heading, in parallel. Alternatively, use of the temporary array can be avoided by recoding the above lines as follows:

forall (i = 1:n) y(i) = SUM(ae(i,1:ncol)*x(jae(i,1:ncol))) .


The main difference between these loops and the previous ones for the diagonal format is the presence of indirect addressing in the innermost computation. A disadvantage of the Ellpack format is that if the number of nonzero elements per row varies substantially, many zero elements must be stored unnecessarily. Then the scheme becomes inefcient. As an extreme example, if all rows are very sparse except for one of them which is full, then the arrays ae, jae must be full arrays, containing mostly zeros. This is remedied by a variant of the format which is called the jagged diagonal format.

A more general alternative to the diagonal or Ellpack format is the Jagged Diagonal (JAD) format. This can be viewed as a generalization of the Ellpack-Itpack format which removes the assumption on the xed length rows. To build the jagged diagonal structure, start from the CSR data structure and sort the rows of the matrix by decreasing number of nonzero elements. To build the rst jagged diagonal (j-diagonal), extract the rst element from each row of the CSR data structure. The second jagged diagonal consists of the second elements of each row in the CSR data structure. The third, fourth, , jagged diagonals can then be extracted in the same fashion. The lengths of the successive j-diagonals decreases. The number of j-diagonals that can be extracted is equal to the number of nonzero elements of the rst row of the permuted matrix, i.e., to the largest number of nonzero elements per row. To store this data structure, three arrays are needed: a real array DJ to store the values of the jagged diagonals, the associated array JDIAG which stores the column positions of these values, and a pointer array IDIAG which points to the beginning of each j-diagonal in the DJ, JDIAG arrays.

Y 9 f e g3XU SCXU 8 9 PcR IA8 8 9 T 9 W R H

DB

 $ &

p8 8 $

a a  ` % `

Yc 0c" 

H a cbY

2 2 I6 6

8 ~

Consider the following matrix and its sorted version


The rows of have been obtained from those of by sorting them by number of nonzero elements, from the largest to the smallest number. Then the JAD data structure for is as follows: DJ JDIAG IDIAG 3. 6. 1. 9. 1 2 1 3 1 6 11 13 11. 4 4. 2 7. 3 2. 3 10. 4 12. 5 5. 4 8. 5

Thus, there are two j-diagonals of full length (ve) and one of length two. A matrix-by-vector product with this storage scheme can be performed by the following code segment.

1. Do j=1, ndiag 2. k1 = idiag(j) 3. k2 = idiag(j+1) 1 4. len = idiag(j+1) k1 5. y(1:len) = y(1:len) + dj(k1:k2)*x(jdiag(k1:k2)) 6. EndDo Since the rows of the matrix have been permuted, the above code will compute , a permutation of the vector , rather than the desired . It is possible to permute the result back to the original ordering after the execution of the above program. This operation can also be performed until the nal solution has been computed, so that only two permutations on the solution vector are needed, one at the beginning and one at the end. For preconditioning operations, it may be necessary to perform a permutation before or within each call to the preconditioning subroutines. There are many possible variants of the jagged diagonal format. One variant which does not require permuting the rows is described in Exercise 8.

Given a sparse linear system to be solved on a distributed memory environment, it is natural to map pairs of equations-unknowns to the same processor in a certain predetermined way. This mapping can be determined automatically by a graph partitioner or it can be assigned ad hoc from knowledge of the problem. Assume that there is a convenient partitioning of the adjacency graph. Without any loss of generality, the matrix under consideration can be viewed as originating from the discretization of a Partial Differential Equation on a certain domain. This is illustrated in Figure 11.8. Initially, assume that each subgraph (or subdomain, in the PDE literature) is assigned to a different processor, although this restriction

G H G 0GG E E E E E E GE rE E r H E E E Q p"E

B H P Y 9 D! Ce gf IIe 9 H B

`IbC@ B R H Y

 Ce P

Y B PcRd hI!9 U H B

  &  1   

rE EE H
G

G cG G E E E E r E E E p"E E HE EQ G tD#    w w 8 8 " } } p

H a cbY

2 2 I6 6

can be relaxed, i.e., a processor can hold several subgraphs to increase parallelism.

Internal points

Decomposition of physical domain or adjacency graph and the local data structure. A local data structure must be set up in each processor (or subdomain, or subgraph) which will allow the basic operations such as (global) matrix-by-vector products and preconditioning operations to be performed efciently. The only assumption to make regarding the mapping is that if row number is mapped into processor , then so is the unknown , i.e., the matrix is distributed row-wise across the processors according to the distribution of the variables. The graph is assumed to be undirected, i.e., the matrix has a symmetric pattern. It is important to preprocess the data in order to facilitate the implementation of the communication tasks and to gain efciency during the iterative process. The preprocessing requires setting up the following: information in each processor. List of processors with which communication will take place. These are called neighboring processors although they may not be physically nearest neighbors. List of local nodes that are coupled with external nodes. These are the local interface nodes. Local representation of the distributed matrix in each processor.

 $ &
External interface points Internal interface points

p8 8 $

Yc 0c" 



Dd

8 ~



In order to perform a matrix-by-vector product with a distributed sparse matrix, the matrix consisting of rows that are local to a given processor must be multiplied by some global vector . Some components of this vector will be local, and some components must be brought from external processors. These external variables correspond to interface points belonging to adjacent subdomains. When performing a matrix-by-vector product, neighboring processors must exchange values of their adjacent interface nodes.

Internal points (

External interface matrix

The local matrices and data structure associated with each subdomain. Let be the local part of the matrix, i.e., the (rectangular) matrix consisting of all the rows that are mapped to myproc. Call the diagonal block of located in , i.e., the submatrix of whose nonzero elements are such that is a local variable. Similarly, call the offdiagonal block, i.e., the submatrix of whose nonzero elements are such that is not a local variable. To perform a matrix-by-vector product, by the local variables. Then, multiply the external start multiplying the diagonal block variables by the sparse matrix . Notice that since the external interface points are not coupled with local internal points, only the rows to in the matrix will have nonzero elements. Thus, the matrix-by-vector product can be separated into two such operations, one involving only the local variables and the other involving external variables. It is necessary to construct these two matrices and dene a local numbering of the local variables in order to perform the two matrix-by-vector products efciently each time. To perform a global matrix-by-vector product, with the distributed data structure described above, each processor must perform the following operations. First, multiply the local variables by the matrix . Second, obtain the external variables from the neighboring processors in a certain order. Third, multiply these by the matrix and add the resulting vector to the one obtained from the rst multiplication by . Note that the rst and second steps can be done in parallel. With this decomposition, the global matrixby-vector product can be implemented as indicated in Algorithm 11.10 below. In what follows, is a vector of variables that are local to a given processor. The components corresponding to the local interface points (ordered to be the last components in for convenience) are called . The external interface points, listed in a certain order, constitute a vector which is called . The matrix is a sparse matrix which represents the restriction of to the local variables . The matrix operates on the

p t

Local interface points (

Y
( s

p t

S s

vP t

S

  &  1   

p t

p t

p t

( s

w w 8 8 " } } p

tD# 

p t

S

An important observation is that the matrix-by-vector products in lines 4 and 5 can use any convenient data structure that will improve efciency by exploiting knowledge on the local architecture. An example of the implementation of this operation is illustrated next:

call bdxchg(nloc,x,y,nproc,proc,ix,ipr,type,xlen,iout) y(1:nloc) = 0.0 call amux1 (nloc,x,y,aloc,jaloc,ialoc) nrow = nloc nbnd + 1 call amux1(nrow,x,y(nbnd),aloc,jaloc,ialoc(nloc+1)) In the above code segment, bdxchg is the only routine requiring communication. Its purpose is to exchange interface values between nearest neighbor processors. The rst call to amux1 performs the operation , where has been initialized to zero prior to the call. The second call to amux1 performs . Notice that the data for the matrix is simply appended to that of , a standard technique used for storing a succession of sparse matrices. The matrix acts only on the subvector of which starts at location of . The size of the matrix is .

Each step of a preconditioned iterative method requires the solution of a linear system of equations

This section only considers those traditional preconditioners, such as ILU or SOR or SSOR, in which the solution with is the result of solving triangular systems. Since these are commonly used, it is important to explore ways to implement them efciently in a parallel environment. We only consider lower triangular systems of the form

Without loss of generality, it is assumed that

is unit lower triangular.

qrGG

bink

p p t t

 C

m|  fh# xnEtvgP!U bUx " " "

  

p t

p t

p p t t

  

 

bi

p t

p t ( s

1. 2. 3. 4. 5.

Exchange interface data, i.e., Scatter to neighbors and Gather from neighbors Do Local Matvec: Do External Matvec:

external variables to give the correction which must be added to the vector in order to obtain the desired result .

0 0 9I! ! &

 $ &

p8 8 $
! 7 %

! 0 ! %2 &10 7

Yc 0c" 

m
a

Ux`

"

   05D wEn

p t

8 ~

Typically in solving a lower triangular system, the solution is overwritten onto the righthand side on return. In other words, there is one array for both the solution and the right-hand side. Therefore, the forward sweep for solving a lower triangular system with coefcients and right-hand-side is as follows.

1. Do i=2, n 2. For (all j such that al(i,j) is nonzero) Do: 3. x(i) := x(i) al(i,j) * x(j) 4. EndDo 5. EndDo

Assume that the matrix is stored row wise in the general Compressed Sparse Row (CSR) format, except that the diagonal elements (ones) are not stored. Then the above algorithm translates into the following code segment:

1. Do i=2, n 2. Do j=ial(i), ial(i+1) 1 3. x(i)=x(i) al(j) * x(jal(j)) 4. EndDo 5. EndDo The outer loop corresponding to the variable is sequential. The loop is a sparse dot row of and the (dense) vector . This dot product may be split among product of the the processors and the partial results may be added at the end. However, the length of the vector involved in the dot product is typically short. So, this approach is quite inefcient in general. We examine next a few alternative approaches. The regularly structured and the irregularly structured cases are treated separately.

First, consider an example which consists of a 5-point matrix associated with a mesh as represented in Figure 11.10. The lower triangular matrix associated with this mesh is represented in the left side of Figure 11.10. The stencil represented in the right side of Figure 11.10 establishes the data dependence between the unknowns in the lower triangular system solution when considered from the point of view of a grid of unknowns. It tells us that in order to compute the unknown in position , only the two unknowns in positions and are needed . The unknown does not depend on any other variable and can be computed rst. Then the value of can be used to get and simultaneously. Then these two values will in turn enable and to be obtained simultaneously, and so on. Thus, the computation can proceed in wavefronts. The steps for this wavefront algorithm are shown with dashed lines in Figure 11.10. Observe that the

f C g


"

C f

B H P Y D! e 9gf Y P 3W qU & U I!I cbD5P !@cIHc xQb DT E H B 9 H a Y 8 W T R a B T H H

Cf

B H I IH

C ks% f C f f

B 3e 9 R

ff

 $ &

% 

XU W P f B P QDT Se 9 763 2 I6 e T H T 9 2 6 p& $ H& 8    #$"v 8 $ $  8


a 0 ` %

'! $!) 0 ! 2 D5D wh n & % # %   

42 6 532 I6

% 0 `

a % 7`

a %

maximum degree of parallelism (or vector length, in the case of vector processing) that can be reached is the minimum of , , the number of mesh points in the and directions, respectively, for 2-D problems. For 3-D problems, the parallelism is of the order of the maximum size of the sets of domain points , where , a constant level . It is important to note that there is little parallelism or vectorization at the beginning and at the end of the sweep. The degree of parallelism is equal to one initially, and then increases by one for each wave reaching its maximum, and then decreasing back down to one at the end of the sweep. For example, for a grid, the levels (sets of equations that can be solved in parallel) are , , , , , and nally . The rst and last few steps may take a heavy toll on achievable speed-ups.
9 10 11 12

Stencil 1 2 3 4 " Level scheduling for a

grid problem.

The idea of proceeding by levels or wavefronts is a natural one for nite difference matrices on rectangles. Discussed next is the more general case of irregular matrices, a textbook example of scheduling, or topological sorting, and is well known in different forms to computer scientists.

The simple scheme described above can be generalized for irregular grids. The objective of the technique, called level scheduling, is to group the unknowns in subsets so that they can be determined simultaneously. To explain the idea, consider again Algorithm 11.11 for solving a unit lower triangular system. The -th unknown can be determined once all the other ones that participate in equation become available. In the -th step, all unknowns that must be known. To use graph terminology, these unknowns are adjacent to unknown number . Since is lower triangular, the adjacency graph is a directed acyclic graph. The edge in the graph simply indicates that must be known before can be determined. It is possible and quite easy to nd a labeling of the nodes that satisfy the property that if , then task must be executed before task . This is called a topological sorting of the unknowns. The rst step computes and any other unknowns for which there are no predecessors

a %

i7`

&

a 0 ` %

B C A8 e &T @ pC35P XU W P T @cIA! VQb DT a 9 e 9 8 H e e e 8 R H a B T H H

G 0G

% w 07`

% p

&

 $ & E G

w

% %

p8 8 $

&

"

( D %p" %

C C S

Yc 0c" 

&

( %

a`

 p
& (

a `

&

05 Dd

3 2 I6 2 6

8 ~
E

a 0 ` %

in the graph, i.e., all those unknowns for which the offdiagonal elements of row are zero. These unknowns will constitute the elements of the rst level. The next step computes in parallel all those unknowns that will have the nodes of the rst level as their (only) predecessors in the graph. The following steps can be dened similarly: The unknowns that can be determined at step are all those that have as predecessors equations that have been determined in steps . This leads naturally to the denition of a depth for each unknown. The depth of a vertex is dened by performing the following loop for , after initializing to zero for all .

By denition, a level of the graph is the set of nodes with the same depth. A data structure for the levels can be dened: A permutation denes the new ordering and points to the beginning of the -th level in that array. Natural ordering Wavefront ordering

Lower triangular matrix associated with mesh

of Figure 11.10. Once these level sets are found, there are two different ways to proceed. The permutation vector can be used to permute the matrix according to the new order. In the example mentioned in the previous subsection, this means renumbering the variables , , , consecutively, i.e., as . The resulting matrix after the permutation is shown in the right side of Figure 11.11. An alternative is simply to keep the permutation array and use it to identify unknowns that correspond to a given level in the solution. Then the algorithm for solving the triangular systems can be written as follows, assuming that the matrix is stored in the usual row sparse matrix format. "

1. Do lev=1, nlev 2. j1 = level(lev) 3. j2 = level(lev+1) 1 4. Do k = j1, j2 5. i = q(k) 6. Do j= ial(i), ial(i+1) 1

& Q

0   &  0 0 

a 07` %

for all such that

v( E

a 

 $ &

% BBDD" %

p& $ H& 8    8 $ $
#  %
H
%

% a   `



&

a ` &

% i BDB% H % G

'! #$"  75D wh n & % !   

% % w

 D5D#

` a 7 

s% % G % a `

% % % DBBF( D p"

   #$" 

s% BBD% H % G

8 v
&

( %

&

7. x(i) = x(i) al(j) * x(jal(j)) 8. EndDo 9. EndDo 10. EndDo


An important observation here is that the outer loop, which corresponds to a level, performs an operation of the form

where is a submatrix consisting only of the rows of level , and excluding the diagonal elements. This operation can in turn be optimized by using a proper data structure for these submatrices. For example, the JAD data structure can be used. The resulting performance can be quite good. On the other hand, implementation can be quite involved since two embedded data structures are required.

Natural ordering

Level-Scheduling ordering

Lower-triangular matrix associated with a nite element matrix and its level-ordered version.

Consider a nite element matrix obtained from the example shown in Figure 3.1. After an additional level of renement, done in the same way as was described in Chapter 3, the resulting matrix, shown in the left part of Figure 11.12, is of size . In this case, levels are obtained. If the matrix is reordered by levels, the matrix shown in the right side of the gure results. The last level consists of only one element.

 $ &

p8 8 $

u 

Yc 0c" 

75 Dd

8 ~

D#    

 

1 Give a short answer to each of the following questions:

What is the main disadvantage of shared memory computers based on a bus architecture? What is the main factor in yielding the speed-up in pipelined processors? Related to the previous question: What is the main limitation of pipelined processors in regards to their potential for providing high speed-ups?

2 Show that the number of edges in a binary -cube is

3 Show that a binary -cube is identical with a torus which is a mesh with wrap-around connections. Are there hypercubes of any other dimensions that are equivalent topologically to toruses? 4 A Gray code of length is a sequence of -bit binary numbers such that (a) any two successive numbers in the sequence differ by one and only one bit; (b) all -bit binary numbers are represented in the sequence; and (c) and differ by one bit. Find a Gray code sequence of length and show the (closed) path dened by the sequence of nodes of a 3-cube, whose labels are the elements of the Gray code sequence. What type of paths does a Gray code dene in a hypercube? To build a binary reected Gray code, start with the trivial Gray code sequence consisting of the two one-bit numbers 0 and 1. To build a two-bit Gray code, take the same sequence and insert a zero in front of each number, then take the sequence in reverse order and insert a one in front of each number. This gives . The process is repeated until an -bit sequence is generated. Show the binary reected Gray code sequences of length 2, 4, 8, and 16. Prove (by induction) that this process does indeed produce a valid Gray code sequence. Let an -bit Gray code be given and consider the sub-sequence of all elements whose rst bit Gray code sequence? Generalize this to any of bit is constant (e.g., zero). Is this an the -bit positions. Generalize further to any set of bit positions. Use the previous question to nd a strategy to map a mesh into an

-cube.

5 Consider a ring of processors which are characterized by the following communication performance characteristics. Each processor can communicate with its two neighbors simultaneously, i.e., it can send or receive a message while sending or receiving another message. The time for a message of length to be transmitted between two nearest neighbors is of the form

A message of length is broadcast to all processors by sending it from to and then from to , etc., until it reaches all destinations, i.e., until it reaches . How much time does it take for the message to complete this process? Now split the message into packets of equal size and pipeline the data transfer. Typically, each processor will receive packet number from the previous processor, while sending packet it has already received to the next processor. The packets will travel in chain from to , , to . In other words, each processor executes a program that is described roughly as follows:

r
9  R  b  5 R b b

H   H   3H  H e

e " RSP      

R $P A 

R $P

A"

& 

 T 

 5 3 6 Y  g   `AC $ ' 

&

"

 c"

  b  R b  aT

&

8 }$
"

E
 b   & 32 &1 & % & % & & % &1 & 32 &1

 (c2 " $  

There are a few additional conditionals. Assume that the number of packets is equal to . How much time does it take for all packets to reach all processors? How does this compare with the simple method in (a)? 6 (a) Write a short FORTRAN routine (or C function) which sets up the level number of each unknown of an upper triangular matrix. The input matrix is in CSR format and the output should be an array of length containing the level number of each node. (b) What data structure should be used to represent levels? Without writing the code, show how to determine this data structure from the output of your routine. (c) Assuming the data structure of the levels has been determined, write a short FORTRAN routine (or C function) to solve an upper triangular system using the data structure resulting in the previous question. Show clearly which loop should be executed in parallel. 7 In the jagged diagonal format described in Section 11.5.5, it is necessary to preprocess the matrix by sorting its rows by decreasing number of rows. What type of sorting should be used for this purpose? 8 In the jagged diagonal format described in Section 11.5.5, the matrix had to be preprocessed by sorting it by rows of decreasing number of elements.

What is the main reason it is necessary to reorder the rows? Assume that the same process of extracting one element per row is used. At some point the extraction process will come to a stop and the remainder of the matrix can be put into a CSR data structure. Write down a good data structure to store the two pieces of data and a corresponding algorithm for matrix-by-vector products. This scheme is efcient in many situations but can lead to problems if the rst row is very short. Suggest how to remedy the situation by padding with zero elements, as is done for the Ellpack format.

9 Many matrices that arise in PDE applications have a structure that consists of a few diagonals and a small number of nonzero elements scattered irregularly in the matrix. In such cases, it is advantageous to extract the diagonal part and put the rest in a general sparse (e.g., CSR) format. Write a pseudo-code to extract the main diagonals and the sparse part. As input parameter, the number of diagonals desired must be specied.

N OTES AND R EFERENCES . Kai Hwangs book [124] is recommended for an overview of parallel architectures. More general recommended reading on parallel computing are the book by Bertsekas and Tsitsiklis [25] and a more recent volume by Kumar et al. [139]. One characteristic of highperformance architectures is that trends come and go rapidly. A few years ago, it seemed that massive parallelism was synonymous with distributed memory computing, specically of the hypercube type. Currently, many computer vendors are mixing message-passing paradigms with global address space, i.e., shared memory viewpoint. This is illustrated in the recent T3D machine built by CRAY Research. This machine is congured as a three-dimensional torus and allows all three programming paradigms discussed in this chapter, namely, data-parallel, shared memory, and message-passing. It is likely that the T3D will set a certain trend. However, another recent development is the advent of network supercomputing which is motivated by astounding gains both in workstation performance and in high-speed networks. It is possible to solve large problems on clusters of workstations and to

 T

"

d  5 V B Q  q a g Qd t     Y   5 3 V   5 V   V BBB( 2 qY XWhB QB5v   Y   g 5 3 V 2   

 $ &

p8 8 $
"

Yc 0c" 

8 ~

&1 & '%

&32

obtain excellent performance at a fraction of the cost of a massively parallel computer. Regarding parallel algorithms, the survey paper of Ortega and Voigt [156] gives an exhaustive bibliography for research done before 1985 in the general area of solution of Partial Differential Equations on supercomputers. An updated bibliography by Ortega, Voigt, and Romine is available in [99]. See also the survey [178] and the monograph [71]. Until the advent of supercomputing in the mid 1970s, storage schemes for sparse matrices were chosen mostly for convenience as performance was not an issue, in general. The rst paper showing the advantage of diagonal storage schemes in sparse matrix computations is probably [133]. The rst discovery by supercomputer manufacturers of the specicity of sparse matrix computations was the painful realization that without hardware support, vector computers could be inefcient. Indeed, the early CRAY machines did not have hardware instructions for gather and scatter operations but this was soon remedied in the second-generation machines. For a detailed account of the benecial impact of hardware for scatter and gather on vector machines, see [146]. Level scheduling is a textbook example of topological sorting in graph theory and was discussed from this viewpoint in, e.g., [8, 190, 228]. For the special case of nite difference matrices on rectangular domains, the idea was suggested by several authors independently, [208, 209, 111, 186, 10]. In fact, the level scheduling approach described in this chapter is a greedy approach and is unlikely to be optimal. There is no reason why an equation should be solved as soon as it is possible. For example, it may be preferable to use a backward scheduling [7] which consists of dening the levels from bottom up in the graph. Thus, the last level consists of the leaves of the graph, the previous level consists of their predecessors, etc. Dynamic scheduling can also be used as opposed to static scheduling. The main difference is that the level structure is not preset; rather, the order of the computation is determined at run-time. The advantage over pre-scheduled triangular solutions is that it allows processors to always execute a task as soon as its predecessors have been completed, which reduces idle time. On loosely coupled distributed memory machines, this approach may be the most viable since it will adjust dynamically to irregularities in the execution and communication times that can cause a lock-step technique to become inefcient. However, for those shared memory machines in which hardware synchronization is available and inexpensive, dynamic scheduling would have some disadvantages since it requires managing queues and generates explicitly busy waits. Both approaches have been tested and compared in [22, 189] where it was concluded that on the Encore Multimax dynamic scheduling is usually preferable except for problems with few synchronization points and a large degree of parallelism. In [118], a combination of prescheduling and dynamic scheduling was found to be the best approach on a Sequent balance 21000. There seems to have been no comparison of these two approaches on distributed memory machines or on shared memory machines with microtasking or hardware synchronization features. In [22, 24] and [7, 8], a number of experiments are presented to study the performance of level scheduling within the context of preconditioned Conjugate Gradient methods. Experiments on an Alliant FX-8 indicated that a speed-up of around 4 to 5 can be achieved easily. These techniques have also been tested for problems in Computational Fluid Dynamics [214, 216].

 c"

8 }$

 (c2 " $  

As seen in the previous chapter, a limited amount of parallelism can be extracted from the standard preconditioners such as ILU and SSOR. Fortunately, a number of alternative techniques can be developed that are specically targeted at parallel environments. These are preconditioning techniques that would normally not be used on a standard machine, but only for parallel computers. There are at least three such types of techniques discussed in this chapter. The simplest approach is to use a Jacobi or, even better, a block Jacobi approach. In the simplest case, a Jacobi preconditioner may consist of the diagonal or a block-diagonal of . To enhance performance, these preconditioners can themselves be accelerated by polynomial iterations, i.e., a second level of preconditioning called polynomial preconditioning. A different strategy altogether is to enhance parallelism by using graph theory algorithms, such as graph-coloring techniques. These consist of coloring nodes such that two adjacent nodes have different colors. The gist of this approach is that all unknowns associated with the same color can be determined simultaneously in the forward and backward sweeps of the ILU preconditioning operation. Finally, a third strategy uses generalizations of partitioning techniques, which can

s xDbihDi F` WBhEQyV p@3e B IQ WBp@A@EQI 2h@ Hf3T8 EFb3g yCQ hmsCFA@Iz p@hA@qsBh hv@EQeSV wI8 @PIhwEt7@@ hHt2VeVyCTVxQ9h2FGAS EB Etf GTfhiqVdQ C sB2FI d2fsQV Wqp@B RB@ TfBAQ TVf HfvxFG SAFB @9IGEF bDsVTIVEa@ yQ`Sh xQ 2SW@ 9EQt2QV ES 8I W8 eIA@EQ88 z`EQB @wEFPXW8zEBQ IsEFF e8VII d @PIRFsI@VhsByw2t gqWVqTazPI2hzIV 2qytgQsByFF xQTSEChbEQW8WBWtEQ8 @sidPIV oEI@sVEtghh eVxQASyCEGASFxQ WqTfB @ QTff 2fQ f xF2fASxFEFAShsGGEF ytA@RQGEBV yFTfxQEQTSeXEC2QdhfEaEhVUEQ8RBHIeEQ Th QAD~ EQW8VPIEF8W9{I xQASFeduQbt 2fsVW8wEQhAVWsxBESEQhV @I IPfg3Ip 3~2Sw}Y|9sBzT8ECWsBysBPIHhF c@ASEFaqXV ztrASF2Eth2gFIB TaxQHSUzESVV uQgEQq 2qe8EF2hh2IG EQW8e8yTSII yQuQECl YhAS@F xjB f s 2fT8ECeaggQ sBbiuQq wt TSEQVrTIuBsVEt@h ThQgB uTShEQV p@hPIV p@xFTSI @QgEtI F vWVh@ syChp@xQASI 2fG WBWG B A@xQB btEQIS W8 Hs8W8ThU7I RQTaIr qBpsh3V s@pdhPIA@xFhTSgQV @I @I X q @g FyCeVhEt@oxQASn A@eVTCsFmHSiVkHfEFhAVa Fst @iEQW8IB hE8TCFAxVTS2GHGbIWBQ 2fgq@EQ9fWqB @ TfQ f f2xFASEFG ASyQyFe `hEQ@ dq@ u@Pf bhqs A@RVrVhI B@I@ @2fEtuFhTVeVxQyC ytASxQwp@GS 2SWB2QvVruBEQHtW8sVIe8eIThrrbdq8QEQ DEQXQ p@2fPI2EFFghiI @ATSWQgIDcQedFASF2fB xQ uBbHta `e8FsV YBeITSrEXEQQ WVq UWBCgQ TSEQRQ9PI8 HGEF7 E8 DgQCTq B IA@Bs98 B 7 Fr 6   & %0 0 ( & # 5)4321))'%"$"!      

be put in the general framework of domain decomposition approaches. These will be covered in detail in the next chapter. Algorithms are emphasized rather than implementations. There are essentially two types of algorithms, namely, those which can be termed coarse-grain and those which can be termed ne-grain. In coarse-grain algorithms, the parallel tasks are relatively big and may, for example, involve the solution of small linear systems. In ne-grain parallelism, the subtasks can be elementary oating-point operations or consist of a few such operations. As always, the dividing line between the two classes of algorithms is somewhat blurred.

Overlapping block-Jacobi preconditioning consists of a general block-Jacobi approach as described in Chapter 4, in which the sets overlap. Thus, we dene the index sets
a C Y U H bS`XWUVTRSQHIGEB C P F D C C "B

with
fD dR e c

where is the number of blocks. Now use the block-Jacobi method with this particular partitioning, or employ the general framework of additive projection processes of Chapter 5, and use an additive projection method onto the sequence of subspaces
hld k t h h h mmbujiiit c gfSeIs ECbuC rd t d D t a C F D y

Each of the blocks will give rise to a correction of the form

One problem with the above formula is related to the overlapping portions of the variables. The overlapping sections will receive two different corrections in general. According to the denition of additive projection processes seen in Chapter 5, the next iterate can be dened as
| wC p Y h c vC C '

where is the residual vector at the previous iteration. Thus, the corrections for the overlapping regions simply are added together. It is also possible to weigh these contributions before adding them up. This is equivalent to redening (12.1) into
} | p #

in which is a nonnegative diagonal matrix of weights. It is typical to weigh a nonoverlapping contribution by one and an overlapping contribution by where is the number
e C

sn j

t
~ ~  h } p r so

| #

e y U w U e Ax'vuc sC R t r

z wC {x

zx h {yC w

@ A 9 7

e D hx hgw v gb

vC

c sC

vC '

t C xuC r so n p

~ {{   p w7 w p { p ly Ap| o

$#  

t p D c r p | |

t uC r sn po

i D g ph2Y

qVC Y

! " D D

C ec r qn r po

C ec r so n r p

   

7 8 5 6 %

| p #

 

3 4120 ) (

D p Y

&'%

of times the unknown is represented in the partitioning.

The block-Jacobi matrix with overlapping

blocks.

The block-Jacobi iteration is often over- or under-relaxed, using a relaxation parameter . The iteration can be dened in the form
h p Y ' wC c vC

The solution of a sparse linear system is required at each projection step. These systems can be solved by direct methods if the subblocks are small enough. Otherwise, iterative methods may be used. The outer loop accelerator should then be a exible variant, such as FGMRES, which can accommodate variations in the preconditioners.

h p Y

4 wC 5'

2C v 3

wC 0 m1C

'

c sC

Recall that the residual at step


D c r p Y

is then related to that at step

~ { {  | |  p w7 z p |w z
  "#

C C

$#   !  

'

 

c sC

t p D c r p | |

  4 

) "(

# & $"    % # !

c )

 7 t


 

t
'

by

where is a polynomial, typically of low degree. Thus, the original system is replaced by the preconditioned system

which is then solved by a conjugate gradient-type technique. Note that and commute and, as a result, the preconditioned matrix is the same for right or left preconditioning. In addition, the matrix or does not need to be formed explicitly since can be computed for any vector from a sequence of matrix-by-vector products. Initially, this approach was motivated by the good performance of matrix-vector operations on vector computers for long vectors, e.g., the Cyber 205. However, the idea itself is an old one and has been suggested by Stiefel [204] for eigenvalue calculations in the mid 1950s. Next, some of the popular choices for the polynomial are described.
 

The simplest polynomial which has been used is the polynomial of the Neumann series expansion
W XR VU Q T T T

and is a scaling parameter. The above series comes from expanding the inverse of using the splitting
h }

This approach can also be generalized by using a splitting of the form

v '

Thus, setting
D
Q

v h v v c c a} c v e} c v ' }

a b

'

) `D Y

'x

where

can be the diagonal of

or, more appropriately, a block diagonal of . Then,

'

'

'

'

in which
D
Q

un j

 ~

 

I A PH5 GEDC@8(0642)0( F 3 9 ( B A 9 7 ( 5 3 1

t u

 

 

D # |

In polynomial preconditioning the matrix


v


is dened by

@ 7

D xY`z v D ~ { {  | { p p p z7 w p w x U |gt
Q S

 " t

Q R

' '

  

! "

'#$&$" % # !

   9  } 7


& 3

 

& 5 6

'

results in the approximate -term expansion

The matrix operation with the preconditioned matrix can be difcult numerically for large . If the original matrix is Symmetric Positive Denite, then is not symmetric, but it is self-adjoint with respect to the -inner product; see Exercise 1.

The polynomial can be selected to be optimal in some sense, and this leads to the use of Chebyshev polynomials. The criterion that is used makes the preconditioned matrix as close as possible to the identity matrix in some sense. For example, the spectrum of the preconditioned matrix can be made as close as possible to that of the identity. Denoting by the spectrum of , and by the space of polynomials of degree not exceeding , the following may be solved.

Unfortunately, this problem involves all the eigenvalues of and is harder to solve than the original problem. Usually, problem (12.4) is replaced by the problem

which is obtained from replacing the set by some continuous set that encloses it. Thus, a rough idea of the spectrum of the matrix is needed. Consider rst the particular case where is Symmetric Positive Denite, in which case can be taken to be an interval containing the eigenvalues of . A variation of Theorem 6.4 is that for any real scalar such with , the minimum

is reached for the shifted and scaled Chebyshev polynomial of the rst kind,

An4 j

~ 

6 U @59

d b e v t e cp IT T V a h v d b e fT v V t e cp a Q T

G X W !U R Yx Py 2V " T S$ Q

x  } 3 tfP ) x  )0 fP " 2# e ! %$  

c r IPo g GHFD$ g E C B A

Find

which minimizes:

un1 j

~ 

r (o & $ '! h P ) x  )0 e P " %%#  

Find

which minimizes:

I A PH5 G3 F

x l

v l

( CA9@ B 7

Q R VU T T T

tS

Q R V8 T T T

t u

ic r }

 )0I

W XQ

Q R

2 B

Q R

)0 

) t f v D

) '

X Yx p

'

`a

! $&$" # % # !

t l

Since

note that

An j
~  h

~ { {  | |  p w7 z p |w z
Q R V8 T T T

  "# t S

$#   !  
Q R

  4 

c }

')
x

 7 t
c 


  D ' c v

8tA6 l 7

t


X Yx p p c v p }  
h

Y  Y e} x p } x c r p x D p W  c r p Y  }

X c  d x Yx c r p D Xx p Xx c r p } } }  dx D e

D c r p Y

Observe that formulas (12.712.8) can be started at provided we set and , so that . The goal is to obtain an iteration that produces a residual vector of the form where is the polynomial dened by the above recurrence. The difference between two successive residual vectors is given by

~  ~ 

h e

D $5 v X v  "ut  } Yx c p c p

X Yx p } X

c c v p

 x d p
Ac

c d x uefD  }   Xx c r p
D

nu# j un! j

 e
h h h h 2iit

  d D p 

un j
~ 

d Gut c r p  t e D
p

p 
}

h e fD }

X Yx 
t

fD e

X Yx c

r p p

 


D

p  Xx v Xx te ut } c p c v p  } X Yx c v c v  X x p }

 }

d c  p X c   X d
p t

r p e

Xx c r p

t eGD  

D c


with

The three-term recurrence for the Chebyshev polynomials results in the following threeterm recurrences:

d v t2hihh d tefDQ"tc p D c r p   p 

Xx p

6 7 6 t 7
h

p a 

~ { {  | { p p p z7 w p w x U |gt
 " ! "    9 

Of interest is the case where

which gives the polynomial

with

Denote the center and mid-width of the interval

p  h p p p X a  a e 7t 6 d d h t 6 7 6 t 7 6
l 7 t

6 7 p  X d y6 t 7 p a e D 9

Xx p

The identity

Note that the above recurrences can be put together as

Dene

with

and

Using these parameters instead of

and the relations (12.8) yield , the above expressions then become with , respectively, by

 Y} x c rp c
v

Dene

Finally, the following algorithm is obtained.

It can be shown that when and , the resulting preconditioned matrix minimizes the condition number of the preconditioned matrices of the form over all polynomials of degree . However, when used in conjunction with the Conjugate Gradient method, it is observed that the polynomial which minimizes the total number of Conjugate Gradient iterations is far from being the one which minimizes the condition number. If instead of taking and , the interval [ ] is chosen to be slightly inside the interval [ ], a much faster convergence might be achieved. The true optimal parameters, i.e., those that minimize the number of iterations of the polynomial preconditioned Conjugate Gradient method, are difcult to determine in practice. There is a slight disadvantage to the approaches described above. The parameters and , which approximate the smallest and largest eigenvalues of , are usually not available beforehand and must be obtained in some dynamic way. This may be a problem mainly because a software code based on Chebyshev acceleration could become quite complex.

 

G) `7 D c ) 6 D v v  t p p#c r p | z {x x h | | | D p d } c p W p | c p } p #  

Lines 7 and 4 can also be recast into one single update of the form

7t 8A6

G )

c r p Y

p p c r p F E C A &DE B

G )t c ) 6

e 9U

 Y c@

D c

D#c r p

1. 2. 3. For 4. 5. 6. 7. 8. EndDo

; ; until convergence Do:

 

D c r p Y

975301) ( &%3 # "  # % # 8 64 2 ' $ $   !    

x c r p x v c D p c r p Y Y

and note that and recurrence,

. If

, then , . Therefore the relation (12.9) translates into the

An j

 Y e} x p }

~ 

X X v  X Yx c p D h p } p d Xx c v p Xx p Xx p Xx c r p } } } }  

As a result,

Xx c v p Xx p x c v p h }  } }

~ { {  | |  p w7 z p |w z
v v  h D p  p Y d c p c p    Y } x }  | D c rp |
  "#

Y} x p

t p c r p | |

$#   !   }

 t Yx p X p X d


D

  4 

dx  c   D c r p p D c r p Y Y  t p D c r p | | iiit DQ t h h h c eGD  D     |# z D  Y

 7 t


 

   

vx

To remedy this, one may ask whether the values provided by an application of Gershgorins theorem can be used for and . Thus, in the symmetric case, the parameter , which estimates the smallest eigenvalue of , may be nonpositive even when is a positive denite matrix. However, when , the problem of minimizing (12.5) is not well dened, since it does not have a unique solution due to the non strict-convexity of the uniform norm. An alternative uses the -norm on [ ] with respect to some weight function . This least-squares polynomials approach is considered next.

over all polynomials of degree . Call the least-squares iteration polynomial, or simply the least-squares polynomial, and refer to as the leastsquares residual polynomial. A crucial observation is that the least squares polynomial is well dened for arbitrary values of and . Computing the polynomial is not a difcult task when the weight function is suitably chosen. Computation of the least-squares polynomials There are three ways to compute the least-squares polynomial dened in the previous section. The rst approach is to use an explicit formula for , known as the kernel polynomials formula,
}

in which the s represent a sequence of polynomials orthogonal with respect to the weight function . The second approach generates a three-term recurrence satised by the residual polynomials . These polynomials are orthogonal with respect to the weight function . From this three-term recurrence, we can proceed exactly as for the Chebyshev iteration to obtain a recurrence formula for the sequence of approximate solutions . Finally, a third approach solves the Normal Equations associated with the minimization of (12.11), namely,
p | e ` jit t h h h c p 2 e t h h h 5j2iit e d t iH t D  ) x 0()Ht ) x c v p  ) e e t D ) ' } }  }

is any basis of the space of polynomials of degree . These three approaches can all be useful in different situations. For example, the rst approach can be useful for computing least-squares polynomials of low degree explicitly. For high-degree polynomials, the last two approaches are preferable for their better numerv D e ` U Ht ) m1'

where

sn j

un j

~ ~ ~ 

~ ~ I 

)x c p

! y " 

) x c p  8e )

x } ) p #

7t 6

where is some non-negative weight function on ( -norm, the 2-norm induced by this inner product. We seek the polynomial which minimizes
e !  } ) x  )  c v p


). Denote by

and call

un j

~ ~ 

) } )x } )x } )x y T

$ sC C } x i  p $ %D } ) x p # ) x i x i  C C

sC p

V  

Consider the inner product on the space


D

t
I A PH5 F 3 9

7t 8A6

( CB @XI A 9 7

$
5
U

1

~ { {  | { p p p z7 w p w x U |gt
7 6 I I
 "

y  t

! "
A

% $&$" # % # !

   9  p

)x p #

) x &)

)x

)x

)x

ical behavior. The second approach is restricted to the case where , while the third is more general. Since the degrees of the polynomial preconditioners are often low, e.g., not exceeding 5 or 10, we will give some details on the rst formulation. Let , be the orthonormal polynomials with respect to . It is known that the least-squares residual polynomial of degree is determined by the kernel polynomials formula (12.12). To obtain , simply notice that
h h h t i t h h h t e iiVeiit

with
t

This allows to be computed as a linear combination of the polynomials . Thus, we can obtain the desired least-squares polynomials from the sequence of orthogonal polynomials which satisfy a three-term recurrence of the form:

From this, the following recurrence for the s can be derived:


} C  v ) x c C X C 7 } ) x C X } C 6 ) x D } ) x c rsC X c rsC 7 C

The weight function is chosen so that the three-term recurrence of the orthogonal polynomials is known explicitly and/or is easy to generate. An interesting class of weight functions that satisfy this requirement is considered next.
e D `7

For these weight functions, the recurrence relations are known explicitly for the polynomials that are orthogonal with respect to , , or . This allows the use of any of the three methods described in the previous section for computing . Moreover, it has been shown [129] that the preconditioned matrix is Symmetric Positive . Denite when is Symmetric Positive Denite, provided that The following explicit formula for can be derived easily from the explicit expression of the Jacobi polynomials and the fact that is orthogonal with respect to the weight :

Using (12.13), the polynomial for small degrees; see Exercise 4.


v


can be derived easily by hand

An j

~ ~ I

)x c p

  ` e

x p

)x )

 ) x v x ) x ) } ) ) p } )0 e r so D } ) p # p

 

) 2 ) x p # e x D } ) x c p } e} t e t w  sC h 2 )0 D ) r qo t W v  p w p c )

a p

where
t

and

An4 j

~ ~ I

D 6

Choice of the weight functions This section assumes that the Jacobi weights

and

. Consider

nuH1 j unH j

~ ~ I

~  ~

)xCX

h h h h iit

D et } ) x C  w

h h h h 2iit

d f$et x C  t t e D w

 6

d t e D t ) x c v i 7 w C C

~ { {  | |  p w7 z p |w z
}   "# } } }

#"F

} )xC

)x

) x ) } ) x

) ) x i } C x C   sC p X xC   } ) )x p #

$#   !   } } }

)x p #

) x i $6 ) x D C C

sC p

x  C $

) e x c v ) D } ) x

  4 

)x c p

)xCX

) x c s c rs7 rC C

)x c p )x p #

 7 t



  c

) x )

C 

As an illustration, we list the least-squares polynomials for , , , obtained for the Jacobi weights with and . The polynomials listed are for the interval as this leads to integer coefcients. For a general interval , the best polynomial of degree is . Also, each polynomial is rescaled by to simplify the expressions. However, this scaling factor is unimportant if these polynomials are used for preconditioning.

Theoretical considerations An interesting theoretical question is whether the leastsquares residual polynomial becomes small in some sense as its degree increases. Consider rst the case . Since the residual polynomial minimizes the norm associated with the weight , over all polynomials of degree such that , the polynomial with satises

where is the -norm of the function unity on the interval . The norm of will tend to zero geometrically as tends to innity, provided . Consider now the case , and the Jacobi weight (12.15). For this choice of the weight function, the least-squares residual polynomial is known to be where is the degree Jacobi polynomial associated with the weight function . It can be shown that the 2-norm of such a residual polynomial with respect to this weight is given by
} D ) x 9 8 } x p I } ) x p y y p

Therefore, the -norm of the least-squares residual polynomial converges to zero like as the degree increases (a much slower rate than when ). However, note that the condition implies that the polynomial must be large in some interval around the
ue

} e

in which

is the Gamma function. For the case

and

, this becomes

!  #

e fD

x #

t t x t x

6 7Q e D 7 p6 D q 6 l7 t 6 ! ! 4 t z 2 6 t 7 2 2 3 2 2 2 2 e 2 U !  p #   6 7 D 2  45 z 2 U 2 2 ) 2 2 2 p 22 2 p 2 2 p d 7 t 6x D }

e D x y } q 6 8 t x e t dx t dx d } D } c D @ !  x p  p b y y x } C l } B 5 GD D c B e t t t x x e t t t dx } D A!  x p  p b y y B } @ } t x e t x

We selected and currence for the polynomials


D p

only because these choices lead to a very simple re, which are the Chebyshev polynomials of the rst kind.

1 7 27 77 182 378 714 1254

1 9 44 156 450 1122 2508 c

1 11 65 275 935 2717

1 13 90 442 1729

1 15 119 665

1 17 152

1 19

$ 

# 

" 

! 



 

 

) x #e x pj} } 7 ) @16 0 )

1 5 14 30 55 91 140 204 285 c

h h ih

# U

 

e } D

t x  l 7 t 8

fQ D

 # % # ! ~ { {  | { p p p z7 w p w x U |gt
D  "

7  x p )

! "

   9 


l  t

& '%

% %

"% !%

$ #%

x } ) e )

p y

origin.

Given a set of approximate eigenvalues of a nonsymmetric matrix , a simple region can be constructed in the complex plane, e.g., a disk, an ellipse, or a polygon, which encloses the spectrum of the matrix . There are several choices for . The rst idea uses an ellipse that encloses an approximate convex hull of the spectrum. Consider an ellipse centered at , and with focal distance . Then as seen in Chapter 6, the shifted and scaled Chebyshev polynomials dened by

are nearly optimal. The use of these polynomials leads again to an attractive three-term recurrence and to an algorithm similar to Algorithm 12.1. In fact, the recurrence is identical, except that the scalars involved can now be complex to accommodate cases where the ellipse has foci not necessarily located on the real axis. However, when is real, then the symmetry of the foci with respect to the real axis can be exploited. The algorithm can still be written in real arithmetic. An alternative to Chebyshev polynomials over ellipses employs a polygon that contains . Polygonal regions may better represent the shape of an arbitrary spectrum. The best polynomial for the innity norm is not known explicitly but it may be computed by an algorithm known in approximation theory as the Remez algorithm. It may be simpler to use an -norm instead of the innity norm, i.e., to solve (12.11) where is some weight function dened on the boundary of the polygon . Now here is a sketch of an algorithm based on this approach. We use an -norm associated with Chebyshev weights on the edges of the polygon. If the contour of consists of edges each with center and half-length , then the weight on each edge is dened by

Using the power basis to express the best polynomial is not a safe practice. It is preferable to use the Chebyshev polynomials associated with the ellipse of smallest area containing . With the above weights or any other Jacobi weights on the edges, there is a nite procedure which does not require numerical integration to compute the best polynomial. To do this, each of the polynomials of the basis (namely, the Chebyshev polynomials associated with the ellipse of smallest area containing ) must be expressed as a linear combination of the Chebyshev polynomials associated with the different intervals . This redundancy allows exact expressions for the integrals involved in computing the leastsquares solution to (12.11). Next, the main lines of a preconditioned GMRES algorithm are described based on least-squares polynomials. Eigenvalue estimates are obtained from a GMRES step at the beginning of the outer loop. This GMRES adaptive corrects the current solution and the eigenvalue estimates are used to update the current polygon . Correcting the solution at this stage is particularly important since it often results in a few orders of magnitude

An! j

~ ~ I

lC T

t C t C

$C

h t h h h t e D jiiif"w t

5 @

F )

~ { {  | |  p w7 z p |w z
5
  "#

2 F 0 p a @ 2 Fv 0 p a # @

vP ) x C P C d c } C

$#   !  

CB 3

ID(

()0

)x p

  4  C

# $% $" # !

)xC

 7 t


  }

improvement. This is because the polygon may be inaccurate and the residual vector is dominated by components in one or two eigenvectors. The GMRES step will immediately annihilate those dominating components. In addition, the eigenvalues associated with these components will now be accurately represented by eigenvalues of the Hessenberg matrix.
@ "7

Table 12.1 shows the results of applying GMRES(20) with polynomial preconditioning to the rst four test problems described in Section 3.7. Matrix F2DA F3D ORS F2DB Iters 56 22 78 100 Kops 2774 7203 4454 4432 Residual 0.22E-05 0.18E-05 0.16E-05 0.47E-05 Error 0.51E-06 0.22E-05 0.32E-08 0.19E-05

A test run of ILU(0)-GMRES accelerated with polynomial preconditioning. See Example 6.1 for the meaning of the column headers in the table. In fact, the system is preconditioned by ILU(0) before polynomial preconditioning is applied to it. Degree 10 polynomials (maximum) are used. The tolerance for stopping is . Recall that Iters is the number of matrix-by-vector products rather than the number of GMRES iterations. Notice that, for most cases, the method does not compare well with the simpler ILU(0) example seen in Chapter 10. The notable exception is example F2DB for which the method converges fairly fast in contrast with the simple ILU(0)-GMRES; see Example 10.2. An attempt to use the method for the fth matrix in the test set, namely, the FIDAP matrix FID, failed because the matrix has eigenvalues on both sides of the imaginary axis and the code tested does not handle this situation.

x p

 

| #

} z

x p  D )Y

t |

t |

#%

Y% #

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

Start or Restart: Compute current residual vector . Adaptive GMRES step: Run steps of GMRES for solving . Update by . Get eigenvalue estimates from the eigenvalues of the Hessenberg matrix. Compute new polynomial: Rene from previous hull and new eigenvalue estimates. Get new best polynomial . Polynomial Iteration: Compute the current residual vector . Run steps of GMRES applied to . Update by . Test for convergence. If solution converged then Stop; else GoTo 1.

 4 4 6 $ 8 7652 98 && )

~ { {  | { p p p z7 w p w x U |gt
 " 6

' 0 4 8  96 6  % # 6 '

! "

   9 

  

vr 3

It is interesting to follow the progress of the algorithm in the above examples. For the rst example, the coordinates of the vertices of the upper part of the rst polygon are

This hull is computed from the 20 eigenvalues of the Hessenberg matrix resulting from the rst run of GMRES(20). In the ensuing GMRES loop, the outer iteration converges in three steps, each using a polynomial of degree 10, i.e., there is no further adaptation required. For the second problem, the method converges in the 20 rst steps of GMRES, so polynomial acceleration was never invoked. For the third example, the initial convex hull found is the interval of the real line. The polynomial preconditioned GMRES then convergences in ve iterations. Finally, the initial convex hull found for the last example is

0.17131 0.39337 1.43826

0.00000 0.10758 0.00000

and the outer loop converges again without another adaptation step, this time in seven steps.

The general idea of multicoloring, or graph coloring, has been used for a long time by numerical analysts. It was exploited, in particular, in the context of relaxation techniques both for understanding their theory and for deriving efcient algorithms. More recently, these techniques were found to be useful in improving parallelism in iterative solution techniques. This discussion begins with the 2-color case, called red-black ordering.

The problem addressed by multicoloring is to determine a coloring of the nodes of the adjacency graph of a matrix such that any two adjacent nodes have different colors. For a 2-dimensional nite difference grid (5-point operator), this can be achieved with two

 

( F

~ { {  | |  p w7 z p |w z
0.06492 0.17641 0.29340 0.62858 1.18052 0.00000 0.02035 0.03545 0.04977 0.00000

  "#
) H

 

C 3x

C 3x

9 S

  d

$#   !  

 

e h it e  h

AD

C 3x

C 3x

 

) $

  4 
# D $" # !

  i `cu

 7 t


  & 5 &

colors, typically referred to as red and black. This red-black coloring is illustrated in Figure 12.2 for a mesh where the black nodes are represented by lled circles.

19

20

21

22

23

24

13

14

15

16

17

18

10

11

12

Assume that the unknowns are labeled by listing the red unknowns rst together, followed by the black ones. The new labeling of the unknowns is shown in Figure 12.3.
22 10 23 11 24 12

19

20

16

17

13

14

Since the red nodes are not coupled with other red nodes and, similarly, the black nodes are not coupled with other black nodes, the system that results from this reordering will have the structure

in which and are diagonal matrices. The reordered matrix associated with this new labeling is shown in Figure 12.4. Two issues will be explored regarding red-black ordering. The rst is how to exploit this structure for solving linear systems. The second is how to generalize this approach for systems whose graphs are not necessarily 2-colorable.

un# j

~ ~ 

 5

c z

Red-black coloring of a labeling of the nodes.

Red-black coloring of a beling of the nodes.

grid. Natural la-

21

18

15

grid. Red-black

tU

| c

Y% #

Y% #

 f

{ | { |} f`Apgp w7pk

!  3  !  3 

Matrix associated with the red-black reordering

of Figure 12.3.

The easiest way to exploit the red-black ordering is to use the standard SSOR or ILU(0) preconditioners for solving the block system (12.18) which is derived from the original system. The resulting preconditioning operations are highly parallel. For example, the linear system that arises from the forward solve in SSOR will have the form
h

This system can be solved by performing the following sequence of operations:

1. Solve . 2. Compute . 3. Solve . This consists of two diagonal scalings (operations 1 and 3) and a sparse matrix-byvector product. Therefore, the degree of parallelism, is at least if an atomic task is considered to be any arithmetic operation. The situation is identical with the ILU(0) preconditioning. However, since the matrix has been reordered before ILU(0) is applied to it, the resulting LU factors are not related in any simple way to those associated with the original matrix. In fact, a simple look at the structure of the ILU factors reveals that many more elements are dropped with the red-black ordering than with the natural ordering. The result is that the number of iterations to achieve convergence can be much higher with red-black ordering than with the natural ordering. A second method that has been used in connection with the red-black ordering solves the reduced system which involves only the black unknowns. Eliminating the red unknowns from (12.18) results in the reduced system:

d Ei

) 

h c z

I S

~ { {  | |  p w7 z p |w z
c vc

  "# c z

5 AD

5 z D | } v c 5 x c

H

$#   !  

 

|

) $

c |

9 F

c '5 |

  4 
A "I 9

z 8|2 D z` D z  ` c z #Ic D c |

 & $"   % # !

! # &D $" # !

 7 t


 

UU

Note that this new system is again a sparse linear system with about half as many unknowns. In addition, it has been observed that for easy problems, the reduced system can often be solved efciently with only diagonal preconditioning. The computation of the reduced system is a highly parallel and inexpensive process. Note that it is not necessary to form the reduced system. This strategy is more often employed when is not diagonal, such as in domain decomposition methods, but it can also have some uses in other situations. For example, applying the matrix to a given vector can be performed using nearest-neighbor communication, and this can be more efcient than the standard approach . In addition, this of multiplying the vector by the Schur complement matrix can save storage, which may be more critical in some cases.
c

Chapter 3 discussed a general greedy approach for multicoloring a graph. Given a general sparse matrix , this inexpensive technique allows us to reorder it into a block form where the diagonal blocks are diagonal matrices. The number of blocks is the number of colors. For example, for six colors, a matrix would result with the structure shown in Figure 12.5 where the s are diagonal and , are general sparse. This structure is obviously a generalization of the red-black ordering.

F


A six-color ordering of a general sparse matrix.

Just as for the red-black ordering, ILU(0), SOR, or SSOR preconditioning can be used on this reordered system. The parallelism of SOR/SSOR is now of order where is the number of colors. A loss in efciency may occur since the number of iterations is likely to increase. A Gauss-Seidel sweep will essentially consist of scalings and matrix-by-vector products, where is the number of colors. Specically, assume that the matrix is stored in the well known Ellpack-Itpack format and that the block structure of the permuted matrix is dened by a pointer array . The index is the index of the rst row in the -th block. Thus, the pair represents the sparse matrix consisting of the rows to in the Ellpack-Itpack format. The main diagonal of is assumed to
y H y Ei

U
I )

F 0

e #y

vc

3 X)

5 "

5

A H5

) 0(

t Hx Y

) 4

d i eVi x X qw  y



( F

A9

d i Vi x e X qw  y

CA F

% #

 f

1 G3

{ | { |} f`Apgp w7pk
d i
!  3 
y
% # &D $" # !

e Vi

1. Do col = 1, ncol 2. n1 = iptr(col) 3. n2 = iptr(col+1) 1 4. y(n1:n2) = rhs(n1:n2) 5. Do j = 1, ndiag 6. Do i = n1, n2 7. y(i) = y(i) a(i,j)*y(ja(i,j)) 8. EndDo 9. EndDo 10. y(n1:n2) = diag(n1:n2) * y(n1:n2) 11. EndDo
R 3 i 

In the above algorithm, is the number of colors. The integers and set in lines 2 and 3 represent the beginning and the end of block . In line 10, is multiplied by the diagonal which is kept in inverted form in the array . The outer loop, i.e., the loop starting in line 1, is sequential. The loop starting in line 6 is vectorizable/parallelizable. There is additional parallelism which can be extracted in the combination of the two loops starting in lines 5 and 6.
}

The discussion in this section begins with the Gaussian elimination algorithm for a general sparse linear system. Parallelism in sparse Gaussian elimination can be obtained by nding unknowns that are independent at a given stage of the elimination, i.e., unknowns that do not depend on each other according to the binary relation dened by the graph of the matrix. A set of unknowns of a linear system which are independent is called an independent set. Thus, independent set orderings can be viewed as permutations to put the original matrix in the form

in which is diagonal, but can be arbitrary. This amounts to a less restrictive form of multicoloring, in which a set of vertices in the adjacency graph is found so that no equation in the set involves unknowns from the same set. A few algorithms for nding independent set orderings of a general sparse graph were discussed in Chapter 3. The rows associated with an independent set can be used as pivots simultaneously. When such rows are eliminated, a smaller linear system results, which is again sparse. Then we can nd an independent set for this reduced system and repeat the process of

An j

~ ~ I

d i

2 0 ) ' ' 7  2 8 4 &%G 6  $ 0    @

4 w

d i

e Vi x 

e Vi

4 w

be stored separately in inverted form in a one-dimensional array the multicolor SOR iteration will then take the following form.

~ { {  | |  p w7 z p |w z
  "# R 3  $#   !  

a5

9 ) 6 9&$ 54 '2 ' 6

@ 8

u hrw i u u

& d

  4 

 %

Y # vx  
'

 7 t


  & u 7 0

 

&

. One single step of

reduction. The resulting second reduced system is called the second-level reduced system. The process can be repeated recursively a few times. As the level of the reduction increases, the reduced systems gradually lose their sparsity. A direct solution method would continue the reduction until the reduced system is small enough or dense enough to switch to a dense Gaussian elimination to solve it. This process is illustrated in Figure 12.6. There exists a number of sparse direct solution techniques based on this approach.

Illustration of two levels of multi-elimination for sparse linear systems. After a brief review of the direct solution method based on independent set orderings, we will explain how to exploit this approach for deriving incomplete LU factorizations by incorporating drop tolerance strategies.

We start by a discussion of an exact reduction step. Let be the matrix obtained at the -th step of the reduction, with . Assume that an independent set ordering is applied to and that the matrix is permuted accordingly as follows:

where is a diagonal matrix. Now eliminate the unknowns of the independent set to get the next reduced matrix,

This results, implicitly, in a block LU factorization

with dened above. Thus, in order to solve a system with the matrix , both a forward and a backward substitution need to be performed with the block matrices on the right-hand side of the above system. The backward solution involves solving a system with the matrix . This block factorization approach can be used recursively until a system results that is small enough to be solved with a standard method. The transformations used in the elimination process, i.e., the matrices and the matrices must be saved. The permutation

un j

sn j

&U
~ ~   ~

c r) )

9 F C

) v) ) 5 D c ) a ) ) ) c v ) ) 5 #) D c r ) a

) a

( G3 F F

) 5 )

A 2)

CA F

1G3

i t h h h dRiiit

w)

2 )

v)

T) 2

#D$" # !

}|{ p w x { o w7pk {7 { | { |}
) 5 XH D ) 5 )
 #

!  P% # 3 

 

w)

2 )

 e

c r

) 2

c r

matrices can also be saved. Alternatively, the matrices involved in the factorization at each new reordering step can be permuted explicitly.

The successive reduction steps described above will give rise to matrices that become more and more dense due to the ll-ins introduced by the elimination process. In iterative methods, a common cure for this is to neglect some of the ll-ins introduced by using a simple dropping strategy as the reduced systems are formed. For example, any ll-in element introduced is dropped, whenever its size is less than a given tolerance times the 2-norm of the original row. Thus, an approximate version of the successive reduction steps can be to for any given . This can be used used to provide an approximate solution to precondition the original linear system. Conceptually, the modication leading to an incomplete factorization replaces (12.21) by

in which is the matrix of the elements that are dropped in this reduction step. Globally, the algorithm can be viewed as a form of incomplete block LU with permutations. Thus, there is a succession of block ILU factorizations of the form

with dened by (12.22). An independent set ordering for the new matrix will then be found and this matrix is reduced again in the same manner. It is not necessary to save the successive matrices, but only the last one that is generated. We need also to save the sequence of sparse matrices

which contain the transformation needed at level of the reduction. The successive permutation matrices can be discarded if they are applied to the previous matrices as soon as these permutation matrices are known. Then only the global permutation is needed, which is the product of all these successive permutations. An illustration of the matrices obtained after three reduction steps is shown in Figure 12.7. The original matrix is a 5-point matrix associated with a grid and is therefore of size . Here, the successive matrices (with permutations applied) are shown together with the last matrix which occupies the location of the block in (12.23).
C

An j

An j

~ S I

I ~

c r

) 1

) #

e X e

) # } ) v ) ) 5 ) x #c r ) D c a

~ { {  | |  p w7 z p |w z

  "# c r

) )

3 41

$#   !  

v)

A F

) 5

! &#D$!" #

) ) ) ) 5 ) a )

D c r

v)

  4 

) 5

) 

) 

 7 t
) w'2 ) 2

 

u) 2

d d D

)#

) 2

c r

Illustration of the processed matrices obtained from three steps of independent set ordering and reductions. We refer to this incomplete factorization as ILUM (ILU with Multi-Elimination). The preprocessing phase consists of a succession of applications of the following three steps: (1) nding the independent set ordering, (2) permuting the matrix, and (3) reducing it.


1. Set . Do: 2. For 3. Find an independent set ordering permutation for ; 4. Apply to to permute it into the form (12.20); 5. Apply to ; 6. Apply to ; 7. Compute the matrices and dened by (12.22) and (12.23). 8. EndDo

In the backward and forward solution phases, the last reduced system must be solved but not necessarily with high accuracy. For example, we can solve it according to the level of tolerance allowed in the dropping strategy during the preprocessing phase. Observe that if the linear system is solved inaccurately, only an accelerator that allows variations in the preconditioning should be used. Such algorithms have been discussed in Chapter 9. Alternatively, we can use a xed number of multicolor SOR or SSOR steps or a xed polynomial iteration. The implementation of the ILUM preconditioner corresponding to

# 

  6 ! 0

) 2

R {i

 8 4 ! ! &$

c r

6 ) ) 6

c r ) v c ) 2 iit  2 t h h h ) iiit c t h h h

}|{ p w x { o w7pk {7 { | { |}
e

 #


) 2 ) ) 2   u  % # vr 3

R i t h h h t e t {2ii

& d

!  % # 3 

 

) 2

 e

9H D

this strategy is rather complicated and involves several parameters. In order to describe the forward and backward solution, we introduce some notation. We start by applying the global permutation, i.e., the product

according to the partitioning (12.20). The forward step consists of transforming the second component of the right-hand side as
h

Now is partitioned in the same manner as and the forward elimination is continued the same way. Thus, at each step, each is partitioned as
c |

A forward elimination step denes the new using the old and for while a backward step denes using the old and , for . Algorithm 12.5 describes the general structure of the forward and backward solution sweeps. Because the global permutation was applied at the beginning, the successive permutations need not be applied. However, the nal result obtained must be permuted back into the original ordering.
D D H H

EndDo Solve with a relative tolerance : . For Do: [Backward sweep] . EndDo Permute the resulting solution vector back to the original ordering to obtain the solution .
|

Computer implementations of ILUM can be rather tedious. The implementation issues are similar to those of parallel direct-solution methods for sparse linear systems.

)  v ) ) 5 c r pD c r &| ) | )  R i t h h h e t {2iit i D H

c r &|i) )  x v ) D )  ) } c e2hihhit `  {i D  H t t e R | pD |

e `

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Apply global permutation to right-hand-side and copy into For Do: [Forward sweep]

to the right-hand side. We overwrite the result on the current solution vector, an called . Now partition this vector into

) 

c r

) |

! 8 64 2 ' 6 @ ) 0 9753 99 ( $0 % 8 0 ( ) ( ) 0 6

c r

)  ) &|

 |

~ { {  | |  p w7 z p |w z
  "#

2thh t  iih 2 c 2

)  c r (| ) c r) | h D ) | )  )|  | c | c   c v   5 D  |

$#   !  

| c

 

 |

  4 

   % # vx

& {

 7 t


 


t2ii  di h h h t e R e diihihit R t h

 |

-vector

This section describes parallel variants of the block Successive Over-Relaxation (BSOR) and ILU(0) preconditioners which are suitable for distributed memory environments. Chapter 11 briey discussed distributed sparse matrices.. A distributed matrix is a matrix whose entries are located in the memories of different processors in a multiprocessor system. These types of data structures are very convenient for distributed memory computers and it is useful to discuss implementations of preconditioners that are specically developed for them. Refer to Section 11.5.6 for the terminology used here. In particular, the term subdomain is used in the very general sense of subgraph. For both ILU and SOR, multicoloring or level scheduling can be used at the macro level, to extract parallelism. Here, macro level means the level of parallelism corresponding to the processors, or blocks, or subdomains.

In the ILU(0) factorization, the LU factors have the same nonzero patterns as the original matrix , so that the references of the entries belonging to the external subdomains in the ILU(0) factorization are identical with those of the matrix-by-vector product operation with the matrix . This is not the case for the more accurate ILU( ) factorization, with . If an attempt is made to implement a wavefront ILU preconditioner on a distributed memory computer, a difculty arises because the natural ordering for the original sparse problem may put an unnecessary limit on the amount of parallelism available. Instead, a two-level ordering is used. First, dene a global ordering which is a wavefront ordering for the subdomains. This is based on the graph which describes the coupling between the subdomains: Two subdomains are coupled if and only if they contain at least a pair of coupled unknowns, one from each subdomain. Then, within each subdomain, dene a local ordering. To describe the possible parallel implementations of these ILU(0) preconditioners, it is sufcient to consider a local view of the distributed sparse matrix, illustrated in Figure 12.8. The problem is partitioned into subdomains or subgraphs using some graph partitioning technique. This results in a mapping of the matrix into processors where it is assumed that the -th equation (row) and the -th unknown are mapped to the same processor. We distinguish between interior points and interface points. The interior points are those nodes that are not coupled with nodes belonging to other processors. Interface nodes are those local nodes that are coupled with at least one node which belongs to another processor. Thus, processor number 10 in the gure holds a certain number of rows that are local rows. Consider the rows associated with the interior nodes. The unknowns associated with these nodes are not coupled with variables from other processors. As a result, the rows associated with these nodes can be eliminated independently in the ILU(0) process. The rows associated with the nodes on the interface of the subdomain will require more attention. Recall that an ILU(0) factorization is determined entirely by the order in which the rows are processed. The interior nodes can be eliminated rst. Once this is done, the interface
y w y


I )

F 0

3 X)

5

 

C )

9 w !r gb k p ~ `~ }|{ } fw3~ w 7 {7{


1D F

 ( 0I F0

@ "@

$  ! 3

# # ! $$"

& {

! 

d %

s @

q y

rows can be eliminated in a certain order. There are two natural choices for this order. The rst would be to impose a global order based on the labels of the processors. Thus, in the illustration, the interface rows belonging to Processors 2, 4, and 6 are processed before those in Processor 10. The interface rows in Processor 10 must in turn be processed before those of Processors 13 and 14. The local order, i.e., the order in which we process the interface rows in