Problems
Graeme Fairweather and Ian Gladwell
1 Finite Dierence Methods
1.1 Introduction
Consider the second order linear twopoint boundary value problem
Lu(x) u
+p(x)u
and q
, 0 < q
q(x), x I. (1.3)
Let = x
j
N+1
j=0
denote a uniform partition of the interval I such that
x
j
= jh, j = 0, 1, . . . , N+1, and (N+1)h = 1. On this partition, the solution
u of (1.1)(1.2) is approximated by the mesh function u
j
N+1
j=0
dened by
the nite dierence equations
L
h
u
j
u
j+1
2u
j
+u
j1
h
2
+p
j
u
j+1
u
j1
2h
+q
j
u
j
= f
j
, j = 1, . . . , N, (1.4)
u
0
= g
0
, u
N+1
= g
1
, (1.5)
where
p
j
= p(x
j
), q
j
= q(x
j
), f
j
= f(x
j
).
Equations (1.4) are obtained by replacing the derivatives in (1.1) by basic
centered dierence quotients.
We now show that under certain conditions the dierence problem (1.4)
(1.5) has a unique solution u
j
N+1
j=0
, which is second order accurate; that
is,
[u(x
j
) u
j
[ = O(h
2
), j = 1, . . . , N.
1
1.2 The Uniqueness of the Dierence Approximation
From (1.4), we obtain
h
2
L
h
u
j
=
_
1 +
h
2
p
j
_
u
j1
+
_
2 +h
2
q
j
_
u
j
_
1
h
2
p
j
_
u
j+1
= h
2
f
j
, j = 1, . . . , N.
(1.6)
The totality of dierence equations (1.6), subject to (1.5), may be written
in the form
Au = b, (1.7)
where
u =
_
_
u
1
u
2
.
.
.
u
N1
u
N
_
_
, A =
_
_
d
1
e
1
c
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
e
N1
c
N
d
N
_
_
, b = h
2
_
_
f
1
f
2
.
.
.
f
N1
f
N
_
_
c
1
g
0
0
.
.
.
0
e
N
g
1
_
_
,
and, for j = 1, . . . , N,
c
j
=
_
1 +
1
2
hp
j
_
, d
j
= 2 +h
2
q
j
, e
j
=
_
1
1
2
hp
j
_
. (1.8)
We prove that there is a unique u
j
N
j=1
by showing that the tridiagonal
matrix A is strictly diagonally dominant and hence nonsingular.
Theorem 1.1. If h < 2/p
then
[c
j
[ = 1 +
h
2
p
j
, [e
j
[ = 1
h
2
p
j
,
and
[c
j
[ +[e
j
[ = 2 < d
j
, j = 2, . . . , N 1.
Also,
[e
1
[ < d
1
, [c
N
[ < d
N
,
which completes the proof.
Corollary If p = 0, the matrix A is positive denite, with no restriction
on the mesh spacing h.
Proof If p = 0, then A is strictly diagonally dominant with no restriction
on the mesh spacing h. In addition, A is symmetric and has positive diagonal
entries and is therefore positive denite.
2
1.3 Consistency, Stability and Convergence
To study the accuracy and the computability of the dierence approximation
u
j
N+1
j=0
, we introduce the concepts of consistency, stability and convergence
of nite dierence methods. The basic result proved in this section is that,
for a consistent method, stability implies convergence.
Denition 1.1 (Consistency). Let
j,
[w] L
h
w(x
j
) Lw(x
j
), j = 1, . . . , N,
where w is a smooth function on I. Then the dierence problem (1.4)(1.5)
is consistent with the dierential problem (1.1)(1.2) if
[
j,
[w][ 0 as h 0.
The quantities
j,
[w], j = 1, . . . , N, are called the local truncation (or local
discretization) errors.
Denition 1.2. The dierence problem (1.4)(1.5) is locally p
th
order ac
curate if, for suciently smooth data, there exists a positive constant C,
independent of h, such that
max
1jN
[
j,
[w][ Ch
p
.
The following lemma demonstrates that the dierence problem (1.4)
(1.5) is consistent with (1.1)(1.2) and is locally secondorder accurate.
Lemma 1.1. If w C
4
(I), then
j,
[w] =
h
2
12
[w
(4)
(
j
) 2p(x
j
)w
(3)
(
j
)],
where
j
and
j
lie in (x
j1
, x
j+1
).
Proof By denition
j,
[w] =
_
w(x
j+1
) 2w(x
j
) +w(x
j1
)
h
2
w
(x
j
)
_
+p
j
_
w(x
j+1
) w(x
j1
)
2h
w
(x
j
)
_
, j = 1, . . . , N.
(1.9)
3
It is easy to show using Taylors theorem that
w(x
j+1
) w(x
j1
)
2h
w
(x
j
) =
h
2
6
w
(3)
(
j
),
j
(x
j1
, x
j+1
). (1.10)
Also,
w(x
j+1
) 2w(x
j
) +w(x
j1
)
h
2
w
(x
j
) =
h
2
12
w
(4)
(
j
),
j
(x
j1
, x
j+1
). (1.11)
The desired result now follows on substituting (1.10) and (1.11) in (1.9).
Denition 1.3 (Stability). The linear dierence operator L
h
is stable if,
for suciently small h, there exists a constant K, independent of h, such
that
[v
j
[ Kmax([v
0
[, [v
N+1
[) + max
1iN
[L
h
v
i
[ j = 0, . . . , N + 1,
for any mesh function v
j
N+1
j=0
.
We now prove that, for h suciently small, the dierence operator L
h
of (1.4) is stable.
Theorem 1.2. If the functions p and q satisfy (1.3), then the dierence
operator L
h
of (1.4) is stable for h < 2/p
.
Proof If
[v
j
[ = max
0jN+1
[v
j
[, 1 j N,
then, from (1.6), we obtain
d
j
v
j
= e
j
v
j+1
c
j
v
j1
+h
2
L
h
v
j
.
Thus,
d
j
[v
j
[ ([e
j
[ +[c
j
[) [v
j
[ +h
2
max
1jN
[L
h
v
j
[.
If h < 2/p
, then
d
j
= [e
j
[ +[c
j
[ +h
2
q
j
,
and it follows that
h
2
q
j
[v
j
[ h
2
max
1jN
[L
h
v
j
[,
4
or
[v
j
[
1
q
max
1jN
[L
h
v
j
[.
Thus, if max
0jN+1
[v
j
[ occurs for 1 j N then
max
0jN+1
[v
j
[
1
q
max
1jN
[L
h
v
j
[,
and clearly
max
0jN+1
[v
j
[ Kmax([v
0
[, [v
N+1
[) + max
1jN
[L
h
v
j
[ (1.12)
with K = max1, 1/q
. If max
0jN+1
[v
j
[ = max[v
0
[, [v
N+1
[, then
(1.12) follows immediately.
An immediate consequence of stability is the uniqueness (and hence ex
istence since the problem is linear) of the dierence approximation u
j
N+1
j=0
(which was proved by other means in earlier), for if there were two solutions,
their dierence v
j
N+1
j=0
, say, would satisfy
L
h
v
j
= 0, j = 1, . . . , N,
v
0
= v
N+1
= 0.
Stability then implies that v
j
= 0, j = 0, 1, . . . , N + 1.
Denition 1.4 (Convergence). Let u be the solution of the boundary value
problem (1.1)(1.2) and u
j
N+1
j=0
the dierence approximation dened by
(1.4)(1.5). The dierence approximation converges to u if
max
1jN
[u
j
u(x
j
)[ 0,
as h 0. The dierence u
j
u(x
j
) is the global truncation (or discretiza
tion) error at the point x
j
, j = 1, . . . , N.
Denition 1.5. The dierence approximation u
j
N+1
j=0
is a p
th
order ap
proximation to the solution u of (1.1)(1.2) if, for h suciently small, there
exists a constant C independent of h, such that
max
0jN+1
[u
j
u(x
j
)[ Ch
p
.
The basic result connecting consistency, stability and convergence is
given in the following theorem.
5
Theorem 1.3. Suppose u C
4
(I) and h < 2/p
N+1
j=0
of (1.4)(1.5) is convergent to the solution u of (1.1)
(1.2). Moreover,
max
0jN+1
[u
j
u(x
j
)[ Ch
2
.
Proof Under the given conditions, the dierence problem (1.4)(1.5) is
consistent with the boundary value problem (1.1)(1.2) and the operator L
h
is stable.
Since
L
h
[u
j
u(x
j
)] = f(x
j
) L
h
u(x
j
) = Lu(x
j
) L
h
u(x
j
) =
j,
[u],
and u
0
u(x
0
) = u
N+1
u(x
N+1
) = 0, the stability of L
h
implies that
[u
j
u(x
j
)[
1
q
max
1jN
[
j,
[u][.
The desired result follows from Lemma 1.1.
It follows from this theorem that u
j
N+1
j=0
is a secondorder approxima
tion to the solution u of (1.1).
1.4 Experimental Determination of the Asymptotic Rate of
Convergence
If a norm of the global error is O(h
p
) then an estimate of p can be determined
in the following way. For ease of exposition, we use a dierent notation in
this section and denote by u
h
j
the dierence approximation computed with
a mesh length of h. Also let
e
h
u u
h
 = max
j
[u(x
j
) u
h
j
[.
If e
h
= O(h
p
), then, for h suciently small, h < h
0
say, there exists a
constant C, independent of h, such that
e
h
Ch
p
.
If we solve the dierence problem with two dierent mesh lengths h
1
and h
2
such that h
2
< h
1
< h
0
, then
e
h
1
Ch
p
1
6
and
e
h
2
Ch
p
2
from which it follows that
lne
h
1
p lnh
1
+ lnC
and
lne
h
2
p lnh
2
+ lnC.
Therefore an estimate of p can be calculated from
p ln(e
h
1
/e
h
2
)/ ln(h
1
/h
2
) (1.13)
In practice, one usually solves the dierence problem for a sequence of values
of h, h
0
> h
1
> h
2
> h
3
> . . ., and calculates the ratio on the right hand
side of (1.13) for successive pairs of values of h. These ratios converge to
the value of p as h 0.
1.5 HigherOrder Finite Dierence Approximations
1.5.1 Richardson extrapolation and deferred corrections
Commonly used methods for improving the accuracy of approximate so
lutions to dierential equations are Richardson extrapolation and deferred
corrections. These techniques are applicable if there exists an expansion of
the local truncation error of the form
j,
[u] = h
2
[u(x
j
)] +O(h
4
). (1.14)
As is shown in the next theorem, if (1.14) holds, then the stability of the
dierence operator L
h
ensures that there exists a function e(x) such that
u(x
j
) = u
j
+h
2
e(x
j
) +O(h
4
), j = 0, 1, . . . , N + 1. (1.15)
In Richardson extrapolation, the dierence problem (1.4)(1.5) is solved
twice with mesh spacings h and h/2 to yield dierence solutions u
h
j
N+1
j=0
,
and u
h/2
j
2(N+1)
j=0
. Then at a point x = jh = (2j)(h/2) common to both
meshes we have, from (1.15),
u( x) = u
h
j
+h
2
e( x) +O(h
4
)
and
u( x) = u
h/2
2j
+
h
2
4
e( x) +O(h
4
),
7
from which it follows that
u( x) = u
h/2
2j
+
1
3
(u
h/2
2j
u
h
j
) +O(h
4
).
Thus
u
(1)
j
u
h/2
2j
+
1
3
(u
h/2
2j
u
h
j
)
is a fourthorder approximation to u(x
j
), j = 0, . . . , N + 1.
In deferred corrections, the dierence equations (1.4)(1.5) are solved in
the usual way to obtain u
j
. Then a fourthorder dierence approximation
u
j
N+1
j=0
to u(x) is computed by solving dierence equations which are a
perturbation of (1.4)(1.5) expressed in terms of u
j
. Suitable denitions
of u
j
N+1
j=0
are discussed once we derive (1.15).
Theorem 1.4. Suppose u C
6
(I), and h < 2/p
j,
[e] +h
2
L
h
e(x
j
) +O(h
4
).
From the smoothness properties of the functions p, q, and , it follows that
the solution e(x) of (1.16) is unique. Moreover, since C
2
(I), e(x)
C
4
(I) and
j,
[e] = O(h
2
). Using this result in (1.19) and rearranging, we
have
L
h
[u(x
j
) u
j
+h
2
e(x
j
)] = O(h
4
).
8
The desired result, (1.15), follows from the stability of L
h
.
Deferred corrections is dened in the following way. Suppose
j,
[ ] is a
dierence operator such that
[[u(x
j
)]
j,
[u
h
][ = O(h
2
), (1.20)
where u
h
denotes the solution u
j
N+1
j=0
of (1.4)(1.5) We dene the mesh
function u
j
N+1
j=0
by
L
h
u
j
= f
j
+h
2
j,
[u
h
], j = 1, . . . , N,
u
0
= g
0
, u
N+1
= g
1
.
Then it is easy to show that u
j
N+1
j=0
is a fourthorder approximation to
u(x), since
L
h
[ u
j
u(x
j
)] = f
j
+h
2
j,
[u
h
] L
h
u(x
j
)
= [Lu(x
j
) L
h
u(x
j
)] +h
2
j,
[u
h
]
=
j,
[u] +h
2
j,
[u
h
]
= h
2
[u(x
j
)]
j,
[u
h
] +O(h
4
).
Using (1.20) and the stability of L
h
, it follows that
[ u
j
u(x
j
)[ = O(h
4
).
The main problem in deferred corrections is the construction of second
order dierence approximations
j,
[u
h
] to the truncation error term[u(x
j
)].
One way is to replace the derivatives appearing in [u(x)] by standard dier
ence approximations using the nite dierence solution u
j
N+1
j=0
in place of
the exact solution u(x). One problem with this approach is that it requires
some modication near the endpoints of the interval, or it is necessary to
compute the numerical solution outside the interval I. A second approach
is to use the dierential equation to express [u(x)] in the form
[u(x)] = C
2
(x)u
(x) +C
1
(x)u(x) +C
0
(x), (1.21)
where the functions C
0
, C
1
, C
2
are expressible in terms of the functions p, q, f
and their derivatives. Then choose
j,
[u
h
] = C
2
(x
j
)
u
j+1
u
j1
2h
+C
1
(x
j
)u
j
+C
0
(x
j
).
Since
u
= pu
+qu f,
9
we have
u
(3)
= pu
+p
+qu
+q
u f
(1.22)
= p(pu
+qu f) + (p
+q)u
+q
u f
= (p
2
+p
+q)u
+ (pq +q
)u (pf +f
)
Pu
+Qu F.
Similarly,
u
(4)
= (Pp +P
+Q)u
+ (Pq +Q
)u (Pf +F
). (1.23)
When (1.22) and (1.23) are substituted into (1.17), we obtain the desired
form (1.21).
In deferred corrections, the linear algebraic systems dening the basic
dierence approximation and the fourthorder approximation have the same
coecient matrix, which simplies the algebraic problem.
If the solution u of the boundary value problem (1.1)(1.2) is suciently
smooth, then it can be shown that the local truncation error
j,
[u] has an
asymptotic expansion of the form
j,
[u] =
m
=1
h
2
[u(x
j
)] +O(h
2m+2
),
and, if m > 1, deferred corrections can be extended to compute approxima
tions of higher order than 4.
1.5.2 Numerovs method
A second higherorder nite dierence method is easily constructed in the
case p = 0, when (1.1)(1.2) becomes
Lu(x) = u
h
u(x
j
) [u(x
j+1
) 2u(x
j
) +u(x
j1
)]/h
2
(1.24)
= u
(x
j
) +
1
12
h
2
u
(4)
(x
j
) +O(h
4
).
From the dierential equation, we have
u
(4)
(x) = [q(x)u(x) f(x)]
10
and therefore
u
(4)
(x
j
) =
h
[q(x
j
)u(x
j
) f(x
j
)] +O(h
2
). (1.25)
Thus, from (1.24) and (1.25),
u
(x
j
) =
h
u(x
j
)
1
12
h
2
h
[q(x
j
)u(x
j
) f(x
j
)] +O(h
4
).
We dene u
j
N+1
j=0
by
L
h
u
j
h
u
j
+
_
1 +
1
12
h
2
h
_
q
j
u
j
=
_
1 +
1
12
h
2
h
_
f(x
j
), j = 1, . . . , N,
(1.26)
u
0
= g
0
, u
N+1
= g
N+1
,
which is commonly known as Numerovs method. Equations (1.26) are sym
metric tridiagonal and may be written in the form
c
j
u
j1
+
d
j
u
j
+ e
j
u
j+1
=
1
12
h
2
[f
j+1
+ 10f
j
+f
j1
], j = 1, . . . , N,
where
c
j
=
_
1
1
12
h
2
q
j1
_
,
d
j
= 2 +
5
6
h
2
q
j
, e
j
=
_
1
1
12
h
2
q
j+1
_
.
It is easy to show that, for h suciently small, the coecient matrix of this
system is strictly diagonally dominant and hence u
j
N+1
j=0
is unique. From
a similar analysis, it follows that the dierence operator L
h
is stable. Also,
since
L
h
[ u
j
u(x
j
)] =
_
1 +
1
12
h
2
h
_
f(x
j
) L
h
u(x
j
) = O(h
4
),
the stability of L
h
implies that
[ u
j
u(x
j
)[ = O(h
4
).
1.6 SecondOrder Nonlinear Equations
In this section, we discuss nite dierence methods for the solution of the
secondorder nonlinear twopoint boundary value problem
Lu(x) u
+f(x, u) = 0, x I,
(1.27)
u(0) = g
0
, u(1) = g
1
.
11
The basic secondorder nite dierence approximation to (1.27) takes the
form
L
h
u
j
h
u
j
+f(x
j
, u
j
) = 0, j = 1, . . . , N.
(1.28)
u
0
= g
0
, u
N+1
= g
1
.
If
j,
[u] L
h
u(x
j
) Lu(x
j
)
then clearly
j,
[u] =
h
2
12
u
(4)
(
j
),
j
(x
j1
, x
j+1
),
if u C
4
(I). Stability of the nonlinear dierence problem is dened in the
following way.
Denition 1.6. A dierence problem dened by the nonlinear dierence
operator L
h
is stable if, for suciently small h, there exists a positive
constant K, independent of h, such that, for all mesh functions v
j
N+1
j=0
and w
j
N+1
j=0
,
[v
j
w
j
[ K
_
max([v
0
w
0
[, [v
N+1
w
N+1
[) + max
1iN
[L
h
v
j
L
h
w
j
[
_
.
Notice that when L
h
is linear, this denition reduces to the denition of
Section 1.3 applied to the mesh function v
j
w
j
N+1
j=0
.
Theorem 1.5. If f
u
f
u
is continuous on I (, ) such that
0 < q
f
u
,
then the dierence problem (1.28) is stable, with K = max(1, 1/q
)
Proof For mesh functions v
j
and w
j
, we have
L
h
v
j
L
h
w
j
=
h
(v
j
w
j
) +f(x
j
, v
j
) f(x
j
, w
j
)
=
h
(v
j
w
j
) +f
u
(x
j
, v
j
)(v
j
w
j
),
on using the mean value theorem, where v
j
lies between v
j
and w
j
. Then
h
2
[L
h
v
j
L
h
w
j
] = c
j
[v
j1
w
j1
] +d
j
[v
j
w
j
]
+ e
j
[v
j+1
w
j+1
]
12
where c
j
= e
j
= 1 and
d
j
= 2 +h
2
f
u
(x
j
, v
j
).
Clearly
[c
j
[ +[e
j
[ = 2 < [d
j
[.
The remainder of the proof is similar to that of Theorem 1.2.
An immediate consequence of the stability of L
h
is the following:
Corollary 1.1. If the dierence approximation u
j
dened by (1.28) exists
then it is unique.
For a proof of the existence of the solution, see Keller (1968).
Corollary 1.2. If u is the solution of (1.27) and u
j
N+1
j=0
the solution of
(1.28) then, under the assumptions of Theorem 1.5,
[u
j
u(x
j
)[
h
2
12
K max
xI
[u
(4)
(x)[, j = 0, . . . , N + 1,
provided u C
4
(I).
In order to determine the dierence approximation u
j
N+1
j=0
, we must
solve the system of nonlinear equations (1.28), which may be written as
Ju +h
2
f (u) = g, (1.29)
where
J =
_
_
2 1
1 2 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1 2
_
_
, u =
_
_
u
1
u
2
.
.
.
u
N1
u
N
_
_
, f (u) =
_
_
f(x
1
, u
1
)
f(x
2
, u
2
)
.
.
.
f(x
N1
, u
N1
)
f(x
N
, u
N
)
_
_
, g =
_
_
g
0
0
.
.
.
0
g
1
_
_
.
(1.30)
This system is usually solved using Newtons method (or a variant of it),
which is described in Section 1.8.
13
1.7 Numerovs Method for Second Order Nonlinear Equa
tions
Numerovs method for the solution of (1.27) may be derived in the following
way. If u is the solution of (1.27) and f is suciently smooth then
u
(4)
(x
j
) =
d
2
dx
2
f(x, u)[
x=x
j
=
h
f(x
j
, u(x
j
)) +O(h
2
).
Since
h
u(x
j
) = u
(x
j
) +
1
12
h
2
u
(4)
(x
j
) +O(h
4
),
we have
h
u(x
j
) = f(x
j
, u(x
j
)) +
1
12
h
2
h
f(x
j
, u(x
j
)) +O(h
4
).
Based on this equation, Numerovs method is dened by the nonlinear equa
tions
h
u
j
+ [1 +
1
12
h
2
h
]f(x
j
, u
j
) = 0, j = 1, . . . , N
(1.31)
u
0
= g
0
, u
N+1
= g
1
,
which may be written in the form
Ju +h
2
Bf (u) = g, (1.32)
where J and u are as in (1.30), and
B =
1
12
_
_
10 1
1 10 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1 10
_
_
, g =
_
_
g
0
1
12
h
2
f(0, g
0
)
0
.
.
.
0
g
1
1
12
h
2
f(1, g
1
)
_
_
. (1.33)
Again Newtons method is used to solve the nonlinear system (1.32).
Stepleman (1976) formulated and analyzed a fourthorder dierence method
for the solution of the equation
u
+f(x, u, u
) = 0, x I,
14
subject to general linear boundary conditions. When applied to the bound
ary value problem (1.27), this method reduces to Numerovs method. Higher
order nite dierence approximations to nonlinear equations have also been
derived by Doedel (1979) using techniques dierent from those employed by
Stepleman.
1.8 Newtons Method for Systems of Nonlinear Equations
In the case of a single equation, Newtons method consists in linearizing the
given equation
(u) = 0
by approximating (u) by
(u
(0)
) +
(u
(0)
)(u u
(0)
),
where u
(0)
is an approximation to the actual solution, and solving the lin
earized equation
(u
(0)
) +
(u
(0)
)u = 0.
The value u
(1)
= u
(0)
+ u is then accepted as a better approximation and
the process is continued if necessary.
Consider now the N equations
i
(u
1
, u
2
, . . . , u
N
) = 0, i = 1, . . . , N, (1.34)
for the unknowns u
1
, u
2
, . . . , u
N
, which we may write in vector form as
(u) = 0.
If we linearize the i
th
equation, we obtain
i
(u
(0)
1
, u
(0)
2
, . . . , u
(0)
N
) +
N
j=1
i
u
j
(u
(0)
1
, . . . , u
(0)
N
)u
j
= 0, i = 1, . . . , N. (1.35)
If (u) denotes the matrix with (i, j) element
i
u
j
(u), then (1.35) can be
written in the form
_
u
(0)
_
u =
_
u
(0)
_
.
If (u
(0)
) is nonsingular, then
u =
_
(u
(0)
)
_
1
(u
(0)
),
15
and
u
(1)
= u
(0)
+ u
is taken as the new approximation. If the matrices (u
()
), = 1, 2, . . . ,
are nonsingular, one hopes to determine a sequence of successively better
approximations u
()
, = 1, 2, . . . from the algorithm
u
(+1)
= u
()
+ u
()
, = 0, 1, 2, . . . ,
where u
()
is obtained by solving the system of linear equations
(u
()
)u
()
= (u
()
).
This procedure is known as Newtons method for the solution of the system
of nonlinear equations (1.34). It can be shown that as in the scalar case
this procedure converges quadratically if u
(0)
is chosen suciently close to
u, the solution of (1.34).
Now consider the system of equations
i
(u
1
, . . . , u
N
) = u
i1
+ 2u
i
u
i+1
+
h
2
12
[f(x
i1
, u
i1
) + 10f(x
i
, u
i
) +f(x
i+1
, u
i+1
)],
i = 1, . . . , N,
arising in Numerovs method (1.31). In this case, the (i, j) element of the
Jacobian is given by
i
u
j
=
_
_
2 +
5
6
h
2
f
u
(x
i
, u
i
), if j = i,
1 +
1
12
h
2
f
u
(x
j
, u
j
), if [i j[ = 1,
0, if [i j[ > 1.
Thus
(u) = J +h
2
BF(u)
where
F(u) = diag(f
u
(x
i
, u
i
)).
In this case, Newtons method becomes
_
J +h
2
BF(u
()
)
_
u
()
=
_
Ju
()
+h
2
Bf (u
()
) g
_
,
(1.36)
u
(+1)
= u
()
+ u
()
, = 0, 1, 2, . . . .
16
For the system (1.28),
i
(u
1
, . . . , u
N
) u
i1
+ 2u
i
u
i+1
+h
2
f(x
i
, u
i
) = 0, i = 1, . . . , N,
and Newtons method is given by (1.36) with B = I and g = (g
0
, 0, . . . , 0, g
1
)
T
.
Note that in each case the Jacobian has the same structure and properties
as the matrix arising in the linear case.
It is customary to stop the iteration (1.36) when
max
1iN
[u
()
i
[ XTOL(1 + max
1iN
[u
()
i
[)
and
max
1iN
[
i
(u
(+1)
)[ FTOL,
where XTOL and FTOL are prescribed tolerances.
17
2 Algorithms for Solving Tridiagonal Linear Sys
tems
2.1 General Tridiagonal Systems
The algorithm used to solve a general tridiagonal system of the form
Au = b, (2.1)
where
A =
_
_
d
1
e
1
c
2
d
2
e
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
c
N1
d
N1
e
N1
c
N
d
N
_
_
, (2.2)
is a version of Gaussian elimination with partial pivoting. In the decompo
sition step of the algorithm, the matrix A is factored in the form
A = PLR, (2.3)
where P is a permutation matrix recording the pivotal strategy, and L is unit
lower bidiagonal and contains on its subdiagonal the multipliers used in the
elimination process. If no row interchanges are performed, then R is upper
bidiagonal, but, in general, this matrix may also have nonzero elements on
its second superdiagonal. Thus llin may occur in this process; that is, the
factored form of A may require more storage than A, in addition to a vector
to record the pivotal strategy.
Using (2.3), the system (2.1) is equivalent to the systems
PLv = b.
Ru = v.
which can be easily solved for v and the desired solution u. Software, sgtsv,
based on this algorithm is available in LAPACK [4].
2.2 Diagonally Dominant Systems
Finite dierence methods discussed in section 1 give rise to tridiagonal linear
systems with certain properties which can be exploited to develop stable
18
algorithms in which no pivoting is required. In this section, we consider the
case in which the elements of the coecient matrix A satisfy
(i) [d
1
[ > [e
1
[,
(ii) [d
i
[ [c
i
[ +[e
i
[, i = 2, . . . , N 1, (2.4)
(iii) [d
N
[ > [c
N
[,
and
c
i
e
i1
,= 0, i = 2, . . . , N; (2.5)
that is, A is irreducibly diagonally dominant and hence nonsingular. We
show that, without pivoting, such a matrix can be factored in a stable fashion
in the form
A =
L
R, (2.6)
where
L is lower bidiagonal and
R is unit upper bidiagonal. The matrices
L and
R are given by
L =
_
1
c
2
2
c
N
N
_
_
,
R =
_
_
1
1
1
2
1
N1
1
_
_
, (2.7)
where
(i)
1
= d
1
,
(ii)
i
= e
i
/
i
,
i+1
= d
i+1
c
i+1
i
, i = 1, . . . , N 1.
Since
det A = det
L det
R =
N
i=1
i
1
and A is nonsingular, it follows that none of the
i
vanishes; hence the
recursion in (2.8) is welldened.
We now show that conditions (2.4) ensure that the quantities
i
and
i
are bounded and that the bounds are independent of the order of the matrix
A.
Theorem 2.1. If the elements of A satisfy conditions (2.4), then
[
i
[ < 1, (2.8)
and
0 < [d
i
[ [c
i
[ < [
i
[ < [d
i
[ +[c
i
[, i = 2, . . . , N. (2.9)
19
Proof Since
i
= e
i
/
i
= e
i
/d
i
,
[
1
[ = [e
1
[/[d
1
[ < 1,
from (2.4(i)). Now suppose that
[
j
[ < 1, j = 1, . . . , i 1. (2.10)
Then, with (2.8(iii)) in (2.8(ii)), we have
i
= e
i
/(d
i
c
i
i1
)
and
[
i
[ = [e
i
[/([d
i
[ [c
i
[[
i1
[) < [e
i
[/([d
i
[ [c
i
[)
from (2.10). Thus, using (2.4(ii)), it follows that [
i
[ < 1, and (2.8) follows
by induction.
From (2.8(ii)), (2.8) and (2.4(ii)), it is easy to show that (2.9) holds.
The system (2.1) with A given by (2.6) is equivalent to
Lv = b,
Ru = v, (2.11)
from which the desired solution u is obtained in the following fashion:
v
1
= b
1
/
1
For i = 2 to N do:
v
i
= (b
i
c
i
v
i1
)/
i
end
u
N
= v
N
For i = 1 to N 1 do:
u
Ni
= v
Ni
Ni
u
Ni+1
end
(2.12)
The nonzero elements of
L and
R overwrite the corresponding elements of
A, and no additional storage is required.
20
2.3 Positive Denite Systems
In section 1, we saw that if in equation (1.1) p = 0, then in addition to
possessing properties (2.4)(2.5), the matrix A is also symmetric (so that
c
i
= e
i1
, i = 1, . . . , N 1) and its diagonal elements, d
i
, i = 1, . . . , N, are
positive. In this case, the matrix A is positive denite.
It is well known that when A is positive denite a stable decomposi
tion is obtained using Gaussian elimination without pivoting. The LDL
T
decomposition of A can easily be obtained from the recursion (2.8). If
_
_
1
l
1
1
1
l
N1
1
_
_
,
and D = diag(
1
, . . . ,
N
), then by identifying l
i
with
i
, and
i
with
i
, and
setting c
i
= e
i1
, we obtain from (2.8)
1
= d
1
For i = 1 to N 1 do:
l
i
= e
i
/
i
i+1
= d
i+1
l
i
e
i
end
(2.13)
Since A is positive denite, D is also positive denite
1
, and hence the diag
onal elements of D are positive. Thus the recursions (2.13) are well dened.
Using the LDL
T
decomposition of A, the solution of the system (2.1) is
determined by rst solving
Lv = b,
which gives
v
1
= b
1
v
i+1
= b
i+1
l
i
v
i
, i = 1, . . . , N 1;
then
DL
T
u = v
1
For u = L
T
v = 0, u
T
Du = v
T
LDL
T
v = v
T
Av > 0, v = 0 since L nonsingular.
21
yields
u
N
= v
N
/
N
and, for i = 1, . . . , N 1,
u
Ni
=
v
Ni
Ni
l
Ni
u
Ni+1
= (v
Ni
l
Ni
Ni
u
Ni+1
)/
Ni
= (v
Ni
e
Ni
u
Ni+1
)/
Ni
,
where in the last step we have used from (2.13) the fact that l
Ni
Ni
=
e
Ni
. Thus,
v
1
= b
1
For i = 1 to N 1 do:
v
i+1
= b
i+1
l
i
v
i
end
u
N
= v
N
/
N
For i = 1 to N 1 do:
u
Ni
= (v
Ni
e
Ni
u
Ni+1
)/
Ni
end
(2.14)
The code sptsv in LAPACK is based on (2.13)(2.14)
22
3 The Finite Element Galerkin Method
3.1 Introduction
Consider the twopoint boundary value problem
Lu u
+p(x)u
+q(x)u = f(x), x I.
(3.1)
u(0) = u(1) = 0,
where the functions p, q and f are smooth on I, and (1.3) holds. Let H
1
0
(I)
denote the space of all piecewise continuously dierentiable functions on I
which vanish at 0 and 1. If v H
1
0
(I), then
u
v +pu
v +quv = fv,
and
_
1
0
[u
v +pu
v +quv]dx =
_
1
0
fvdx.
On integrating by parts, we obtain
_
1
0
u
dx [u
v][
1
0
+
_
1
0
pu
vdx +
_
1
0
quvdx =
_
1
0
fvdx.
Since v(0) = v(1) = 0, we have
(u
, v
) + (pu
) + (p
, ) + (q, ), , H
1
0
(I).
Suppose that S
h
is a nite dimensional subspace of H
1
0
(I). The nite ele
ment Galerkin method consists in nding the element u
h
S
h
, the Galerkin
approximation to the solution u of (3.1), satisfying
a(u
h
, v) = (f, v), v S
h
. (3.4)
23
Suppose w
1
, . . . , w
s
is a basis for S
h
, and let
u
h
(x) =
s
j=1
j
w
j
(x). (3.5)
Then, on substituting (3.5) in (3.4) with v = w
i
, we have
(
s
j=1
j
w
j
, w
i
) + (p
s
j=1
j
w
j
, w
i
) + (q
s
j=1
j
w
j
, w
i
) = (f, w
i
), i = 1, . . . , s,
from which it follows that
s
j=1
[(w
j
, w
i
) + (pw
j
, w
i
) + (qw
j
, w
i
)]
j
= (f, w
i
), i = 1, . . . , s;
that is,
/ = f , (3.6)
where the (i, j) element of the matrix / is a(w
j
, w
i
), and the vectors and
f are given by
= [
1
, . . . ,
s
]
T
, f = [(f, w
1
), . . . , (f, w
s
)]
T
.
The system of equations (3.6) is often referred to as the Galerkin equations.
3.2 Spaces of Piecewise Polynomial Functions
The choice of the subspace S
h
is a key element in the success of the Galerkin
method. It is essential that S
h
be chosen so that the Galerkin approximation
can be computed eciently. Secondly, S
h
should possess good approxima
tion properties as the accuracy of the Galerkin approximation u
h
depends
on how well u can be approximated by elements of S
h
. The subspace S
h
is
usually chosen to be a space of piecewise polynomial functions. To dene
such spaces, let P
r
denote the set of polynomials of degree r, let
: 0 = x
0
< x
1
< . . . < x
N
< x
N+1
= 1 (3.7)
denote a partition of I, and set
I
j
= [x
j1
, x
j
], j = 1, . . . , N + 1,
h
j
= x
j
x
j1
and h = max
j
h
j
. We dene
/
r
k
() = v C
k
(I) : v[
I
j
P
r
, j = 1, . . . , N + 1,
24
where C
k
(I) denotes the space of functions which are k times continuously
dierentiable on I, 0 k < r, and v[
I
j
denotes the restriction of the function
v to the interval I
j
. We denote by /
r,0
k
() the space
/
r
k
() v[v(0) = v(1) = 0.
It is easy to see that /
r
k
() and /
r,0
k
() are linear spaces of dimensions
N(r k) +r +1 and N(r k) +r 1, respectively. These spaces have the
following approximation properties.
Theorem 3.1. For any u W
j
p
(I), there exists a u /
r
k
() and a constant
C independent of h and u such that
(u u)
()

L
p
(I)
Ch
j
u
(j)

L
p
(I)
, (3.8)
for all integers and j such that 0 k + 1, j r + 1. If
u W
j
p
(I) v[v(0) = v(1) = 0 then there exists a u /
r,0
k
() such that
(3.8) holds.
Commonly used spaces are the spaces of piecewise Hermite polynomials,
/
2m1
m1
(), m 1. A convenient basis for /
2m1
m1
() is
S
i,k
(x; m; )
N+1, m1
i=0, k=0
, (3.9)
where
D
l
S
i,k
(x
j
; m; ) =
ij
lk
, 0 l m1, 0 j N + 1,
and
mn
denotes the Kronecker delta. It is easy to see that S
i,k
(x; m; ) is
zero outside [x
i1
, x
i+1
]. A basis for /
2m1,0
m1
() is obtained by omitting
from (3.9) the functions S
i,0
(x; m; ), i = 0, N + 1, which are nonzero at
x = 0 and x = 1.
Example 1. M
1
0
(): the space of piecewise linear functions. The
dimension of this space is N+2, and, if
i
(x) = (xx
i
)/h
i+1
, then the basis
functions S
i,0
(x; 1; ), i = 0, . . . , N + 1, are the piecewise linear functions
dened by
S
0,0
(x; 1; ) =
_
1
0
(x), x I
1
,
0, otherwise,
S
i,0
(x; 1; ) =
_
i1
(x), x I
i
,
1
i
(x), x I
i+1
,
0, otherwise, 1 i N,
25
S
N+1,0
(x; 1; ) =
_
N
(x), x I
N+1
,
0, otherwise.
Thus there is one basis function associated with each node of the partition
. For convenience, we set
w
i
(x) = S
i,0
(x; 1; ), i = 0, . . . , N + 1. (3.10)
Note that if u
h
(x) =
N+1
j=0
j
w
j
(x) then
j
= u
h
(x
j
), j = 0, . . . , N + 1,
since w
j
(x
i
) =
ij
. A basis for /
1,0
0
() is obtained by omitting from (3.10)
the functions w
0
(x) and w
N+1
(x).
Example 2. M
3
1
(): the space of piecewise Hermite cubic func
tions. This space has dimension 2N + 4. If
g
1
(x) = 2x
3
+ 3x
2
,
and
g
2
(x) = x
3
x
2
,
the basis functions S
i,0
(x; 2; ), i = 0, . . . , N + 1, are the piecewise cubic
functions dened by
S
0,0
(x; 2; ) =
_
g
1
(1
0
(x)), x I
1
,
0, otherwise,
S
i,0
(x; 2; ) =
_
_
g
1
(
i1
(x)), x I
i
,
g
1
(1
i
(x)), x I
i+1
,
0, otherwise, 1 i N,
S
N+1,0
(x; 2; ) =
_
g
1
(
N
(x)), x I
N+1
,
0, otherwise,
and the functions S
i,1
(x; 2; ), i = 0, . . . , N + 1, are the piecewise cubic
functions
S
0,1
(x; 2; ) =
_
h
1
g
2
(1
0
(x)), x I
1
,
0, otherwise,
S
i,1
(x; 2; ) =
_
_
h
i
g
2
(
i1
(x)), x I
i
,
h
i+1
g
2
(1
i
(x)), x I
i+1
,
0, otherwise, 1 i N,
S
N+1,1
(x; 2; ) =
_
h
N+1
g
2
(
N
(x)), x I
N+1
,
0, otherwise.
26
For notational convenience, we write
v
i
(x) = S
i,0
(x; 2; ), s
i
(x) = S
i,1
(x; 2; ), i = 0, . . . , N + 1. (3.11)
The functions v
i
and s
i
are known as the value function and the slope func
tion, respectively, associated with the point x
i
. Note that if
u
h
(x) =
N+1
j=0
j
v
j
(x) +
j
s
j
(x),
then
j
= u
h
(x
j
),
j
= u
h
(x
j
), j = 0, . . . , N + 1,
since
v
j
(x
i
) =
ij
, v
j
(x
i
) = 0, s
j
(x
i
) = 0, s
j
(x
i
) =
ij
, i, j = 0, . . . N + 1.
A basis for /
3,0
1
() is obtained by omitting from (3.11) the functions v
0
(x)
and v
N+1
(x).
3.3 The Algebraic Problem
First we examine the structure of the coecient matrix / of the Galerkin
equations (3.6) in the case in which S
h
= /
1,0
0
(). With the basis functions
given by (3.10), it is clear that
(w
i
, w
j
) + (pw
i
, w
j
) + (qw
i
, w
j
) = 0 if [i j[ > 1,
since w
i
(x) = 0, x /(x
i1
, x
i+1
). Thus /is tridiagonal. Since the tridiagonal
matrices
A = ((w
i
, w
j
)), B = ((w
i
, w
j
)), (3.12)
corresponding to p = 0, q = 1, occur quite often in practice, their elements
are given in Appendix A. If the partition is uniform and
J = h
2
_
_
2 1
1 2 1
.
.
.
.
.
.
.
.
.
1 2 1
.
.
.
.
.
.
.
.
.
1 2 1
1 2
_
_
, (3.13)
27
then it is easy to see from Appendix A that
A = hJ, B = h
_
I
h
2
6
J
_
=
h
6
tridiag(1 4 1). (3.14)
Now consider the case in which S
h
= /
3
1
(). It is convenient to consider
the basis w
i
2N+3
i=0
, where
w
2i
= v
i
, w
2i+1
= s
i
, i = 0, . . . , N + 1. (3.15)
For this ordering of the basis functions (3.11), it follows that, since v
i
and
s
i
vanish outside [x
i1
, x
i+1
], 1 i N, the quantities (w
k
, w
l
), (pw
k
, w
l
)
and (qw
k
, w
l
) are nonzero only if w
k
and w
l
are basis functions associated
with the same node or adjacent nodes of the partition . Consequently
a(w
k
, w
l
) = 0 if [k l[ > 3.
Thus the matrix / whose (i, j) element is a(w
i
, w
j
) is a band matrix of
bandwidth seven. More precisely, in the 2i
th
and (2i + 1)
th
rows of /, only
the elements in columns 2i +j 2, j = 0, 1, . . . , 5, can be nonzero. Thus we
can partition the (2N + 4) (2N + 4) matrix / in the form
/ =
_
_
/
00
/
01
/
10
/
11
/
12
/
N,N+1
/
N+1,N
/
N+1,N+1
_
_
; (3.16)
that is, / is block tridiagonal, the submatrices /
ii
and /
i,i+1
being 2 2.
The matrices A = ((w
i
, w
j
)), B = ((w
i
, w
j
)) are given in Appendix B for the
case in which the partition is uniform. If S
h
= /
3,0
1
(), the corresponding
(2N+2)(2N+2) matrix / is obtained from (3.16) by omitting the rst and
the (2N + 3)
th
rows and columns. It is easy to see that if S
h
= /
2m1
m1
()
and the basis given by (3.9) is chosen, then / is block tridiagonal with mm
diagonal blocks.
3.4 Treatment of Inhomogeneous Dirichlet Boundary Con
ditions and Flux Boundary Conditions
The Galerkin method can be easily modied to treat boundary conditions
other than the homogeneous Dirichlet conditions. Consider, for example,
28
the inhomogeneous Dirichlet boundary conditions
u(0) = g
0
, u(1) = g
1
, g
0
g
1
,= 0, (3.17)
and choose S
h
= /
3,0
1
(). If we set
w = u g
where
g(x) = g
0
v
0
(x) +g
1
v
N+1
(x)
then
w(0) = w(1) = 0,
and, from (3.2),
(p(w +g)
, v
, v
i=0
i
v
i
(x) +
i
s
i
(x)
then
0
= g
0
and
N+1
= g
1
.
With the basis functions ordered as in (3.15), the coecient matrix of the
Galerkin equations in this case is the matrix A of (3.16) with the elements
of rows 1 and 2N + 3 replaced by
a
1j
=
1j
, j = 1, . . . , 2N + 4
and
a
2N+3,j
=
2N+3,j
, j = 1, . . . , 2N + 4,
respectively, and the right hand side vector has as its rst and (2N + 3)
th
components, g
0
and g
1
, respectively.
In the case in which the general linear boundary conditions
0
u(0) p(0)u
(0) =
0
,
1
u(1) +p(1)u
(1) =
1
, (3.18)
are prescribed then as in derivation of (3.2),we have,
(pu
, v
) [pu
v][
1
0
+ (qu, v) = (f, v), (3.19)
29
for v H
1
(I). Using (3.18) in (3.19), we obtain
(pu
, v
) +
1
u(1)v(1) +
0
u(0)v(0) + (qu, v)
= (f, v) +
0
v(0) +
1
v(1), v H
1
(I).
If S
h
is now a subspace of H
1
(I), then the Galerkin approximation U S
h
is dened by
(pU
, v
)+
1
U(1)v(1)+
0
U(0)v(0)+(qU, v) = (f, v)+
0
v(0)+
1
v(1), v S
h
.
(3.20)
Thus, if S
h
= /
3
1
() and the basis functions are again ordered as in (3.15),
the matrix of coecents of the Galerkin equations is obtained from (3.16)
by adding
0
and
1
to the (1, 1) element and the (2N +3, 2N +3) element,
respectively. The term
0
v(0)+
1
v(1) on the right hand side of (3.20) merely
adds
0
to the rst component and
1
to the (2N +3)th component of the
right hand side vector.
3.5 The Uniqueness of the Galerkin Approximation
We now show that there exists a unique solution u
h
S
h
of (3.4) for h
suciently small. Since the problem under consideration is linear, it is
sucient to show uniqueness.
Let U
1
and U
2
be two solutions of (3.4) and set Y = U
1
U
2
. Then
a(Y, v) = 0, v S
h
. (3.21)
Let H
1
0
(I) be the solution of
L
= Y (x), x I,
(0) = (1) = 0,
where L
u = u
(pu)
+qu.
From the hypotheses on p and q, it follows that H
2
(I) H
1
0
(I) and

H
2 CY 
L
2. (3.22)
Thus, using integration by parts and (3.21), we have
Y 
2
L
2 = (L
, Y ) = a(Y, ) = a(Y, ),
30
where S
h
, and since
a(, ) C
H
1
H
1, (3.23)
for , H
1
(I), it follows that
Y 
2
L
2 CY 
H
1 
H
1.
From Theorem 3.1, we can choose S
h
such that
 
H
1 Ch
H
2.
Thus
Y 
2
L
2 ChY 
H
1
H
2 ChY 
H
1Y 
L
2,
where in the last inequality we have used (3.22), and we obtain
Y 
L
2 ChY 
H
1. (3.24)
Since Y S
h
,
a(Y, Y ) = 0,
from which it follows that
Y

2
L
2 = (pY
, Y ) (qY, Y ) CY

L
2 +Y 
L
2Y 
L
2.
Using this inequality and (3.24), we obtain
Y 
2
H
1 ChY 
2
H
1.
Thus, for h suciently small,
Y 
H
1 = 0,
and hence from Sobolevs inequality,
Y = 0.
For the selfadjoint boundary value problem
Lu (pu
+q(x)u = f(x), x I.
u(0) = u(1) = 0,
the uniqueness of the Galerkin approximation u
h
S
h
satisfying
(pu
h
, v
) + (qu
h
, v) = (f, v), v S
h
,
is much easier to prove. In this case, the coecient matrix / of the Galerkin
equations is positive denite, from which it follows immediately that the
Galerkin approximation is unique. This result is proved in the following
theorem.
31
Theorem 3.2. The matrix / is positive denite.
Proof Suppose R
s
and ,= 0. Then, if / = (a
ij
) and w =
s
j=1
j
w
j
, then
T
/ =
s
i,j=1
a
ij
j
=
s
i,j=1
(p
i
w
i
,
j
w
j
) + (q
i
w
i
,
j
w
j
)
= (pw
, w
) + (qw, w)
= p
1/2
w

2
L
2 +q
1/2
w
2
L
2
p
w

2
L
2.
The proof is complete if w
 > 0. Suppose w
 = 0. Then w
= 0 and
w = C, where C is a constant. Since w S
h
, w(0) = w(1) = 0, from
which it follows that C = 0. Therefore w = 0; that is
s
j=1
j
w
j
= 0.
Since w
1
, . . . , w
s
is a basis for S
h
and is therefore linearly independent,
this implies that
j
= 0, j = 1, . . . , N, which is a contradiction. Therefore
w
= e(x), x I,
(0) = (1) = 0,
32
where e = u u
h
. Then, as in section 3.2 but with and e replacing and
Y , respectively, we have
e
L
2 Che
H
1, (3.26)
since
a(e, v) = 0, v S
h
.
Since
a(e, e) = a(e, u ), S
h
,
it follows that
e

2
L
2 = (pe
, e) (qe, e) +a(e, u )
C
_
[e

L
2 +e
L
2]e
L
2 +e
H
1u 
H
1
_
on using Schwarzs inequality and (3.23). On using (3.26), we have, for h
suciently small,
e
H
1 Cu 
H
1. (3.27)
Since u H
r+1
(I) H
1
0
(I), from Theorem 3.1, can be chosen so that
u 
H
1 Ch
r
u
H
r+1,
and hence
e
H
1 Ch
r
u
H
r+1. (3.28)
The use of this estimate in (3.26) completes the proof.
3.6.2 Optimal L
error estimate
In this section, we derive an optimal L
estimate.
Lemma 3.1. Let Pu be the L
2
projection of u into /
r
k
(), that is,
(Pu u, v) = 0 v /
r
k
(),
where 1 k r 1. Then, if u W
r+1
(I),
Pu u
L
(I)
Ch
r+1
u
W
r+1
(I)
,
provided is chosen from a quasiuniform collection of partitions of I.
We now prove the following theorem.
Theorem 3.4. Suppose S
h
= /
r,0
k
(), with chosen from a quasiuniform
collection of partitions. If u W
r+1
(I) H
1
0
(I), then
u u
h

L
Ch
r+1
u
W
r+1
. (3.29)
Proof Let W S
h
satisfy
(W
, v
) = (u
, v
), v S
h
.
Since
a(u u
h
, v) = 0, v S
h
,
it follows that
((W u
h
)
, v
) + (u u
h
, qv (pv)
) = 0,
for all v S
h
. If we set v = W u
h
, then
(W u
h
)

2
L
2 Cu u
h

L
2W u
h

H
1.
Therefore, since  
H
1
0
and  
H
1 are equivalent norms on H
1
0
,
W u
h

H
1 Cu u
h

L
2
34
and from Sobolevs inequality, we obtain
W u
h

L
Cu u
h

L
2.
Then, from Theorem 3.3, it follows that
W u
h

L
Ch
r+1
u
H
r+1. (3.30)
Since
u u
h

L
u W
L
+W u
h

L
, (3.31)
we need to estimate u W
L
to complete the proof.
Note that since
((u W)
, 1) = 0,
it follows that W
is the L
2
projection of u
into /
r1
k1
(), and hence, from
Lemma 3.1, we obtain
(u W)

L
Ch
r
u

W
r
Ch
r
u
W
r+1
. (3.32)
Now suppose g L
1
and dene G by
G
= g(x), x I,
G(0) = G(1) = 0.
Then
G
W
2
1
Cg
L
1, (3.33)
and, for S
h
,
(u W, g) = (u W, G
) = ((u W)
, (G)
).
On using Holders inequality, we obtain
(u W, g) (u W)

L
(G)

L
1.
From Theorem 3.1, we can choose so that
(G)

L
1 ChG
W
2
1
.
Hence, on using (3.33), it follows that
[(u W, g)[ Ch(u W)

L
g
L
1.
On using (3.32) and duality, we have
u W
L
Ch
r+1
u
W
r+1
. (3.34)
The desired result now follows from (3.30), (3.31) and (3.34).
35
3.6.3 Superconvergence results
The error estimates of Theorems 3.3 and 3.4 are optimal and consequently
no better global rates of convergence are possible. However, there can be
identiable points at which the approximate solution converges at rates that
exceed the optimal global rate. In the following theorem, we derive one such
superconvergence result.
Theorem 3.5. If S
h
= /
r,0
0
() and u H
r+1
(I) H
1
0
(I), then, for h
suciently small,
[(u u
h
)(x
i
)[ Ch
2r
u
H
r+1, i = 0, . . . , N + 1. (3.35)
Proof Let G(x, ) denote the Greens function for (3.1); that is,
u(x) = (Lu, G(x, )) = a(u, G(x, ))
for suciently smooth u. This representation is valid for u H
1
0
(I) and
hence it can be applied to e = u u
h
. Thus, for S
h
,
e(x
i
) = a(e, G(x
i
, )) = a(e, G(x
i
, ) ),
since
a(e, ) = 0, S
h
.
Thus,
[e(x
i
)[ Ce
H
1G(x
i
, ) 
H
1. (3.36)
From the smoothness assumptions on p and q, it follows that
G(x
i
, ) H
r+1
([0, x
i
]) H
r+1
([x
i
, 1]),
and
G(x
i
, )
H
r+1
([0,x
i
])
+G(x
i
, )
H
r+1
([x
i
,1])
C.
Hence there exists S
h
(obtained, for example, by Lagrange interpolation
on each subinterval) such that
G(x
i
, ) 
H
1 Ch
r
. (3.37)
From Theorem 3.3, we have
e
H
1 Ch
r
u
H
r+1, (3.38)
36
for h suciently small, and hence combining (3.36)(3.38), we obtain
[e(x
i
)[ Ch
2r
u
H
r+1,
as desired.
A method which involves very simple auxiliary computations using the
Galerkin solution can be used to produce superconvergent approximations
to the derivative, [24]. First we consider approximations to u
(0) and u
(1).
Motivated by the fact that
(f, (1 x)) = (u
+pu
+qu, 1 x) = u
(0) by
0
= (f, 1 x) a(u
h
, 1 x),
where u
h
is the solution to (3.4). Also, with 1x replaced by x in the above,
we nd that
u
(1) by
N+1
= a(u
h
, x) (f, x).
It can be shown that that if u H
r+1
(I), then
[
j
u
(x
j
)[ Ch
2r
u
H
r+1, (3.39)
j = 0, N + 1, when for example, S
h
= /
r,0
k
().
If S
h
= /
r,0
0
(), a procedure can be dened which at the nodes x
i
, i =
1, . . . , N, gives superconvergence results similar to (3.39). Specically, if
j
,
an approximation to u
(x
j
), j = 1, . . . , N, is dened by
j
=
a(u
h
, x)
I
j
(f, x)
I
j
x
j
,
where the subscript I
j
denotes that the inner products are taken over I
j
=
(0, x
j
), then (3.39) holds for j = 1, . . . , N. The approximation
j
is moti
vated by the fact that
(Lu, x)
I
j
= (f, x)
I
j
,
and, after integration by parts,
u
(x
j
)x
j
+a(u, x)
I
j
= (f, x)
I
j
.
37
3.6.4 Quadrature Galerkin methods
In most cases, the integrals occurring in (3.6) cannot be evaluated exactly
and one must resort to an approximation technique for their evaluation. In
this section, the eect of the use of certain quadrature rules the Galerkin
method is discussed. To illustrate the concepts, we consider the problem
(pu
= f, x I,
u(0) = u(1) = 0,
where we shall assume that there exists a constant p
0
such that
0 < p
0
p(x), x I.
The Galerkin method in this case takes the form
(pu
h
, v
) = (f, v), v S
h
, (3.40)
where we shall take S
h
to be /
r,0
0
(). Let
0
1
<
2
< . . . <
1,
and set
ij
= x
i1
+h
i
j
, i = 1, . . . , N + 1, j = 1, . . . , .
Suppose
j
> 0, j = 1, . . . , , and denote by
, )
i
= h
i
j=1
j
()(
ij
),
a quadrature rule on I
i
which is exact for P
t
(I
i
).
Set
, ) =
N+1
i=1
, )
i
.
If this quadrature formula is used in (3.40), the problem becomes that of
nding z
h
S
h
such that
pz
h
, v
) = f, v), v S
h
. (3.41)
If t is suciently large, the existence and uniqueness of z
h
follow from the
next lemma.
38
Lemma 3.2. If t 2r 2,
pV
, V
) p
0
V

2
L
2, V S
h
.
Proof Since the weights
j
, j = 1, . . . , q are positive, and (V
)
2
P
2r2
(I
i
),
pV
, V
) p
0
V
, V
)
= p
0
V

2
L
2.
From this lemma, it follows that, if t 2r 2, there exists a unique
solution z
h
S
h
of (3.41). In order to have t 2r 2, it is necessary to
have at least r quadrature points; the use of fewer leads to nonexistence,
as is shown by Douglas and Dupont (1974).
It can be shown (cf. Douglas and Dupont, 1974) that if t 2r 1 then
u z
h

L
2 Ch
r+1
f
H
r+1.
In addition, the superconvergence indicated in Theorem 3.5 is preserved.
More specically
[(u z
h
)(x
i
)[ Ch
2r
f
H
t+1.
Notice that the use of an rpoint Gaussian quadrature rule produces the
desired accuracy and satises the condition of Lemma 3.2.
3.7 The Galerkin Method for Nonlinear Problems
Consider the boundary value problem
u
+f(x, u) = 0, x I,
u(0) = u(1) = 0.
It is easy to show that the weak form of this boundary value problem is
(u
, v
) + (f(u), v) = 0, v H
1
0
(I). (3.42)
As before, let S
h
be a nite dimensional subspace of H
1
0
(I) with basis
w
1
, . . . , w
s
. The Galerkin approximation to u is the element u
h
S
h
such that
(u
h
, v
) + (f(u
h
), v) = 0, v S
h
, (3.43)
and if
u
h
(x) =
s
j=1
j
w
j
(x),
39
we obtain from (3.43) with v = w
i
, the nonlinear system
s
j=1
(w
i
, w
j
)
j
+
_
f(
s
=1
), w
i
_
= 0, i = 1, . . . , s, (3.44)
for
1
, . . . ,
s
. Newtons method for the solution of (3.44) can be easily
derived by linearizing (3.43) to obtain
(u
(n)
h
, v
) + (f(u
(n1)
h
, v) + (f
u
(u
(n1)
h
)(u
(n)
h
u
(n1)
h
), v) = 0, v S
h
,
where u
(0)
h
is arbitrary. If
u
(k)
h
=
s
j=1
(k)
j
w
j
,
then
(A+B
n
)
(n)
= F
n
+B
n
(n1)
, (3.45)
where
A = ((w
i
, w
j
)),
(n)
= (
(n)
1
, . . . ,
(n)
N
)
T
,
B
n
= ((f
u
(
s
=1
(n1)
)w
i
, w
j
)),
F
n
= ((f(u
(n1)
h
), w
i
)).
Note that (3.45) may be written in the form
(A+B
n
)(
(n)
(n1)
) = (A
(n1)
+F
n
).
A comprehensive account of the Galerkin method for second order nonlinear
problems is given by Fairweather (1978).
40
4 The Orthogonal Spline Collocation Method
4.1 Introduction
Consider the linear second order twopoint boundary value problem
Lu u
+p(x)u
0
u(0) +
0
u
(0) = g
0
,
1
u(1) +
1
u
(1) = g
1
, (4.2)
where the functions p, q and f are smooth on I. Let
: 0 = x
0
< x
1
< . . . < x
N
< x
N+1
= 1
denote a partition of I, and set h
i
= x
i
x
i1
. In the orthogonal spline collo
cation method for (4.1)(4.2), the approximate solution u
h
/
r
1
(), r 3.
If w
j
s
j=1
is a basis for /
r
1
(), where s = (N +1)(r 1) +2, we may write
u
h
(x) =
s
j=1
u
j
w
j
(x). (4.3)
Then the coecients u
j
s
j=1
in (4.3) are determined by requiring that u
h
satisfy (4.1) at the points
j
s2
j=1
, and the boundary conditions (4.2):
0
u
h
(0) +
0
u
h
(0) = g
0
,
Lu
h
(
j
) = f(
j
), j = 1, 2, . . . , s 2,
1
u
h
(1) +
1
u
h
(1) = g
1
,
where
(i1)(r1)+k
= x
i1
+h
i
k
, i = 1, 2, . . . , N + 1, k = 1, 2, . . . , r 1, (4.4)
and
k
r1
k=1
are the nodes for the (r 1)point GaussLegendre quadrature
rule on the interval [0, 1].
As an example, consider the case when r = 3, for which the collocation
points are
2i1
= x
i1
+
1
2
_
1
1
3
_
h
i
,
2i
= x
i1
+
1
2
_
1 +
1
3
_
h
i
, i = 1, 2, . . . , N+1.
(4.5)
With the usual basis (3.15) for /
3
1
(), we set
u
h
(x) =
N+1
j=0
j
v
j
(x) +
j
s
j
(x).
41
Since only four basis functions,
v
i1
, s
i1
, v
i
, s
i
, (4.6)
are nonzero on [x
i1
, x
i
], the coecient matrix of the collocation equations
is of the form
_
_
_
_
_
_
_
_
_
_
D
0
S
1
T
1
S
2
T
2
.
.
.
.
.
.
S
N+1
T
N+1
D
1
_
_
_
_
_
_
_
_
_
_
, (4.7)
with D
a
= [
0
0
], D
b
= [
1
1
], and, for i = 1, 2, . . . , N + 1,
S
i
=
_
Lv
i1
(
2i1
) Ls
i1
(
2i1
)
Lv
i1
(
2i
) Ls
i1
(
2i
)
_
, T
i
=
_
Lv
i
(
2i1
) Ls
i
(
2i1
)
Lv
i
(
2i
) Ls
i
(
2i
)
_
.
For r > 3, with the commonly used Bspline basis [8], the coecient
matrix has the form
_
_
_
_
_
_
_
_
_
_
D
0
W
11
W
12
W
13
W
21
W
22
W
23
.
.
.
W
N+1,1
W
N+1,2
W
N+1,3
D
1
_
_
_
_
_
_
_
_
_
_
, (4.8)
with W
i1
(r1)2
, W
i2
(r1)(r3)
, W
i3
(r1)2
. The matrices
(4.7) and (4.8) are called almost block diagonal and the collocation equations
form an almost block diagonal linear system. There exist ecient algorithms
for solving such systems based on the idea of alternate row and column
elimination; see section 5.
For the case in which the partition is uniform, the values at the colloca
tion points of the basis functions (4.6) and their rst and second derivatives
are given in Appendix C.
4.2 The Existence and Uniqueness of the Collocation Ap
proximation
To demonstrate analytical tools used in the proof of the existence and
uniqueness of the collocation approximation and in the derivation of error
42
estimates, we consider the linear twopoint boundary value problem
Lu = u
+p(x)u
+q(x)u = f(x), x I,
u(0) = u(1) = 0.
(4.9)
Then, for v H
2
(I) H
1
0
(I), there exists a constant C
0
such that
v
H
2
(I)
C
0
Lv
L
2
(I)
. (4.10)
We dene the discrete inner product , ) by
, ) =
N+1
i=1
, )
i
,
where
, )
i
=
r1
k=1
h
i
j
(
(i1)(r1)+k
)(
(i1)(r1)+k
), i = 1, . . . .N + 1,
and
j
s2
j=1
are the collocation points given in (4.4) and
k
r1
k=1
are the
weights for the (r 1)point GaussLegendre quadrature rule on the interval
[0, 1].
In order to demonstrate the existence and uniqueness of the collocation
approximation u
h
S
h
/
r,0
1
() satisfying
Lu
h
(
j
) = f(
j
) j = 1, . . . , s 2,
we use the following lemma, [13].
Lemma 4.1. There exists a constant C such that
[LY, LY ) LY 
2
L
2
(I)
[ ChY 
2
H
2
(I)
, (4.11)
for all Y S
h
.
From (4.10) and this lemma, we nd that, for Y S
h
,
Y 
2
H
2
(I)
C
0
LY 
2
L
2
(I)
C
0
(LY, LY ) +ChY 
2
H
2
(I)
).
Thus, for h suciently small,
Y 
2
H
2
(I)
CLY, LY ). (4.12)
If
LY (
j
) = 0, j = 1, . . . , s 2,
then (4.12) implies that Y = 0, which proves the uniqueness and hence
existence of the collocation approximation u
h
.
43
4.3 Optimal H
2
Error Estimates
We now derive an optimal H
2
error estimate.
Theorem 4.1. Suppose u H
r+1
(I) H
1
0
(I). Then there exists a constant
C such that, for h suciently small,
u u
h

H
2
(I)
C h
r1
u
H
r+1
(I)
. (4.13)
Proof Let T
r,h
be the interpolation operator from C
1
(I) to S
h
deter
mined by the conditions
(i) (T
r,h
v)(x
i
) = v(x
i
), (T
r,h
v)
(x
i
) = v
(x
i
), i = 0, . . . , N + 1,
(ii) (T
r,h
v)(
ij
) = v(
ij
), i = 1, . . . , N + 1, j = 1, . . . , r 3,
for v C
1
(I), where the points
ij
, j = 1, . . . , r 3, are certain specic
points in I
i
; see [13] for details. Clearly T
r,h
is local in the sense that T
r,h
v
is determined on I
i
by v on I
i
. (Note that when r = 3, T
r,h
v is the Hermite
cubic interpolant of v.) It can be shown that, if W = T
r,h
u and u H
r+1
(I),
2
k=0
h
k
(u W)
(k)

L
(I
i
)
Ch
r+
1
2
i
u
(r+1)

L
2
(I
i
)
, i = 1, . . . , N + 1, (4.14)
and
2
k=0
h
k
(u W)
(k)

L
2
(I)
C h
r+1
u
(r+1)

L
2
(I)
. (4.15)
Then, using (4.14), it follows that
[L(u
h
W)(
j
)[ = [L(uW)(
j
)[ C
2
k=0
(uW)
(k)

L
(I
i
)
C h
r
3
2
u
(r+1)

L
2
(I
i
)
.
Hence
L(u
h
W), L(u
h
W)) =
N+1
i=1
r1
j=1
h
i
j
[L(u
h
W)(x
ij
)]
2
N+1
i=1
r1
j=1
h
2r2
i
j
u
(r+1)

2
L
2
(I
i
)
C h
2r2
u
(r+1)

2
L
2
(I)
,
and, from (4.12), it follows that
44
u
h
W
H
2
(I)
C h
r1
u
(r+1)

L
2
(I)
. (4.16)
The required result is now obtained from the triangle inequality, (4.15) and
(4.16).
4.4 Superconvergence Results
Douglas and Dupont [13] construct a quasiinterpolant, u S
h
, which
has the properties that, for u W
2r1
(I),
[(u u)
(j)
(x
i
)[ C h
2r2
u
W
2r1
(I)
, j = 0, 1, (4.17)
and, if u W
2r
(I),
L(u
h
u)(
j
) = (
j
), j = 1, . . . , s 2,
(u
h
u)(0) = (u
h
u)(1) = 0,
(4.18)
where
[(
j
)[ C h
2r2
u
W
2r
(I)
.
From (4.12) and (4.18), we obtain
u
h
u
H
2
(I)
CL(u
h
u), L(u
h
u))
1
2
= C, )
1
2
C h
2r2
u
W
2r
(I)
,
and hence, using Sobolevs inequality,
u
h
u
W
1
(I)
C h
2r2
u
W
2r
(I)
; (4.19)
that is, the quasiinterpolant u diers from the collocation solution u
h
along
with rst derivatives, uniformly by O(h
2r2
). For r 3, this represents a
higher order in h than is possible for u u
h
. However, using the triangle
inequality, (4.17) and (4.18), we see that a superconvergence phenomenon
occurs at the nodes, namely
[(u u
h
)
(j)
(x
i
)[ C h
2r2
u
W
2r
(I)
, j = 0, 1. (4.20)
45
4.5 L
2
and H
1
Error Estimates
From the inequality,
__
I
i
g(x)
2
dx
_1
2
Ch
1
2
i
([g(x
i1
[ +[g(x
i
)[) +h
2
i
__
I
i
g
(x)
2
dx
_1
2
,
valid for any g C
2
(I
i
), the H
2
estimate (4.13) and the superconvergence
result (4.20), it follows that
u u
h

L
2 C
_
h
2r1
u
W
2r
(I)
+h
r+1
u
H
r+1
(I)
_
.
From this inequality, (4.13) and the inequality [1],
_
I
i
[g
(x)[
2
dx 54
_
h
2
i
_
I
i
[g(x)[
2
dx +h
2
i
_
I
i
[g
(x)[
2
dx
_
,
we obtain the estimate
u u
h

H
1
(I)
C
_
h
2r2
u
W
2r
(I)
+h
r
u
H
r+1
(I)
_
.
In these estimates, the exponent of h is optimal but the smoothness re
quirements on u are not minimal. It is shown in [13] that, by modifying the
collocation procedure, the smoothness requirements on the solution can be
reduced to the minimal ones required for approximation when global esti
mates are sought. To describe this modication, let
f denote the standard
L
2
projection of f into M
r2
1
. The smoothed collocation method consists
of nding u
h
S
h
such that
(Lu
h
)(
j
) =
f(
j
), j = 1, . . . , s 2.
Douglas and Dupont [13] prove that, for h suciently small,
u u
h

H
j
(I)
C
j
h
r+1j
f
H
r1
(I)
, j = 0, 1, 2,
and
[(u u
h
)(x
i
)[ Ch
2r
f
H
r1
(I)
, i = 1, . . . , N.
46
Figure 1: Structure of a general ABD matrix
5 Algorithms for Solving Almost Block Diagonal
Linear Systems
5.1 Introduction
In several numerical methods for solving twopoint boundary value problems,
there arise systems of linear algebraic equations with coecient matrices
which have a certain block structure, almost block diagonal (ABD), [8]; see,
for example, section 4.1.
The most general ABD matrix, shown in Figure 1, has the following
characteristics: the nonzero elements lie in blocks which may be of dierent
sizes; each diagonal entry lies in a block; any column of the matrix intersects
no more than two blocks (which are successive), and the overlap between
successive blocks (that is, the number of columns of the matrix common to
two successive blocks) need not be constant. In commonly used methods for
solving twopoint boundary value problems, the most frequently occurring
ABD structure is shown in Figure 2, where the blocks W
(i)
, i = 1, 2, . . . , N,
are all of equal size, and the overlap between successive blocks is constant
and equal to the sum of the number of rows in TOP and BOT.
In this section, we outline ecient methods for the solution of systems
with coecient matrices having this structure. These algorithms are all
variants of Gaussian elimination with partial pivoting, and are more ecient
than Gaussian elimination primarily because they introduce no llin.
We describe the essential features of the algorithms using a simple ex
ample with a coecient matrix of the form in Figure 2, namely the matrix
of Figure ?? in which there are two 4 7 blocks W
(1)
, W
(2)
, and TOP and
47
Figure 2: Special ABD structure arising in BVODE solvers
Figure 3: Fillin introduced by SOLVEBLOK
BOT are 23 and 13, respectively. The overlap between successive blocks
is thus 3.
5.2 Gaussian Elimination with Partial Pivoting
The procedure implemented in solveblok [9, 10] uses conventional Gaussian
elimination with partial pivoting. Fillin may be introduced in the positions
indicated in Figure 3. The possible llin, and consequently the possible
additional storage and work, depends on the number of rows, N
T
, in the
block TOP.
48
Figure 4: Structure of the reduced matrix
5.3 Alternate Row and Column Elimination
This stable elimination procedure, based on the approach of Varah [23],
generates no llin for the matrix / of Figure ??. Suppose we choose a
pivot from the rst row. If we interchange the rst column and the column
containing the pivot, there is no llin. Moreover, if instead of perform
ing row elimination as in conventional Gaussian elimination, we reduce the
(1, 2) and (1, 3) elements to zero by column elimination, the corresponding
multipliers are bounded in magnitude by unity. We repeat this process in
the second step, choosing a pivot from the elements in the (2, 2) and (2, 3)
positions, interchanging columns 2 and 3 if necessary and eliminating the
(2, 3) element. If this procedure were adopted in the third step, llin could
be introduced in the (i, 3) positions, i = 7, 8, 9, 10. To avoid this, we switch
to row elimination with partial pivoting to eliminate the (4, 3), (5, 3), (6, 3)
elements, which does not introduce llin. We continue using row elimina
tion with partial pivoting until a step is reached when llin could occur, at
which point we switch back to the column pivoting, column elimination
scheme. This strategy leads to a decomposition
/ = PL
BUQ,
where P, Q are permutation matrices recording the row and column inter
changes, respectively, the unit lower and unit upper triangular matrices L,
U contain the multipliers used in the row and column eliminations, respec
tively, and the matrix
B has the structure shown in Figure 4, where denotes
a zeroed element. Since there is only one row or column interchange at each
step, the pivotal information can be stored in a single vector of the order of
49
Figure 5: The coecient matrix of the reordered system
the matrix, as in conventional Gaussian elimination. The nonzero elements
of L, U can be stored in the zeroed positions in /. The pattern of row and
column eliminations is determined by N
T
and the number of rows N
W
in
a general block W
(i)
(cf. Figure 2). In general, a sequence of N
T
column
eliminations is alternated with a sequence of N
W
N
T
row eliminations.
To solve /x = b, we solve
PLz = b,
Bw = z, UQx = w.
The second step requires particular attention. If the components of w are
ordered so that those associated with the column eliminations precede those
associated with the row eliminations, and the equations are ordered accord
ingly, the system is reducible. In our example, if we use the ordering
w = [w
1
, w
2
, w
5
, w
6
, w
9
, w
10
, w
3
, w
4
, w
7
, w
8
, w
11
]
T
,
the coecient matrix of the reordered equations has the structure in Figure
5. Thus, the components w
1
, w
2
, w
5
, w
6
, w
9
, w
10
, are determined by solving a
lower triangular system and the remaining components by solving an upper
triangular system.
5.4 Modied Alternate Row and Column Elimination
The alternate row and column elimination procedure can be made more
ecient by using the fact that, after each sequence of row or column elimi
nations, a reducible matrix results. This leads to a reduction in the number
of arithmetic operations because some operations involving matrixmatrix
50
multiplications in the decomposition phase can be deferred to the solution
phase, where only matrixvector multiplications are required. After the rst
sequence of column eliminations, involving a permutation matrix Q
1
and
multiplier matrix U
1
, say, the resulting matrix is
B
1
= /Q
1
U
1
=
_
_
_
_
_
C
1
O
M
1
/
1
O
_
_
_
_
_
,
where C
1
22
is lower triangular, M
1
42
and /
1
99
. The
equation /x = b becomes
B
1
x = b, x = U
1
1
Q
T
1
x =
_
x
1
x
2
_
,
and x
1
2
. Setting b =
_
b
1
b
2
_
, where b
1
2
, we obtain
C
1
x
1
= b
1
, /
1
x
2
= b
2
_
M
1
x
1
0
_
b
2
. (5.1)
The next sequence of eliminations, that is, the rst sequence of row elimi
nations, is applied only to /
1
to produce another reducible matrix
L
1
P
1
/
1
=
_
R
1
N
1
O
O /
2
_
,
where R
1
22
is upper triangular, N
1
23
and /
2
77
. If
x
2
=
_
x
1
x
2
_
, L
1
P
1
b
2
=
_
b
1
b
2
_
,
where x
1
,
b
1
2
, system (5.1) becomes
/
2
x
2
=
b
2
, R
1
x
1
=
b
1
[N
1
O] x
2
, (5.2)
and the next sequence of eliminations is applied to /
2
, which has the struc
ture of the original matrix with one W block removed. Since row operations
are not performed on M
1
in the second elimination step, and column op
erations are not performed on N
1
in the third, etc., there are savings in
51
arithmetic operations [11]. The decomposition phase diers from that in al
ternating row and column elimination in that if the r
th
elimination step is a
column (row) elimination, it leaves unaltered the rst (r1) rows (columns)
of the matrix. The matrix in the modied procedure has the same structure
and diagonal blocks as
B, and the permutation and multiplier matrices are
identical.
In [11, 12], this modied alternate row and column elimination procedure
was developed and implemented in the package colrow for systems with
matrices of the form in Figure 2, and in the package arceco for ABD systems
in which the blocks are of varying dimensions, and the rst and last blocks
protrude, as shown in Figure 1.
A comprehensive survey of the occurrence of, and solution techniques
for, ABD linear systems is given in [3].
6 First Order Systems
6.1 Introduction
Consider the twopoint boundary value problem for the rst order system
given by
u
, . . . , u
(k1)
),
subject to appropriate boundary conditions. By introducing
u
1
= u, u
2
= u
, . . . , u
k
= u
(k1)
,
the rst order system corresponding to such an equation is
u
i
= u
i+1
i = 1, 2, . . . , k 1,
u
k
= f(x, u
1
, . . . , u
k
),
52
subject to rewritten boundary conditions. In many cases, it is desirable to
reduce the higherorder equations in this way and to use a solution approach
that is suitable for rst order systems. However, it should be noted that
there are situations where such a reduction is not necessary and approaches
applicable to the original higherorder equation are more appropriate.
6.2 The Trapezoidal Rule
If
: 0 = x
0
< x
1
< . . . < x
N+1
= 1
is a partition of I and h
i
= x
i
x
i1
, then the trapezoidal rule for the
solution of (6.1)(6.2) takes the form
i
(U) U
i
U
i1
1
2
h
i
f (x
i
, U
i
) +f (x
i1
, U
i1
) = 0, i = 1, . . . , N + 1,
L
(U) g
0
(U
0
) = 0,
R
(U) g
1
(U
N+1
) = 0,
where U
i
u(x
i
), i = 0, 1, . . . , N + 1. The determination of the approxi
mation
U [U
T
0
, U
T
1
, , U
T
N+1
]
T
requires the solution of a system of m(N + 2) equations of the form
(U)
_
L
(U)
0
(U)
.
.
.
N+1
(U)
R
(U)
_
_
= 0. (6.3)
This system is usually solved by Newtons method (or a variant of it), which
takes the form
((U
1
)U
= (U
1
),
U
= U
1
+ U
, = 1, 2, . . . ,
(6.4)
where U
0
is an initial approximation to U and the (block) matrix
(U)
(U)
U
=
_
i
U
j
_
53
is the Jacobian of the nonlinear system (6.3). In this case, this matrix has
the ABD structure
_
_
A
L
1
R
1
L
2
R
2
L
N+1
R
N+1
B
_
_
(6.5)
where
A =
g
0
U
0
is m
1
m,
L
i
=
_
I +
1
2
h
i
f
U
i1
(x
i1
, U
i1
)
_
is mm,
R
i
= I
1
2
h
i
f
U
i
(x
i
, U
i
) is mm,
and
B =
g
1
U
N+1
is m
2
m.
It can be shown that the approximation U determined from the trapezoidal
rule satises
u(x
i
) U
i
= O(h
2
),
where h = max
j
h
j
. Under sucient smoothness conditions on u, higher
order approximations to u on the mesh can be generated using the method
of deferred conditions (cf. Section 1.5.1), which we shall describe briey;
full details are given by Lentini and Pereyra (1977).
The local truncation error of the trapezoidal rule is dened as
,i
[u] =
i
(u) [u
f (x, u)][
x=x
i1/2
, i = 1, . . . , N + 1,
where x
i1/2
= x
i1
+
1
2
h
i
. By expanding in Taylor series about x
i1/2
, it
is easy to show that
,i
[u] =
L
=1
T
(x
i1/2
)h
2
i
+O(h
2L+2
), i = 1, . . . , N + 1, (6.6)
54
where
T
(x
i1/2
) =
2
21
(2 + 1)!
d
2
dx
2
f (x, u)[
x=x
i1/2
.
If S
k
(U
(k1)
) is a nite dierence approximation of order h
2k+2
to the rst k
terms in the expansion (6.6) of the local truncation error, then the solution
U
(k)
of the system
(v) = S
k
(U
(k1)
), k = 1, 2, . . . , L, (6.7)
where U
(0)
= U, the solution of (6.3), is an order h
2k+2
approximation to u
on the mesh . Details of the construction of the operators S
k
by numerical
dierentiation are given by Lentini and Pereyra (1974).
Note that the systems (6.3) and (6.7) are similar, the right hand side of
(6.7) being simply a known vector. Once the rst system (6.3) is solved, the
remaining systems (6.7) are small perturbations of (6.3) and considerable
computational eort can be saved by keeping the Jacobian, and therefore
its LU decomposition, xed during the iteration (6.4). Of course, if the
mesh is changed, the Jacobian must be recomputed and refactored. In
principle, this modication degrades the quadratic convergence of Newtons
method, but because of the accuracy of the approximate Jacobian and the
initial approximation, convergence is still rapid.
6.3 Orthogonal Spline Collocation
This method produces an approximation to u which is a piecewise poly
nomial vector function U(x) = (U
1
(x), . . . , U
m
(x)), where, for each i, i =
1, . . . , m, U
i
/
r
0
(). Each piecewise polynomial U
i
is represented in the
form
U
i
(x) =
s
l=1
li
B
l
(x), (6.8)
where s = r(N + 1) + 1, and B
l
(x)
M
l=1
is a convenient basis, commonly
known as the Bspline basis (de Boor, 1978), for the piecewise polynomial
space /
r
0
(). The coecients
li
are determined by requiring that the
approximate solution U satisfy the boundary conditions, and the dierential
equation at r specic points, the collocation points, in each subinterval.
These points are the Gauss points dened by
(i1)(r1)+k
= x
i1
+h
i
k
, i = 1, . . . , N + 1, k = 1, . . . , r,
where
k
r
k=1
are the r zeros of the Legendre polynomial of degree r on
the interval [0, 1] (cf., Section 4.1). The collocation equations then take the
55
form
0
(U) g
0
(U(0)) = 0,
l
(U) U
(
j
) f (
j
, U(
j
)) = 0, j = 1, 2, . . . , s 2,
s
(U) g
1
(U(1)) = 0.
(6.9)
Again a variant of Newtons method is usually used to solve the nonlinear
system (6.9). Certain properties of the Bsplines ensure that the Jacobian
of (6.9) is almost block diagonal but having a more general structure than
that of the matrix (6.5). These properties are (de Boor, 1978):
On each subinterval [x
i1
, x
i
] only r+1 Bsplines are nonzero, namely
B
r(i2)+1
, . . . , B
r(i1)+1
.
At x = 0, B
i
(0) =
i,1
and at x = 1, B
i
(1) =
i,s
where
ij
denotes the
Kronecker delta.
If the unknown coecients in (6.8) are ordered by columns, that is, rst
with respect to i and then with respect to l, the Jacobian of (6.9) has the
form
_
_
A
W
11
W
12
W
13
W
21
W
22
W
23
W
N+1,1
W
N+1,2
W
N+1,3
B
_
_
where the matrices A and B are m
1
m and m
2
m respectively, and
W
i1
, W
i2
and W
i3
are mr m, mr m(r 1) and mr m, respectively.
Under sucient smoothness conditions, it can be shown (see, for example
Ascher et al., 1979) that the error in U for x [x
i1
, x
i
] is given by
u
j
(x) U
j
(x) = c(x)u
(r+1)
j
(x
i1
)h
r+1
i
+O(h
r+2
), j = 1, . . . , m,
where c(x) is a known bounded function of x, and at each mesh point x
i
,
u
j
(x
i
) U
j
(x
i
) = O(h
2r
), j = 1, . . . , m, i = 0, 1, . . . , N + 1.
56
6.4 Multiple Shooting
This procedure involves nding an approximation U
i
to u(x
i
), i = 0, 1, . . . , N+
1, in the following way (cf. Stoer and Bulirsch, 1980). Let u
i
(x; U
i
) denote
the solution of the initial value problem
u
_
A
L
0
I
L
1
I
L
N
I
B
_
_
57
where
A =
g
0
U
1
is m
1
m, L
i
=
u
i
(x
i+1
; U
i
)
U
i
is mm, B =
g
1
U
N
is m
2
m,
and I denotes the m m unit matrix. In order to determine the m m
matrices L
i
, i = 0, 1, . . . , N, one must solve the initial value problems
L
i
(x) = J(x, u
i
(x; U
i
))L
i
(x), L
i
(x
i
) = I, i = 0, . . . , N, (6.14)
where J is the Jacobian matrix J =
f
u
. Then L
i
= L
i
(x
i+1
). Given
the function f and its Jacobian J, one can solve the initial value problems
(6.10) and (6.14) simultaneously using the values of U calculated from the
previous Newton iteration. In practice, these initial value problems are
solved numerically using an initial value routine, usually a RungeKutta
code.
6.5 Software Packages
**(THIS SECTION IS OUTDATED)** In recent years, several general
purpose, portable software packages capable of handling nonlinear boundary
value problems involving mixed order systems of ordinary dierential equa
tions have been developed. The use of these packages has led to the solution
of problems which were dicult and/or costly to solve, if not intractable,
using standard solution procedures, and has removed much of the guesswork
from the problem of solving boundary value problems for ordinary dieren
tial equations, (see, for example, Davis and Fairweather (1981), Denison et
al. (1983), Muir et al. (1983), Fairweather and VedhaNayagam, (1987),
Bhattacharya et al. (1986), Akyurtlu et al. (1986), VedhaNayagam et al.
(1987)).
In this section, we shall discuss three packages which are widely available
and are now in common use. In addition to implementing one of the basic
numerical methods described in Sections 8.28.4, each package incorporates
a sophisticated nonlinear equations solver, which involves a linear equations
solver, and a complex, fullyautomatic mesh selection and error estimation
algorithm upon which the success of the code largely depends.
The IMSL package, DVCPR, called BVPFD in the new IMSL Library,
implements a variable order nite dierence method based on the trapezoidal
rule with deferred corrections. Like the package DD04AD in the Harwell
Subroutine Library (1978) and D02RAF in the NAG Library, this package
58
is a version of the code PASVA3 (Pereyra, 1979). The code PASVA3 has
fairly long history and a series of predecessors has been in use for several
years. The package COLSYS (Ascher et al., 1981), which implements spline
collocation at Gauss points, and IMSLs multiple shooting code, DTPTB,
called BVPMS in the new IMSL Library, are of a more recent vintage. All
three codes are documented elsewhere, and in this section we shall only
briey mention some of their similarities and dierences, and other note
worthy features of the codes. It should be noted that Bader and Ascher
(1987) have developed a new version of COLSYS called COLNEW which
diers from COLSYS principally in its use of certain monomial basis func
tions instead of Bsplines. This new code, which is reputed to be somewhat
more robust that its predecessor, shares many of its features, and, in the
remainder of this section, any comments referring to COLSYS apply equally
well to COLNEW.
The codes DVCPR and DTPTB require that the boundary value prob
lem be formulated as a rstorder system, and they can handle nonseparated
boundary conditions, i.e. boundary conditions for (1.1), for example, of the
form
g(u(0), u(1)) = 0,
where g is a vector function of order m. On the other hand, COLSYS can
handle a mixedorder system of multipoint boundary value problems with
out rst reducing it to a rstorder system, but requires that the boundary
conditions be separated. This restriction is not a serious one as a boundary
value problem with nonseparated boundary conditions can be reformulated
so that only separated boundary conditions occur, as we shall see in Section
10.2. However this reformulation does increase the size of the problem.
The algebraic problem arising in the codes are similar but there is no
uniformity in the manner in which they are solved. Each code uses some
variant of Newtons method for the solution of the nonlinear systems, and a
Gauss eliminationbased algorithm to solve the almost block diagonal linear
systems. Since the codes were written, the package COLROW, (Diaz et al.,
1983) has been developed specically for the solution of such linear systems;
see Section 9.4. Its use in the existing codes has not been investigated, but
may improve their eciency signicantly.
Both DVCPR and COLSYS solve the boundary value problem on a se
quence of meshes until userspecied error tolerances are satised. Detailed
descriptions and theoretical justication of the automatic meshselection
procedures used in DVCPR and COLSYS are presented by Lentini and
Pereyra (1974) and Ascher et al. (1979), respectively. It is worth noting
59
that DVCPR constructs meshes in such a way that each mesh generated
contains the initial mesh, which is not necessarily the case in COLSYS.
In DTPTB, the shooting points x
1
, . . . , x
N
, are chosen by the user or,
for linear problems, by the code itself. Any adaptivity in the code appears
in the initial value solver, which in this case is IMSLs RungeKutta code,
DVERK.
In DVCPR, one can specify only an absolute tolerance, , say, which is
imposed on all components of the solution. The code attempts to nd an
approximate solution U such that the absolute error in each of its compo
nents is bounded by . On the other hand, COLSYS allows the user to
specify a dierent tolerance for each component of the solution, and in fact
allows one to impose no tolerance at all on some or all of the components.
This code attempts to obtain an approximate solution U such that
max
x[x
i
,x
i+1
]
[u
l
(x) U
l
(x)[
l
+ max
x[x
i
,x
i+1
]
[U
l
(x)[
l
for l
th
component on which the tolerance
l
is imposed. This criterion has
a denite advantage in the case of components with dierent order of mag
nitude or when components dier in magnitude over the interval of deni
tion. Moreover, it is often more convenient to provide an array of tolerances
than to rescale the problem. On successful termination, both DVCPR and
COLSYS return estimates of the errors in the components of approximate
solution.
In DTPTB, the same absolute tolerance is imposed on all components
of the solution. In addition, a boundary condition tolerance is specied,
and, on successful termination, the solution returned will also satisfy the
boundary conditions to within this tolerance.
Each code oers the possibility of providing an initial approximation to
the solution and an initial subdivision of the interval of denition (the shoot
ing points in the case of DTPTB). With COLSYS, one may also specify the
number of collocation points per subinterval. In many instances, parame
ter continuation is used to generate initial approximations and subdivisions.
That is, when solving a parameterized family of problems in increasing or
der of diculty, the subdivision and initial approximation for a particular
problem are derived from the nal mesh and the approximate solution cal
culated in the previous problem. This is an exceedingly useful technique
which improves the robustness of the packages. It should be noted that
COLSYS requires a continuous initial approximation, which is sometimes
an inconvenience.
60
It has been the authors experience that DTPTB is the least robust of
the three codes. However W. H. Enright and T. F. Fairgrieve (private com
munication) have made some modications to this code which have greatly
improved its performance, making it competitive with both DVCPR and
COLSYS. In general, there is little to choose between DVCPR and COL
SYS, and it is recommended that both be used to solve a given boundary
value problem. Driver programs for these codes are straightforward to write.
Moreover, there are denite advantages to using both codes. For example,
each provides its own insights when solving a problem, simple programming
errors can be detected more quickly by comparing results, and considerable
condence can be attached to a solution obtained by two dierent methods.
With the wide availability of these software packages, it is rarely advisable
or even necessary for the practitioner to develop ad hoc computer programs
for solving boundary value problems ordinary dierential equations.
7 Reformulation of Boundary Value Problems into
Standard Form
7.1 Introduction
Boundary value problems arising in various applications are frequently not in
the form required by existing general purpose software packages. However,
many problems can be easily converted to such a form thus enabling the
user to take advantage of the availability of this software and avoid the need
to develop a special purpose code.
In this section, we describe the conversion to standard form of some
rather common nonstandard problems which the author has encountered
in applications in chemical engineering and mechanical engineering. Some
of the material presented is adapted from the excellent survey article [5],
where additional examples may be found.
7.2 Nonseparated Boundary Conditions
The standard form required by several boundary value problem codes is
u
= f (x, u), x I,
g(u(0), u(1)) = 0,
(7.1)
where u, f and g have m components and f and/or g may be nonlinear.
The boundary conditions in (7.1) are said to be nonseparated.
61
While COLSYS/COLNEW can handle mixed order systems directly,
it does not accept nonseparated boundary conditions. A simple way to
convert a BVP of the form (7.1) with nonseparated boundary conditions to
one with separated boundary conditions, where each condition involves only
one point, is the following. We introduce the constant function v such that
v(x) = u(1), x I,
and obtain the equivalent boundary value problem
u
= f (x, u), v
= 0, x I,
g(u(0), v(0)) = 0, u(1) = v(1).
The penalty in this approach is that the transformed problem is twice the
order of the original problem. On the other hand, as we have seen, the ap
plication of standard techniques to boundary value problems with separated
boundary conditions involves the solution of almost block diagonal linear
systems. The algebraic problem is signicantly more involved when these
techniques are applied directly to the boundary value problem (7.1).
7.3 Singular Coecients
In problems involving cylindrical or spherical geometry, ordinary dierential
equations with singular coecients often occur. A prime example is the
problem of diusion and reaction in porous catalytic solids which gives rise
to a boundary value problem of the form
u
+
s
x
u
= R(u), x I, (7.2)
subject to the boundary conditions
u
(x) = 0, and
lim
x0
u
(x)
x
= u
=
R(u)
(1 +s)
.
62
With this modication, the codes DVCPR (a predecessor of BVPFD) and
DTPTB (a predessor of BVPMS) have been used successfully to solve prob
lems of the form (7.2)(7.3); see, for example, [15, 18].
7.4 Continuation
Many problems arising in engineering applications depend on one (or more)
parameters. Consider, for example, the boundary value problem
u
= f (x, u; ), x I,
g(u(0), u(1); ) = 0,
(7.4)
where is a parameter. For the desired value of ,
1
say, the problem (7.4)
might be exceedingly dicult to solve without a very good initial approx
imation to the solution u. If the solution of (7.4) is easily determined for
another value of ,
0
say, then a chain of continuation steps along the ho
motopy path (, u( ; )) may be initiated from (
0
, u( ;
0
)). At each step
of the chain, standard software can be used to determine an approximation
to u( ; ) which is then used, at the next step, as the initial approximation
to u( ; +). This procedure can also be used when the solution of (7.4)
is required for a sequence of values of , and the problems are solved in
increasing order of diculty.
As an example, consider the boundary value problem
u
+
s
x
u
+e
u
= 0, x I,
u
(0) = 0, u(1) = 0,
(7.5)
which describes the steadystate temperature distribution in the radial di
rection in the interior of a cylinder of unit radius when s = 1, and a sphere
of unit radius when s = 2 with heat generation according to the exponential
law [21]. There is no solution for > 1.7 when s = 1 and for > 3.0 when
s = 2. The quantity of interest in this problem is the dimensionless cen
ter temperature u(0). The boundary value problem (7.5) has been solved
for various values of in [18] using DVCPR and DTPTB (replacing the
dierential equation in (7.5) by
u
+
e
u
(1 +s)
= 0,
when x = 0, as described in section 3) and COLSYS, and comparisons made
with results obtained in [21]. The results for the case s = 1 are presented in
63
[21] COLSYS
0.1 0.252(1) 0.254822(1)
0.2 0.519(1) 0.519865(1)
0.3 0.796(1) 0.796091(1)
0.4 0.109 0.108461
0.5 0.138 0.138673
0.6 0.169 0.170397
0.7 0.2035 0.203815
0.8 0.238 0.239148
1.0 0.65 0.316694
1.5 1.1 0.575364
1.7 2.0 0.731577
Table 1: Dimensionless center temperature u(0) for the case s = 1.
Table 1. For each value of , COLSYS converged without diculty while
it was necessary to use continuation with the other codes; DVCPR was run
starting with = 1.7 and decreasing in steps of 0.1 to = 0.1 but did
not converge for = 1.7, while DTPTB was run with increasing in steps
of 0.1 from 0.1. In most cases, the results produced by DVCPR (except for
= 1.7) and DTPTB agree to six signicant gures with those obtained by
COLSYS, casting doubt on the validity of the results [21] for 1. Details
of these numerical experiments and a discussion of the case s = 2 are given
in [18].
7.5 Boundary Conditions at Innity
Consider the following boundary value problem on the semiinnite interval
[0, ):
u
+
1
3
(m+ 2)ff
1
3
(2m+ 1)(f
)
2
= 0, x [0, ),
f(0) = 0, f
(0) = 1, f
() = 0,
(7.7)
where m is a parameter. The boundary condition at innity is replaced by
f
+
1
3
L(m+ 2)ff
1
3
L(2m+ 1)(f
)
2
= 0, z I,
f(0) = 0, f
(0) = L
2
, f
(1) = 0,
(7.8)
which can now be solved directly by COLSYS or reduced to a rst order
system and solved by DVCPR or DTPTB. The quantity of interest in this
problem is f
+k = SF(f, g), I,
g
+k = SF(g, f), I,
(7.9)
65
S 0.5 0.0 1.0 25.0
(a) 1.3023 3.0000 6.2603 73.8652
(b) 1.3036 3.0018 6.2778 74.0414
Table 2: Values of k for = 1 and various values of S.
where
F(, ) = 2
+
1
2
(
)
2
1
2
( +),
subject to the boundary conditions
f(0) = f
(0) = g(0) = g
(0) = 0,
f(1) +g(1) = 2, f
(1) = g
(1) = 0.
(7.10)
In (7.9), the parameters and S are prescribed, and k is an unknown
constant. If we add to the boundary value problem (7.9)(7.10) the trivial
dierential equation
k
= 0,
the augmented boundary value problem can be solved using standard soft
ware. In Table 2, we give the values of k for = 1 and various values of S
determined by:
(a) solving the augmented boundary value problem using COLSYS and
DVCPR;
(b) [6].
Since COLSYS and DVCPR produce the same results to the number of
gures presented, it seems reasonable to assume that the results given by
(a) are the correct values.
7.7 Integral Constraints
In many boundary value problems, a term of the form
_
1
0
G(u(t), t) dt = R,
where R is a given constant, occurs as a constraint. In such cases, we dene
w(x) =
_
x
0
G(u(t), t) dt,
66
and replace the constraint by the dierential equation
w
(x) = G(u(x), x)
and the boundary conditions
w(0) = 0, w(1) = R.
As an example, consider the following boundary value problem which arises
in a study of membrane separation processes:
u
+
1
x
u
+c
1
+c
2
(1 e
(x,k)
) [(u, ) 1] +c
3
(u, )u = 0, (7.11)
where
(x, k) = k/(c
4
xk),
(u, ) = e
u
/[1 +c
5
e
(e
u
1)],
subject to
u
= xF(u, x, k),
w(0) = 0, w(1) = 0,
and the trivial ordinary dierential equation
k
= 0.
This augmented boundary value problem was solved successfully by [7] using
COLSYS.
67
7.8 Interface Conditions
In problems involving layered media, for example, one has interface condi
tions at known interior points. Such a problem is
Lu = 0, x [0, ],
Lu
2
1
u = 0, x [, 1],
(7.14)
where
Lu =
d
2
u
dx
2
+
1
x +
du
dx
,
and and
1
are known constants, subject to
u(0) = 1,
du
dx
(1) = 0, u(
) = u(
+
),
du
dx
(
) =
du
dx
(
+
). (7.15)
To obtain a standard boundary value problem, we map the problem (7.14)
(7.15) to [0, 1], by setting z = x/ to map [0, ] to [0, 1], and then z =
(1 x)/(1 ) maps [, 1] to [1, 0]. With
L
0
u =
d
2
u
dz
2
+
2
+
du
dz
,
and
L
1
u =
d
2
u
dz
2
+
1
( 1)
2
+ 1 +
du
dz
,
we obtain
L
0
u
1
= 0 L
1
u
2
2
1
( 1)
2
u
2
= 0 (7.16)
and
u
1
(0) = 1,
du
2
dz
(0) = 0,
u
1
(1) = u
2
(1),
1
du
1
dz
(1) =
1
1
du
2
dz
(1).
(7.17)
An additional complication arises when the location of the interface, x = ,
is unknown. Such a problem is discussed in [2], where in addition to (7.14),
(7.15), we have
Lv
2
2
u = 0, x [, 1], (7.18)
and the boundary conditions
v(1) = 0,
dv
dx
() = 0, v(1) = 1, (7.19)
68
where
2
is a given constant. Equation (7.18) is transformed to
L
1
v
1
2
2
( 1)
2
u
2
= 0, z I, (7.20)
and (7.19) becomes
v
1
(0) = 1, v
1
(1) = 0,
dv
1
dz
(1) = 0. (7.21)
Since is unknown, to the boundary value problem consisting of (7.16),
(7.17), (7.20) and (7.21), we add the trivial ordinary dierential equation
d
dz
= 0, z I.
69
8 Matrix Decomposition Algorithms for Poissons
Equation
8.1 Introduction
In this section, we consider the use of nite dierence, nite element Galerkin
and orthogonal spline collocation methods on uniform partitions for the so
lution of Poissons equation in the unit square subject to Dirichlet boundary
conditions:
u = f(x, y), (x, y) ,
(8.1)
u(x, y) = 0, (x, y) ,
where denotes the Laplacian and = (0, 1)
2
. Each method gives rise to
a system of linear equations of the form
(AB +B A)u = f , (8.2)
where A and B are square matrices of order M, say, u and f are vectors of
order M
2
given by
u = [u
1,1
, . . . , u
1,M
, . . . , u
M,1
, . . . , u
M,M
]
T
, (8.3)
f = [f
1,1
, . . . , f
1,M
, . . . , f
M,1
, . . . , f
M,M
]
T
, (8.4)
and denotes the tensor product; see Appendix D for the denition of
and its properties. A matrix decomposition algorithm is a fast direct method
for solving systems of the form (8.2) which reduces the problem to one of
solving a set of independent onedimensional problems. We rst develop a
framework for matrix decomposition algorithms. To this end, suppose the
real nonsingular matrix E is given and assume that a real diagonal matrix
and a real nonsingular matrix Z can be determined so that
AZ = BZ (8.5)
and
Z
T
EBZ = I, (8.6)
where I is the identity matrix of order M. Premultiplying (8.5) by Z
T
E
and using (8.6), we obtain
Z
T
EAZ = . (8.7)
70
The system of equations (8.2) can then be written in the form
(Z
T
E I)(AB +B A)(Z I)(Z
1
I)u = (Z
T
E I)f , (8.8)
which becomes, on using the properties of , (8.6) and (8.7),
( B +I A)(Z
1
I)u = (Z
T
E I)f . (8.9)
From the preceding, we obtain the following algorithm for solving (8.2):
MATRIX DECOMPOSITION ALGORITHM
1. Determine the matrices and Z satisfying (8.5) and (8.6).
2. Compute g = (Z
T
E I)f .
3. Solve ( B +I A)v = g.
4. Compute u = (Z I)v.
In the discretization methods considered in this section, the matrices
and Z are known explicitly. If = diag
i
M
i=1
, Step 2 reduces to a system
of independent problems of the form
(A+
j
B)v
j
= g
j
, j = 1, . . . , M.
It is shown that, for each method, these systems correspond to discretiza
tions of twopoint boundary value problems of the form
u
u
m1,n
2u
m,n
+u
m+1,n
h
2
u
m,n1
2u
m,n
+u
m,n+1
h
2
= f(x
m
, y
n
), m, n = 1, ..., N,
(8.11)
where u
0,n
= u
N+1,n
= u
m,0
= u
m,N+1
= 0. If we set M = N and introduce
vectors u and f as in (8.3) and (8.4), respectively, with f
m,n
= f(x
m
, y
n
),
then the nite dierence equations (8.11) may be written in the form
(AI +I A)u = f , (8.12)
with A = J, where J is the tridiagonal matrix of order N given by (3.13).
It is well known that (8.5) and (8.6) are satised if B = E = I,
= diag(
1
, . . . ,
N
), (8.13)
where
j
=
4
h
2
sin
2
jh
2
, j = 1, . . . , N, (8.14)
and Z is the symmetric orthogonal matrix given by
Z = S, (8.15)
where
S =
_
2
N + 1
_
1/2
_
sin
mn
N + 1
_
N
m,n=1
. (8.16)
From the Matrix Decomposition Algorithm with and Z dened by (8.13)
and (8.15), respectively, we obtain the following matrix decomposition algo
rithm for solving (8.12).
FINITE DIFFERENCE ALGORITHM
1. Compute g = (S I)f .
2. Solve ( I +I A)v = g.
3. Compute u = (S I)v.
72
Note that Steps 1 and 3 can be carried out using FFTs at a cost of O(N
2
log N)
operations. Step 2 consists of N tridiagonal linear systems each of which
can be solved in O(N) operations so that the total cost of the algorithm is
O(N
2
log N) operations. Each tridiagonal system corresponds to the stan
dard nite dierence approximation to (8.10).
8.3 Finite Element Galerkin Methods with Piecewise Bilin
ear Elements
To obtain the weak form of the Poisson problem (8.1), let H
1
0
() denote
the space of all piecewise continuously dierentiable functions dened on
which vanish on . Then, for all v H
1
0
(), u(x, y) satises
u(x, y)v(x, y) = f(x, y)v(x, y),
and
_
u
x
v
x
+
u
y
v
y
_
dx dy =
_
g
1
(x, y) g
2
(x, y)dx dy
for functions g
1
and g
2
with the dot denoting scalar multiplication in the
case of vector functions. Then (8.18) can be written in the form
(u, v) = (f, v) for all v H
1
0
(). (8.19)
This is the weak form of (8.1) on which the nite element Galerkin method
is based.
Now let x
k
N+1
k=0
be a uniform partition of the interval [0, 1], so that
x
k
= kh, k = 0, . . . , N + 1, where the stepsize h = 1/(N + 1). Let w
n
N
n=1
denote the standard basis for the space of piecewise linear functions dened
on this partition which vanish at 0 and 1; see (3.10). Then the C
0
piecewise
bilinear Galerkin approximation
u
h
(x, y) =
N
m=1
N
n=1
u
m,n
w
m
(x)w
n
(y)
73
to the solution u of (2.1) is obtained by requiring that
(u
h
, v) = (f, v), (8.20)
for all piecewise bilinear functions v which vanish on . If M = N, and u
and f are as in (8.3) and (8.4), respectively, with f
m,n
= (f,
m
n
), then we
obtain the linear system (8.2) in which the matrices A and B are given by
(3.14). If E = I,
= diag
_
j
/[1
1
6
j
]
_
N
j=1
, (8.21)
and
Z = S diag
_
_
h[1
1
6
j
]
_
1/2
_
N
j=1
, (8.22)
where
j
and S are given by (8.14) and (8.16), respectively, then (8.5)
and (8.6) are satised. Thus, with and Z dened by (8.21) and (8.22),
respectively, we have the following matrix decomposition algorithm:
FINITE ELEMENT GALERKIN ALGORITHM
1. Compute g = (Z
T
I)f .
2. Solve ( B +I A)v = g.
3. Compute u = (Z I)v.
As in the nite dierence method, Step 2 involves the solution of N tridi
agonal systems and consequently the computational cost of the algorithm
is O(N
2
log N) operations. In this case, each tridiagonal system arises from
the Galerkin approximation of (8.10).
8.4 Orthogonal Bicubic Spline Collocation Method
Again, let x
k
N+1
k=0
be a uniform partition of the interval [0, 1], and let
M
n=1
, where M = 2N + 2 be a basis for /
3,0
1
given by
n
= v
n
, n = 1, . . . , N,
N+n+1
= s
n
, n = 0, . . . , N + 1. (8.23)
74
This is the standard basis but the ordering of the basis functions is nonstan
dard. The Hermite bicubic orthogonal collocation approximation
u
h
(x, y) =
M
m=1
M
n=1
u
m,n
m
(x)
n
(y)
to the solution u of (2.1) is obtained by requiring that
u
h
(
m
,
n
) = f(
m
,
n
), m, n = 1, . . . , M. (8.24)
With u and f as in (8.3) and (8.4), respectively, where f
m,n
= f(
m
,
n
),
equations (8.24) can be written in the form (8.2) with
A = (a
mn
)
M
m,n=1
, a
mn
=
n
(
m
), B = (b
mn
)
M
m,n=1
, b
mn
=
n
(
m
).
(8.25)
Let
= diag(
1
, . . . ,
N
,
0
,
+
1
, . . . ,
+
N
,
N+1
), (8.26)
where
j
= 12
_
8 +
j
j
7
j
_
h
2
, j = 1, . . . , N,
0
= 36h
2
,
N+1
= 12h
2
,
and
j
= cos
_
j
N + 1
_
,
j
=
_
43 + 40
j
2
2
j
.
To describe the matrix Z, let
= diag(
1
, . . . ,
N
),
= diag(
1
, . . . ,
N
),
+
= diag(1,
+
1
, . . . ,
+
N
, 1/
3),
where
j
= (5 + 4
j
j
)
j
,
j
= 18 sin
_
j
N + 1
_
j
,
and
j
=
_
27(1 +
j
)(8 +
j
j
)
2
+ (1
j
)(11 + 7
j
4
j
)
2
_
1/2
.
Then
Z = 3
3
_
S
0 S
+
C
+
_
, (8.27)
75
where 0 is the Ndimensional zero column vector, S is given by (8.16), and
C =
_
2
N + 1
_
1/2
_
cos
mn
N + 1
_
N+1
m,n=0
,
C =
_
2
N + 1
_
1/2
_
cos
mn
N + 1
_
N+1,N
m=0,n=1
.
It can be shown that if A, B, , Z are given by (8.25), (8.26), (8.27), and
E = B
T
, then equations (8.5) and (8.6) are satised. Thus, from the Matrix
Decomposition Algorithm, we obtain:
ORTHOGONAL SPLINE COLLOCATION ALGORITHM
1. Compute g = (Z
T
B
T
I)f .
2. Solve ( B +I A)v = g.
3. Compute u = (Z I)v.
Since there are at most four nonzero elements in each row of the matrix B
T
the matrixvector multiplications involving the matrix B
T
in step 1 require
a total of O(N
2
) arithmetic operations. From (8.27), it follows that FFT
routines can be used to perform multiplications by the matrix Z
T
in step
1 and by the matrix Z in step 3, the corresponding cost of each step being
O(N
2
log N) operations. Step 2 involves the solution of M independent
almost block diagonal linear systems with coecient matrices of the form
(4.7) arising from the orthogonal spline collocation approximation of (8.10),
which can be solved in a total of O(N
2
) operations. Thus the total cost of
this algorithm is also O(N
2
log N) operations.
76
References
1. S. Agmon, Lectures on Elliptic BoundaryValue Problems, Van Nos
trand, Princeton, New Jersey, 1965.
2. A. Akyurtlu, J. F. Akyurtlu, C. E. Hamrin Jr. and G. Fairweather, Re
formulation and the numerical solution of the equations for a catalytic,
porous wall, gasliquid reactor, Computers Chem. Eng., 10(1986),
361365.
3. P. Amodio, J. R. Cash, G. Roussos, R. W. Wright, G. Fairweather, I.
Gladwell, G. L. Kraut and M. Paprzycki, Almost block diagonal linear
systems: sequential and parallel solution techniques, and applications,
Numerical Linear Algebra with Applications, to appear.
4. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz,
A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov and D.
Sorensen, LAPACK Users Guide, SIAM Publications, Philadelphia,
1995.
5. U. Ascher and R. D. Russell, Reformulation of boundary value prob
lems into standard form, SIAM Rev., 23(1981), 238254.
6. A. Aziz and T. Y. Na, Squeezing ow of a viscous uid between elliptic
plates, J. Comput. Appl. Math., 7(1981), 115119.
7. D. Bhattacharyya, M. Jevtitch, J. T. Schrodt and G. Fairweather,
Prediction of membrane separation characteristics by pore distribution
measurements and surface forcepore ow model, Chem.Eng. Com
mun., 42(1986), 111128.
8. C. de Boor, A Practical Guide to Splines, Applied Math. Sciences 27,
SpringerVerlag, New York, 1978.
9. C. de Boor and R. Weiss, SOLVEBLOK: A package for solving almost
block diagonal linear systems, ACM Trans. Math. Software, 6(1980),
8087.
10. C. de Boor and R. Weiss, Algorithm 546: SOLVEBLOK, ACM Trans.
Math. Software, 6(1980), 8891.
11. J.C. Diaz, G. Fairweather and P. Keast FORTRAN packages for solv
ing certain almost block diagonal linear systems by modied alternate
row and column elimination, ACM Trans. Math. Software, 9(1983),
358375.
77
12. J. C. Diaz, G. Fairweather and P. Keast, Algorithm 603 COLROW and
ARCECO: FORTRAN packages for solving certain almost block diag
onal linear systems by modied alternate row and column elimination,
ACM Trans. Math. Software, 9(1983), 376380.
13. J. Douglas Jr. and T. Dupont, Collocation Methods for Parabolic
Equations in a Single Space Variable, Lecture Notes in Mathematics
385, SpringerVerlag, New York, 1974.
14. J. Douglas Jr., T. Dupont and L. Wahlbin, Optimal L
error estimates
for Galerkin approximations to solutions of twopoint boundary value
problems Math. Comp., 29(1975), 475483.
15. K. S. Denison, C. E. Hamrin Jr. and G. Fairweather, Solution of
boundary value problems using software packages: DD04AD and COL
SYS, Chem. Eng. Commun., 22(1983), 19.
16. E. J. Doedel, Finite dierence collocation methods for nonlinear two
point boundary value problems, SIAM J. Numer. Anal., 16(1979), 173
185.
17. G. Fairweather, Finite Element Galerkin Methods for Dierential Equa
tions, Lecture Notes in Pure and Applied Mathematics, Vol. 34, Mar
cel Dekker, New York, 1978.
18. G. Fairweather and M. VedhaNayagam, An assessment of numerical
software for solving twopoint boundary value problems arising in heat
transfer, Numerical Heat Transfer, 11(1987), 281293.
19. H. B. Keller, Numerical Methods for TwoPoint Boundary Value Prob
lems, Dover, New York, 1992.
20. T. Y. Na and I. Pop, Free convection ow past a vertical at plate
embedded in a saturated porous medium, Int.J. Engrg. Sci., 21(1983),
517526.
21. T. Y. Na and S. C. Tang, A method for the solution of conduction heat
transfer with nonlinear heat generation, Z. Angew. Math. Mech.,
49(1969), 4552.
22. R. S. Stepleman, Tridiagonal fourth order approximations to general
twopoint nonlinear boundary value problems with mixed boundary
conditions, Math. Comp., 30(1976), 92103.
78
23. J. M. Varah, Alternate row and column elimination for solving certain
linear systems, SIAM J. Numer. Anal., 13(1976), 7175.
24. M. F. Wheeler, A Galerkin procedure for estimating the ux for two
point boundary value problems, SIAM J. Numer. Anal., 11(1974), 764
768.
Appendix A. Galerkin Matrices for Piecewise Linear Functions.
For the matrix, A = ((w
i
, w
j
)):
(w
i
, w
i
) =
_
1
0
[w
i
(x)]
2
dx
=
_
x
i
x
i1
1
(x
i
x
i1
)
2
dx +
_
x
i+1
x
i
1
(x
i+1
x
i
)
2
dx
=
_
1
(x
i
x
i1
)
+
1
(x
i+1
x
i
)
_
=
1
h
i
+
1
h
i+1
;
(w
i
, w
i+1
) =
_
x
i+1
x
i
1
(x
i+1
x
i
)
2
dx =
1
(x
i+1
x
i
)
=
1
h
i+1
;
(w
i
, w
i1
) =
1
h
i
.
For the matrix B = ((w
i
, w
j
)) :
(w
i
, w
i
) =
_
x
i
x
i1
_
x x
i1
x
i
x
i1
_
2
dx +
_
x
i+1
x
i
_
x x
i
x
i
x
i+1
_
2
dx
=
1
3
[h
i
+h
i+1
];
(w
i
, w
i+1
) =
_
x
i+1
x
i
_
x
i+1
x
x
i+1
x
i
__
x x
i
x
i+1
x
i
_
dx
=
1
h
2
i+1
1
2
(x
i+1
x)(x x
i
)
2
[
x
i+1
x
i
+
1
2
_
x
i+1
x
i
(x x
i
)
2
dx
=
1
6
h
i+1
.
Similarly,
(w
i
, w
i1
) =
1
6
h
i
.
79
Appendix B. Galerkin Matrices for Piecewise Hermite Cubic
Functions. We consider the case in which the partition of I is uniform
and let h = x
i+1
x
i
, i = 0, . . . , N. The conditioning of the matrices is
improved if the slope functions are normalized by dividing by h. Thus we
dene
s
i
(x) = h
1
s
i
(x), i = 0, . . . , N + 1.
If the Galerkin solution u
h
is expressed in the form
u
h
(x) =
N+1
i=0
(1)
i
v
i
(x) +
(2)
i
s
i
(x),
then
(1)
i
= u
h
(x
i
),
as before, but now
(2)
i
= hu
h
(x
i
), i = 0, . . . , N + 1.
If A and B are partitioned as in (3.16) then, with the normalization of
the slope functions, we have, if A = ((w
i
, w
j
)) (A
ij
) and B = ((w
i
, w
j
))
(B
ij
),
A
00
= (30h)
1
_
36 3
3 4
_
,
A
ii
= (15h)
1
_
36 0
0 4
_
, i = 1, . . . , N,
A
N+1,N+1
= (30h)
1
_
36 3
3 4
_
,
A
i,i+1
= A
T
i+1,i
= (30h)
1
_
36 3
3 1
_
, i = 0, . . . , N,
and
B
00
=
h
210
_
78 11
11 2
_
,
B
ii
=
h
105
_
78 0
0 2
_
, i = 1, . . . , N,
80
B
N+1,N+1
=
h
210
_
78 11
11 2
_
,
B
i,i+1
= B
T
i+1,i
=
h
420
_
54 13
13 3
_
, i = 0, . . . , N.
Appendix C. Elements of Piecewise Hermite Cubic Orthogonal
Spline Collocation Matrices. On [x
i
, x
i+1
], with x
i
= ih,
v
i
(x) = 2
_
x
i+1
x
h
_
3
+ 3
_
x
i+1
x
h
_
2
,
v
i
(x) =
6
h
3
(x
i+1
x)
2
6
h
2
(x
i+1
x),
v
i
(x) =
12
h
3
(x
i+1
x) +
6
h
2
.
From (4.5),
2i+1
= x
i
+
1
2
h(1 +
1
), where
1
= 1/
3. Then, if x =
2i+1
x
i+1
x =
1
2
h(1
1
)
and
v
i
(
2i+1
) =
1
4
(1
1
)
3
+
3
4
(1
1
)
2
=
1
4
(1
1
)
2
(2 +
1
)
v
i
(
2i+1
) =
3
2h
(1
1
)
2
3
h
(1
1
)
=
3
2h
(1
2
1
)
v
i
(
2i+1
) =
6
h
2
(1
1
) +
6
h
2
=
6
1
h
2
Similar expressions hold for v
i
(
2i+2
), v
i
(
2i+2
), v
i
(
2i+2
), where
2i+2
=
x
i
+
1
2
h(1 +
2
),
2
=
1
.
On [x
i
, x
i+1
],
v
i+1
(x) = 2
_
x x
i
h
_
3
+ 3
_
x x
i
h
_
2
,
v
i+1
(x) =
6
h
3
(x x
i
)
2
+
6
h
2
(x x
i
)
v
i+1
(x) =
12
h
3
(x x
i
) +
6
h
2
.
81
If x =
2i+1
,
x x
i
=
1
2
h(1 +
1
).
Therefore,
v
i+1
(
2i+1
) =
1
4
(1 +
1
)
3
+
3
4
(1 +
1
)
2
=
1
4
(1 +
1
)
2
(2
1
),
v
i+1
(
2i+1
) =
3
2h
(1 +
1
)
2
+
3
h
(1 +
1
)
=
3
2h
(1
2
1
),
v
i+1
(
2i+1
) =
6
h
2
(1 +
1
) +
6
h
2
i+1
=
6
1
h
2
,
with similar expressions for v
i+1
(
2i+2
), v
i+1
(
2i+2
), v
i+1
(
2i+2
).
With s
i
= h
1
s
i
as in Appendix B, on [x
i
, x
i+1
],
s
i
(x) =
_
_
x
i+1
x
h
_
3
_
x
i+1
x
h
_
2
_
,
s
i
(x) =
_
3
h
3
(x
i+1
x)
2
+
2
h
2
(x
i+1
x)
_
,
s
i
(x) =
_
6
h
3
(x
i+1
x)
2
h
2
_
.
With x
i+1
x =
1
2
h(1
1
), we obtain
s
i
(
2i+1
) =
_
1
8
(1
1
)
3
1
4
(1
1
)
2
_
=
1
8
(1
1
)
2
1
1
2
=
1
8
(1
1
)
2
(1 +
1
),
s
i
(
2i+1
) =
_
3
h
(1
1
)
2
4
+
1
h
(1
1
)
_
=
1
4h
(1
1
)3(1
1
) + 4
82
=
1
4h
(1
1
)(1 + 3
1
),
s
i
(
2i+1
) =
_
3
h
2
(1
1
)
2
h
2
_
=
1
h
2
(1 3
1
),
with similar expressions for s
i
(
2i+2
), s
i
(
2i+2
), and s
i
(
2i+2
).
On [x
i
, x
i+1
],
s
i+1
(x) =
_
x x
i
h
_
3
_
x x
i
h
_
2
,
s
i+1
(x) =
3
h
3
(x x
i
)
2
2
h
2
(x x
i
),
s
i+1
(x) =
6
h
3
(x x
i
)
2
h
2
.
Then, with x x
i
=
1
2
h(1 +
1
), we have
s
i+1
(
2i+1
) =
_
1
8
(1 +
1
)
3
1
4
(1 +
1
)
2
_
=
1
8
(1 +
1
)
2
(1
1
),
s
i+1
(
2i+1
) =
_
3
4h
(1 +
1
)
2
1
h
(1 +
1
)
_
=
1
4h
(1 +
1
)(1 3
1
),
s
i+1
(
2i+1
) =
1
h
2
3(1 +
1
) 2
=
1
h
2
(1 + 3
1
),
with similar expressions for s
i+1
(
2i+2
), s
i+1
(
2i+2
), s
i+1
(
2i+2
).
Appendix D. Properties of the Tensor Product. If A = (a
ij
) is
an MM matrix and B is an N N matrix, then the tensor product of A
and B, AB, is the MN MN block matrix whose (i, j) element is a
ij
B.
The tensor product has the following properties:
A(B +C) = AB +AC;
(B +C) A = B A+C A;
(AB)(C D) = AC BD, provided the matrix products are dened.
83