Vous êtes sur la page 1sur 9

10 Matrix norms and Conditioning

10.1 Matrix norms


10.1.1 Denition of a matrix norm
The (Euclidean) norm of a vector is well known:
a
2
= a
T
a.
For matrices various norms exist, but the one in which we are interested is expressible
in terms of the norm of a vector. Consider the vector x as well as Ax. Normally Ax
points in a dierent direction from x and it has a dierent length. In fact x and Ax
may even lie in dierent spaces if A is rectangular. If we only consider the lengthening
of the vector x and disregard its direction change, we may calculate
R =
Ax
x
.
We now test A on all possible vectors x, of course excluding x = 0. It should be clear
that R does not depend on the particular length of x, but only on its direction, since
if we double x both the numerator and the denominator are doubled.
If we consider the value of R for all possible directions of x, it should attain a maximum
for some direction. We now dene the norm of A as the maximum lengthening that A
can produce, or
A = max
x=0
Ax
x
. (10.1)
Note that A
2
may be expressed as
A
2
= max
x=0
x
T
A
T
Ax
x
T
x
. (10.2)
The expression
x
T
Bx
x
T
x
,
in known as the Rayleigh quotient of the matrix B with respect to the vector x. The
norm of a matrix A is therefore the maximum Rayleigh quotient of A
T
A.
Since the length of x is unimportant, it suces to consider only all x with unit length,
so that we may also write
A = max
x=1
Ax. (10.3)
It is usually easier to work with the square of the norm,
A
2
= max
x=1
Ax
2
= max
x=1
x
T
A
T
Ax. (10.4)
We shall mostly may use of (10.4) as denition.
2 Matrix norms and Conditioning
10.1.2 The norm and singular values
Calculating the norm:
Consider the the SVD of A,
A = UV
T
,
then
A
T
A = V V
T
, with =
T
.
Recall that is square and has eigenvalues
2
1
,
2
2
, ...
Then
A
2
= max
x=1
_
x
T
A
T
Ax
_
= max
x=1
_
(x
T
Q)(Q
T
x)
_
Let y = Q
T
x. Since Q is orthogonal, y = x = 1, and then
A
2
= max
y=1
_
y
T
y
_
Let y = [y
1
y
2
...y
n
]
T
, then
A
2
= max
y=1
_
_
_
_
_
_
[y
1
y
2
...y
n
]
_

2
1
0 ... 0
0
2
... 0
.
.
. 0
.
.
.
0
0 0
n
_

_
_

_
y
1
y
2
.
.
.
y
n
_

_
_
_
_
_
_
_
= max
y=1
_

2
1
y
2
1
+
2
2
y
2
2
+...
2
n
y
2
n
_
=
2
1
, when y
1
= 1 and all the other y
j
= 0.
Geometric interpretation of the norm:
You may recall that a (2 2)-matrix acting on a unit vector (i.e. the circle with radius
one) produces an ellipse. What is the greatest lengthening the matrix can produce?
It is the semi-axis major of the ellipse. It is in fact the largest singular value of the
matrix.
Matrix norms and Conditioning 3
A 33 matrix acting on a unit vector in IR
3
, will map
the unit sphere to an ellipsoid in 3D space. Figure
1 shows the unit sphere and also the ellipsoid that is
produced when the matrix
A =
_

_
0.5 0.1 1.6
0.7 0.1 1.5
0.8 1.0 2.5
_

_
maps the unit sphere to 3D space. The singular values
of A are
1
= 3.509,
2
= 0.967, and
3
= 0.460.
These are exactly the lengths of the three semi-axes
of the ellipsoid.
Once again the greatest lengthening the matrix can
produce is if it sends a vector to the longest semi-axis
of the ellipsoid.In general
A =
1
.
Figure 1. The transformation ellip-
soid for A.
The norm of the inverse:
If A is an invertible square matrix, then A
1
exists, and A has no zero singular values.
Then the singular values of A
1
can be found from,
A
1
= (UV
T
)
1
= V
1
U
T
.
If the singular values of A are in descending order

1
,
1
, ...,
n
then the singular values of A
1
in descending order are
1

n
,
1

n1
, ...,
1

1
.
The largest singular value of A
1
is therefore the reciprocal of the smallest singular
value of A.
Matrices with norm=1:
Any matrix that maps the unit sphere to the unit sphere, has a norm of 1. In particular
for the identity matrix
I = 1.
For orthogonal matrices (rotations and reections) we have Q
T
Q = I, then
Q
2
= max
x=1
_
x
T
Ix
_
= 1.
Consider a projection matrix. The greatest lengthening a projection can produce is for
vectors already in the projection space. Such vectors retain their length after projection
so that the norm of a projection matrix is
P = 1.
4 Matrix norms and Conditioning
10.1.3 Inequalities for norms
The following inequalities are often employed when working with vector norms.
The Triangle inequality:
x +y x + y (10.5)
Scaling:
x = || x (10.6)
For matrices we also have
The Triangle inequality:
A +B A + B (10.7)
Scaling:
A = || A (10.8)
These rules follow directly from the same rules for vector norms.
From the denition of a matrix norm it follows that
A = max
x=0
Ax
x
, (10.9)
but x is positive, so that we may multiply both sides of the equation by it and retain
notion of a maximum,
A x = max
x=0
Ax (10.10)
Turn this into an inequality, then
A x Ax. (10.11)
10.2 The Condition of a Matrix
10.2.1 Ill-conditioned systems
An (nn)-system of linear equations, Ax = b, has only one unique solution if b C(A).
If the matrix A is singular, i.e. det(A) = 0, then it has no solution for most values of
b.
What if the matrix is close to singular? In a strict mathematical sense it still has only
one unique solution, but this solution may be dicult to nd numerically. What is
more important is that a small change in b results in a large change in x. The following
is an example: we solve Ax
1
= b
1
and also Ax
2
= b
2
.
A =
_
13 1
1 0.077
_
, b
1
=
_
14
1.077
_
, b
2
=
_
14
1
_
.
Matrix norms and Conditioning 5
The solutions are
x
1
=
_
1
1
_
, x
2
=
_
78
1000
_
.
One may understand why the solution dier so much in spite of the fact that the right
hand vectors dier by only 0.077 in one element. The two equations represent two lines
of the form
y = 13 x + 14
y = 12.987 x + 13.987
Note that it is the matrix that is responsible for determining the slopes of the lines,
and it is the right hand vector that determines the y-intercepts of the lines.
These lines have almost the same slopes, therefore the position of their intersection will
change drastically if the y-intercept is changed only slightly in any of the equations.
The system of equations is said to be ill-conditioned.
10.2.2 Determinants and conditioning
One might argue that a square system where the matrix has a zero determinant cannot
have a solution. So, if the determinant is close to zero, it may be dicult to nd a
solution numerically. Note that det(A) = 0.001. The determinant is indeed small, i.e.
close to zero. Does this signify ill-conditioning ?
Consider another matrix C with a small determinant as well. Find the solutions of
Cx
1
= b
1
and Cx
2
= b
2
, where
C =
_
0.04 0.02
0.01 0.02
_
, b
1
=
_
14
1.077
_
, b
2
=
_
14
1
_
.
Here, det(C) = 0.001 as well, but the solutions are
x
1
=
_
258.46
183.08
_
, x
2
=
_
260
180
_
.
A matrix with small elements has a small determinant. The best matrix to use for
creating a system to be solved is the identity matrix. Its determinant is one. However,
the 2 2 matrix 0.001I is just as good a matrix but its determinant is 0.000001.
We must conclude that the determinant alone says nothing about the condition of
a matrix. We need a dierent number ascribed to the matrix, that measures the
sensitivity of the solution of a system with that matrix with respect to small changes
in the right hand vector.
10.2.3 The condition number
A perturbed system:
6 Matrix norms and Conditioning
Consider a square system of equations,
Ax = b. (10.12)
We shall only be concerned with small changes in the right hand vector b. (Small
changes in the elements of the matrix A constitute yet another problem that may be
discussed, but this case will not be covered here.)
If the right hand vector is perturbed to b + b, the solution should also change to
x + x, i.e.
A(x + x) = b + b. (10.13)
If (10.13) is multiplied out and (10.12) is subtracted, we obtain
Ax = b. (10.14)
It is obvious that if we double b in length then the solution will be x, doubled in length.
If we double the perturbation b then the error in the solution x will be doubled.
We are therefore only interested in the relative errors in x and b. Furthermore it is the
norms of the relative errors that is important, therefore we shall specically consider
relative error in x =
x
x
and
relative error in b =
b
b
.
Denition of the condition number:
We shall dene the condition number of a matrix A as the number cond(A) such that
x
x
cond(A)
b
b
. (10.15)
When solving a perturbed system Ax = b, the relative error in x may be several times
the relative error in b. The condition number says how many times.
A formula for the condition number:
Consider the norm of Ax = b, and using the inequality (10.11), we have
A x b (10.16)
Invert Ax = b,
x = A
1
b,
take the norm and use inequality (10.11) as well,
A
1
b x (10.17)
Since both (10.16) and (10.17) consists of positive numbers we may multiply the largest
sides together and this will still be larger that the product of the smaller sides, hence
A x A
1
b b x
Matrix norms and Conditioning 7
Divide by x b and rearrange,
x
x
A A
1

b
b
.
Comparing this to (10.15) it is clear that
cond(A) = A A
1
. (10.18)
Expressed in terms of the singular values of A, we have
cond(A) =

1

n
, (10.19)
where
n
is the smallest singular value. The condition number gives an indication
of how elongated the transformation ellipsoid is, regardless of its actual size. The
transformation ellipsoid of 2A has semi-axes that are twice as long as those of the
transformation ellipsoid of A, but the condition number divides two singular values, so
that absolute scaling disappears.
Condition numbers of good matrices:
The best matrices for solving a system of equations are those that have their singular
values all the same. It is a matrix that maps the unit sphere to another sphere. Since
the largest and smallest singular value is the same, the condition number will be one.
A condition number cannot be smaller than one.
An orthogonal matrix may be singular-value-decomposed as
Q = Q I I
T

U V
T
The U for an orthogonal matrix is therefore itself and its singular values are all ones.
Then cond(Q) = 1.
Also any multiple of the identity matrix has cond(I) = 1.
10.2.4 Good and bad algorithms
If a matrix is ill-conditioned, there is nothing one can do about it. The solution of the
system is simply sensitive to small changes in the right hand vector. Often the small
changes come in the form round-o errors introduced by the oating point system used
by the computer.
However, in solving a system one can use a bad algorithm that actually allows the
round-o errors to accumulate faster than other algorithms. For example, if one solves
the system by doing LU-decomposition of A, and then solving Lc = b, followed by
Ux = c, one may actually create two systems that are worse to solve than the original
matrix. Consider for example again the matrix A (used in Figure 1) as well as its
8 Matrix norms and Conditioning
LU-decomposition
A =
_

_
0.5 0.1 1.6
0.7 0.1 1.5
0.8 1.0 2.5
_

_ =
_

_
1.0000 0 0
1.4000 1.0000 0
1.6000 4.8333 1.0000
_

_
_

_
0.5000 0.1000 1.6000
0 0.2400 3.7400
0 0 13.0167
_

_
We have
cond(A) = 7.6225, cond(L) = 39.7260, cond(U) = 60.3774.
In solving the system by LU-decomposition we have created two systems with worse
condition numbers than the original problem. The fact that the diagonal elements
of U dier so much in absolute magnitude, is already a sign that we are solving a
bad system. (Actually, usually LU-decomposition is done with partial pivoting this
improves the performance of the algorithm.)
Can QR-decomposition do better? Recall: Let A = QR, then Ax = b, becomes
(after premultiplying by Q
T
), Rx = Q
T
b. The problem is decomposed into one matrix
multiplication on a vector, Q
T
b, and the solution of one triangular system, Rx =new
right hand side.
Even when multiplying a matrix on a vector the condition number controls the max-
imum relative error. But cond(Q
T
) = 1, so that the rst part does not aggravate the
problem.
For the second part, the condition number of R can be found as follows: R = Q
T
A,
and R
1
= (Q
T
A)
1
= A
1
Q, then
R
2
= max
x=1
x
T
A
T
QQ
T
Ax = max
x=1
x
T
A
T
Ax = A
2
,
and since norm of the transpose is the same as the norm of the matrix for square
matrices,
R
1

2
= (R
1
)
T
max
x=1
x
T
A
1
QQ
T
(A
1
)
T
x = max
x=1
x
T
A
1
(A
1
)
T
x = A
1

2
.
Therefore
cond(R) = cond(A).
This means that the second part of the problem is no worse than the original problem.
Using QR-decomposition to solve a system is therefore better in terms of round-o
error, especially when the original problem is ill-conditioned.
An example:
Solve Ax = b, with
A =
_

_
4.0 1.0 7.0
3.0 2.0 2.5
7.0 4.0 6.5
_

_, b =
_

_
4.0
2.5
4.5
_

_
Check that the correct solution is
x =
_

_
1
1
1
_

_
Matrix norms and Conditioning 9
The problem is extremely ill-conditioned, because cond(A) = 4.1092 10
17
.
Let us rst use standard LU-decomposition,
U =
_

_
4.00 1.00 7.00
0 2.75 2.75
0 0 9.992 10
16
_

_, L =
_

_
1.0000 0 0
0.7500 1.0000 0
1.7500 2.0909 1.0000
_

_
and cond(U) = 1.0776 10
33
, and cond(L) = 7.9806.
Solving Lc = b gives
c =
_

_
4.0
5.5
4.4409 10
16
_

_
and solving Ux = c gives
x =
_

_
10.809 10
16
7.206 10
16
7.206 10
16
_

_
Let us now use QR-decomposition.
Q =
_

_
0.4650 0.8819 0.0782
0.3487 0.2636 0.8994
0.8137 0.3909 0.4301
_

_, R =
_

_
8.6023 3.4874 9.4161
0 2.9729 2.9729
0 0 0.0000
_

_
and cond(Q) = 1 , and cond(R) = 9.9847 10
16
.
Multiply out Q
T
b,
b

= Q
T
b =
_

_
2.6737
5.9457
2.498 10
16
_

_
and solve Rx = b

,
x =
_

_
0.8649
1.0900
0.9100
_

_
which is much closer to x =
_

_
1
1
1
_

_.
To summarize: ill-conditioned problems should rather be solved by more stable solution
methods, such as QR-decomposition.

Vous aimerez peut-être aussi