Académique Documents
Professionnel Documents
Culture Documents
Contents
Matrix Operations
1.1 Examples of linear equations
1.2 Basic matrix operations . . .
1.3 Transpose and adjoint . . . .
1.4 Elementary row operations .
1.5 Row reduced echelon form .
1.6 Determinant . . . . . . . . .
1.7 Computing inverse of a matrix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
10
13
15
20
23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
31
33
36
40
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
45
48
54
57
61
.
.
.
.
65
65
67
71
73
77
77
78
82
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Orthogonalization
4.1 Inner products . . . . . . . . . . . . . .
4.2 Gram-Schmidt orthogonalization . . . .
4.3 Best approximation . . . . . . . . . . .
4.4 QR factorization and least squares . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
iii
iv
6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
87
. 87
. 91
. 97
. 110
. 117
Short Bibliography
123
Index
124
1
Matrix Operations
1.1
Linear equations are everywhere, starting from mental arithmetic problems to advanced defense applications. We start with an example. The system of linear equations
x1 + x2 = 3
x1 x2 = 1
has a unique solution x1 = 2, x2 = 1. Substituting these values for the unknowns, we
see that the equations are satisfied; but why are there no other solutions? Well, we
have not merely guessed this solution; we have solved it! The details are as follows.
Suppose the pair (x1 , x2 ) is a solution of the system. Subtracting the first from the
second, we get another equation: 2x2 = 2. It implies x2 = 1. Then from either of
the equations, we get x1 = 1. To proceed systematically, we would like to replace the
original system with the following:
x1 + x2 = 3
x2 = 1
Substituting x2 = 1 in the first equation of the new system, we get x1 = 2. In fact,
substituting these values of x1 and x2 , we see that the original equation is satisfied.
Convinced? The only solution of the system is x1 = 2, x2 = 1. What about the
system
x1 + x2 = 3
x1 x2 = 1
2x1 x2 = 3
The first two equations have a unique solution and that satisfies the third. Hence this
system also has a unique solution x1 = 2, x2 = 1. So the extra equation does not put
any constraint on the solutions that we obtained earlier.
But what about our systematic solution method? We aim at eliminating the first
unknown from all but the first equation. We replace the second equation with the one
obtained by second minus the first. We also replace the third by third minus twice
the first. It results in
x1 + x2 =
x2 = 1
3x2 =
Notice that the second and the third equations coincide, hence the conclusion. We
give another twist. Consider the system
x1 + x2 = 3
x1 x2 = 1
2x1 + x2 = 3
The first two equations again have the same solution x1 = 2, x2 = 1. But this time, the
third is not satisfied by these values of the unknowns. So, the system has no solution.
Also, by using our elimination method, we obtain the equations as:
x1 + x2 =
x2 = 1
x2 = 3
The last two equations are not consistent. So, the original system has no solution.
Finally, instead of adding another equation, we drop one. Consider the linear
equation
x1 + x2 = 3
having only one equation. The old solution x1 = 2, x2 = 1 is still a solution of this
system. But x1 = 1, x2 = 2 is also a solution. Moreover, since x1 = 3 x2 , by assigning x2 any real number, we get a corresponding value for x1 , which together give
a solution. Thus, it has infinitely many solutions. Notice that the same conclusion
holds if we have more equations, which are some multiple of the only given equation.
For example,
x1 + x2 = 3
2x1 + 2x2 = 6
3x1 + 3x2 = 9
We see that the number of equations really does not matter, but the number of
independent equations does matter.
Caution: the notion of independent equations is not yet clear; nonetheless we have
some working idea.
It is not also very clear when does a system of equations have a solution, a unique
solution, infinitely many solutions, or even no solutions. And why not a system of
Matrix Operations
equations has more than one but finitely many solutions? How do we use our elimination method for obtaining infinite number of solutions? To answer these questions,
we will introduce matrices. Matrices will help us in representing the problem is a
compact way and also will lead to a definitive answer. We will also study the eigenvalue problem for matrices which come up often in applications. These concerns
will allow us to represent matrices in elegant forms.
1.2
As usual, R denotes the set of all real numbers and C denotes the set of all complex
numbers. We will write F for either R or C. The numbers in F will also be referred
to as scalars.
A matrix is a rectangular array of symbols. For us these symbols are real numbers
or, in general, complex numbers. The individual numbers in the array are called the
entries of the matrix. Each entry of a matrix is a scalar. The number of rows and the
number of columns in any matrix are necessarily positive integers. A matrix with m
rows and n columns is called an m n matrix and it may be written as
a11 a1n
.. ,
A = ...
.
am1 amn
or as A = [ai j ] for short, with ai j F for i = 1, . . . , m] j = 1, . . . , n. The number ai j
which occurs at the entry in ith row and jth column is referred to as the (i, j)th entry
of the matrix [ai j ].
Any matrix with m rows and n columns will be referred as an m n matrix. The
set of all m n matrices with entries from F will be denoted by Fmn .
A row vector of size n is a matrix in F1n . Similarly, a column vector of size n
is a matrix in Fn1 . The vectors in F1n (row vectors) will be written as
[a1 , , an ] or as
[a1 an ]
iff ai j = bi j
for each i {1, . . . , m} and for each j {1, . . . , n}. Thus matrices of different sizes
are unequal.
The zero matrix is a matrix each entry of which is 0. We write 0 for all zero
matrices of all sizes. The size is to be understood from the context.
Let A = [ai j ] Fnn be a square matrix of order n. The entries aii are called as
the diagonal entries of A. The diagonal of A consists of all diagonal entries; the
first entry on the diagonal is a11 , and the last diagonal entry is ann . The entries of A,
which are not on the diagonal, are called as off diagonal entries of A; they are ai j
for i 6= j. The diagonal of the following matrix is shown in bold:
1 2 3
2 3 4 .
3 4 0
Here, 1 is the first diagonal entry, 3 is the second diagonal entry and 5 is the third
and the last diagonal entry.
The super-diagonal of a matrix consists of entries above the diagonal. That is,
the entries ai,i+1 consist the super-diagonal of an n n matrix A = [ai j ]. Of course, i
varies from 1 to n 1 here. The super-diagonal of the following matrix is shown in
bold:
1 2 3
2 3 4 .
3 4 0
If all off-diagonal entries of A are 0, then A is said to be a diagonal matrix. Only
a square matrix can be a diagonal matrix. There is a way to generalize this notion to
any matrix, but we do not require it. Notice that the diagonal entries in a diagonal
matrix need not all be nonzero. For example, the zero matrix of order n is also a
diagonal matrix. The following is a diagonal matrix. We follow the convention of
not showing the 0 entries in a matrix.
1
1 0 0
3 = 0 3 0 .
0
0 0 0
Matrix Operations
3
3
3
3
It is also written as diag (3, 3, 3, 3). If A, B Fmm and A is a scalar matrix, then
AB = BA. Conversely, if A Fmm is such that AB = BA for all B Fmm , then A
must be a scalar matrix. This fact is not obvious, and its proof will require much
more than discussed until now.
A matrix A Fmn is said to be upper triangular iff all entries above the diagonal
are zero. That is, A = [ai j ] is upper triangular when ai j = 0 for i > j. In writing such
a matrix, we simply do not show the zero entries below the diagonal. Similarly, a
matrix is called lower triangular iff all its entries above the diagonal are zero. Both
upper triangular and lower triangular matrices are referred to as triangular matrices.
A diagonal matrix is both upper and lower triangular. The following are examples of
lower triangular matrix L and upper triangular matrix U, both of order 3.
1
1 2 3
L = 2 3 , U = 3 4 .
3 4 5
5
Sum of two matrices of the same size is a matrix whose entries are obtained by
adding the corresponding entries in the given two matrices. That is, if A = [ai j ] and
B = [bi j ] are in Fmn , then
A + B = [ai j + bi j ] Fmn .
For example,
1 2 3
3 1 2
4 3 5
+
=
.
2 3 1
2 1 3
4 4 4
We informally say that matrices are added entry-wise. Matrices of different sizes can
never be added.
It then follows that
A + B = B + A.
Similarly, matrices can be multiplied by a scalar entry-wise. If A = [ai j ] Fmn ,
and F, then
A = [ ai j ] Fmn .
Therefore, a scalar matrix with on the diagonal is written as I. Notice that
A+0 = 0+A = A
for all matrices A Fmn , with an implicit understanding that 0 Fmn . For A =
[ai j ], the matrix A Fmn is taken as one whose (i, j)th entry is ai j . Thus
A = (1)A
and A + (A) = A + A = 0.
=
.
2 3 1
2 1 3
4 8 0
The addition and scalar multiplication as defined above satisfy the following properties:
Let A, B,C Fmn . Let , F.
1. A + B = B + A.
2. (A + B) +C = A + (B +C).
3. A + 0 = 0 + A = A.
Matrix Operations
4. A + (A) = (A) + A = 0.
5. ( A) = ( )A.
6. (A + B) = A + B.
7. ( + )A = A + A.
8. 1 A = A.
Notice that whatever we discuss here for matrices apply to row vectors and column
vectors, in particular. But remember that a row vector cannot be added to a column
vector unless both are of size 1 1, when both become numbers in F.
Another operation that we have on matrices is multiplication of matrices, which
is a bit involved. Let A = [aik ] Fmn and B = [bk j ] Fnr . Then their product AB
is a matrix [ci j ] Fmr , where the entries are
n
ci j = ai1 b1 j + + ain bn j =
aik bk j .
k=1
Notice that the matrix product AB is defined only when the number of columns in A
is equal to the number of rows in B.
A particular case might be helpful. Suppose A is a row vector in F1n and B is a
column vector in Fn1 . Then their product AB F11 ; it is a matrix of size 1 1.
Often we will identify such matrices with numbers. The product now looks like:
b1
.
a1 an .. = a1 b1 + + an bn
bn
This is helpful in visualizing the general case, which looks like
a11
a1k
a1n
c11 c1 j
..
ai1 aik ain b`1 b`j b`r = ci1
cij
..
.
am1
amk
amn
cm1 cm j
bn1 bnj bnr
c1r
cir
cmr
The ith row of A multiplied with the jth column of B gives the (i, j)th entry in AB.
Thus to get AB, you have to multiply all m rows of A with all r columns of B. Besides
writing a linear system in compact form, we will see later why matrix multiplication
is defined this way. For example,
3 5 1
2 2 3 1
22 2
43 42
4 0 2 5 0 7 8 = 26 16 14
6 .
6 3 2
9 4 1 1
9
4 37 28
1
3 6 1
1
3 6 1 2 = 19 , 2 3 6 1 = 6 12 2 .
4
4
12 24 4
It shows clearly that matrix multiplication is not commutative. Commutativity can
break down due to various reasons. First of all when AB is defined, BA may not be
defined. Secondly, even when both AB and BA are defined, they may not be of the
same size; and thirdly, even when they are of the same size, they need not be equal.
For example,
1 2 0 1
4 7
0 1 1 2
2 3
=
but
=
.
2 3 2 3
6 11
2 3 2 3
8 13
It does not mean that AB is never equal to BA. There can be some particular matrices
A and B both in Fnn such that AB = BA.
Observe that if A Fmn , then AIn = A and Im A = A. Look at the columns of In in
this product. They say that
Ae j = the jth column of A
for j = 1, . . . , n.
Here, e j is the standard jth basis vector, the jth column of the identity matrix of order
n; its jth component is 1 and all other components are 0. Also, directly multiplying
A with e j we see that
a1 j
0
a11 a1 j a1n
.. ..
..
. .
.
Ae j = ai1 ai j ain
1 = ai j = jth column of A.
. .
..
.. ..
.
am j
am1 am j amn 0
Thus A can be written in block form as
A = [Ae1 Ae j Aen ].
Unlike numbers, product of two nonzero matrices can be a zero matrix. For example,
1 0 0 0
0 0
=
.
0 0 0 1
0 0
It is easy to verify the following properties of matrix multiplication:
1. If A Fmn , B Fnr and C Frp , then (AB)C = A(BC).
2. If A, B Fmn and C Fnr , then (A + B)C = AB + AC.
3. If A Fmn and B,C Fnr , then A(B +C) = AB + AC.
Matrix Operations
4. If F, A Fmn and B Fnr , then (AB) = ( A)B = A( B).
You can see matrix multiplication in a block form. Suppose A Fmn . Write its
ith row as Ai? Also, write its kth column as A?k . Then we can write A as a row of
columns and also as a column of rows in the following manner:
A1?
B1?
.
B?r = .. .
B?n
A1? B
.
AB = AB?1 AB?r = .. .
Am? B
When writing this way, we ignore the extra brackets [ and ].
Powers of square matrices can be defined inductively by taking
A0 = I
and
An = AAn1 for n N.
Example
1.1
1 1 0
1 n n(n 1)
2n for n N.
Let A = 0 1 2 . Show that An = 0 1
0 0 1
0 0
1
We use induction on n. The basis case n = 1 is obvious. Suppose An is as
given. Now,
1 1 0 1 n n(n 1)
1 n + 1 (n + 1)n
2n = 0
1
2(n + 1) .
An+1 = AAn = 0 1 2 0 1
0 0 1 0 0
1
0
0
1
Notice that taking n = 0 in the matrix An , we see that A0 = I.
A square matrix A of order m is called invertible iff there exists a matrix B of
order m such that
AB = I = BA.
Such a matrix B is called an inverse of A. If C is another inverse of A, then
C = CI = C(AB) = (CA)B = IB = B.
10
Therefore, an inverse of a matrix is unique and is denoted by A1 . We talk of invertibility of square matrices only; and all square matrices are not invertible. For
example, I is invertible but 0 is not. If AB = 0 for square matrices A and B, then
neither A nor B is invertible.
If both A, B Fnn are invertible, then (AB)1 = B1 A1 . Reason:
B1 A1 AB = B1 IB = I = AIA1 = ABB1 A1 .
Invertible matrices play a crucial role in solving linear systems uniquely. We will
come back to the issue later.
1
2
3
2
1
2 3
4 1
0 .
A=
, B=
, C = 2 1 , D = 4 6
1 2
4
0
1
3
1 2 2
2. Let Ei j be the n n matrix whose i jth entry is 1 and all other entries are 0.
Show that each A = [ai j ] Cnn can be expressed as A = ni=1 nj=1 ai j Ei j .
Also show that Ei j Ekm = 0 if j 6= k, and Ei j E jm = Eim .
3. Let A Cmn , B Cnp . Let B1 , . . . , B p be the columns of B. Show that
AB1 , . . . , AB p are the columns of AB.
4. Let A Cmn , B Cnp . Let A1 , . . . , Am be the rows of A. Show that A1 B, . . . , Am B
are the rows of AB.
5. Construct two 3 3 matrices A and B such that AB = 0 but BA 6= 0.
1.3
11
Matrix Operations
A1?
1 2
1 2 3
A=
At = 2 3 .
2 3 1
3 1
It then follows that transpose of the transpose is the original matrix. The following
are some of the properties of this operation of transpose.
1. (At )t = A.
2. (A + B)t = At + Bt .
3. ( A)t = At .
4. (AB)t = Bt At .
5. If A is invertible, then At is invertible, and (At )1 = (A1 )t .
In the above properties, we assume that the operations are allowed, that is, in (2), A
and B must be of the same size. Similarly, in (4), the number of columns in A must
be equal to the number of rows in B; and in (5), A must be a square matrix.
It is easy to see all the above properties, except perhaps the fourth one. For this,
let A Fmn and B Fnr . Now, the ( j, i)th entry in (AB)t is the (i, j)th entry in AB;
and it is given by
ai1 b j1 + + ain b jn .
On the other side, the ( j, i)th entry in Bt At is obtained by multiplying the jth row
of Bt with the ith column of At . This is same as multiplying the entries in the jth
column of B with the corresponding entries in the ith row of A, and then taking the
sum. Thus it is
b j1 ai1 + + b jn ain .
This is the same as computed earlier.
The fifth one follows from the fourth one and the fact that (AB)1 = B1 A1 .
Observe that transpose of a lower triangular matrix is an upper triangular matrix,
and vice versa.
12
Close to the operations of transpose of a matrix is the adjoint. Let A = [ai j ] Fmn .
The adjoint of A is denoted as A , and is defined by
the (i j)th entry of A = the complex conjugate of ( ji)th entry ofA.
We write for the complex conjugate of a scalar . That is, + i = i . Thus,
if ai j R, then ai j = ai j . Thus, when A has only real entries, A = At . Also, the ith
column of At is the column vector (ai1 , , ain )t . For example,
1 2
1 2 3
A=
A = 2 3 .
2 3 1
3 1
1i 2
1+i 2 3
3 .
A=
A = 2
2 3 1i
3 1+i
Similar to the transpose, the adjoint satisfies the following properties:
1. (A ) = A.
2. (A + B) = A + B .
3. ( A) = A .
4. (AB) = B A .
5. If A is invertible, then A is invertible, and (A )1 = (A1 ) .
Here also, in (2), the matrices A and B must be of the same size, and in (4), the
number of columns in A must be equal to the number of rows in B. The adjoint of A
is also called the conjugate transpose of A. Notice that if A Rmn , then A = At .
Occasionally, we will use A for the matrix obtained from A by taking complex
conjugate of each entry. That is, the (i, j)th entry of A is the complex conjugate of
the (i, j)th entry of A. Hence A = (A)t .
1 2 + i 3 i
1 2 3 1
i 1 i
2i
(a) A = 2 1 0 3 (b) A =
1 + 3i
i 3
0 1 3 1
2
0 i
2. Let A Cmn . Suppose AA = Im . Does it follow that A A = In ?
13
Matrix Operations
1.4
Recall that while solving linear equations in two or three variables, you try to eliminate a variable from all but one equation by adding an equation to the other, or
even adding a constant times one equation to another. We do similar operations on
the rows of a matrix. These are achieved by multiplying a given matrix with some
special matrices, called elementary matrices.
Let e1 , . . . , em Fm1 be the standard basis vectors. Let 1 i, j m. The product
ei etj is an m m matrix whose (i, j)th entry is 1 and all other entries are 0. We write
such a matrix as Ei j . For instance, when m = 3, we have
0
0
e2 et3 = 1 0 0 1 = 0
0
0
0
0
0
0
1 = E23 .
0
0 1 0
1
0 0
1
E[1, 2] = 1 0 0 , E1 [2] = 0 1 0 , E2 [3, 1] = 0
0 0 1
0
0 1
2
0
1
0
0
0 .
1
14
We call these operations of pre-multiplying a matrix with an elementary matrix as elementary row operations. Thus there are three kinds of elementary row operations
as listed above. Sometimes, we will refer to them as of Type-1, 2, or 3, respectively.
Also, in computations, we will write
E
A B
to mean that the matrix B has been obtained by an elementary row operation E, that
is, B = EA.
Example 1.3
See the following applications of elementary row operations:
1 1 1
1 1 1
1 1 1
[3,1]
E2 [2,1]
2 2 2 E3
2 2 2 0 0 0
3 3 3
0 0 0
0 0 0
Often we will apply elementary row operations in a sequence. In this way, the
above operations could be shown in one step as E3 [3, 1], E2 [2, 1]. However, remember that the result of application of this sequence of elementary row operations
on a matrix A is E2 [2, 1] E3 [3, 1] A; the products are in reverse order.
Elementary row operations can be undone by other elementary row operations.
The reason: each elementary matrix is invertible. In fact, the inverses of the elementary matrices are as follows:
(E[i, j])1 = E[i, j],
1
2 + i 3 i
1 2 3 1
i
1 i
2i
A = 2 1 0 3 (b) A =
1 + 3i i
3
0 1 3 1
2
0
i
2. Argue in general terms why the following are true:
(a) E[i, j] A is the matrix obtained from A by exchanging its ith and jth rows.
(b) E [i] A is the matrix obtained from A by replacing its ith row with
times the ith row.
(c) E [i, j] A is the matrix obtained from A by replacing its ith row with the
ith row plus times the jth row.
3. Describe A E[i, j], A E [i] and A E [i, j] as to how are they obtained from A.
15
Matrix Operations
1.5
The matrices
are in
0
0
0
0
0
0
,
0
0
echelon form. Whereas
0
0 1 3 1
0 0 0 1
0
0 0 0 0 ,
0
0 0 0 0
0
1 2 0
0 0 1
0 0 0
row reduced
1 3 0
0 0 2
,
0 0 0
0 0 0
0 ,
1
1
0
,
0
0
1
0
0
0
0
1
0
0
0 0 0 0 ,
0
0
0
1
0
0
1
0
0
0
i
1 0 0 0
3
0
0
0
0
1
1
0
16
1
3
A =
1
2
1
R2
0
0
0
0
1 1
0 2
R1
1
0 4
5
9
0 6
0 32 12
1 E1/3 [3]
1 12
2
0 0 3
0 0 6
1
5
5
8
2
7
4
7
2
1
2
3
1
0
0
0
0
E1/2 [2]
1
5
9
1 2 0
1 12 12
4 2 5
6 3 9
0 32 12
1 0 32 0
1
1
R3 0
1 2
1 12 0
2
0 0 0 1
0 0
1
0 0
6
0 0 0 0
1
0
0
0
=B
Here, R1 = E3 [2, 1], E1 [3, 1], E2 [4, 1]; R2 = E1 [2, 1], E4 [3, 2], E6 [4, 2]; and
R3 = E1/2 [1, 3], E1/2 [2, 3], E6 [4, 3]. The matrix B is the RREF of A; and
B = E6 [4, 3] E1/2 [2, 3] E1/2 [1, 3] E1/3 [3] E6 [4, 2] E4 [3, 2] E1 [2, 1]E1/2 [2]
E2 [4, 1] E1 [3, 1] E3 [2, 1] A.
The products are in reverse order.
17
Matrix Operations
18
Matrix Operations
19
1 2 1 1
0 0 1
2 1 1 0
0 2 3 3
20
1.6
Determinant
There are two important quantities associated with a square matrix. One is the trace
and the other is the determinant.
The sum of all diagonal entries of a square matrix is called the trace of the matrix.
That is, if A = [ai j ] Fmm , then
n
akk .
k=1
The second quantity, called the determinant of a square matrix A = [ai j ] Fnn ,
written as det(A), is defined inductively as follows:
If n = 1, then det(A) = a11 .
If n > 1, then det(A) = nj=1 (1)1+ j a1 j det(A1 j )
where the matrix A1 j F(n1)(n1) is obtained from A by deleting the first row and
the jth column of A.
When A = [ai j ] is written showing all its entries, we also write det(A) by replacing
the two big closing brackets [ and ] by two vertical bars | and |. For a 2 2 matrix,
its determinant is seen as follows:
a11 a12
1+1
1+2
a21 a22 = (1) a11 det[a22 ] + (1) a12 det[a21 ] = a11 a22 a12 a21 .
Similarly, for a 3 3 matrix, we need to compute three 2 2 determinants. For
example,
1 2 3
1 2 3
det 2 3 1 = 2 3 1
3 1 2
3 1 2
3 1
+ (1)1+2 2 2 1 + (1)1+3 3 2 3
= (1)1+1 1
1 2
3 2
3 1
3 1
2 2 1 + 3 2 3
= 1
1 2
3 2
3 1
= (3 2 1 1) 2 (2 2 1 3) + 3 (2 1 3 3)
= 5 2 1 + 3 (7) = 18.
21
Matrix Operations
For a lower triangular matrix, we see that
a11
a22
a12 a22
a23 a33
a13 a23 a33
..
=
a
11
.
..
.
an1
an1
ann
= = a11 a22 ann .
ann
In general, the determinant of any triangular matrix (upper or lower), is the product
of its diagonal entries. In particular, the determinant of a diagonal matrix is also
the product of its diagonal entries. Thus, if I is the identity matrix of order n, then
det(I) = 1 and det(I) = (1)n .
Our definition of determinant expands the determinant in the first row. In fact, the
same result may be obtained by expanding it in any other row, or even any other
column. Along with this, some more properties of the determinant are listed in the
following.
Let A Fnn . The sub-matrix of A obtained by deleting the ith row and the jth
column is called the (i, j)th minor of A, and is denoted by Ai j . The (i, j)th co-factor
of A is (1)i+ j det(Ai j ); it is denoted by Ci j (A). Sometimes, when the matrix A is
fixed in a context, we write Ci j (A) as Ci j . The adjugate of A is the n n matrix
obtained by taking transpose of the matrix whose (i, j)th entry is Ci j (A); it is denoted
by adj(A). That is, adj(A) Fnn is the matrix whose (i, j)th entry is the ( j, i)th cofactor C ji (A). Also, we write Ai (x) for the matrix obtained from A by replacing its
ith row by a row vector x of appropriate size.
Let A Fnn . Let i, j, k {1, . . . , n}. Let E[i, j], E [i] and E [i, j] be the elementary
matrices of order n with 1 i 6= j n and 6= 0, a scalar. Then the following
statements are true.
1. det(E[i, j] A) = det(A).
2. det(E [i] A) = det(A).
3. det(E [i, j] A) = det(A).
4. If some row of A is the zero vector, then det(A) = 0.
5. If one row of A is a scalar multiple of another row, then det(A) = 0.
6. For any i {1, . . . , n}, det( Ai (x + y) ) = det( Ai (x) ) + det( Ai (y) ).
7. det(At ) = det(A).
8. If A is a triangular matrix, then det(A) is equal to the product of the diagonal
entries of A.
9. det(AB) = det(A) det(B) for any matrix B Fnn .
22
1 0 0
1
1 R1 0 1 0
=
0 1 1
1
0 1 1
1
1
1
2 R2 0
=
0
2
0
2
0 0
1 0
0 1
0 1
1
1
2 R3 0
=
0
4
0
4
0
1
0
0
0
0
1
0
1
2
= 8.
4
8
Here, R1 = E1 [2, 1]; E1 [3, 1]; E1 [4, 1], R2 = E1 [3, 2]; E1 [4, 2], and R3 = E1 [4, 3].
Finally, the upper triangular matrix has the required determinant.
Example 1.7
See that the following is true, for verifying Property (6) as mentioned above:
3 1 2 4 1 0 0 1 2 1 2 3
1 1 0 1 1 1 0 1 1 1 0 1
+
=
1 1 1 1 1 1 1 1 1 1 1 1 .
1 1 1 1 1 1 1 1 1 1 1 1
23
Matrix Operations
5. Determine
A1
1 0 0
1 1 0
using the adj(A), where A =
1 1 1
1 1 1
1
.
..
1
1
1
.
1
1
where the anti-diagonal entries are all 1 and all other entries are 0.
1.7
The adjugate property of the determinant provides a way to compute the inverse
of a matrix, provided it is invertible. However, it is very inefficient. We may use
elementary row operations to compute the inverse. Our computation of the inverse
bases on the following fact.
Theorem 1.2
A square matrix is invertible iff it is a product of elementary matrices.
Proof Each elementary matrix is invertible since E[i, j] is its own inverse,
E1/ [i] is the inverse of E [i], and E [i, j] is the inverse of E [i, j]. Therefore,
any product of elementary matrices is invertible.
Conversely, suppose that A is an invertible matrix. Let EA1 be the RREF
of A1 . If EA1 has a zero row, then EA1 A also has a zero row. That is, E
has a zero row. But E is a product of elementary matrices, which is invertible;
it does not have a zero row. Therefore, EA1 does not have a zero row. Then
each row in the square matrix EA1 has a pivot. But the only square matrix in
RREF having a pivot at each row is the identity matrix. Therefore, EA1 = I.
That is, A = E, a product of elementary matrices.
The computation of inverse will be easier if we write the matrix A and the identity
matrix I side by side and apply the elementary operations on both of them simultaneously. For this purpose, we introduce the notion of an augmented matrix.
If A Fmn and B Fmk , then the matrix [A|B] Fm(n+k) obtained from A and
B by writing first all the columns of A and then the columns of B, in that order, is
called an augmented matrix.
24
The vertical bar shows the separation of columns of A and of B, though, conceptually unnecessary.
For computing the inverse of a matrix, start with the augmented matrix [A|I]. Apply elementary row operations for reducing A to its row reduced echelon form, while
simultaneously applying the same operations on the entries of I. This means we premultiply the matrix [A|I] with a product B of elementary matrices. In block form, our
result is the augmented matrix [BA|BI]. If BA = I, then BI = A1 . That is, the part
that contained I originally will give the matrix A1 after the elementary row operations have been applied. If after row reduction, it turns out that B 6= I, then A is not
invertible; this information is a bonus.
Example 1.8
For illustration, consider the following square matrices:
1 1 2 0
1 0 0 2
A=
2 1 1 2 ,
1 2 4 2
1 1 2 0
1 0 0 2
B=
2 1 1 2 .
0 2 0 2
We want to find the inverses of the matrices, if at all they are invertible.
Augment A with an identity matrix to get
1 1 2 0
1 0 0 2
2 1 1 2
1 2 4 2
1
0
0
0
0
1
0
0
0
0
1
0
0
0
.
0
1
1 1 2 0
1
0 1 2 2
1
0 3 5 2 2
0 1 2 2 1
0
1
0
0
0
0
1
0
0
0
.
0
1
1 1 2 0
1 0 0 0
0 1 2 2 1 1 0 0
0 3 5 2 2 0 1 0 .
0 1 2 2 1 0 0 1
Use E1 [1, 2], E3 [3, 2], E1 [4, 2] to zero-out all non-pivot entries in the pivotal
25
Matrix Operations
column to 0:
1
0 0 2
0 1
0 1 2 2 1 1
0 0 1
4
1 3
0 0 0 0 2 1
0
.
0
1
0
0
1
0
Since a zero row has appeared in the A portion of the augmented matrix,
we conclude that A is not invertible. You see that the second portion of the
augmented matrix has no meaning now. However, it records the elementary
row operations which were carried out in the reduction process. Verify that
this matrix is equal to
E1 [4, 2] E3 [3, 2] E1 [1, 2] E1 [2] E1 [4, 1] E2 [3, 1] E1 [2, 1]
and that the first portion is equal to this matrix times A.
For B, we proceed similarly. The augmented matrix [B|I] with the first pivot
looks like:
1 1 2 0
1 0 0 2
2 1 1 2
0 2 0 2
1
0
0
0
0
1
0
0
0
0
1
0
0
0
.
0
1
1 1 2 0
1 0
0 1 2 2
1 1
0 3 5 2 2 0
0 2 0 2
0 0
0
0
1
0
0
0
.
0
1
Next, the pivot is 1 in (2, 2) position. Use E1 [2] to get the pivot as 1.
1 1 2 0
1 0 0 0
0 1 2 2 1 1 0 0
0 3 5 2 2 0 1 0 .
0 0 0 1
0 2 0 2
1
0
0
0
0 0 2 0 1
1 2 2 1 1
0 1 4 1 3
0 4 2 2 2
0
0
1
0
0
0
.
0
1
1
0
0
0
0
1
0
0
0 2 0 1
0 6 1 5
1
4 1 3
0 14 2 10
0
2
1
4
0
0
0
1
26
Next pivot is 14 in (4, 4) position. Use [4; 1/14] to get the pivot as 1:
1
0
0
0
0
1
0
0
0 2
0 1
0
0
0 6
1
5
2
0
1
4
1
3
1
0
0 1 1/7 5/7 2/7 1/14
Use E2 [1, 4]; E6 [2, 4]; E4 [3, 4] to zero-out the entries in the pivotal column:
1
1
Thus B1 =
73
1
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
2/7
1/7
3/7
1/7
3 4 1
5 2 3
. Verify that B1 B = BB1 = I.
1 1 2
5 2 21
Observe that if a matrix is not invertible, then our algorithm for reduction to RREF
produces a pivot in the I portion of the augmented matrix.
3 1 1 2
2 1 2
1 4 6
1 2 0 1
0 1 0
2. Let A = 0 0 1 , where b, c C. Show that A1 = bI + cA.
1 b c
3. Show that if a matrix A is upper triangular and invertible, then so is A1 .
4. Show that if a matrix A is lower triangular and invertible, then so is A1 .
5. Show that every nn matrix can be written as a sum of two invertible matrices.
6. Show that every n n invertible matrix can be written as a sum of two noninvertible matrices.
2
Rank and Linear Equations
2.1
Linear independence
In the reduction to RREF, why some rows are reduced to zero rows and why the
others are not reduced to zero rows? To see what is going on in such a reduction we
need to introduce some more concepts. If A is an m n matrix with entries from F,
then its rows are vectors in F1n and its columns are vectors in Fm1 . Recall that to
talk about row and column vectors at a time, we write Fn for both F1n and Fn1 .
The elements of Fn are written as (a1 , . . . , an ). That is, such an n-tuple of numbers
from F is interpreted as either a row vector with n components or a column vector
with n components, as the case demands.
Let v1 , . . . , vm be vectors in Fn . Let 1 , . . . , m F be scalars. Recall that the
vector
1 v1 + m vm
is called a linear combination of v1 , . . . , vm .
For example, in F12 , one linear combination of v1 = [1, 1] and v2 = [1, 1] is as
follows:
2[1, 1] + 1[1, 1].
This linear combination evaluates to [3, 1]. Thus [3, 1] is a linear combination of
v1 , v2 .
Is [4, 2] a linear combination of v1 and v2 ? Yes, since
[4, 1] = 1[1, 1] + 3[1, 1].
In fact, every vector in F12 is a linear combination of v1 and v2 . Reason:
[a, b] =
a+b
2
[1, 1] + ab
2 [1, 1].
However, every vector in F12 is not a linear combination of [1, 1] and [2, 2]. Reason? Any linear combination of these two vectors is a multiple of [1, 1]. Then [1, 0]
is not a linear combination of these two vectors.
Now, you see that a zero row in a row echelon form matrix is a linear combination
of earlier rows. Conversely, if a row is a linear combination of earlier rows in any
matrix, then in the RREF of the matrix, this row is reduced to a zero row. However,
during the reduction process, there can be row exchanges. In that case, instead of
27
28
talking about a linear combination of earlier rows, we may think of a linear combination of all other rows.
The vectors v1 , . . . , vm in Fn are called linearly dependent iff at least one of them
is a linear combination of others. The vectors are called linearly independent iff
none of them is a linear combination of others.
For example, [1, 1], [1, 1], [4, 1] are linearly dependent vectors whereas [1, 1],
[1, 1] are linearly independent vectors in F12 .
If 1 = = m = 0, then, the linear combination 1 v1 + + m vm evaluates
to 0. That is, the zero vector can always be written as a trivial linear combination.
Suppose the vectors v1 , . . . , vm are linearly dependent. Then one of them, say, vi is
a linear combination of others. That is,
vi = 1 v1 + + i1 vi1 + i+1 vi+1 + + m vm .
Then
1 v1 + + i1 vi1 + (1)vi + i+1 vi+1 + + m vm = 0.
Here, we see that a linear combination becomes zero, where at least one of the coefficients, that is, the ith one is nonzero.
Conversely, suppose that we have scalars 1 , . . . , m not all zero such that
1 v1 + + m vm = 0.
Suppose that the kth scalar k is nonzero. Then
vk =
1
1 v1 + + k1 vk1 + k+1 vk+1 + + m vm .
k
29
If it is not possible, then from its proof you must be able to find a way of expressing
one of the vectors as a linear combination of the others, showing that the vectors are
linearly dependent.
Example 2.1
Are the vectors [1, 1, 1], [2, 1, 1], [3, 1, 0] linearly independent?
We start with an arbitrary linear combination and equate it to the zero vector.
Solve the resulting linear equations to determine whether all the coefficients
are necessarily 0 or not.
So, let
a[1, 1, 1] + b[2, 1, 1] + c[3, 1, 0] = [0, 0, 0].
Comparing the components, we have
a + 2b + 3c = 0, a + b + c = 0, a + b = 0.
The last two equations imply that c = 0. Substituting in the first, we see that
a + 2b = 0.
This and the equation a + b = 0 give b = 0. Then it follows that a = 0.
We conclude that the given vectors are linearly independent.
Example 2.2
Are the vectors [1, 1, 1], [2, 1, 1], [3, 2, 2] linearly independent?
Clearly, the third one is the sum of the first two. So, the given vectors are
linearly dependent.
To illustrate our method, we start with an arbitrary linear combination and
equate it to the zero vector. We then solve the resulting linear equations to
determine whether all the coefficients are necessarily 0 or not.
So, as earlier, let
a[1, 1, 1] + b[2, 1, 1] + c[3, 2, 2] = [0, 0, 0].
Comparing the components, we have
a + 2b + 3c = 0, a + b + 2c = 0, a + b + 2c = 0.
The last equation is redundant. From the first and the second, we have
b + c = 0.
We may choose b = 1, c = 1 to satisfy this equation. Then from the second
equation, we have a = 1. Our starting equation says that the third vector is
the sum of the first two.
30
Be careful with the direction of implication here. Your work-out must be in the
form
1 v1 + + m vm = 0 1 = = m = 0.
And that would prove linear independence.
To see how linear independence is
equations:
x1
2x1
4x1
Here, we find that the third equation is redundant, since 2 times the first plus the
second gives the third. That is, the third one linearly depends on the first two. (You
can of course choose any other equation here as linearly depending on other two, but
that is not important.) Now, take the row vectors of coefficients of the unknowns as
in the following:
v1 = [1, 2, 3, 2],
v2 = [2, 1, 2, 3],
v3 = [4, 3, 4, 7].
We see that v3 = 2v1 + v2 , as it should be. We see that the vectors v1 , v2 , v3 are
linearly dependent. But the vectors v1 , v2 are linearly independent. Thus, solving the
given system of linear equations is the same thing as solving the system with only
first two equations. For solving linear systems, it is of primary importance to find out
which equations linearly depend on others. Once determined, such equations can be
thrown away, and the rest can be solved.
2.2
31
We may use elementary row operations to check linear independence. Given m row
vectors v1 , . . . , vm F1n , we form a matrix A with its ith row as vi . Then using elementary row operations, we bring it to its RREF. Observe that exchanging vi with v j
in the list of vectors does not change linear independence of the vectors. Multiplying
a nonzero scalar with vi does not affect linear independence. Also, replacing vi with
vi + v j does not alter linear independence.
To see the last one, suppose v1 , . . . , vm are linearly independent. Let wi = vi + v j ,
i 6= j. To show the linear independence of v1 , . . . , vi1 , wi , vi+1 , . . . , vn , suppose that
1 v1 + + i1 vi1 + i wi + i+1 vi+1 + + m vm = 0.
Then
1 v1 + + i1 vi1 + i (vi + v j ) + i+1 vi+1 + + m vm = 0.
Simplifying, we have
1 v1 + + i vi + + ( j + i )v j + + m vm = 0.
Using linear independence of v1 , . . . , vm , we obtain
1 = i = j + bi = = m = 0.
This gives j = i = 0 and all other s are zero. Thus v1 , . . . , wi , . . . , vm are
linearly independent. Similarly, the converse also holds.
Thus, we take these vectors as the rows of a matrix and apply our reduction to
RREF algorithm. From the RREF, we know that all rows where a pivot occurs are
linearly independent. If you want to determine exactly which vectors among these
are linearly independent, you must keep track of the row exchanges. A summary of
the discussion in terms of a matrix is as follows.
Theorem 2.2
Let v1 , . . . , vm F1n . Let A Fmn be the matrix whose jth row is v j . Then
v1 , . . . , vm are linearly independent iff the RREF of A has no zero row.
Example 2.3
To determine whether the vectors [1, 1, 0, 1], [0, 1, 1, 1] and [1, 3, 2, 1] are
linearly independent or not, we proceed as follows.
32
1
0
1
1
1
3
0
1
1 1
2 1
E1 [3,1]
R2
1
0
0
0
0
0
1
1
R1
1 1 0
2 2
0
0 1 0
1 1
0
0 0 1
1
1
2
0 1
2
1 1 1
0 0 4
Here, R1 = E1 [1, 2], E2 [3, 2] and R2 = E1/4 [3], E2 [1, 3], E1 [2, 3].
The last matrix is in RREF in which each row has a pivot. Therefore, the
original vectors are linearly independent.
Though we have formulated Theorem 2.2 for row vectors. It is applicable for
column vectors as well. All that we do is start with the transposes of the column
vectors and apply the theorem.
Example 2.4
Are the vectors [1, 1, 0, 1]t , [0, 1, 1, 1]t and [2, 1, 3, 5]t linearly independent?
The vectors are in F41 . These are linearly independent iff their transposes
are. Forming a matrix with the transpose of the given vectors as rows, and
reducing it to its RREF, we see that
1
0
1
1
0 1 2
1
1
0
1 E [3,1] 1
R1
2
0
1
1 1 0
1
1 1 0
1
1 1
2 1 3
5
0 3 3
3
0
0
0
0
Here, R1 = E1 [1, 2], E3 [3, 2]. Since a zero row has appeared, the original vectors are linearly dependent. Also, notice that no row exchanges were carried
out in the reduction process. Therefore, the third vector is a linear combination of the first two vectors, which are linearly independent.
Due to our observations on RREF, we may follow another alternative. Instead of
taking transposes of the given column vectors, we proceed with the vectors themselves; thus forming a matrix with the columns as given vectors. In the RREF of the
matrix, if we see that each column is a pivotal column, then the vectors are linearly
independent. Moreover, if a column remains non-pivotal, then such a non-pivotal
column is a linear combination of the pivotal columns that preceed it. The components in the non-pivotal column give the coefficients of such a linear combination.
We solve Example 2.4 once more for illustrating this point.
Example 2.5
To determine whether the v1 = [1, 1, 0, 1]t , v2 = [0, 1, 1, 1]t and v3 = [2, 1, 3, 5]t
are linearly independent or not, we form a matrix [v1 v2 v3 ] and then reduce
33
1 0 2
1 1 1 R1
0 1 3
1 1 5
1
1
0 2
0
R2
0
1 3
0
0
1 3
0 1 3
0
0
2
1 3
.
0
0
0
0
Here, R1 = E1 [2, 1], E1 [4, 1] and R2 = E1 [3, 2] E1 [4, 2]. The third column is
non-pivotal. Thus, the corresponding vector v3 is a linear combination of the
vectors that correspond to the pivotal columns, i.e., of v1 and v2 . Moreover,
the components in the non-pivotal column say that v3 = 2 v1 3 v2 . You can
easily verify it.
2.3
Rank of a matrix
Suppose B is the RREF of a matrix A. Keeping track of the row exchanges, suppose
that the jth row of A has become a zero row in B. In that case, the jth row of A
is a linear combination of other rows. Conversely, if the jth row of A is a linear
combination of other rows, then in B, this row becomes a zero row. If B has r number
of pivots, then A has r number of linearly independent rows and other rows are linear
combinations of these r rows.
We define the rank of A as the number of pivots in the RREF of A, and denote it
by rank(A).
For an mn matrix A, the number nrank(A) is called the nullity of the matrix A.
Nullity of A is the number of un-pivoted columns in the RREF of A. We will connect
the nullity of a matrix to the solutions of the homogeneous linear system Ax = 0 later.
34
Example
2.6
1 1 1 2 1
1 2 1 1 1
Let A =
3 5 3 4 3 . We compute its RREF as follows:
1 0 1 3 1
1
1
3
1
1 1 2 1
R1
2 1 1 1
5 3 4 3
0 1 3 1
1
0
0
0
1
1
2
1
1 2 1
R2
0 1 0
0 2 0
0 1 0
1
0
0
0
0
1
0
0
1 3 1
0 1 0
0 0 0
0 0 0
Here, R1 is E1 [2, 1], E3 [3, 1], E1 [4, 1] and R2 is E1 [1, 2], E2 [3, 2], E1 [4, 2].
Thus rank(A) = 2.
Example 2.7
Determine the rank of the matrix A in Example 1.5, and point out which
rows of A are linear combinations of other rows, and which columns are linear
combinations of other columns, by reducing A to its RREF.
Form Example 1.5, we have seen that
1 0 32 0
1 1 2 0
3 5 7 1 E
0 1 12 0
A=
.
1 5 4 5
0 0 0 1
2 8 7 9
0 0 0 0
The row operation E is given by
E = E3 [2, 1], E1 [3, 1], E2 [4, 1] E1 [2, 1], E4 [3, 2], E6 [4, 2],
E1/2 [1, 3], E1/2 [2, 3], E6 [4, 3].
We see that rank(A) = 3, the number of pivots in the RREF of A. In this
reduction, no row exchanges have been used. Thus the first three rows of A
are the required rows. The fourth row is a linear combination of these three
rows. In fact,
row(4) = 3 row(1) + (1) row(2) + 2 row(3).
It also says that the third column is a linear combination of first and second.
Notice that the coefficients in such a linear combination are given by the
entries of the third column in the RREF. We can easily check that
col(3) = 23 col(1) + 21 col(2).
Let A Fmn . If rank(A) = r, then there are r number of linearly independent rows
in the RREF of A and other rows are linear combinations of these r rows. In A, the
35
corresponding r rows (with row exchanges taken care) are linearly independent and
the other rows are linear combinations of these r rows. Therefore, the maximum
number of linearly independent rows in A is r.
Looking at the columns in the RREF of A, we see that if rank(A) = r, then the
pivoted columns in the RREF are the standard basis vectors e1 , . . . , er . Thus, the unpivoted columns in the RREF are linear combinations of the pivoted ones. It shows
that in the RREF, there are r number of linearly independent columns and all other
columns are linear combinations of these r columns. Is this true in A also?
The RREF of A can be expressed as EA, where the matrix E is a product of elementary matrices. E is invertible. If the jth column of A is v j , then the jth column
of EA is Ev j . Without loss of generality, suppose that v1 , . . . , vr are linearly independent and each of vr+1 , . . . , vn is a linear combination of v1 , . . . , vr . We claim that the
columns of EA also has this property. To see this, suppose that
1 Ev1 + + r Evr = 0.
As, E is invertible, multiplying this equation with E 1 , we have
1 v1 + + r vr = 0.
Then the linear independence of v1 , . . . , vr implies that 1 = = r = 0. That is,
the vectors Ev1 , . . . , Evr are linearly independent. Next, for j > r, if
v j = 1 v1 + + r vr ,
then it follows that
Ev j = 1 Ev1 + + r Evr .
That is each of the vectors Evr+1 , . . . , Evn is a linear combination of Ev1 , . . . , Evr . It
proves our claim.
It then follows that if k is the maximum number of linearly independent columns
in A, then k is also the maximum number of columns in EA. Conversely, if EA has
maximum of k number of linearly independent columns, then A = E 1 (EA) will also
have maximum of k number of linearly independent columns.
Therefore, we conclude that the maximum number of linearly independent columns
in A is same as the maximum number of linearly independent columns in the RREF
of A, which is equal to rank(A).
The maximum number of linearly independent rows of a matrix is called the row
rank of the matrix. Similarly, the column rank of a matrix is the maximum number
of linearly independent columns. Using this terminology, we note down the above
discussion as our next theorem.
Theorem 2.3
Let A Fmn . Then rank(A) is equal to the row rank of A, which is also equal
to the column rank A.
36
It then follows that rank(A ) = rank(At ) = rank(A). Moreover, our discussion also
reveals that if P is any invertible matrix of order m, then rank(PA) = rank(A). Now,
if Q is an invertible matrix of order n, then
rank(AQ) = rank(Qt At ) = rank(At ) = rank(A).
We summarize the result as follows.
Theorem 2.4
Let A Fmn . Let P Fmm and Q Fnn be invertible matrices. Then
rank(PAQ) = rank(PA) = rank(AQ) = rank(A).
In general, if P Fkm and Q Fns , then rank(PA) rank(A) and rank(AQ)
rank(A). These are left as exercises for you.
Further, it follows that a matrix A Fnn is invertible iff rank(A) = n.
1 2 1 1 1
3 5 3 4 3
1. Determine rank r of
1 1 1 2 1 . Find out the r linearly independent
5 8 5 7 5
rows, and also the r linearly independent columns of the matrix. And then
express 4 r rows as linear combinations of those r rows, and 5 r columns
as linear combinations of those r columns.
2. Let A Fnn . Prove that A is invertible iff rank(A) = n iff det(A) 6= 0.
3. Let A Fmn . Let P Fmm . Is it true that the RREF of PA is same as the
RREF of A?
4. Let A Fmn and B Fnk . Prove that rank(AB) min{rank(A), rank(B)}.
2.4
We can now use our knowledge about matrices to settle some issues regarding solvability of linear systems. A linear system with m equations in n unknowns looks
37
38
(7) Notice that for a matrix A Fnn , it is invertible iff rank(A) = n iff
det(A) 6= 0. Then the statement follows from (5).
A system of linear equations Ax = b is said to be consistent iff rank([A|b]) = rank(A).
39
Theorem 2.5(1) says that only consistent systems have solutions. Conversely, if a
system has a solution, then the system must be consistent. The statement in Theorem 2.5(4) is sometimes informally stated as follows:
A consistent system has n rank(A) number of linearly independent solutions.
The unknowns that correspond to the pivots are called the basic variables, and the
unknowns which correspond to the un-pivoted ones are called the free variables.
Thus there are rank(A) number of basis variables and n rank(A) of free variables,
which are assigned arbitrary values. Therefore, the number of free variables is equal
to the nullity of A.
To summarize, suppose that a linear homogeneous system Ax = 0 has m number
of equations and n number of unknowns. If m < n, then the system has a nonzero
solution; and hence, infinitely many solutions. If m > n, then the number of solutions
depends on the rank of the system matrix. In this case, if rank(A) < n, then Ax = 0
has infinitely many solutions; and if rank(A) = n, then it has a unique solution, which
is the trivial solution.
For non-homogeneous linear systems the same conclusion is drawn provided that
the system is consistent. To say it explicitly, let the linear system Ax = b has m
equations, n unknowns with rank(A) = r. Then it has no solution iff rank([A|b]) > r;
it has a unique solution iff rank([A|b]) = r = n; and it has infinitely many solutions
iff rank([A|b]) = r < n. Notice that the number m of equations plays no role; but the
number r, which is the number of linearly independent equations is important here.
40
2.5
Gauss-Jordan elimination
2/5 3/5
1/5
7/5
5 2 3 1 7
1
R1
1 3 2 2 11 0 17/5 13/5 11/5 48/5
3 8 7 5 8
0 34/5 26/5 22/5 19/5
1
0 5/17 1/17 43/17
R2
41
Example 2.9
We change the last equation in the previous example to make it consistent.
We consider the new system
5x1 + 2x2 3x3 + x4 = 7
x1 3x2 + 2x3 2x4 = 11
3x1 + 8x2 7x3 + 5x4 = 15
The reduction to echelon form is as follows:
2/5 3/5
1/5
7/5
7
5 2 3 1
1
R1
1 3 2 2 11 0 17/5 13/5 11/5 48/5
3 8 7 5 15
0 34/5 26/5 22/5 96/5
1
0 5/17 1/17 43/17
R2
0 1 13/17 11/17 48/17
0 0
0
0
0
with R1 = E1/5 [1], E1 [2, 1], E3 [3, 1] and R2 = E5/17 [2], E2/5 [1, 2], E34/5 [3, 2]
as the row operations. This expresses the fact that the third equation is
redundant. Now, solving the new system in row reduced echelon form is
easier. Writing as linear equations, we have
1 x1
5
1
17 x3 17 x4
1 x2 13
17 x3 +
11
17 x4
43
17
= 48
17
The unknowns corresponding to the pivots, that is, x1 and x2 are the basic
variables and the other unknowns, x3 , x4 are the free variables. The number
of basic variables is equal to the number of pivots, which is the rank of the
system matrix. By assigning the free variables xi to any arbitrary values, say,
i , the basic variables can be evaluated in terms of i .
We assign x3 to and x4 to . Then we have
x1 =
43
17
5
1
+ 17
+ 17
,
13
11
x2 = 48
17 + 17 17 .
1
17 + 17 + 17
48 13
+ 11
17
y := 17 17
for , F
43/17
5/17
1/17
48/17
13/17
11/17
y=
0 + 1 + 0 .
0
0
1
42
Here, the first vector is a particular solution of the original system. The two
vectors
1/17
5/17
11/17
13/17
1 and 0
1
0
are linearly independent solutions of the corresponding homogeneous system.
There should be exactly two such linearly independent solutions of the homogeneous system, because the nullity of the system matrix is the number of
unknowns minus its rank, which is 4 2 = 2.
It is easy to devise a mechanical way to write out the solution set of a consistent
linear system using Gauss-Jordan elimination. One has to add necessary number of
zero rows or delete some so that the RREF is a square matrix. Then identifying the
free and basis variables, one can just write out the solution set by reversing the signs
of entries on the un-pivoted columns, and changing one of the 0s to a 1. This is left
as an exercise for you.
There are variations of Gauss-Jordan elimination. Instead of reducing the augmented matrix to its row reduced echelon form, if we reduce it to another intermediary form, called the row echelon form, then we obtain the method of Gaussian elimination. In the row echelon form, we do not require the entries above a pivot to be 0;
also the pivots need not be equal to 1. In that case, we will require back-substitution
in solving a linear system. To illustrate this process, we redo Example 2.9 starting
with the augmented matrix, as follows:
5
5
2 3 1
7
R1
1 3 2 2 11
0
3 8 7 5 15
0
5
E2 [3,2]
0
0
48/5
96/5
2 3
1 7
1
11/5
22/5
Here, R1 = E1/5 [2, 1], E3/5 [3, 1]. The augmented matrix is now in row echelon
form. It is a consistent system, since no entry in the b portion is a pivot. The pivots
say that x1 , x2 are basic variables and x3 , x4 are free variables. We assign x3 to and
x4 to . Writing in equations form, we have
13
11
5 48
x1 = 7 2 x2 + 3 , x2 = 17
5 5 + 5 .
First we determine x2 and then back-substitute. We obtain
x1 =
43
17
5
1
+ 17
+ 17
,
48
11
x2 = 17
+ 13
17 17 ,
x3 = ,
x4 = .
As you see we end up with the same set of solutions as in Gauss-Jordan elimination.
43
3
Subspace and Dimension
3.1
Recall that F stands for either R or C; and Fn denotes either F1n or Fn1 . Also,
recall that a typical row vector in F1n is written as [a1 , . . . , an ] and a column vector in Fn1 is written as [a1 , . . . , an ]t . Both the row and column vectors are written
uniformly as (a1 , . . . , an ); these constitute the vectors in Fn . In Fn , we have a special vector, called the zero vector, which we denote by 0 := (0, . . . , 0). And if
x = (a1 , . . . , an ) Fn , then its additive inverse x := (a1 , . . . , an ).
The operations of addition and scalar multiplication in Fn enjoy the following
properties:
For u, v, w Fn and , F,
1. u + v = v + u.
2. (u + v) + w = u + (v + w).
3. u + 0 = 0 + u = u.
4. u + (u) = u + u = 0.
5. ( u) = ( )u.
6. (u + v) = u + v.
7. ( + )u = u + u.
8. 1 u = u.
9. (1)u = u.
10. If u + v = u + w, then v = w.
11. If u = 0, then = 0 or u = 0.
It so happens that the last three properties follow from the earlier ones. Any
nonempty set where the two operations of addition and scalar multiplication are defined, and which enjoy the first eight properties above, is called a vector space. In
this sense, both F1n and Fn1 are vector spaces. In such a general setting when
45
46
a nonempty subset of a vector space is closed under both the operations is called a
subspace. We may not need these general notions. However, we define a subspace
of our two specific vector spaces.
Let V be a nonempty subset of Fn . We say that V is a subspace of Fn iff the
following properties are satisfied:
1. For each u, v V, u + v V.
2. For each F and for each v V, v V.
Example 3.1
1. {0} and Fn are subspaces of Fn .
2. Let V = {(a, b, c) : 2a + 3b + 5c = 0, a, b, c F}. Clearly, (0, 0, 0) V. So,
V 6= . If (a1 , b1 , c1 ), (a2 , b2 , c2 ) V, then
2a1 + 3b1 + 5c1 = 0,
47
is a subspace of F2 . This is the set of all linear combinations of the vector (1, 1).
Recall that a linear combination of vectors v1 , . . . , vm is any vector in the form
1 v1 + + m vm
for scalars 1 , . . . , m . We give a name to the set of all linear combinations of given
vectors.
If S is any nonempty subset of Fn , we define span (S) as the set of all linear combinations of finite number of vectors from S. We read span (S) as span of S. That
is,
span (S) = { 1 v1 + m vm : 1 , . . . , m F for some m N}.
If S = , then we define span () = {0}.
When S = {v1 , . . . , vm }, we also write span (S) as span {v1 , . . . , vm }. We see that
span {v1 , . . . , vm } = { 1 v1 + m vm : 1 , . . . , m F}.
For instance, v1 + v2 + + vm and v1 + 5v2 are in span {v1 , . . . , vm }. In the first case,
each i is equal to 1, whereas in the second case, 1 = 1, 2 = 5 and all other s
are 0.
Notice that S span (S) since each u S is a linear combination of itself, with the
coefficient as 1. Similarly, 0 span (S) since 0 = 0 u for any u S. However,
this argument is valid provided, there exists such a u in S. Otherwise, by definition,
span () = {0}. Therefore, 0 span (S) for every subset S of Fn .
Suppose S Fn . If u, v span (S), then both of them are linear combinations of
vectors from S. Their sum u+v is also a linear combination of vectors from S. Hence,
u + v span (S). Similarly, u span (S). Therefore, span (S) is a subspace of Fn .
Moreover, span (span (S)) = span (S). In general span of any subspace is the subspace
itself.
Let V be a subspace of Fn and let S V. We say that S is a spanning subset of V,
or that S spans V iff V = span (S). In this case, each vector in V can be expressed as
a linear combination of vectors from S. We also informally say that the vectors in S
span V whenever span (S) = V.
We just saw that
1
2
1
1
21
span
,
$ F , span
,
= F21 .
1
2
1
1
Notice that the vectors [1, 1]t , [1, 1]t , [4, 1]t also span F21 . In fact, since the
first two vectors span F21 , any list of vectors containing these two will also span
F21 .
Similarly, the vectors e1 , . . . , en in Fn1 span Fn1 , where ei is the column vector
in Fn1 whose ith component is 1 and all other components are 0.
In this terminology, vectors v1 , . . . , vn are linearly dependent iff one of the vectors
in this list is in the span of the rest. If no vector in the list is in the span of the rest,
then the vectors are linearly independent.
48
3.2
We bring in some flexibility in using the phrases linearly dependent or independent. When the vectors v1 , . . . , vm in Fn are linearly independent, we say that the list
v1 , . . . , vm is linearly independent, and also the set {v1 , . . . , vm } is linearly independent. Similarly, the vectors v1 , . . . , vm are linearly dependent iff the list v1 , . . . , vm is
linearly dependent. If there are no repetitions of vectors, in this list, then it is also
equivalent to asserting that the set {v1 , . . . , vm } is linearly dependent.
Let V be a subspace of Fn . Let S be a subset of V. The subset S may or may not
span V. If it spans V, it is possible that it has a proper subset which also spans V. For
instance,
S = {[1, 2, 3], [1, 0, 1], [2, 4, 2], [0, 2, 2]}
spans the subspace V = {[a, b, c] : a + b + c = 0} of F13 . Also, the subset
{[1, 2, 3], [1, 0, 1], [2, 4, 2]}
spans the same subspace V. Notice that S is linearly dependent. Reason:
[0, 2, 2] = (1)[1, 2, 3] + [1, 0, 1].
49
On the other hand the linearly independent set {[1, 2, 3]} does not span V. For
instance,
[1, 0, 1] 6= [1, 2, 3], for any F.
That is, a spanning subset may be superfluous and a linearly independent set may
be deficient. A linearly independent set which also spans a subspace may be just
adequate in spanning the subspace.
Let V be a subspace of Fn . Let B be a list of vectors from V. We say that B is a
basis of V iff B is linearly independent and B spans V. We write a basis using the set
notation, though it is a list of vectors. However, we remember that a basis is a list, an
ordered set, where the ordering of the vectors is as they are written. For instance, if
{v1 , v3 , v2 } is a basis for a subspace U, then we consider v1 as the first basis vector,
v3 as the second basis vector, and v2 as he third basis vector.
Example 3.2
1. It is easy to check that B = {e1 , . . . , en } is a basis of Fn1 . Similarly,
E = {et1 , . . . , etn } is a basis of F1n .
2. We show that B = {[1, 2, 3], [1, 0, 1]} is a basis of
V = {[a, b, c] : a + b + c = 0, a, b, c F}.
First, B V. Second, any vector in V is of the form [a, b, a b] for
a, b F. Now,
[a, b, a b] = 2b [1, 2, 3] + a b2 [1, 0, 1]
shows that span (B) = V. For linear independence, suppose
[1, 2, 3] + [1, 0, 1] = [0, 0, 0].
Then + = 0, 2 = 0, 3 = 0. It implies that = = 0.
3. Also, E = {[1, 1, 0], [0, 1, 1]} is a basis for the subspace V in (2).
Let B be a basis of a subspace V of Fn . If C is any proper superset of B, then any
vector in C \ B is a linear combination of vectors from B. So, C is linearly dependent.
On the other hand, if D is any proper subset of B, then each vector in B \ D fails
to be a linear combination of vectors from D. For, otherwise, B would be linearly
dependent. We thus say that
A basis is a maximal linearly independent set.
A basis is a minimal spanning set.
The zero subspace {0} has a single basis . But other subspaces do not have a unique
basis. For instance, the subspace V in Example 3.2 has at least two bases. However,
something remains same in all these bases. In that example, both the bases have
50
exactly two vectors. Is it true that all bases of a subspace have the same number of
vectors?
Theorem 3.1
If a subspace V of Fn has a basis of k vectors, then any list of vectors from V
having more than k vectors is linearly dependent.
Proof Let B = {u1 , . . . , uk } be a basis for a subspace V of Fn . Let E =
{v1 , . . . , vm }, where m > k. Each v j is a linear combination of us. So, we have
scalars ai j for i = 1, . . . , k and j = 1, . . . , m such that
k
v j = a1 j u1 + a2 j u2 + + ak j uk = ai j ui
for j = 1, 2, . . . , m.
i=1
i=1 j=1
j ai j ui = j a1 j
j=1
u1 + +
j ak j
uk = 0.
j=1
51
Since {e1 , . . . , en } is a basis for Fn1 , dim (Fn1 ) = n. Similarly, dim (F1n ) = n.
Remember that when we consider Cn1 or C1n , the scalars are complex numbers,
and for Rn1 or R1n , the scalars are real numbers.
Example 3.3
1. The dimension of the zero space is 0. That is, dim ({0}) = 0.
2. The subspace U := {[a, b, c, d] : a 2b + 3c = 0 = d + a, a, b, c, d F} can
be written as
U = {[a, b, c, d] : [2b 3c, b, c, 2b + 3c] : b, c F}
= {b [2, 1, 0, 2] + c [3, 0, 1, 3] : b, c F}.
The vectors [2, 1, 0, 2] and [3, 0, 1, 3] are linearly independent. Therefore, U has a basis {[2, 1, 0, 2], [3, 0, 1, 3]}. So, dim (U) = 2.
For any subset B of a subspace V of Fn , the following statements should then be
obvious.
1. If B has less vectors than dim (V ), then span (B) is a proper subspace of V.
2. If B has more vectors than dim (V ), then B is linearly dependent.
3. If B has dim (V ) number of vectors and span (B) = V, then B is a basis for V.
4. If B has dim (V ) number of vectors and B is linearly independent, then B is a
basis of V.
5. If B is a proper superset of a spanning set of V, then B is linearly dependent.
6. If B is a proper subset of a linearly independent subset of V, then B is linearly
independent and span (B) is a proper subspace of V.
7. If U is a subspace of V, then dim (U) dim (V ) n.
8. If B is a spanning set of V, then it contains a basis for V.
9. If B is linearly independent, then there exists a superset of B which is a basis
of V.
To see (8), suppose B is a spanning subset of V. If B = {0}, then V = {0} and in
that case, B is a basis of V. Otherwise, choose a nonzero vector v1 from B. Take
C := {v1 }. If V = span (C), then C is a basis of V. Else, choose a (nonzero) vector
v2 from V \ span (C). Update C to C {v2 }. Notice that C is linearly independent.
Continue this process to obtain a basis C for V. This process terminates since V is
finite dimensional.
52
Incidentally the same process is applicable to prove (9) starting from the linearly
independent subset B of V. Observe that a linearly independent subset is a basis for
the span of the subset. Therefore, we conclude the following statement from (9).
Theorem 3.2
(Basis Extension Theorem) Let V be a subspace of Fn . Then each basis of a
subspace of V can be extended to a basis of V.
You can use the methods of last section, using elementary row operations to extract
a basis for a subspace which is given in the form of span of some finite number of
vectors. The trick is to throw away the vectors which are linear combinations of the
selected ones. That is, write the vectors as row vectors and form a matrix; convert the
matrix to its RREF; and then throw away the zero rows, or the rows corresponding
to the zero rows by monitoring row exchanges.
Example 3.4
Find a basis for the subspace U of F4 , where
U = span {(1, 1, 1, 1), (2, 1, 0, 3), (1, 0, 1, 2), (0, 3, 2, 1)}.
We start with the matrix with these vectors as its rows and
its RREF as follows.
1 1 1 1
1
1 1 1
2 1 0 3 R1 0 1 2 1
1 0 1 2 0
1 2 1
0 3 2 1
0
3 2 1
1
1 0 1 2
R2
R3 0
2 1
0 1
0 0
0
0 0
0 0 4 4
0
convert it to
0
1
0
0
0
1
0
1
1 1
0
0
Here, R1 is E2 [2, 1], E1 [3, 1], R2 is E1 [2], E1 [1, 2], E1 [3, 2], E3 [4, 2], and
R3 is E[3, 4], E1/4 [3], E1 [1, 3], E2 [2, 3].
Taking the pivoted rows, we see that {(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 1)} is
a basis for the given subspace. Notice that only one row exchange has been
done in this reduction process, which means that the third row in the RREF
corresponds to the fourth vector and the fourth row corresponds to the third
vector. Thus the pivoted rows correspond to the first, second and the fourth
vector, originally. This says that a basis for the subspace is also given by
{(1, 1, 1, 1), (2, 1, 0, 3), (0, 3, 2, 1)}.
The reduction process confirms that the third vector is a linear combination
of the first, second, and the fourth.
53
54
3.3
Let A Fmn . We may view the matrix A as a function from Fn1 to Fm1 . It goes
as follows.
Let x Fn1 . Then define the matrix A as a function A : Fn1 Fm1 by
A(x) = Ax.
That is, the value of the function A at any vector x Fn1 is the vector Ax in Fm1 .
Since the matrix product Ax is well defined, such a function is meaningful. We see
that due to the properties of matrix product, the following are true:
1. A(u + v) = A(u) + A(v) for all u, v Fn1 .
2. A( v) = A(v) for all v Fn1 and for all F.
In this manner a matrix is considered as a linear map. In fact, any function A from a
vector space to another (both over the same field) satisfying the above two properties
is called a linear transformation or a linear map.
To see the connection between the matrix as a rectangular array and as a function,
consider the values of the matrix A at the standard basis vectors e1 , . . . , en in Fn1 .
Recall that e j is a column vector in Fn1 where the jth entry is 1 and all other entries
are 0. Let A = [ai j ] Fmn . Then Ae j is the jth column of A. We thus observe the
following:
A matrix A Fmn is viewed as the linear map A : Fn1 Fm1 , where
A(e j ) is the jth column of A, and A(v) = Av for each v Fn1 .
The range of the matrix A (of the linear map A) is the set R(A) = {Ax : x Fn1 }.
Now, each vector x = [ 1 , . . . , n ]t Fn1 can be written as
x = 1 e1 + + n en .
If y R(A), then there exists an x Fn1 such that y = Ax. Such a y is written as
y = Ax = 1 Ae1 + + n Aen .
Conversely we see that each vector 1 Ae1 + + n Aen is in R(A). Since Ae j is the
jth column of A, we find that
R(A) = { 1 A1 + + n An : 1 , . . . , n F},
55
where A1 , . . . , An are the n columns of A. Thus, R(A) is the span of the columns of A;
and hence, it is a subspace of Fm1 . We thus refer to R(A) as the range space of A.
In this terminology, rank(A), which is the maximum number of columns of A is the
maximum number of linearly independent vectors in R(A). That is,
rank(A) = dim (R(A)).
In the RREF of A, the un-pivoted columns are the linear combinations of the pivoted
ones. If rank(A) = r, then the pivoted columns are e1 , . . . , er . Thus, in A, the columns
that correspond to the pivoted ones form a basis of R(A).
The null space N(A) = {x Fn1 : Ax = 0} of A is simply the set of solutions of
the homogeneous system Ax = 0. If u, v N(A), then A(u + v) = Au + Av = 0. For
any scalar , A( u) = Au = 0. Therefore, N(A) is a subspace of Fn1 . And we
refer to N(A) as the null space of A.
The nullity of A, denoted by null (A) := n rank(A) is the maximum number of
linearly independent vectors in N(A). That is,
null (A) = dim (N(A)).
As we know from the properties of a linear system, null (A) gives the number of
linearly independent solutions of the homogeneous system Ax = 0. In fact, by GaussJordan elimination we can construct a basis for N(A).
Explicitly, if A has more rows than columns, then we neglect the last m n zero
rows in the RREF of A; and if A has less rows than columns, we put in n m number
of zero rows at the bottom of the RREF of A, to get a square matrix. From the
resulting square matrix B, we collect all un-pivoted columns, and if an un-pivoted
column had the column index j in B, then we change its jth entry from 0 to 1.
These changed un-pivoted column vectors form a basis of N(A).
Example 3.5
Consider the system matrix in Example
pivots as shown below:
1
5 2 3 1
A = 1 3 2 2 0
3 8 7 5
0
5/17 1/17
13/17
11/17
= RREF(A).
The first two columns in RREF(A) are the pivoted columns. So, the first two
columns in A form a basis for R(A). That is,
Basis for R(A) is {[5, 1, 3]t , [2, 3, 8]t }.
For a basis of N(A), we adjoin a zero row to the RREF to make it a square
matrix to obtain
1
0 5/17 1/17
0 1 13/17 11/17
.
B=
0 0
0
0
0 0
0
0
56
t 1
5
, 13
17
17 , 1, 0 , 17 ,
11
17 ,
t
0, 1 .
Check whether these vectors came up in writing the solution set of the system
in Example 2.9.
It follows that dim (R(A)) + dim (N(A)) is equal to the dimension of the domain
space of the linear map A, which is same as the number of columns of A. This statement is referred to as the rank nullity theorem.
1 2 1 1 1
3 5 3 4 3
3. Determine rank r of A =
1 1 1 2 1 . And then express suitable 4 r
5 8 5 7 5
rows of A as linear combinations of the other r rows. Also, express suitable
5 r columns of A as linear combinations of the other r columns.
4. If E Fmm is an elementary matrix and A Fmn , then show that the row
rank of EA is equal to the row rank of A.
5. If B Fmm is an invertible matrix and A Fmn , then show that the column
rank of BA is equal to the column rank of A.
6. From previous two exercises, conclude that an elementary row operation neither alters the row rank nor the column rank of a matrix.
7. Let A Fmn . Prove that the linear map A : Fn1 Fm1 given by A(x) := Ax
for each x Fn1 is one-one iff N(A) = {0}.
8. Let A Fnn . Prove that the linear map A : Fn1 Fn1 given by A(x) := Ax
is one-one and onto iff A maps any basis onto another (may be same) basis of
Fn1 .
9. Let A Fnn . Prove that the linear map A : Fn1 Fn1 given by A(x) := Ax
for each x Fn1 is one-one iff it is onto.
57
3.4
Change of basis
58
e j = c1 u1 + + cn un ,
Q = [v1 . . . vm ],
59
P [e j ]B = e j ,
Q [ fi ]C = fi ,
w=
a je j,
j=1
Aw = bi fi .
i=1
Then
P [w]B = P
a je j
j=1
Q [Aw]C = Q
=P
bi fi
i=1
i
C
a j [e j ]B
a j P [e j ]B = a j e j = w.
j=1
j=1
j=1
= bi Q[ fi ]C = bi fi = Aw = AP [w]B .
i=1
i=1
Fnn
Find the change of basis matrix M when the basis changes from O
to N. Also
1 1 1
find the matrix B that represents the linear map given by A = 1 0 1
0 1 0
and verify that
1
1
1
1
2 = M 2 , A 2 = B 2 .
3 N
3 O
3 N
3 O
We consider the transposes of the basis vectors and work in R31 . As per
the construction in Theorem 3.5, the change of basis matrix is given by
1 1 1
1 1 0
1 1/2 1/2
M = 1 1 1 0 1 1 = 1/2 1 1/2 .
1/2 1/2
1 1 1
1 0 1
1
60
1 1/2 1/2 1
2
M[(1, 2, 3)]O = 1/2 1 1/2 0 = 3/2 = [(1, 2, 3]N .
1/2 1/2
5/2
1
2
According to Theorem 3.5,
2 3 3
1 1 1
1 1 0
1
B = 1 1 1 A 0 1 1 = 2 1 3 .
2
0 0 2
1 1 1
1 0 1
As to the verification,
1
6
1
1
1
A 2 = 2 = 4 1 + 4 1 + 2 1 .
3
6
1
1
1
We find that
1
1
4
1
B 2 = B 0 = 4 = A 2 .
3 O
2
2
3 N
Observe that the matrix B is now a linear map from R31 with basis as O to
the space R31 with basis as N.
61
3.5
In view of Theorem 3.5, we say that two matrices A, B Fmn are equivalent iff
there exist invertible matrices P Fnn and Q Fmm such that B = Q1 AP.
Observe that equivalent matrices represent the same linear map (matrix) with respect to possibly different pairs of bases. Therefore, ranks of two equivalent matrices
are same.
We can construct a matrix of rank r relatively easily. Let r min{m, n}. The
matrix Rr Fmn whose first r columns are the first standard basis vectors e1 , . . . , er
of F1n and all other rows are zero rows, has rank r. That is, in block form,
I 0
Rr = r
.
0 0
Such a matrix is called a rank echelon matrix. For notational ease, we do not show
the size of a rank echelon matrix; we rather specify it in different contexts. From
Theorem 2.4, it follows that any matrix which is equivalent to Rr has also rank r.
Conversely, if a row of a matrix is a linear combination of other rows, then the
rank of the matrix is same as the matrix obtained by deleting such a row. Similarly,
deleting a column which is a linear combination of other columns does not change
the rank of the matrix. It is thus possible to perform elementary row and column
operations to bring a matrix of rank r to its rank echelon form Rr . We state this result
and give a rigorous proof.
Theorem 3.6
(Rank factorization) A matrix is of rank r iff it is equivalent to the rank echelon
matrix Rr of the same size.
Proof Let A Fmn . Suppose rank(A) = r. Convert A to its row reduced
echelon form C := E1 A, where E1 is a suitable product of elementary matrices.
Now, each non-pivotal column is a linear combination of the r pivotal columns.
Consider the matrix C t . The pivotal columns of C are now the pivotal rows
t
ei in C t . Each other row in C t is a liner combination of these pivotal rows.
Exchange the rows of C t so that the first r rows of the new C t are et1 , . . . , etr in
that order. Use suitable elementary row operations to zero-out all non-pivotal
62
P = [v1 . . . vn ].
The matrix A as a linear map now takes the form P1 AP, where the columns of P
form a basis, or a new coordinate system in Fn1 . This leads to similarity of two
matrices.
63
We say that two matrices A, B Fnn are similar iff B = P1 AP for some invertible matrix P Fnn .
We emphasize that if B = P1 AP is a matrix similar to A, then the matrix A as a
linear map on Fn1 with standard basis, and the matrix B as a liner map on Fn1 with
an ordered basis as the columns of P, are the same linear map.
Let N be the ordered basis whose jth element is the jth column of P. We see that
for each vector v Fn1 , [Av]N = P1 AP [v]N .
Example 3.8
t
t
31
Consider the basis N = {[1, 1,1]t , [1, 1, 1]
, [1, 1, 1] } for R . To deter1 1 1
mine the matrix similar to A = 1 0 1 , when the basis has changed from
0 1 0
the standard basis to N, we construct the matrix P by taking the basis vectors
of N as follows:
1 1 1
P = 1 1 1 .
1 1 1
Then the matrix similar
1 0
1
B = P1 AP = 1 1
2
0 1
1
1 1 1
1 1 1
0 2 2
1
0 1 0 1 1 1 1 = 1 1 3 .
2
1
0 1 0
1 1 1
1 1 3
0 2 2
2
4
1
B [u]N = 1 1 3 3/2 = 4 .
2
1 1 3 5/2
2
This verifies the condition [Au]N = B [u]N for the vector u = [1, 2, 3]t .
Though equivalence is easy to characterize by the rank, similarity is much more
difficult. And we postpone this to a later chapter.
64
4
Orthogonalization
4.1
Inner products
The dot product in R3 is used to define length and angle. In particular, the dot
product is used to determine when two vectors become perpendicular to each other.
This notion can be generalized to Fn .
For vectors u, v F1n , we define their inner product as
hu, vi = uv .
For example, if u = [1, 2, 3], v = [2, 1, 3], then hu, vi = 1 2 + 2 1 + 3 3 = 13.
Similarly, for x, y Fn1 , we define their inner product as
hx, yi = y x.
In case, F = R, in the definition of inner product, x becomes xt . The inner product
satisfies the following properties:
For x, y, z Fn , , F,
1. hx, xi 0.
2. hx, xi = 0 iff x = 0.
3. hx, yi = hy, xi.
4. hx + y, zi = hx, zi + hy, zi.
5. hz, x + yi = hz, xi + hz, yi.
6. h x, yi = hx, yi.
7. hx, yi = hx, yi.
In any vector space V, with a map from h i : V V F that satisfies Properties
(1)-(4) and (6) is called an inner product space. Properties (5) and (7) follow from
the others.
The inner product gives rise to the length of a vector as in the familiar case of
R13 . We now call the generalized version of length as the norm.
65
66
For u Fn , we define its norm, denoted by kuk as the nonnegative square root of
hu, ui. That is,
p
kuk = hu, ui.
The norm satisfies the following properties:
For x, y Fn and F,
1. kxk 0.
2. kxk = 0 iff x = 0.
3. k xk = | | kxk.
4. |hx, yi| kxk kyk. (Cauchy-Schwartz inequality)
5. kx + yk = kxk + kyk. (Triangle inequality)
A proof of Cauchy-Schwartz inequality goes as follows:
If y = 0, then the inequality clearly holds. Else, hy, yi 6= 0. Write =
=
hx, yi
. Then
hy, yi
hy, xi
and hx, yi = | |2 kyk2 . Then
hy, yi
0 hx y, x yi = hx, xi hx, yi + hy, yi hy, xi
= kxk2 hx, yi = kxk2 | |2 kyk2 = kxk2
|hx, yi|2
kyk2 .
kyk4
|hx, yi|
.
kxk kyk
67
Orthogonalization
other vector in the set. An orthogonal set of vectors is called an orthonormal set if
the norm of each vector is 1. For example,
{[1, 2, 3]t , [2, 1, 0]t }
is an orthogonal set in F31 . And
1/ 14, 2/ 14, 3/ 14 t , 2/ 5,
1/ 5,
t
0
3. Write a vector u as u =
4. Is the set {[1, 2, 3, 1]t , [2, 1, 0, 0]t , [0, 0, 1, 3]t } an orthogonal set in F41 ?
Is it also a linearly independent set?
5. Prove that each orthogonal set in Fn is linearly independent.
6. Construct an orthonormal set from {[1, 2, 3, 1]t , [2, 1, 0, 0]t , [0, 0, 1, 3]t }.
7. If an orthogonal set is given, how do we construct an orthonormal set from it?
8. Let B = {v1 , . . . , vm } be an orthonormal set in Fn . Let V = span (B). Let x Fn .
Prove the following:
(a) Fourier Expansion: If x V, then x = mj=1 hx, v j iv j .
(b) Parsevals Identity: If x V, then kxk2 = mj=1 |hx, v j i|2 .
(c) Bessels Inequality: kxk2 mj=1 |hx, v j i|2 .
4.2
Gram-Schmidt orthogonalization
It is easy to see that if the nonzero vectors v1 , . . . , vn are orthogonal, then they are
also linearly independent. For, suppose v1 , . . . , vn are nonzero orthogonal vectors.
68
Assume that
1 v1 + + n vn = 0.
Let j {1, . . . , n}. Take inner product of the left hand side and the right hand side
of the above equation with v j . If i 6= j, then hvi , v j i = 0. So, we have j hv j , v j i = 0.
But v j 6= 0 implies that hv j , v j i 6= 0. Therefore, j = 0. That is,
1 = = n = 0.
Therefore, the vectors v1 , . . . , vn are linearly independent.
Conversely, given n linearly independent vectors v1 , . . . , vn (necessarily all nonzero),
we can orthogonalize them. If v1 , . . . , vk are linearly independent but v1 , . . . , vk , vk+1
are linearly dependent, then we will see that our orthogonalization process will yield
the (k + 1)th vector as the zero vector. We now discuss this method, called GramSchmidt orthogonalization.
Given two linearly independent vectors u1 , u2 on the plane how do we construct
two orthogonal vectors? Keep v1 = u1 . Take out the projection of u2 on u1 to get v2 .
Now, v2 v1 .
What is the projection of u2 on u1 ? Its length is hu2 , u1 i. Its direction is that of
hu2 , v1 i
v1 does the job. You can now verify
u1 . Thus taking v1 = u1 and v2 = u2
hv1 , v1 i
that hv2 , v1 i = 0. We may continue this process of taking away projections in Fn . It
results in the following process.
Theorem 4.1
(Gram-Schmidt orthogonalization) Let u1 , u2 , . . . , un be linearly independent vectors in Fn . Define
v1 = u1
v2 = u2
hu2 , v1 i
v1
hv1 , v1 i
..
.
vn+1 = un+1
hun+1 , v1 i
hun+1 , vn i
v1
vn
hv1 , v1 i
hvn , vn i
Proof
hv2 , v1 i = hu2
69
Orthogonalization
Example 4.1
The vectors u1 = [1, 0, 0], u2 = [1, 1, 0], u3 = [1, 1, 1] are linearly independent
in R13 . Apply Gram-Schmidt Orthogonalization.
v1 = [1, 0, 0].
hu2 , v1 i
v2 = u2
v1 = [1, 1, 0] 1 [1, 0, 0] = [0, 1, 0].
hv1 , v1 i
hu3 , v2 i
hu3 , v1 i
v1
v2 = [1, 1, 1] [1, 0, 0] [0, 1, 0] = [0, 0, 1].
v3 = u3
hv1 , v1 i
hv2 , v2 i
Example 4.2
The vectors u1 = [1, 1, 0], u2 = [0, 1, 1], u3 = [1, 0, 1] form a basis for F13 . Apply
Gram-Schmidt Orthogonalization.
v1 = [1, 1, 0].
hu2 , v1 i
[0, 1, 1] [1, 1, 0]
v2 = u2
v1 = [0, 1, 1]
[1, 1, 0]
hv1 , v1 i
[1, 1, 0] [1, 1, 0]
h
= [0, 1, 1] 12 [1, 1, 0] = 12 , 12 , 1].
v3 = u3
hu3 , v1 i
hu3 , v2 i
v1
v2
hv1 , v1 i
hv2 , v2 i
The set [1, 1, 0], [ 12 , 12 , 1], [ 32 , 32 , 23 ] is orthogonal.
Example 4.3
Apply Gram-Schmidt orthogonalization process on the vectors u1 = [1, 1, 0, 1],
u2 = [0, 1, 1, 1] and u3 = [1, 3, 2, 1].
70
v1 = [1, 1, 0, 1].
hu2 , v1 i
v1 = [0, 1, 1, 1] 0 [1, 1, 0, 1] = [0, 1, 1, 1].
v2 = u2
hv1 , v1 i
hu3 , v2 i
hu3 , v1 i
v1
v2
v3 = u3
hv1 , v1 i
hv2 , v2 i
h(1, 3, 2, 1), (1, 1, 0, 1)i
= (1, 3, 2, 1)
(1, 1, 0, 1)
h(1, 1, 0, 1), (1, 1, 0, 1)i
h(1, 3, 2, 1), (0, 1, 1, 1)i
(0, 1, 1, 1)
Discarding v3 , which is the zero vector, we have only two linearly independent
vectors out of u1 , u2 , u3 . They are u1 and u2 ; and u3 is a linear combination of
these two. In fact, the process also revealed that u3 = u1 2u2 .
An orthogonal set can be made orthonormal by dividing each vector by its norm.
Also you can modify Gram-Schmidt orthogonalization process to directly output
orthonormal vectors.
Let V be a subspace of Fn . An orthogonal subset of V which is also a basis of V is
called an orthogonal basis of V. Similarly, when an orthonormal set is a basis of V,
the set is said to an orthonormal basis of V.
For example, the standard basis {e1 , . . . , en } of Fn1 is an orthonormal basis of
n1
F . Similarly, {et1 , . . . , etn } is an orthonormal basis of F1n .
Gram-Schmidt procedure constructs an orthogonal or an orthonormal basis from
a given basis of any subspace of Fn . It also shows that every subspace of Fn has an
orthogonal (orthonormal) basis.
In an orthonormal basis, the components of the coordinate vector of any vector can
be expressed through the inner product. Revisit Exercise 8(a) of Section 4.1. The
Fourier expansion, there, says that the coordinate vector of any vector x with respect
to an orthonormal basis has the jth component as the inner product of x with the jth
basis vector.
2. Find u R13 so that [1/ 3, 1/ 3, 1/ 3], [1/ 2, 0, 1/ 2], u are orhtonormal. Form a matrix with the vectors as rows, in that order. Verify that the
columns of the matrix are also orthonormal.
71
Orthogonalization
4.3
Best approximation
To find the point in a given plane closest to a point in space, we draw a perpendicular
from the point to the plane. The foot of the perpendicular is the closest point in the
plane.
Let U be a subspace of Fn . Let v Fn . A vector u U is a best approximation
of v iff kv uk kv xk for each x U.
We show that our intuition of taking a perpendicular can be used to compute a best
approximation.
Theorem 4.2
Let U be a subspace of Fn . A vector u U is a best approximation of v Fn iff
v u U. Moreover, a best approximation is unique.
Proof
rem,
Suppose v u U. Let x U. Now, u x U. By Pythagoras Theokv xk2 = k(v u) + (u x)k2 = kv uk2 + ku xk2 kv uk2 .
(4.1)
72
hv u, u j i = v hv, ui iui , u j = hv, u j i hv, ui ihui , u j i = hv, u j i hv, u j i = 0.
i=1
i=1
hu j , ui i j = hv, ui i.
j=1
Theorem 4.3 guarantees that this linear system has a unique solution. Notice that the
system matrix of this linear system is A = [ai j ], where ai j = hu j , ui i. Such a matrix
which results from a basis by taking the inner products of basis vectors is called a
Gram matrix. Our result shows that a Gram matrix is invertible. Can you prove
directly that a Gram matrix is invertible?
Example 4.4
Find the best approximation of v = (1, 0) R2 from U = {(a, a) : a R}.
We seek ( , ) so that (1, 0) ( , ) ( , ) for all . That is, to find
( , ) so that (1 , ) (1, 1) = 0. So, = 1/2. The best approximation
here is (1/2, 1/2).
73
Orthogonalization
4.4
Notice that we have discussed two ways of extracting a basis for the span of a finite
number of vectors from Fn . One is the method of elementary row operations and
the other is Gram-Schmidt orthogonalization. The orthogonalization is a superior
tool though computationally difficult. We will now see one of its applications in
factorizing a matrix.
Let u1 , u2 , . . . , un be the columns of A Fmn , where m n. Suppose that the
columns are linearly independent. Using Gram-Schmidt process and then orthonormalizing we get the vectors v1 , v2 , . . . , vn . Since span {u1 , . . . , uk } = span {v1 , . . . , vk }
for any k = 1, . . . , n, there exist scalars ai j , 1 i j n such that
u1 = a11 v1
u2 = a12 v1 + a22 v2
..
.
un = a1n v1 + a2n v2 + + ann vn
We take ai j = 0 for i > j. Writing R = [ai j ] for i, j = 1, 2, . . . , n, Q = [v1 v2 vn ],
we see that
A = [u1 u2 un ] = QR.
Since columns of Q are orthonormal, Q Fmn , R Fnn , Q Q = I, and R is upper
triangular.
The QR factorization of a matrix A is the determination of a matrix Q with orthonormal columns and an upper triangular matrix R so that A = QR.
Recall that if Q Rmn , we have Qt Q = I. The above discussion boils down to
the following result.
Theorem 4.4
Any matrix A Fmn with linearly independent columns has a QR factorization. Moreover, R is invertible.
74
Example
4.5
1/ 2 0
1 1
Let A = 0 1 . Orthonormalization of the columns of A yields Q = 0 1 .
1/ 2 0
1 1
2
2
. It is easy to check
Since A = QR and Qt Q = I, we have R = Qt A =
0
1
that A = QR.
The QR factorization and best approximation together give an efficient procedure
in approximating a solution of a system of linear equations. In order to discuss this,
we first define the so called least squares approximation of a solution of a linear
system.
Let A Fmn . A vector u Fn1 is a called a least squares solution of the linear
system Ax = b iff kAu bk kAz bk for all z Fm1 .
Notice that a least squares solution of Ax = b is simply the best approximation of
a solution to Ax = b from R(A). Then Theorem 4.2 yields the following result.
Theorem 4.5
Let A Fmn , and let b Fm1 .
1. The linear system Ax = b has a least squares solution.
2. A vector u Fn1 is a least squares solution iff Au b R(A).
3. A least squares solution is unique iff N(A) = {0}.
Recall that the null space N(A) of A is the set of all solutions of the homogeneous
system Ax = 0. Thus the condition N(A) = {0} says that the homogeneous system
Ax = 0 does not have a nonzero solution.
Least squares solutions can be computed by solving a related linear system.
Theorem 4.6
Let A Fmn and let b Fm1 . A vector u Fn1 is a least squares solution of
Ax = b iff A Au = A b.
Proof The columns u1 , . . . , un of A span R(A). Due to Theorem 4.5,
u is a least squares solution of Ax = b
iff hAu b, ui i = 0, for i = 1, . . . , n
iff ui (Au b) = 0 for i = 1, . . . , n
iff A (Au b) = 0
iff A Au = A b.
Orthogonalization
75
1 1
2
0 1
1 0 2
0 1 1
76
0
1 1 1
3
1
1
1
1 0 1
2 , b = 0
(a) A = 1
(b) A =
1 1 0 , b = 1 .
2 1
2
2
0 1 1
4. Let A Rnn . Let b Rn1 with b 6= 0. Show that if b is orthogonal to each
column of A, then Ax = b is inconsistent. What are least squares solutions of
Ax = b?
5
Eigenvalues and Eigenvectors
5.1
Eigenvalues
0 1
Let A =
. We view A as a linear transformation; A : R21 R21 . It trans1 0
forms straight lines to straight lines or points. Does there exist a straight line which
is transformed to itself?
x
0 1
x
y
A
=
=
.
y
1 0
y
x
Thus, the line {(x,x) :x R}
never
moves.
So also the line
{(x,
x) : x R}.
x
x
x
x
Observe that A
=1
and A
= (1)
.
x
x
x
x
Let A Fnn . A scalar F is called an eigenvalue of A iff there exists a nonzero vector v Fn1 such that Av = v. Such a vector v is called an eigenvector of
A for (or, associated with, or, corresponding to) the eigenvalue .
Example 5.1
1 1 1
Consider the matrix A = 0 1 1 . It has an eigenvector [0, 0, 1]t associated
0 0 1
with the eigenvalue 1. Is [0, 0, c]t also an eigenvector associated with the same
eigenvalue 1?
In fact, corresponding to an eigenvalue, there are infinitely many eigenvectors.
77
78
5.2
Characteristic polynomial
1
A= 1
1
0
1
1
0
0 .
1
1t 0
0
1 1 t 0 = (1 t)3 .
1
1 1t
79
A=
0 1
,
1 0
80
81
For computing the inverse, suppose that A is invertible. Then det(A) 6= 0. Since
det(A) is the product of all eigenvalues of A, = 0 is not an eigenvalue of A. It
implies that (t ) is not a factor of the characteristic polynomial of A. Therefore,
the constant term a0 in the characteristic polynomial of A is nonzero. Then we can
rewrite the above equation as
a0 I + A(a1 I + an An1 ) = 0.
Multiplying A1 and simplifying, we obtain
A1 =
1
a1 I + a2 A + + an An1 .
a0
3
0
1. Find eigenvalues and corresponding eigenvectors of
0
0
0
2
0
0
0
0
0
0
.
0 2
2
0
82
5.3
1 2 3
0 2 3
i 2i 3
0 2+i 3
B = 2 3 4 , C = 2 0 4 , D = 2i 3 4 , E = 2 i i 4i
3 4 5
3 4 0
3 4 5
3 4i 0
Notice that a skew-symmetric matrix must have a zero diagonal, and the diagonal
entries of a skew-hermitian matrix must be 0 or purely imaginary. Reason:
aii = aii 2Re(aii ) = 0.
Let A be a square matrix. Since A + At is symmetric and A At is skew symmetric, every square matrix can be written as a sum of a symmetric matrix and a skew
symmetric matrix:
A = 21 (A + At ) + 12 (A At ).
Similar rewriting is possible with hermitian and skew hermitian matrices:
A = 21 (A + A ) + 21 (A A ).
A square matrix A is called unitary iff A A = I = AA . In addition, if A is real, then
it is called an orthogonal matrix. That is, an orthogonal matrix is a matrix with
real entries satisfying At A = I = AAt . Notice that a square matrix is unitary iff it is
invertible and its inverse is equal to its adjoint. Similarly, a real matrix is orthogonal
iff it is invertible and its inverse is its transpose. In the following, B is a unitary
matrix of order 2, and C is an orthogonal matrix (also unitary) of order 3:
2 1 2
1 1+i 1i
1
B=
, C = 2 2 1 .
2 1i 1+i
3
1 2 2
The following are examples of orthogonal 2 2 matrices.
cos sin
cos
sin
O1 :=
, O2 :=
.
sin
cos
sin cos
If A = [ai j ] is an orthogonal matrix of order 2, then At A = I implies
a211 + a221 = 1 = a212 + a222 , a11 a12 + a21 a22 = 0.
83
Thus, there exist , such that a11 = cos , a21 = sin , a12 = cos , a22 = sin
and cos( ) = 0. It then follows that A is in the form of either O1 or O2 .
Let (a, b) be the vector in the plane that starts at the origin and ends at the point
(a, b). Writing the point (a, b) as a column vector [a b]t , we see that the matrix
product O1 [a b]t is the end-point of the vector obtained by rotating the vector (a, b)
by an angle . Similarly, O2 [a b]t gives a point obtained by reflecting (a, b) along a
straight line that makes an angle /2 with the x-axis. Thus, O1 is said to be a rotation
by an angle and O2 is called a reflection along a line making an angle of /2 with
the x-axis.
If A Fmn , then A A = I is equivalent to asserting that the columns of A are
orthonormal; and AA = I is equivalent to the fact that the rows of A are orthonormal.
Unitary or orthogonal matrices preserve inner product and also the norm. This is the
reason unitary matrices are also called isometries. We prove these facts about unitary
matrices in the following theorem.
Theorem 5.3
Let A Fnn be a unitary or an orthogonal matrix.
1. For each pair of vectors x, y Fn1 , hAx, Ayi = hx, yi.
2. For each x Fn1 , kAxk = kxk.
3. The columns of A are orthonormal.
4. The rows of A are orthonormal.
5. |det(A)| = 1.
Proof
84
Theorem 5.4
Let A Fnn . Let be any complex eigenvalue of A.
1. If A is hermitian or real symmetric, then R. Moreover, there exists
a real eigenvector corresponding to .
2. If A is skew-hermitian or skew-symmetric, then is purely imaginary
or zero.
3. If A is unitary or orthogonal, then | | = 1.
Proof Let A Fnn . Let be any complex eigenvalue of A with an eigenvector v Cn1 . Now, Av = v. Pre-multiplying with v , we have v Av = v v C.
(1) If A is hermitian, then A = A . Now,
(v Av) = v A v = v Av and
(v v) = v v.
Ay = y.
85
6
Canonical Forms
6.1
Schur triangularization
Eigenvalues and eigenvectors can be used to bring a matrix to nice forms using similarity transformations. A very general result in this direction is Schurs unitary triangularization. It says that using a suitable similarity transformation, we can represent
a square matrix by an upper triangular matrix. Thus, the diagonal entries of the upper
triangular matrix must be the eigenvalues of the given matrix. This information can
be used to construct the appropriate similarity transformation.
Theorem 6.1
(Schur Triangularization) Let A Cnn . Then there exists a unitary matrix
P Cnn such that P AP is upper triangular. Moreover, if A Rnn has only
real eigenvalues, then P can be chosen to be an orthogonal matrix.
Proof Our proof is by induction on n. If n = 1, then clearly A is an upper
triangular matrix, and we take P = [1], the identity matrix with a single entry
as 1, which is both unitary and orthogonal.
Assume that for all B Cmm , m 1, we have a unitary matrix Q Cmm
such that Q BQ is upper triangular. Let A C(m+1)(m+1) . Let C be an
eigenvalue of A with an associated eigenvector u. Consider C(m+1)1 as an inner
product space with the usual inner product hw, zi = z w. Let v = u/kuk, so that
v is an eigenvector of A of norm 1 associated with the eigenvalue . Extend the
set {v} to obtain an orthonormal ordered basis E = {v, v1 , . . . , vm } for C(m+1)1 .
Here, you may have to use an extension of a basis, and then Gram-Schmidt
orthonormalization process. Now, construct the matrix R C(m+1)(m+1) by
taking these basis vectors as its columns, in that order; that is, let
R = [v v1 vm ].
Since E is an orthonormal set, R is unitary. With respect to the basis E, the
matrix representation of A is R1 AR = R AR. The first column of R AR is
R ARe1 = R Av = R1 v = R1 v = R1 Re1 = e1 ,
87
88
P AP =
R AR
=
=
0
0 S
0 S
0 S 0 C 0 S
y
SCS
for some y C1m . Since SCS is upper triangular, the induction proof is
complete for the case A Cnn .
When A Rnn , and all the eigenvalues of A are real, we use the transpose
instead of the adjoint, every where in the above proof. Thus, P can be chosen
to be an orthogonal matrix.
To eradicate possible misunderstanding, we recall that A has only real eigenvalues
means that when we consider this A as a matrix in Cnn , all its complex eigenvalues
turn out to be real numbers. This again means that all zeros of the characteristic
polynomial of A are real.
Further, during the course of the proof of Schurs triangularization, once we obtain
a matrix similar to A in the form
y
,
0 SCS
we look for whether is still an eigenvalue of SCS. If so, we choose this eigenvalue
over others for further reduction. In the next step we obtain a matrix similar to A in
the form
y z
0 x ,
0 0 M
where M is an (n 2) (n 2) matrix. Continuing further this way, we see that a
Schur triangularization of A exists, where on the diagonal of the final upper triangular
matrix equal eigenvalues occur together. Of course, the construction allows to have
an upper triangular form, where eigenvalues can be chosen to occur on the diagonal
on any given order. However, this particular form, where equal eigenvalues occur
together will be helpful later.
89
Canonical Forms
Example 6.1
2
1 0
3 0 for Schur triangularization.
Consider the matrix A = 2
1 1 1
We find that A (t) = (1 t)2 (4 t). All eigenvalues of A are real; thus there
exists an orthogonal matrix P such that Pt AP is upper triangular. To determine such a matrix P, we take one of the eigenvalues, say 1. An associated
eigenvector of norm 1 is v = [0, 0, 1]t . We extend {v} to an orthonormal basis
for C31 . For convenience, we take the (ordered) orthonormal basis as
{[0, 0, 1]t , [1, 0, 0]t , [0, 1, 0]t }.
Taking the basis vectors as columns, we form the matrix R as
0 1 0
R = 0 0 1 .
1 0 0
We then find that
1 1 1
2
1 .
Rt AR = 0
0
2
3
2 1
Now, we try to triangularize the matrix C =
. It has eigenvalues 1
2 3
and 4. The eigenvector of unit norm associated with the eigenvalue 1 is
{[1/ 2,
1/ 2]t ,
[1/ 2, 1/ 2]t }
for C21 . Then we construct the matrix S by taking these basis vectors as its
columns, that is,
1/ 2
1/ 2
S = 1
.
/ 2 1/ 2
1 1
We find that St CS =
, which is an upper triangular matrix. Then
0
4
1/ 2
1/ 2
0 1 0 1
0
0
0
1 0
1/ 2 = 0 1/ 2
1/ 2 .
1/ 2
P=R
= 0 0 1 0
0 S
1 0 0 0 1/ 2 1/ 2
1
0
0
Computing Pt AP, we have
1
Pt AP = 0
0
0 2
1 1 ,
0
4
90
Further, there is nothing sacred about being upper triangular. For, given a matrix
A Cnn , consider using Schur triangularization of A . There exists a unitary matrix
P such that P A P is upper triangular. Then taking transpose, we have P AP is lower
triangular. That is,
any square matrix is unitarily similar to a lower triangular matrix.
Analogously, a real square matrix having only real eigenvalues is also orthogonally
similar to a lower triangular matrix. We remark that the lower triangular form of a
matrix need not be the transpose or the adjoint of its upper triangular form.
Moreover, neither the unitary matrix P nor the upper triangular matrix P AP in
Schur triangualrization is unique. That is, there can be unitary matrices P and Q
such that both P AP and Q AQ are upper triangular, and Q, P 6= Q, P AP 6= Q AQ.
The non-uniqueness stems from the choices involved in the associated eigenvectors
and in extending this to an orthonormal basis. For instance, in Example 6.1, if you
extend {[0, 0, 1]t } to the ordered orthonormal basis
{[0, 0, 1]t , [0, 1, 0]t , [1, 0, 0]t },
then you end up with
0
P = 0
1
1/ 2
1/ 2
1/ 2
1/ 2 ,
1
Pt AP = 0
0
0 2
1
1 .
0
4
91
Canonical Forms
6.2
Diagonalizability
As you see from Schur triangularization, each matrix with complex entries is similar
to an upper triangular matrix. Moreover, a matrix A with real entries is similar to
an upper triangular real matrix provided all zeros of the characteristic polynomial
are real. The upper triangular matrix similar to a given matrix A takes a better form
when A is hermitian.
Theorem 6.2
(Spectral theorem for hermitian matrices) Each hermitian matrix is orthogonally similar to a diagonal matrix.
Proof Let A Cnn be a hermitian matrix. Due to Schur triangularization,
we have a unitary matrix P such that D = P AP is upper triangular. Now,
D = P A P = P AP = D.
Since D is upper triangular and D = D, we see that D is a diagonal matrix.
Observe that the diagonal entries in the diagonal matrix D are the eigenvalues of A. Since A is hermitian, all of them are real numbers, and the associated
eigenvectors can always be chosen to be in Rn1 . Therefore, the unitary matrix
P, which consists of real eigenvectors of A is an orthogonal matrix.
It thus follows that each real symmetric matrix is orthogonally similar to a diagonal matrix. The spectrum of a matrix is the multi-set of its eigenvalues. Theorem 6.2
is called the spectral theorem for hermitian matrices since it explicitly uses the eigenvalues of the matrix.
A matrix A Fnn is called diagonalizable iff there exists an invertible matrix P
Fnn such that P1 AP is a diagonal matrix. When P1 AP is a diagonal matrix, we
say that A is diagonalized by P. In this language, the spectral theorem for hermitian
matrices may be stated as follows:
Every hermitian matrix is orthogonally diagonalizable.
To see how the eigenvalues and eigenvectors are involved in the diagonalization process, we proceed as follows.
Let A Fnn . Let 1 , . . . , n be all complex eigenvalues (not necessarily distinct)
of A. Let v1 , . . . , vn Fn1 be corresponding eigenvectors. Construct n n matrices
P := [v1 v2 vn ],
D = diag ( 1 , 2 , . . . , n ).
92
That is,
AP = P D..
P1 AP
If P is invertible, then
= D, a diagonal matrix. That is, A is similar to a
diagonal matrix. Moreover, P is invertible iff its columns form a basis for Fn1 . We
thus obtain the following result.
Theorem 6.3
A matrix A Fnn is diagonalizable iff there exists a basis of Fn1 consisting
of eigenvectors of A.
The question is, when are there n linearly independent eigenvectors of A? The
spectral theorem provides a partial answer. We can generalize the spectral theorem
to the so called normal matrices.
A matrix A Cnn is called a normal matrix iff A A = AA . We observe the
following.
Theorem 6.4
Each upper triangular normal matrix is diagonal.
Proof Let U Cnn be an upper triangular matrix. If n = 1, then clearly
U is a diagonal matrix. Lay out the induction hypothesis that each upper
triangular normal matrix of order k is diagonal. Let U be an upper triangular
normal matrix of order k +1. Write U in a partitioned form, as in the following.
V u
U=
,
0 a
where V Ckk , u Ck1 , 0 is the zero row vector in C1k , and a C. Then
V V
V u
VV + uu a u
U U =
=
UU
=
u V u u + |a|2
a u
|a|2
implies that u u + |a|2 = |a|2 . That is, u = 0. Plugging u = 0 in the above
equation, we see that V V = VV . Since V is upper triangular, by the induction
hypothesis, V is a diagonal matrix. Then with u = 0, U is also a diagonal
matrix.
Using this result on upper triangular normal matrices, we can generalize the spectral theorem to normal matrices.
Theorem 6.5
(Spectral theorem for normal matrices) A square matrix is unitarily diagonalizable iff it is a normal matrix.
93
Canonical Forms
1 0
0
0 1 0
5 2
A = 4 3 2 , P = 1
2 1
0
1
3 1
we see that A A 6= AA , P P 6= I but
1 1
2
1
0
0 4
P1 AP = 1
2
1 1
2
0
0
0 1
3 2 1
5
1
0
1
3
0
1
2 = 0
1
0
0
1
0
0
0 .
2
(6.1)
94
1 1 1
1 1 is real symmetric. It has eigenvalues
The matrix A = 1
1 1
1
1, 2, 2, with associated eigenvectors (normalized)
1/ 3
1/ 6
1/ 2
1/3 , 1/2 , 1/6 .
1/ 3
2/ 6
0
They form an orthonormal basis for R3 . Taking
1/ 3 1/ 2 1/ 6
P = 1/ 3 1/ 2 1/ 6 ,
1/ 3
2/ 6
0
1 0
we see that P1 = Pt and P1 AP = Pt AP = 0 2
0 0
0
0 .
2
95
Canonical Forms
1
. The characteristic polynomials of both
1
96
2 1 0
1 10 0
0 2 0
2 3
3 1
(a)
(b) 1
(c)
0 0 2
6 1
1
0 4
0 0 0
7 5 15
2. Diagonalize A = 6 4 15 . Then compute A6 .
0
0 1
matrix is diago
0
0
0
5
0 1 1
7 2
0
6 2 .
(a) 1 0 1
(b) 2
1 1 0
0 2
5
4. Check whether each of the following matrices is diagonalizable. If diagonalizable, find a basis of eigenvectors for the space C31 :
1 1 1
1 1 1
1 0 1
(a) 1 1 1
(b) 0 1 1
(c) 1 1 0
1 1 1
0 0 1
0 1 1
5. Show that each of the following matrices is diagonalizable with a matrix in
R33 . Also find a basis of eigenvectors for R31 .
3/2 1/2 0
3 1/2 3/2
2 1 0
2 0 .
(a) 1/2 3/2 0
(b) 1 3/2 3/2
(c) 1
1/2 1/2 1
1
5
1
/2
/2
2
2 3
6. Prove that if a normal matrix has only real eigenvalues, then it is hermitian.
Conclude that if a real normal matrix has only real eigenvalues, then it is symmetric.
97
Canonical Forms
6.3
Jordan form
98
for some si ; each matrix Jj ( i ) here has the form
i 1
i 1
.. ..
Jj ( i ) =
. .
1
i
The missing entries are all 0. Such a matrix Jj ( i ) is called a Jordan block with
diagonal entries i . Any matrix which is in the above block diagonal form is said
to be in Jordan form. We will see that the number of Jordan blocks with diagonal
entries i is the geometric multiplicity of the eigenvalue i .
Example 6.4
The following matrix is one in Jordan form:
1
1
2
1
2 1
2
It has three Jordan blocks for the eigenvalue 1 of which two are of size 11 and
one of size 2 2; and it has one block of size 3 3 for the eigenvalue 2. Notice
that the eigenvalue 1 has geometric multiplicity 3, algebraic multiplicity 4,
and 2 has geometric multiplicity 1 and algebraic multiplicity 3.
Theorem 6.8
(Jordan form) Each matrix A Cnn is similar to a matrix in Jordan form J,
where the diagonal entries are the eigenvalues of A. Moreover, if mk ( ) is the
number of Jordan blocks of order k with diagonal entry , then
mk ( ) = rank((A I)k1 )2 rank((A I)k )+rank((A I))k+1 for k = 1, . . . , n.
In particular, the Jordan form of A is unique up to a permutation of the blocks.
In the formula for mk we use the convention that for any matrix B of order n, B0 is
the identity matrix of order n.
Proof First, we will show the existence of a Jordan form, and then we will
come back to the formula mk , which will show the uniqueness of a Jordan form
up to a permutation of Jordan blocks.
99
Canonical Forms
x
0
A1
A1
A2
A2
..
..
.
.
Ak
Ak
If such an x occurs as the (r, s)th entry in A, then r < s. Moreover, the corresponding diagonal entries arr and ass are eigenvalues of A occurring in different
blocks Ai and A j . Thus arr 6= ass . Further, all entries below the diagonals of Ai
and of A j are 0. We use a combination similarity to obtain
E [r, s] A E [r, s] with
x
.
arr ass
This similarity transformation replaces the rth row with the rth row minus
times the sth row, and then replaces the sth column with sth column plus
times the rth column. Since r < s, it changes the entries of A in the rth
row to the right of the sth column, and the entries in the sth column above
the rth row. Thus the upper triangular nature of the matrix does not change.
Further, it replaces the (r, s)th entry x with
ars + (arr ass ) = x +
x
(arr ass ) = 0.
arr ass
We use a sequence of such similarity transformations starting from the last row
of Ak1 with least column index and ending in the first row with largest column
index. Observe that an entry beyond the blocks, which was 0 previously can
become nonzero after a single such similarity transformation. Such an entry
will eventually be zeroed-out. Finally, each position which is not inside any
100
1
Q BQ =
B1
B2
..
.
B`
a 1
1
a
..
where each B j =
.
..
1
a
Then
Q
C :=
0
1
0
Q
Ai
1
0
1
0
Q BQ
=
1
0
a
..
1
Q u
.
=
a
..
b1
b2
.
bm2
a bm1
a
Here, the sequence of s on the super-diagonal read from top left to right
bottom comprise a block of 1s followed by a 0 and then a block of 1s followed
by a 0, and so on. The number of 1s depend on the sizes of B1 , B2 , etc. That
is, when B1 is complete and B1 starts, we have a 0. Also, we have shown Q1 u
as [b1 . . . bm1 ]t . Our goal is to zero-out all b j s to 0 except bm1 which may
be made a 0 or 1.
In the next sub-stage, call it the third stage, we apply similarity transformations to zero-out (all or except one of) the entries b1 , . . . , bm2 . In any row
of C the entry above the diagonal (the there) is either 0 or 1. The is a 0
at the last row of each block B j . We leave all such bs right now; they are to
be tackled separately. So, suppose in the rth row, br 6= 0 and the (r, r + 1)th
entry (the above the diagonal entry) is a 1. We wish to zero-out each such
Canonical Forms
101
B1
c
1
B2
c2
..
F :=
.
.
B`
c`
a
Notice that if B j is a 1 1 block, then the corresponding entry on the last
column is already 0. In the next sub-stage, call it the fourth stage, we keep
the nonzero c corresponding to the last block (the c entry with highest column index) and zero-out all other cs. Let Bq be the last block so that its
corresponding c entry is cq 6= 0 in the sth row. (It may not be c` ; in that
case, all of cq+1 , . . . , c` are already 0.) We first make cq a 1 by using a dilation
similarity:
G := E1/cq [s] FEcq [s].
In G, the earlier cq at (s, m)th position is now 1. Let B p be any block other
than Bq with c p 6= 0 in the rth row. Our goal in this sub-stage, call it the fifth
stage, is to zero-out c p . We use two combination similarity transformations as
shown below:
H := Ec p [r 1, s 1] Ec p [r, s] H Ec p [r, s] Ec p [r 1, s 1].
This similarity transformation brings c p to 0 and keeps other entries intact.
We do this for each such c p . Thus in the mth column of H, we have only one
nonzero entry 1 at (s, m)th position. If this happens to be at the last row, then
we have obtained a Jordan form. Otherwise, call this sub-stage as the seventh
stage, we move this 1 to the (s, s + 1)th position by the following sequence of
permutation similarities:
K := E[m 1, m] E[s + 2, m]E[s + 1, m] H E[s + 1, m]E[s + 2, m] E[m 1, m].
102
This transformation exchanges the rows and columns beyond the sth so that
the 1 in (s, m)th position moves to (s, s + 1)th position making up a block; and
other entries remain as they were earlier.
Here ends the proof by induction that each block Ai can be brought to a
Jordan form by similarity transformations. From a similarity transformation
for Ai a similarity transformation can be constructed for the block diagonal
matrix
A := diag (A1 , A2 , . . . , Ak )
by putting identity matrices of suitable order and the similarity transformation for Ai in a block form. As these transformations do not affect any other
a sequence of such transformations brings A to its
rows and columns of A,
Jordan form, proving the existence part in the theorem.
For the formula for mk , let be an eigenvalue of A. Suppose k {1, . . . , n}.
Observe that A I is similar to J I. Thus, rank((A I)i ) = rank((J I)i )
for each i. Therefore, it is enough to prove the formula for J instead of A. We
use induction on n. In the basis case, J = [ ]. Here, k = 1 and mk = m1 = 1. On
the right hand side, due to the convention,
(J I)k1 = I = [1], (J I)k = [0]1 = [0], (J I)k+1 = [0]2 = [0].
So, the formula holds for n = 1.
Lay out the induction hypothesis that for all matrices in Jordan form of
order less than n, the formula holds. Let J be a matrix of order n, which is in
Jordan form. We consider two cases.
Case 1: Let J have a single Jordan block corresponding to . That is,
0 1
1
0 1
.
.
.
.
.. ..
.. ..
J=
.
, J I =
1
1
0
And for k = n,
103
Canonical Forms
Case 2: Suppose J has more than one Jordan block corresponding to . Suppose that the first Jordan block in J corresponds to and has order r < n.
Then J I can be written in block form as
C 0
J I =
,
0 D
where C is the Jordan block of order r with diagonal entries as 0, and D is
the matrix of order n r in Jordan form consisting of other blocks of J I.
Then, for any j,
j
C
0
j
(J I) =
.
0
Dj
Therefore,
rank(J I) j = rank(C j ) + rank(D j ).
Write mk (C) and mk (D) for the number of Jordan blocks of order k for the
eigenvalue that appear in C and in D, respectively. Then
mk = mk (C) + mk (D).
By the induction hypothesis,
mk (C) = rank(Ck1 ) 2 rank(Ck ) + rank(C)k+1 ,
mk (D) = rank(Dk1 ) 2 rank(Dk ) + rank(D)k+1 .
It then follows that
mk = rank((J I)k1 ) 2 rank((J I)k ) + rank((J I))k+1 .
Since the number of Jordan blocks of order k corresponding to each eigenvalue of A is uniquely determined, the Jordan form of A is also uniquely
determined up to a permutation of blocks.
To obtain a Jordan form of a given matrix, we may use the construction of similarity transformations as used in the proof of Theorem 6.8, or we may use the formula
for mk as given there. We illustrate these methods in the following examples.
Example 6.5
2 1 0 0 0 1 0
2 0 0 0 3 0
2
2 1 0 0
2 0 2 0
2 0 0
Consider the upper triangular matrix A =
2 0
2 0
1
0
0
0
0
0
1
3
0
0
0
0
1
1
3
104
Following the proof of Theorem 6.8, we first zero-out the circled entries,
starting from the entry on the third row. Here, the row index is r = 3, the
column index is s = 7, the eigenvalues are arr = 2, ass = 3, and the entry to
be zeroed-out is x = 2. Thus, = 2/(2 3) = 2. We use the combination
similarity:
M1 = E2 [3, 7] A E2 [3, 7].
That is, in A, we replace row(3) with row(3) 2 row(7) and then replace
col(7) with col(7) + 2 col(3) to obtain
2
2 1 0 0 0 1 0
0
2 0 0 0 3 0 0
1
2 1 0 0 0 2 0
2 0 2 0 0 0
2 0 0 0 0
M1 =
.
2
0
0
0
3 1 1
3 1
3
Notice that the similarity transformation made a previously 0 entry nonzero.
It brought in a new nonzero entry such as 2 in (3, 8) position. We will
zero it out before proceeding to the originally encircled ones. The suitable
combination similarity is
M2 = E2 [3, 8] M1 E2 [3, 8],
which replaces row(3) with row(3) + 2 row(8) and then replaces col(8) with
col(8) 2 col(3). Verify that it zeroes-out the entry 2 but introduces 2 at
(3, 9) position. Once more, we use a combination similarity. This time we use
M3 = E2 [3, 9] M2 E2 [3, 9]
replacing row(3) with row(3)2row(9) and then replacing col(9) with col(9)+
2 col(3). Now,
2 0
2 1 0 0 0 1 0
2 0 0 0 3 0 0
1
2
1
0
0
0
0
0
2 0 2 0 0 0
2 0 0 0 0
M3 =
2
0
0
0
3 1 1
3 1
3
Similar to the above, we use the combination similarities to reduce M3 to M4 ,
where
M4 = E1 [2, 9] M3 E1 [2, 9].
105
Canonical Forms
To zero-out the encircled 2, we use the combination similarity
M5 = E2 [1, 8] M4 E2 [1, 8].
1
2 1 0 0 0
2 0 0 0 3
2
1
0
0
2
0
2
.
2
0
M6 = E2 [1, 9] M5 E2 [1, 9] =
3
1
1
3 1
3
Now, the matrix M6 is in block diagonal form. We focus on each of the
blocks, though we will be working with the whole matrix. We consider the
block corresponding to the eigenvalue 2 first. Since this step is inductive we
scan this block from the top left corner. The 2 2 principal sub-matrix of
this block is already in Jordan form. The 3 3 principal sub-matrix is also
in Jordan form. We see that the principal sub-matrix of size 4 4 and 5 5
are also in Jordan form, but the 6 6 sub-matrix, which is the block itself is
not in Jordan form. We wish to bring the sixth column to its proper shape.
Recall that our strategy is to zero out all those entries on the sixth column
which are opposite to a 1 on the super-diagonal of this block. There is only
one such entry, which is encircled in M6 above.
The row index of this entry is r = 1, its column index is m = 6, and the
entry itself is br = 1. We use the combination similarity
2 1 0 0 0 0
2 0 0 0 5
2 1 0 0
2 0 2
.
2 0
M7 = E1 [2, 6] M6 E1 [2, 6] =
3 1 1
3 1
3
Next, among the nonzero entries 5 and 2 at the positions (2, 6) and (4, 6), we
wish to zero-out the 5 and keep 2 as the row index of 2 is higher. First, we
use a dilation similarity to make this entry 1 as in the following:
M8 = E1/2 [4] M7 E2 [4].
106
It replaces row(4) with 1/2 times itself, and then replaces col(4) with 2 times
itself, thus making (4, 6)th entry 1 and keeping all other entries intact. Next,
we zero-out the 5 on (2, 4) position by using the two combination similarities.
Here, c p = 5, r = 2, s = 4; thus
2 1 0 0 0 0
2 0 0 0 0
2 1 0 0
1
2 0
2 0
M9 = E5 [1, 3] E5 [2, 4] M8 E5 [2, 4] E5 [1, 3] =
3 1 1
3 1
3
Notice that M9 has been obtained from M8 by replacing row(2) with row(2)
5 row(4), col(4) with col(4) + 5 col(2), row(1) with row(1) 5 row(3), and
then col(3) with col(3) + 5 col(1).
Next, we move this encircled 1 to (4, 5) position by similarity. Here, s =
4, m = 6. Thus the sequence of permutation similarities boils down to only one,
i.e., exchanging row(5) with row(6) and then exchanging col(6) with col(5).
Observe that we would have to use more number of permutation similarities
if the difference between m and s is more than 2. We thus obtain
2 1 0 0 0 0
2 0 0 0 0
2
1
0
0
2
1
0
.
2
0
M10 = E[5, 6] M9 E[5, 6] =
3
1
1
3 1
3
107
Canonical Forms
transformation is
2 1 0 0 0 0
2 0 0 0 0
2 1 0 0
2 1 0
2 0
M11 = E1 [8, 9] M10 E1 [8, 9] =
3 1 0
3 1
3
Now, M11 is in Jordan form.
Example 6.6
We consider the same matrix A of Example 6.5. Here, we compute the number
mk of Jordan blocks corresponding to each eigenvalue which is of size k. For
this purpose, we require the ranks of the matrices (A I)k for successive k
and for each eigenvalue of A. We see that A has two eigenvalues 2 and 3. You
may compute the successive powers of (A 2I) and of (A 3I) and their ranks
using packages such as Matalb or Scilab. We find that for the eigenvalue 2,
rank(A 2I)0 = rank(I) = 9, rank(A 2I) = 6, rank(A 2I)2 = 4,
rank(A 2I)3+k = 3
for k = 0, 1, 2, . . . .
for k = 0, 1, 2, . . . .
108
Suppose that a matrix A Cnn has a Jordan form J = P1 AP, in which the first
Jordan block is of size k with diagonal entries as . Suppose P = [v1 vn ]. Then
AP = PJ implies that
A(v1 ) = v1 , A(v2 ) = v1 + v2 , . . . , A(vk ) = vk1 + vk .
If the next Jordan block in J has diagonal entries as (which may or may not be
equal to ) then we have Avk+1 = vk+1 , Avk+2 = vk+1 + vk+2 , . . . , and so on.
The list of vectors v1 , . . . , vk above is called a Jordan string that starts with v1 and
ends with vk . The number k is called the length of the Jordan string. In such a Jordan
string, we see that
v1 N(A I), v2 N(A I)2 , . . . , vk N(A I)k .
Any vector in N((A I) j ), for some j, is called a generalized eigenvector corresponding to the eigenvalue of A.
The columns of P are all generalized eigenvectors of A corresponding to some
eigenvalue of A. Moreover, the columns of P can be constructed this way, looking at
the subspaces N(A I) j . One may start with linearly independent vectors satisfying
(A I)v = 0. Corresponding to each solution v1 of this linear system, one determines linearly independent vectors satisfying (A I)v = v1 . Next, corresponding
to each solution v2 of this linear system, one solves (A I)v = v2 and so on. The
process stops when n number of linearly independent vectors have been obtained this
way. These vectors form the matrix P.
In the first stage, if the geometric multiplicity of the eigenvalue is , then there
are number of linearly independent eigenvectors associated with . These are possible candidates for v1 . Thus there are number of Jordan strings associated with .
These strings give rise to the Jordan blocks with diagonal entries as . Thus, in J
there are exactly number of Jordan blocks with diagonal entries as .
You can prove this fact from J directly by first showing that the geometric multiplicity of the eigenvalue of A is the same as the geometric multiplicity of the
eigenvalue of J.
The uniqueness of a Jordan form can be made exact by first ordering the eigenvalues of A and then arranging the blocks corresponding to each eigenvalue (which now
appear together on the diagonal) in some order, say in ascending order of their size.
In doing so, the Jordan form of any matrix becomes unique. Such a Jordan form is
called the Jordan canonical form of a matrix. It then follows that if two matrices
are similar, then they have the same Jordan canonical form. Moreover, uniqueness
also implies that two dissimilar matrices will have different Jordan canonical forms.
Therefore, Jordan form characterizes similarity of matrices.
As an application of Jordan form, we will show that each matrix is similar to its
transpose. Suppose J = P1 AP. Now, Jt = Pt At (P1 )t = Pt At (Pt )1 . That is, At is
similar to Jt . Thus it is enough to show that Jt is similar to J. First, let us see it for a
109
Canonical Forms
single Jordan block. So, let
J =
1
.. ..
. .
Q=
1
.
1
..
where the entries on the anti-diagonal are all 1 and all other entries are 0. We see that
Q2 = I. Thus Q1 = Q. Further,
Q1 J Q = Q J Q = (J )t .
Therefore, each Jordan block is similar to its transpose. Now, construct a matrix R
by putting such a matrix as its blocks matching the orders of each Jordan block in J.
Then it follows that R1 J R = Jt .
It also follows from the Jordan form that one can always choose m linearly independent generalized eigenvectors corresponding to the eigenvalue , where m is the
algebraic multiplicity of . Further, it is guaranteed that
if the linear system (A I)k x = 0 has r < m number of linearly independent solutions, then (A I)k+1 has at least r + 1 number of linearly
independent solutions.
This result is often more useful in computing the exponential of a matrix rather than
using explicitly the Jordan form, which is comparatively difficult to compute.
0 0 0
2 1 3
3
3
(a) 1 0 0 (b) 4
2 1 0
2
1 1
2. Let A be a 7 7 matrix with characteristic polynomial (t 2)4 (3 t)3 . In the
Jordan form of A, the largest block for each of the eigenvalues is 2. Show that
there are only two possible Jordan forms for A; and determine those Jordan
forms.
110
3. Let A be a 5 5 matrix whose first two rows are [0, 1, 1, 0, 1] and [0, 0, 1, 1, 1];
all other rows are zero rows. What is the Jordan form of A?
4. Determine the matrix P C33 such that P1 AP is in Jordan form, where A is
the matrix in Exercise 1(b).
5. Let A Cnn have an eigenvalue . Suppose the number mk for this eigenvalue are known for each k N. Show that for each j, both rank(A I) j and
null (A I) j are uniquely determined.
6. Let A, B Cnn . Show that A and B are similar iff they have the same eigenvalues and for each eigenvalue , for each k N, rank(A I)k = rank(B I)k .
7. Let A, B Cnn . Show that A and B are similar iff they have the same eigenvalues and for each eigenvalue , for each k N, null (A I)k = null (B I)k .
8. Let be an eigenvalue of a matrix A Cnn having algebraic multiplicity m.
Then prove that null ((A I)m ) = m.
9. Let J be a Jordan form of a matrix A Cnn . Let be an eigenvalue of A.
Show that the geometric multiplicity of as an eigenvalue of A is the same as
the geometric multiplicity of as an eigenvalue of J.
10. Let J be a matrix in Jordan form. Let be an eigenvalue of J. Show that the
geometric multiplicity of is equal to the number of Jordan blocks in J having
as the diagonal entries.
11. Conclude from previous two exercises that if is an eigenvalue of a matrix A
and J is the Jordan form of A, then the number of Jordan blocks with diagonal
entry in J is the geometric multiplicity of .
6.4
Given an m n matrix A with complex entries there are two hermitian matrices that
can be constructed naturally from it, namely, A A and AA . We wish to study the
eigenvalues and eigenvectors of these matrices and their relations to certain parameters associated with A. We will see that these concerns yield a factorization of A.
The hermitian matrix A A Cnn has only real eigenvalues. If R is such an
eigenvalue with an associated eigenvector v Cn1 , then A Av = v implies that
kvk2 = v v = v ( v) = v A Av = (Av) (Av) = kAvk2 .
Since kvk > 0, we see that 0. The eigenvalues of A A can thus be arranged in a
decreasing list
1 2 r r+1 = = n = 0
111
Canonical Forms
for some r, 0 r n. Notice that all of 1 , . . . , r are positive and the rest are all
equal to 0. Conventionally, each i is written as s2i for si R. Notice that in this
notation, an si may be positive, negative or zero. We first give a name to the square
roots of eigenvalues of A A and then relate the number r of positive eigenvalues
of A A with the rank of the matrix A. Of course, we could have started with the
eigenvalues of AA instead of A A.
Let A Cmn . Let s21 s2n be the n eigenvalues of A A. The non-negative
real numbers s1 , . . . , sn are called the singular values of A.
Theorem 6.9
Let A Cmn . Then rank(A) = rank(A A) = rank(AA ) = the number of positive
singular values of A.
Proof As linear transformations, A : Cn1 Cm1 , A : Cm1 Cn1 , and
A A : R(A) Cm1 . By the rank nullity theorem,
rank(A A) = dim (R(A A)) dim (R(A)) = rank(A).
For the other inequality, let v N(A A). That is, A Av = 0. Then v A Av = 0
implies that kAvk2 = 0 giving Av = 0. That is, N(A A) N(A). It implies that
null (A A) null (A). Notice that A A Cnn . Thus
rank(A) = n null (A) n null (A A) = rank(A A).
Combining both the inequalities, we obtain rank(A A) = rank(A).
Now, consider A instead of A. What we just proved implies that
rank(AA ) = rank((A ) A ) = rank(A ).
But rank(At ) = rank(A). Also, rank(A) = rank(A). Thus, rank(A ) = rank(A).
Therefore,
rank(AA ) = rank(A).
Notice that A A is hermitian. So, it is similar to the diagonal matrix
D = diag (s21 , . . . , s2r , s2r+1 , . . . , s2n )
with s21 s2r s2r+1 s2n . Since rank(A A) = rank(D) = r, we see that
s1 sr > 0 and sr+1 = = sn = 0. That is, A has exactly r number of
positive singular values.
Not only the number of positive eigenvalues of A A and of AA is same, but also
something more can be said about these eigenvalues.
Suppose > 0 is an eigenvalue of A A with an associated eigenvector v. Then
112
A = P Q , :=
Cmn .
0 0
Further, the columns of P are the eigenvectors of AA , that form an orthonormal basis of Cm1 , and the columns of Q are the eigenvectors of A A that form
an orthonormal basis of Cn1 .
Proof The matrix A has singular values s1 sr > 0, , 0. Thus, the
eigenvalues of AA and of A A are s21 s2r > 0, , 0. In case of AA , there
are m r number of zeros and in case of A A, there are n r number of zeros.
The matrix A A is hermitian.There exists an orthonormal basis {v1 , . . . , vn }
for Cn1 such that
A A vi = s2i vi for i = 1, . . . , r ;
A A v j = 0 for j = r + 1, . . . , n.
1
Avi .
si
Then,
1
1
AA Avi = s2i Avi = s2i ui .
si
si
That is, u1 , . . . , ur are the eigenvectors of AA associated with the eigenvalues
s21 , . . . , s2r , respectively. Further, since {v1 , . . . , vr } is an orthonormal set, for
i, j = 1, . . . , r, we have
AA ui =
uj ui =
1
1
1 2
si
(Av j ) (Avi ) =
v j (A Avi ) =
v j si vi = vj vi .
si s j
si s j
si s j
sj
113
Canonical Forms
si
ui Av j = 0
if 1 i = j r
if 1 i 6= j r
otherwise.
v
A
=
.
.
1
n
0
0
um
Next, take P as the matrix whose ith column is ui , and Q as the matrix whose
jth column is v j . We obtain
S 0
P AQ =
, S := diag (s1 , . . . , sr ).
0 0
Since P Cmm has orthonormal columns, it is unitary. Similarly, Q is unitary.
That is, P P = PP = I and Q Q = QQ = I. Multiplying P on the left and Q
on the right, we obtain the required factorization of A.
In the singular value decomposition of a matrix A, the columns ui in P are the
eigenvectors of AA ; and these are called the left singular vectors of A. Analogously,
the columns v j in Q are the eigenvectors of A A, and are called the right singular
vectors of A.
Observe that the columns r + 1 onwards in the matrices P and Q in the product
PAQ produce the zero blocks. Thus taking
P = u1 ur , Q = v1 vr ,
we see that a simplified decomposition of A can also be given. It is as follows:
Q .
A = PS
Such a decomposition is called the tight SVD of the matrix A.
114
In the tight SVD, A Cmn , P Cmr , S Crr and Q Crn are matrices each
and C = SQ to obtain
of rank r. Write B = PS
A = B Q = PC,
where B Cmr and C Crn have rank r. It shows that each m n matrix of rank
r can be written as a product of one m r matrix of rank r and a matrix of size
r n, which is also of rank r. Recall that this factorization is named as the full rank
factorization of a matrix.
Example 6.7
2 1
1 ,
To obtain SVD, tight SVD, and a full rank factorization of A = 2
4 2
we consider
24 12
A A=
.
12
6
2 1 2
/ 5
1 1 =
u1 = 130 Av1 = 130 2
/ 5
4 2
1
1 1 .
6
2
1
1
1
u1 = 16 1 , u2 := 12 1 , u3 = 13 1 .
2
0
1
115
Canonical Forms
1/ 2
1/ 3
1/ 6
2 1
30 0 2 1
/ 5 / 5
2
1 = 1/ 6 1/ 2 1/ 3 0
.
0 1
/ 5 2/ 5
1/ 3
2/ 6
0
4 2
0
0
For the tight SVD, P has the r columns as the the first r columns of P, Q has
the the r columns as the first r columns of Q, and S is the usual r r block
consisting of singular values of A as the diagonal entries. With r = rank(A) = 1,
we thus have the tight SVD as
1/ 6
2 1
2/5
2
1 = 1/ 6 30
.
1/ 5
2/ 6
4 2
In the tight SVD, using associativity of matrix product, we get the rank
factorizations as
1/ 6
5 2
2 1
2/ 6
5
/
1
2
1 =
/ 6
.
=
1/ 5
5
6
2/ 6
4 2
2 5
You should check that the columns of Q are eigenvectors of AA .
Like the tight SVD, another simplification can be done in SVD. Let A Cmn
with m n. Suppose A = PQ is an SVD of A. Let the ith row of Q be denoted by
vi C1n . Write
v1
..
P1 := P, Q1 = . Cmn , 1 = diag (s1 , . . . , sr , 0, . . . , 0) Cmm .
vm
Notice that P1 is unitary and the m rows of Q1 are orthonormal. In block form, we
have
Q = Q1 Q2 , = 1 0 ,
where Q2 = vm+1 vn . Then
A = PQ = P1 1
Q1
0
= P1 1 Q1 .
Q2
Similarly, when m n, we may curtail P accordingly. That is, suppose the ith
column of P is denoted by ui Cm1 . Write
P2 = u1 un Cmn , 2 = diag (s1 , . . . , sr , 0, . . . , 0) Cnn , Q2 := Q.
116
2 2
1 2
2
2
1
2
(a) 1 1 (b)
(c) 2 0 5 .
2 1 2
1
1
3 0
0
1 0
2 1
are similar but they have different
4. Show the matrices
and
1 1
1
0
singular values.
117
Canonical Forms
5. Show that a matrix A Cmn is of rank 1 iff there exist vectors u Cm1 and
v C1n such that A = uv.
6. Let A Cmn be
of rank r with positive singular values s1 , . . . , sr .
a matrix
S 0
Suppose A = P
Q is an SVD of A, where S = diag (s1 , . . . , sr ). Define
0 0
1
S
0
A = Q
P . Prove that A satisfies the following properties:
0 0
(AA ) = AA , (A A) = A A, AA A = A, A AA = A .
A is called the generalized inverse of A.
7. Let A Fmn . Prove that there exists a unique matrix A Fnm satisfying the
four identities mentioned in the previous exercise.
6.5
Polar decomposition
Square matrices behave like complex numbers in many ways. One example is a powerful representation of square matrices using a stretch and a rotation. This mimics
the polar representation of a complex number as z = rei . In this representation, r is
a non-negative real number, thus it represents the stretch; and ei is a rotation. Similarly, a square matrix can be written as a product of a positive semidefinite matrix
and a unitary matrix. The positive semidefinite matrix is a stretch and the unitary
matrix is a rotation. We slightly generalize the representation to any m n matrix.
A hermitian matrix P Fnn is called positive semidefinite iff x Px 0 for each
x Fn1 .
Recall that a matrix U has orthonormal rows iff UU = I; it has orthonormal
columns iff U U = I; and it is unitary iff its rows are orthonormal and its columns
are orthonormal iff UU = U U = I.
Theorem 6.11
(Polar decomposition) Let A Cmn . Then there exist positive semidefinite
matrices P Cmm , Q Cnn , and a matrix U Cmn such that
A = PU = UQ,
where P2 = AA , Q2 = A A, and U satisfies the following:
1. If m = n, then the n n matrix U is unitary.
2. If m < n, then the rows of U are orthonormal.
3. If m > n, then the columns of U are orthonormal.
118
UU = BE EB = BB = I.
Thus, U is unitary.
Clearly, both P and Q are hermitian. For the other properties of P and Q,
let x Cn1 . Then
x Px = x BDB x = (B x) D(B x).
Write B x := (a1 , . . . , an )t Cn1 . Then
x Px = (a1 , . . . , an )D(a1 , . . . , an )t = |a1 |2 s1 + + |ar |2 sr 0.
Therefore, P is positive semidefinite. Also,
P2 = BDB BDB = BDDB = BDE EDB = (BDE )(BDE ) = AA .
Similarly, it follows that Q is positive semidefinite and Q2 = A A.
(2) Let m < n. Write the n n matrix E in block form
E = E1 E2 ,
where E1 Cnm comprises the first m columns of E and E2 Cn(nm) comprises the rest of the columns. Since E is unitary, the columns of E1 are
orthonormal, that is, E1 E1 = I. Notice that E1 E1 need not be I. Further, write
D in block form with D1 Cmm as the matrix obtained from D by retaining
the first m columns and deleting the next n m columns. That is,
D = D1 0 , D1 = diag (s1 , . . . , sr , 0, . . . , 0).
Consequently,
DE = D1
E1
= D1 E1 ,
E2
ED = ED = (DE ) = (D1 E1 ) = E1 D1 .
119
Canonical Forms
Set U := BE1 and Q := E1 DE . Now, U Cmn , Q Cnn ; and
A = BDE = BE1 E1 DE = UQ.
We find that
UU = (BE1 )(BE1 ) = BE1 E1 B = BB = I.
with P := BD1 B .
A = PU = U Q.
Taking adjoint, and writing P := Q , Q := P and U := U , we obtain
A = UQ = PU,
where U Cmn has orthonormal columns, and P Cnn , Q Cmm satisfy
P2 = AA , Q2 = A A.
Notice that the proof of Theorem 6.11(2) is also valid when m n. Thus, the proof
of (1) is redundant as it would follow from (2) and (3). Also, (2) can be proved more
easily by using the thin SVD. Further, in the proof of (3) we have not constructed
the matrices P and Q explicitly. To unfold the proof, we start with A = BDE , which
yields A = ED B . It is in the form
A = B D E ,
B = E,
D = D ,
E = B.
Next, we follow the construction in (2) with m and n interchanged. It asks us to get
and D 1 as the first n columns of D,
and the first n
B 1 as the first n columns of B;
columns of E as E1 . Then A = PU = U Q with
U = B E1 ,
P = B D 1 B ,
Q = E1 D E
120
We also write B1 , E1 for the matrices formed by taking the first n columns of B, E,
respectively and D1 for the matrix formed by taking the first n rows of D.
Now, taking adjoint, we have A = PU = UQ with
U = U = E1 B = (first n columns of B)E = B1 E ,
P = Q = E D E1 = BDB1 ,
Q = P = B D 1 B = ED1 E .
With these U, P and Q, you can give a direct proof of (3) in Theorem 6.11.
The construction of polar decomposition from SVD may be summarized as follows:
If A Cmn has SVD as A = BDE , then A = PU = UQ, where
mn :
U = BE1 ,
P = BD1 B ,
Q = E1 DE .
mn :
U = B1 E ,
P = BDB1 ,
Q = ED1 E .
2 1
1 of Example 6.7. We had obtained its
Consider the matrix A = 2
4 2
SVD as A = BDE , where
1/ 2
1/ 3
1/ 6
30 0
2/ 5 1/ 5
B = 1/ 6 1/ 2 1/ 3 , D = 0
.
0 , E =
1/ 5
2/ 5
2/ 6
1/ 3
0
0
0
We follow the notation used in the proof of Theorem 6.11. Here, A C32 .
Thus Theorem 6.11(3) is applicable; see the discussion following the proof of
the theorem. We construct the matrices B1 by taking first two columns of B,
and D1 by taking first two rows of D, as in the following:
1/ 6
1/ 2
30 0
1
1
/ 6
/ 2 , D1 =
B1 =
.
0
0
2/ 6
0
Then
1
U = B1 E = 16 1
2
P = BDB1 = 5 1
2
2 + 3 1 + 23
1
1
2 1
= 130 2 + 3
3
1 + 2 3 ,
1 2
5
0
4
2
1 1 2
0
1
1
1
2
0
1 2 ,
= 56 1
3
3 0
6
0
2 2 4
121
Canonical Forms
2
Q = ED1 E = 6
1
0 1 2
1
0
5
1
=
2
6
5
4 2
.
2
1
1 1
2
2 + 3 1 + 23
2 1
1
1 2 2 + 3
1 = A.
PU = 56 1
1 + 2 3 = 2
30
2 2
4
4 2
4
2
2 + 3 1 + 23
2 1
6
4
2
1 = A.
= 2
UQ = 130 2 + 3
1 + 2 3
2
1
5
4 2
4
2
Short Bibliography
[1] S. Axler, Linear Algebra Done Right, Springer Int. ed., Indian Reprint, 2013.
[2] R.A. Brualdi, The Jordan canonical form: an old proof, The American Mathematical Monthly, 94:3 (1987), 257-267.
[3] S. D. Conte, C. de Boor, Elementary Numerical Analysis: An algorithmic approach, McGraw-Hill Book Company, Int. Student Ed., 1981.
[4] J. W. Demmel, Numerical Linear Algebra, SIAM Pub., Philadelphia, 1996.
[5] A. F. Filippov, A short proof of the theorem on reduction of a matrix to Jordan
form, Vestnik Moskov Univ. Ser. I Mat. Meh. 26:2 (1971), 1819. MR 43 No.
4839.
[6] F. R. Gantmacher, Matrix Theory, Vol. 1-2, American Math. Soc., 2000.
[7] G. H. Golub, C. F. Van Loan, Matrix Computations, Hindustan Book Agency,
Texts and Readings in Math. - 43, New Delhi, 2007.
[8] P. R. Halmos, Finite Dimensional Vector Spaces, Springer Int. Ed., Indian
Reprint, 2013.
[9] J. Hefferon, Linear Algebra, http://joshua.smcvt.edu/linearalgebra, 2014.
[10] R. Horn, C. Johnson, Matrix Analysis, Cambridge University Press, New York,
1985.
[11] K. Janich, Linear Algebra, Undergraduate Texts in Math., Springer, 1994.
[12] S. Kumaresan, Linear Algebra: A geometric approach, PHI, 200.
[13] S. Lang, Introduction to Linear Algebra, 2nd Ed., Springer-Verlag, 1986.
[14] D. Lewis, Matrix Theory, World Scientific, 19191.
[15] C. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM Pub., 2000.
[16] R. Piziak, P.L. Odell, Matrix Theory: From Generalized Inverses to Jordan
Form, Chapman and Hall / CRC, 2007.
[17] G. Strang, Linear Algebra and its Applications, 4th Ed., Cengage Learning,
2006.
123
Index
adjoint of a matrix, 12
adjugate, 21
algebraic multiplicity, 95
angle between vectors, 66
free variable, 39
full rank factorization, 62, 114
Gaussian elimination, 42
geometric multiplicity, 95
Gram-Schmidt orthogonalization, 68
Gram matrix, 72
basic variable, 39
basis, 49
best approximation, 71
Homogeneous system, 37
Cayley-Hamilton, 79
change of basis matrix, 59
characteristic polynomial, 78
co-factor, 21
column rank, 35
column vector, 3
combination similarity, 97
complex conjugate, 12
complex eigenvalue, 78
conjugate transpose, 12
consistent system, 38
coordinate vector, 57
identity matrix, 5
inner product, 65
Jordan block, 98
Jordan form, 98
least squares, 74
linearly dependent, 28
linearly independent, 28
linear combination, 18, 27
linear map, 54
Linear system, 36
Determinant, 20
diagonalizable, 91
diagonalized by, 91
diagonal entries, 4
diagonal matrix, 4
diagonal of a matrix, 4
dilation similarity, 97
dimension, 50
Matrix, 3
augmented, 23
entry, 3
hermitian, 82
inverse, 9
invertible, 9
lower triangular, 5
multiplication, 7
multiplication by scalar, 6
normal, 92
order, 4
orthogonal, 82
real symmetric, 82
size, 4
skew hermitian, 82
eigenvalue, 77
eigenvector, 77
elementary matrix, 13
elementary row operation, 14
equal matrices, 4
equivalent matrices, 61
124
125
Short Bibliography
skew symmetric, 82
sum, 6
symmetric, 82
trace, 20
unitary, 82
minor, 21
Spectral theorem, 92
standard basis, 5
standard basis vectors, 5
subspace, 46
super-diagonal, 4
system matrix, 37
norm, 66
null space, 55