Vous êtes sur la page 1sur 31

Notes on Linear Algebra

Claudio Bartocci

Contents

1 Prerequisites, assumptions, and notation 2

2 Linear operators 3

3 Upper triangularization 5

4 The Cayley-Hamilton theorem 7

5 Jordan-Chevalley decomposition S 10

6 Hermitian vector spaces and self-adjoint operators 17

7 Unitary operators and polar decomposition 23

8 Euclidean vector spaces and symmetric operators 26

2012 Claudio Bartocci


2 Notes in linear algebra

1 Prerequisites, assumptions, and notation

Would you tell me, please, which way I ought


to go from here?
That depends a good deal on where you want
to get to, said the Cat.
Alices Adventures in Wonderland, Chapter 6

The reader is supposed to be reasonably well acquainted with:

elementary matrix theory;1


basics of vector spaces.2

Throughout these notes3 we shall adopt the following assumptions and notation:

K is a subfield of the field of complex numbers C, unless otherwise stated;


all K-vector spaces are finite-dimensional over K (i.e., dimV = dimK V < +); for
a generic vector space V , it is often tacitly understood that its dimension is n (rather
obviously the dimension of Vi will be n i );
the set of all vector space homomorphisms A : V W , endowed with its natu-
ral structure of K-vector space, is denoted by HomK (V,W ); the dual of V is V =
HomK (V, K);
the natural bilinear pairing , : V V K is defined by letting , v = (v) for
any V , v V (since this pairing is nondegenerate, it provides a canonical iso-
morphism of V with its double dual, V = V ; with this identification in mind, we
write v, = , v);
the transpose of a matrix M is denoted by M T ; vectors are usually indicated by
bold italic lowercase letters (u, v, ..., x, ..., ei , f j , ..., , , ...); homomorphisms by
bold uppercase letters (A, B, ..., L, ..., , ...).

Finally, a few special symbols will be used:

 = Definition;
 = Remark (or Example);
 = Historical remark;
S = Supplementary (sometimes more advanced) material;
! = Try to check it!
 = Try to prove it!
1 Main topics: operations of sum and product; determinant; trace; rank; eigenvectors and eigenvalues;
characteristic polynomial.
2 Main topics: homomorphisms; bases; rank-nullity theorem (i.e., for any homomorphism A : V W
between finite-dimensional vector spaces V and W , one has dim Ker A + dim Im A = dimV ); inner prod-
ucts and Gram-Schmidt orthogonalization.
3 The author wishes to thank Valeriano Lanza for his helpful remarks.
Section 2 3

2 Linear operators

A homomorphism (or, more precisely, an endomorphism) A : V V will be called a


linear operator (or simply an operator). The K-vector space EndK (V ) = HomK (V,V )
can be given a structure of (associative, noncommutative) K-algebra by taking as
product the composition of operators. The identity operator will be denoted by
I: V V .

A non-zero vector v is an eigenvector of A if there is K such that Av = v. We


say that is the eigenvalue of A belonging to the eigenvector v. It is not hard to
!
check that is uniquely determined by v ( ). Symmetrically, we say that K is an
eigenvalue of A if there exists a non-zero vector v such that Av = v. Given an eigen-
value K, we can consider the subspace V of V generated by all eigenvectors of
A having as eigenvalue; we call V the eigenspace of A associated with .

Lemma 2.1. Let A : V V be an operator. Assume that A admits an eigenspace V


associated with some K. Then every non-zero element of V is an eigenvector of
A having as eigenvalue. If there exist eigenspaces V1 , V2 associated to distinct
eigenvalues 1 , 2 , then any two non-zero vectors v 1 V1 and v 2 V2 are linearly
independent.

Proof. 
 It is not difficult to find examples of operators having no eigenvectors at all. For
such that J = I (!). Notice that under these hypotheses the dimension of V has
instance, this is the case when V is a real vector space and J : V V is an operator
2

to be even. However, when K = C, every operator has at least one eigenvector, as we


shall see in Theorem 2.3.

Every operator L : Kn Kn is uniquely associated with an nn matrix M (L) whose


entries belong to K; actually, given any vector x = (x 1 , . . . , x n ), the entries M (L)i j of
M (L) are defined by the assignment

M (L)11 M (L)1n x1

.. .. .. Pn Pn
Lx = . = i =1 x i M (L)1i , . . . , i =1 x i M (L)ni .

. .
M (L)n1 M (L)nn x n

A basis E = {e1 , . . . , en } for V yields an isomorphism E : V Kn ; the vector v =


Pn
i =1 v i ei is mapped to the n-tuple (v 1 , . . . v n ) of its components. So, the operator
A : V V can be represented by the matrix M (A, E ) associated with the operator
E AE1 : Kn Kn . In practice, the entry M (A, E )i j is equal to the component along
ei of the vector A(e j ).
4 Notes in linear algebra

 Let A : V V be an operator. The dual operator (or conjugate operator) of A


is the operator A : V V defined by the assignment A , v = , Av for any
V , v V . Let E = {e1 , . . . , en } be a basis for V , and let E = {e1 , . . . , en } be its
dual basis for V (this basis is determined by the requirement that ei , e j = i j ). If
M (A, E ) is the matrix representing A with respect to the basis E , then M (A, E )T is the
matrix representing A with respect to the basis E ( ). !
If another basis F = {f 1 , . . . , f n } for V is assigned, it is clear that the matrix M (A, F )
will be, in general, different from M (A, E ). However, the two matrices are always
similar (i.e., M (A, F ) = N 1 M (A, E )N , where N is an invertible matrix), as it is easy
!
to check ( ); hence, they have the same trace, the same determinant, and the same
characteristic polynomial.4 The following definition is therefore well-posed.

 The determinant det A and the trace tr A of the operator A : V V are defined,
respectively, as the trace and the determinant of the matrix M (A, E ), for any arbitrary
choice of a basis E for the vector space V . The characteristic polynomial of A is the
polynomial A (X ) = det(X I A) K[X ].

 For any matrix M (A, E ) representing the operator A : V V one has


rank(M (A, E )) = dim Im A (!) .

Theorem 2.2. Let A : V V be an operator. Then, K is an eigenvalue of A if and


only if is a root of the characteristic polynomial A (X ).

Proof. For any given K, A () = det(IA) = 0 if and only if the operator IA is


not an isomorphism. Now, if I A is not an isomorphism, then its kernel contains
a non-zero vector v such that (I A)v = v Av = 0. Hence, is an eigenvalue of
A belonging to the eigenvector v. Conversely, if Av = v for some non-zero vector v,
then I A is not invertible. N

As straightforward implications of Theorem 2.2 we deduce the following two re-


sults.

Theorem 2.3. Let V be a complex vector space. Any operator A : V V has at least
one eigenvector.

Proof. The field C is algebraically closed. So, the characteristic polynomial A (X )


has at least one root . Theorem 2.2 implies that there exists a corresponding non-
zero vector v 1 V such that Av 1 = v 1 . N

An eigenvalue of A is said to have multiplicity k if is a root of multiplicity k


of A (X ). When A : V V an operator on a complex vector space of dimension
n, its distinct eigenvalues 1 , . . . s , 1 s n, have multiplicities k 1 , . . . k s such that
4 Recall that the characteristic polynomial of a matrix M is (X ) = det(X I M ). For the matrix M 0 =
M
N M N one has M 0 (X ) = det(X I N 1 M N ) = det(X N 1 N N 1 M N ) = det N 1 (X I M )N = M (X ).
1
So, two similar matrices have the same characteristic polynomial.
Section 3 5

k1 + + k s = n ( !).
Corollary 2.4. If the characteristic polynomial A (X ) of an operator A : V V has
n distinct roots 1 , . . . , n , then the corresponding eigenvectors v 1 , . . . , v n constitute a
basis V for V . The matrix M (A, V ) representing the operator A with respect to that
basis is diagonal, namely

1 0 0

0 2 0
M (A, V ) = . .. .

.. ..
.. . . .
0 0 n

Proof.  (Hint: apply Lemma 2.1). N

 Let V be a K-vector space of dimension n. An operator A : V V is said to


be nilpotent if there is a non-negative integer r such that Ar = 0. If A is a nilpotent
operator, then it is a simple matter to check that IA is invertible (indeed, its inverse
is I + A + A2 + + Ar 1 )); therefore, for any t K, the operator t I A is invertible as
well. It follows that A (X ) = det(X IA) has no non-zero roots. One can show that the
converse is true as well. Let M = M (A, E ) be the matrix representing A w.r.t. a basis
E for V . It is clear that A is nilpotent if and only if M is a nilpotent matrix. We can
think of M as a complex matrix (via the immersion K , C). Now, if C is a non-
zero root of M (X ) and z Cn is an eigenvector associated with , then M k z = k z
for all k > 0; this implies that M is not nilpotent. In conclusion, A is nilpotent if and
only it has only the eigenvalue 0 with multiplicity n..

3 Upper triangularization

When K = C, the fundamental theorem of algebra ensures that the characteristic


polynomial A (X ) has precisely n roots, 1 , . . . , n . If those roots are all distinct, then
we can apply the previous Corollary to diagonalize (the matrix representing) the op-
erator A; but, if some root has multiplicity greater than 1, there is no hope to find
a basis of eigenvectors providing a diagonalization of A. To have a simple example
0 1
in mind, just think of the operator L : C2 C2 defined by the matrix , whose
0 0
2
characteristic polynomial is L (X ) = X (the same as that of the zero operator). So,
we have to devise a different strategy in order to get a matrix suitable to represent
linear operators in a nice and effective way.

Let us come back to the general case of an operator A : V V , where V is a vector


6 Notes in linear algebra

space over a subfield K C. A subspace W V is called an A-invariant space if


AW W . For instance, every eigenspace V of A is an A-invariant subspace (but, of
course, the converse is not true!).

 A set {V , . . . ,V } of subspaces of V is an A-invariant complete flag in V if the fol-


1 n
lowing two conditions are satisfied:
1) the set {V1 , . . . ,Vn } is a complete flag in V , i.e., Vi Vi +1 and dimVi = i (so, Vn = V );
2) each Vi is an A-invariant subspace of V .

Lemma 3.1. If the operator A : V V admits an A-invariant complete flag in V , then


there is a basis E = {e1 , . . . , en } for V such that the corresponding matrix M (A, E ) rep-
resenting A is upper triangular.

Proof. Let {V1 , . . . ,Vn } be an A-invariant complete flag. It is possible to choose a basis
E = {e1 , . . . , en } for V such that {e1 , . . . , ei } is a basis for Vi for all i = 1, . . . , n: actually,
it is enough to pick a generator e1 of V1 , to complete it to a basis {e1 , e2 } for V2 , and
to proceed inductively in the same way (recall that Vi Vi +1 ). Now, by the very
definition of A-invariant subspace, one has:

Ae1 = a 11 e1
Ae2 = a 12 e1 + a 22 e2
..
.
i
X
Aei = a ki ek . (3.1)
k=1

Hence, the matrix M (A, E ) representing A with respect to the basis E is upper trian-
gular, namely
a 11 a 12 a 1n

0 a 22 a 2n
M (A, E ) = . .. . (3.2)

.. ..
.. . . .
0 0 a nn
N

In the case when K = C we can now prove a fundamental result.

Theorem 3.2. Let V be a complex vector space. Any operator A : V V can be repre-
sented by a triangular matrix.

Proof. In view of Lemma 3.1 we have just to find an A-invariant complete flag in
V . We proceed by induction on the dimension n of V . The case n = 1 is completely
trivial, and we assume the inductive hypothesis: if U is a complex vector space of
dimension n 1, then any operator B : U U admits a B-invariant complete flag.
We pass now to study the case of an operator A : V V , where dimV = n. By Corol-
lary 2.3 (where we use the fact that K = C!) A has at least one eigenvector v 1 ; so,
Av 1 = v 1 , for some C. Let V1 be the subspace generated by v 1 ; we decompose
V into the direct sum V = V1 U , where dimU = n 1, and consider the canonical
Section 4 7

projections 1 : V V1 , 2 : V U , which are both linear operators. After restrict-


ing 2 A to U , we get an operator 2 A : U U that satisfies the inductive hypothesis.
Let {U1 , . . . ,Un1 } be a 2 A-invariant complete flag in U : we define the subspaces
Vi +1 = V1 Ui for i = 1, . . . , n 1. It is clear that {V1 , . . . ,Vn } is a complete flag in V . In
order to complete the proof we have to show that each Vi +1 is A-invariant. We can
uniquely decompose any v Vi +1 , i 1, into the sum v = av 1 + ui , with ui Ui . One
has that
V1 Ui by inductive hyp.
z }| { z }| {
Av = ( 1 + 2 ) Av = 1 Av + a 2 Av 1 + 2 Aui ,
| {z } | {z }
=I = a 2 v 1 = 0

so that Av Vi +1 . In conclusion, {V1 , . . . ,Vn } is an A-invariant complete flag. N

4 The Cayley-Hamilton theorem

Theorem 3.2 can be exploited to give a relatively elementary proof of the classical
Cayley-Hamilton theorem. Let A (X ) = nk=0 s k (A)t k , with s k (A) K (and s n (A) = 1),
P

be the characteristic polynomial of an operator A : V V ; we define the operator


n
A (A) = s k (A)Ak
X
k=0

(take in mind that A0 = I).

Theorem 4.1 (Cayley-Hamilton). Let V be a complex vector space. For any operator
A : V V one has A (A) = 0.

Proof (after [10]). By Theorem 3.2, after choosing an A-invariant complete flag
{V1 , . . . , . . . ,Vn } in V , the operator A is represented with respect to an adapted basis
E = {e1 , . . . , en } (cf. Lemma 3.1) by an upper triangular matrix of the form shown in
Eq. (3.2). Then the characteristic polynomial of A is readily expressed by the formula

X a 11 a 12 a 1n

0 X a 22 a 2n n
A (X ) = det
Y
= (X a kk ) . (4.1)

.. .. .. ..
. . . . k=1
0 0 X a nn

Thus, one has A (A) = nk=1 (A a kk I). In order to prove that A (A) = 0 it suffices to
Q

show that A (A)v = 0 for all vectors v V . We do that by proving that

j
Y
j = 1, . . . , n v V j (A a kk I)v = 0 .
k=1
8 Notes in linear algebra

We proceed by induction on j . When j = 1, any v V1 is of the form v = bv 1 , where


v 1 is the generator of V1 and, by construction, an eigenvector of A whose eigenvalue
is a 11 : therefore, (Aa 11 I)v = b(Av 1 a 11 v 1 ) = 0. Now, we assume that the statement
holds true for j 1, and we prove it for j . Any v V j is of the form v = w + cv j , with
w V j 1 (again by construction). Notice that, by Eq. 3.1, we have that

j
X jX
1
(A a j j I)v j = ak j vk a j j v j = a k( j 1) v k ,
k=1 k=1

so that (A a j j I)cv j = u1 V j 1 . On the other hand, as w V j 1 , it is clear that


(A a j j I)w = Aw a j j w = u2 lies in V j 1 . Putting all together, it turns out that

j
Y jY
1
(A a kk I)v = (A a kk I)(u1 + u2 ) ,
k=1 k=1

Qj
with (u1 + u2 ) V j 1 . So, by the inductive hypothesis, we get k=1
(A a kk I)v = 0 for
any v V j . Hence, A (A) = 0. N

It should be clear that the Cayley-Hamilton theorem holds true not just in the
complex case, but for any field K C. Let A : V V be an operator, V being an
n-dimensional K-vector space. After selecting a basis for V , the operator A is repre-
sented by a matrix M and A = M . We can view M as an operator M : Cn Cn and
apply the previous theorem (of course, M = M ). In conclusion, we have A (A) = 0.

 It seems that William Rowan Hamilton (1805-1865) was the first to prove a spe-
cial case of the theorem that today we call Hamilton-Cayley theorem. But the for-
malism of his Lectures on Quaternions (1853) is so perplexingly obscure that for a
modern reader is difficult even to spot the page where the result is stated. Five years
later, Arthur Cayley (1821-1895), in his Memoir on the theory of matrices (Philo-
sophical Transactions of the Royal Society of London, 148 (1858), pp. 17-37), provided
a fairly complete algebraic theory of matrices. In this paper he claimed that the de-
terminant having for its matrix a given matrix less the same matrix considered as a
single quantity involving the matrix unity, is equal to zero [4, p. 482]. Cayley ver-
ified the result by hand for 2 2 and 3 3 matrices but he thought it unnecessary
to undertake the labour of a formal proof of the theorem in the general case of a
matrix of any degree [ibid., p. 483]. It was only in 1878 that Ferdinand Georg Frobe-
nius (1849-1917), seemingly unaware of Cayleys work, gave a complete proof of the
theorem in his paper ber lineare Substitutionen und bilineare Formen [5].

 Let V be an n-dimensional K-vector space. By Theorem 3.2, any operator A : V


V can be represented by an n n upper triangular matrix M (of course, the entries
of M may fail to lie in K; in general, they belong to C). If we use M (cf. Eq. (4.1)) to
compute the characteristic polynomial of A, one finds out that
n n
A (X ) = s k (A)Ak =
X Y
(X a kk ) =
k=0 k=1
= X 1 (a 11 , . . . , a nn )X n1 + 2 (a 11 , . . . , a nn )X n2 +
n

+ (1)n1 n1 (a 11 , . . . , a nn )X + (1)n n (a 11 , . . . , a nn ) ,
Section 4 9

where j (a 11 , . . . , a nn ) is the j -th elementary symmetric polynomial of a 11 , . . . , a nn .5


In particular, we immediately get

s n (A) = 1 (as we already know)


s n1 (A) = 1 (a 11 , . . . , a nn ) = (a 11 + + a nn ) = tr(A)
s 0 (A) = (1)n n (a 11 , . . . , a nn ) = (1)n (a 11 a nn ) = (1)n det(A) .

To come up with an intrinsic equation for all the polynomials j (a 11 , . . . , a nn ) is less


obvious. It can be shown that

s k (A) = nk (a 11 , . . . , a nn ) = (1)k tr( k A) ,


^

where k A : k V k V is the induced linear operator between the k-th compo-


V V V
V
nent of the exterior algebra V of V (see [6, pp. 117-118] and [3, A III.107]). Clearly,
if the operator A is diagonalizable with eigenvalues 1 , . . . , n one has that s k (A) =
nk (1 , . . . , n ).

S Algebraic proof of the Cayley-Hamilton theorem. A quite serious drawback of


the proof of Theorem 4.1 we have supplied is that the algebraic closeness of C seems
to play there an essential role. This is misleading, because the Cayley-Hamilton the-
orem is true for matrices with entries in any commutative ring. So, it can be instruc-
tive to provide a purely algebraic proof valid in the general case (the proof is adapted
from [8, p. 101]).
First, we recall the definition of adjoint matrix (this has nothing to do with the no-
tion of adjoint operator we shall introduce later on). Let M be an n n matrix with
entries in a commutative ring R. Let C i j be the (i j )-th minor of M ; define adj(M )
as the matrix whose entries are M i j = (1)i + j C j i (note the swap of indexes). The
following identity can be proved directly from the definition of determinant:

M adj(M ) = adj(M )M = det(A)I . (4.2)

Take now the matrix t I M where t is an indeterminate over the ring R. Each entry
of adj(t I M ) is of degree n 1 in t , so that we can write

n1
(t I )k Nk
X
adj(t I M ) =
k=0

for suitable matrices N0 , . . . , Nn1 whose entries are of degree 0 in t . The identity

5 By definition, in the product

n
(X t k ) = X n 1 X n1 + 2 X n2 + + (1)n n
Y

k=1

the polynomials j = j (t 1 , . . . , t n ) are called the elementary symmetric polynomials of t 1 , . . . , t n . It is


easy to verify that each j is homogeneous of degree j in t 1 , . . . , t n . Actually, one has

j (t 1 , . . . , t n ) =
X
tl 1 tl j .
1l 1 <<l j n

For more details see e.g. [9, pp. 204-206].


10 Notes in linear algebra

(4.2) reads as follows:


n1
(t I )k Nk =
X
(t I M ) adj(t I M ) = (t I M )
k=0
n1
= (t I )n Nn1 + (t I )k (Nk1 M Nk ) M N0 =
X
k=0
n
t k s k (M )I .
X
= det(t I M )I =
k=0

By comparing the summands having same degree in t , we get a set of n+1 equations:

Eq. E0 :
M N0 = s 0 (M )I


.. ..
. .


Eq. Ek : Nk1 M Nk = s k (M )I
.. ..






. .
Eq. En : Nn1 = I .

By multiplying on the left Eq. Ek by M k and them summing up the resulting equa-
tions we get
sum of the RHSs of Eqs. E0 , ..., En
z }| {
n1
M (M ) = M n + s k (M )M k + s 0 (M )I =
X
k=1
n1
= M n Nn1 + (M k Nk1 M k+1 Nk ) M N0 = 0 .
X
k=1
| {z }
sum of the LHSs of Eqs. E0 , ..., En : its a telescopic sum!

Thus, we have proved that M (M ) = 0.

5 Jordan-Chevalley decomposition S

In most of this section we assume that K is an arbitrary field (not necessarily a sub-
field of C). Let V be a K-vector space of finite dimension > 0. An operator A : V V
induces the ring homomorphism

A : K[X ] K[A] EndK V .


p(X ) 7 p(A)

The ring K[A] is the subring of EndK V generated by A; it is a commutative ring, be-
cause A j Ak = Ak A j for all non-negative integers j , k. The kernel of A is a non-zero
Section 5 11

ideal in K[X ]; indeed, the characteristic polynomial A (X ) of A belongs to Ker A .


Since K[X ] is a principal ideal domain, Ker A is generated by a unique monic poly-
nomial A (X ), that is called the minimal polynomial of A. It is clear that the minimal
polynomial A (X ) divides the characteristic polynomial A (X ), so that deg A (X )
deg A (X ); in Theorem 5.9 we shall see that the two polynomials have the same irre-
ducible factors.

The key point is that the homomorphism A induces a K[X ]-module structure on
V . Actually, for any p(X ) = X m + a m1 X m1 + + a 1 X + a 0 K[X ] and for any v V ,
we set
p(X ) A v = p(A)v = Am v + a m1 Am1 v + + a 1 A + a 0 v .

Lemma 5.1. Under the previous assumptions, any K[X ]-submodule W V is an


A-invariant subspace, and, conversely, any A-invariant subspace W V is a K[X ]-
submodule.

Proof. First, we observe that any K[X ]-submodule W V is a vector subspace of V :


indeed, any a K can be thought of as a constant polynomial. Since K[X ] is gener-
ated by X over K, W is a K[X ]-submodule if and only if X A W W ; this condition
is equivalent to the fact that W is A-invariant, because X A v = Av. N

 Suppose V is given a structure of K[X ]-module: so, for any p(X ) K[X ] and for
any v V , one can perform the multiplication p(X ) v. Then, the multiplication by

!
X induces a linear operator A : V V , that is defined by setting Av = X v for all v V .
It is straightforward to check ( ) that the K[X ]-module structure on V induced by
A coincides with the original one.

To proceed further we need to exploit a fundamental result in commutative alge-


bra, namely, the structure theorem for finitely generated modules over principal ideal
domains. For a proof of this theorem we refer to [10, Chap. XV, 2], or to [1, Chap. 6].

Theorem 5.2. Let M be a finitely generated module over a principal ideal domain
R. Then M is isomorphic to the direct sum of a finite number of cyclic R-modules6
C 1 , . . . ,C q and a free R-module L. More precisely, M admits the following two decom-
positions:
invariant factor decomposition: M ' R/(d 1 ) R/(d k ) R r , where r is a non-
negative integer and d 1 , . . . , d k are elements of R that are not units and not zero and
such that d i divides d i +1 for all i = 1, . . . , k 1;
s s
primary decomposition: M ' R/p11 R/phh R r , where r is a non-negative
integer (the same as above), p1 , . . . , ph are (not necessarily distinct) prime ideals in R,
and s 1 , . . . , s h non-negative integers.

Notice that the elements d 1 , . . . , d k in the invariant factor decomposition are un-
ambiguously determined up to multiplication by units. They are called the invariant
factors for the module M . The R-modules R/(d i ) are of course not necessarily inde-
6 A cyclic R-module N is a module generated by a single element, say x. This is equivalent to say that
N ' R/I, where I is the ideal of elements of R annihilating x, i.e., I = {a R | ax = 0}. In our case, R is a
principal ideal domain, so that I = (d ), where d is uniquely determined up to multiplication by a unit.
12 Notes in linear algebra

sj
composable; on the contrary, the R-modules R/p j of the primary decomposition
are indecomposable. It should be emphasized that, for a module M over a principal
ideal domain R, the existence of the primary factor decomposition is a consequence
of the existence of the invariant factor decomposition, and conversely.

 When the ring R is Z, Theorem 5.2 corresponds to the structure theorem for
finitely generated abelian groups (which, by the way, can be proved in a more ele-
mentary way). Actually, any such group G admits a primary decomposition

G ' (Z/q 1 Z) (Z/q s Z) Zr ,

where the integers q 1 , . . . q s are powers of (not necessarily distinct) prime numbers
p 1 , . . . p s , and an invariant factor decomposition

G ' (Z/d 1 Z) (Z/d k Z) Zr ,

where d i divides d i +1 for all i = 1, . . . , k 1. The two decompositions can be easily


Q sj
transformed into each other (in fact, if p j are primes and d = p j , then one has
sj
Z/d Z ' Z/p j Z).
L

Theorem 5.2, when applied to our case, yields the following result.

Theorem 5.3. Let A : V V be a linear operator. If V is endowed with to the K[X ]-


module structure induced by A, it admits a direct sum decomposition

V = V1 Vk ,

where Vi is a cyclic K[X ]-module isomorphic to K[X ]/(q i (X )). The q 1 (X ), . . . , q k (X )


are monic non-constant polynomials uniquely determined by the pair (V, A); more-
over, q i (X ) divides q i +1 (X ) for all i = 1, . . . , k 1. Finally, q k (X ) is the minimal poly-
nomial A (X ) of A.

Proof. First, we may make use of Theorem 5.2 because K[X ] is a principal ideal do-
main (actually, an Euclidean domain) and V is finitely generated over K[X ] (this fol-
lows immediately from the fact that V is a finite-dimensional vector space). Next,
it is clear that in the invariant factor decomposition of V there is no free direct
summand; in fact, A (X ) A v = 0 for all v V . So, in view of Theorem 5.2, V =
V1 Vk , where Vi is a cyclic K[X ]-module isomorphic to K[X ]/(q i (X )), where
q 1 (X ), . . . , q k (X ) are non-constant, non-zero polynomials such that q i (X ) divides
q i +1 (X ) for all i = 1, . . . , k 1; each polynomial q i (X ) is uniquely determined up to
multiplication by a unit, so we can assume it is monic. To prove the last statement
we notice that the polynomial q i (X ) annihilates all vectors in Vi . As q k (X ) is a non-
zero multiple of q i (X ) for all i = 1, . . . k 1, it follows that q k (X ) A v = 0 for all v V ;
equivalently, q k (A) = 0. Thus, the minimal polynomial A (X ) divides q k (A) (and they
are both monic). But no polynomial of lower degree than q k (A) can annihilate the
all of Vk , so that A (X ) = q k (A). N

In order to obtain some convenient matrix expression for the operator A : V V


we have to find out suitable bases for the direct summands Vi occurring in the in-
variant factor decomposition presented in Theorem 5.3. We start by studying the
Section 5 13

case of an operator B : W W such that W , endowed with the K[X ]-module struc-
ture induced by B, is isomorphic to K[X ]/(p(X )), where p(X ) is a polynomial of de-
gree m = dimW . We make the identification W = K[X ]/(p(X )); as usual, we denote
by [ f (X )] the class of f (X ) in the quotient.

Lemma 5.4. Let w 0 = [1]. The elements

w0 , w 1 = X B w 0 , w 2 = X B w 1 , . . . , w m1 = X B w m2

constitute a basis for W = K[X ]/(p(X )) as a K-vector space.

Proof. Any linear combination m1 i =0 i w i , with i K, can be equivalently writ-


P

ten in the form f (X ) B w 0 , i.e., f (X )[1], with f (X ) = m1 X m1 + + 0 . So, if


Pm1
i =0 i w i = 0, then f (X )[1] = [ f (X )] = 0; but this is absurd, as deg f (X ) < deg p(X ).
So, w 0 , w 1 , . . . , w m1 are linearly independent over K. To show that they generate
W , it is enough to observe that any w W can be thought of as a class [g w (X )]
K[X ]/(p(X )). Since K[X ] is an Euclidean domain, the polynomial g w (X ) can be fac-
tored as the product g w (X ) = h(X )p(X ) + r w (X ), where r w (X ) has degree at most
m 1, i.e. r w (X ) = m1 X m1 + + 0 , with some i possibly zero. Of course,
[g w (X )] = [r w (X )]. Thus, we have w = m1 i =0 i w i .
P
N

Let p(X ) = X m + a m1 X m1 + + a 1 X + a 0 . The identity p(X ) B w 0 = 0 yields the


relation
X B w m1 = a m1 w m1 a 1 w 1 a 0 w 0 .

Since we have X B w = Bw for all w W , we easily get

Bw 0 = w 1 ,
Bw 1 = w 2 ,
...
Bw m2 = w m1 ,
Bw m1 = a m1 w m1 a 1 w 1 a 0 w 0 .

Thus, with respect to the basis {w 0 , w 1 , . . . , w m1 } the operator B is represented by


the m m matrix

0 0 0 0 a 0
1 0 0 0 a 1

0 1 0 0 a 2

. (5.1)
. ..
. .. ..
. . . .

0 0 1 0 a m2
0 0 0 1 a m1

Let us come back to the general case of a pair (V, A) with the associated invariant
factor decomposition V = V1 Vk described in Theorem 5.3; let n i = dimVi .
For each Vi there exists a basis V (i ) = {v 1(i ) , . . . , v n(ii) } such that A|Vi is represented by a
14 Notes in linear algebra

L
matrix F i of the type (5.1). Thus, A = A|Vi is represented by the block matrix

F1 0

0 F2 0
F = .

.. .. ..
.. . . .
0 Fk

Summing up, we have the following theorem.

Theorem 5.5. Let A : V V be an operator. There exists a basis for V such that A is
represented by a block matrix whose blocks are of the type (5.1).

The block matrix obtained in the previous Theorem is called the Frobenius canon-
ical form (aka rational canonical form) of the operator A. As Michael Artin puts it,
it isnt particularly nice, but it is the best form available for an arbitrary field [1,
p. 479].

According to Theorem 5.2 any finitely-generated K[X ]-module admits, besides its
invariant factor decomposition, also its primary decomposition. By simply rephras-
ing the last part of Theorem 5.2, we obtain the following result.

Theorem 5.6. Let A : V V be a linear operator. If V is endowed with to the K[X ]-


module structure induced by A, it admits a direct sum decomposition

V = W1 Wh ,
s
where Wi is a cyclic K[X ]-module isomorphic to K[X ]/pi i and p1 , . . . , ph are (not nec-
essarily distinct) prime ideals in K[X ]. Thus, each pi is generated by a unique monic
irreducible polynomial p i (X ) K[X ].

Proof. 
Corollary 5.7. The polynomials p 1 (X ), . . . , p h (X ) are the irreducible factors of the min-
imal polynomial A (X ).

!
Proof. By comparing the invariant factor decomposition of Theorem 5.3 with the
primary decomposition of Theorem 5.6, it turns out ( ) that
s s
p 1i (X ) p hh (X ) = q 1 (X ) q k1 (X )A (X ) .

Since each q i (X ) divides A (X ) and each p i (X ) is irreducible (by hypothesis), one


can conclude that p 1 (X ), . . . p h (X ) are the irreducible factors of A (X ). N

In order to take the best advantage of the primary decomposition obtained in The-
orem 5.6 we analyze more closely the case when K is algebraically close. Then, each
irreducible polynomial in K[X ] is a polynomial of degree 1 (actually, this statement
is equivalent to the fact that K is algebraically closed). Therefore, each p i (X ) in The-
orem 5.6 is of the form p i (X ) = X i , for some i K (recall that repetitions are
Section 5 15

possible!). As before, we start by considering the case of an operator B : W W such


that the vector space W , endowed with the K[X ]-module structure induced by B, is
isomorphic to K[X ]/((X )s ). We set w 0 = [1] and define

w 1 = (X ) B w 0 , w 2 = (X ) B w 1 , . . . , w s1 = (X ) B w s2 .

By using the same argument as in Lemma 5.4, it is again a simple matter to prove
that the set {w 0 , . . . , w s1 }s is a basis for W . Moreover, one has (X )s B w 0 = 0, so
that (X ) B w s1 = 0.

In terms of the operator B, we get the equations

Bw 0 = w 1 + w 0 ,
Bw 1 = w 2 + w 1 ,
...
Bw s2 = w s1 + w s2 ,
Bw s1 = w s1 .

We concludes that, with respect to the basis {w 0 , w 1 , . . . , w s1 }, the operator B is rep-


resented by the lower triangular s s matrix



0 0 0
1
0 0
. .. .. .. ..
. . .
. . .. (5.2)

..

.

0 0 0
0 0 1

By applying this procedure to each direct summand in the primary decomposition


V = W1 Wh associated with the pair (V, A), we obtain the Jordan-Chevalley
decomposition theorem.

Theorem 5.8 (Jordan-Chevalley). Let V be a finite-dimensional vector space over


an algebraically clodes field K. Let A : V V be an operator. There exists a basis for
V such that A is represented by a block matrix whose blocks are of the type (5.2).

Proof. 
 The block matrix in Theorem 5.8 is called the Jordan canonical form of the opera-
tor A : V V ; its blocks are usually named the Jordan blocks of A.

 A remarkable implication of Theorem 5.8 is that any operator A : V V (when


the base field is algebraically closed) can be decomposed into the sum

A = Ad + An ,

where the operator Ad is diagonalizable, An is nilpotent, and

Ad An = An Ad .
16 Notes in linear algebra

To show this, let J be the canonical Jordan form of A with respect to a basis W . It is
clear that J can be written in the form

J = M +K ,

where K is a strictly lower triangular matrix (hence, nilpotent) and M is a diagonal


matrix, whose diagonal entries are the roots of the minimal polynomial. Notice that
one has K = 0 only when all the roots of the minimal polynomial are distinct ( ).
By direct computation it is non hard to check that M and K commute,

K M = MK .

So, the statement is proved.


It is worthwhile to point out that the Jordan-Chevalley decomposition generalizes to
semisimple Lie algebras over an algebraically closed field (cf. [7]).

If V is a vector space over a field K not algebraically closed, then it may well hap-
pen that an operator A : V V cannot be put in Jordan canonical form. Nonethe-
less, one can get some useful information about A by regarding it as an operator on
a vector space V defined over an algebraic closure K of K (so, K is an extension of
K which is algebraic and algebraically closed; it is unique only up to isomorphisms
inducing the identity on K). As usual, we shall identify K with its image under the
immersion K , K.

For simplicitys sake, we assume V = Kn ; so the operator A : Kn Kn is uniquely


associated with a matrix A (with entries in K). It is clear that the matrix A determines
n n
an operator A : K K . We contend that the invariants q 1 (X ), . . . , q k (X ) of the pair
n
(Kn , A) are the same as the invariants of the pair (K , A); the cautionary quotation
marks surrounding the word same warn that in the first case q i (X ) is a polynomial
in K[X ], while in the second case is a polynomial in K[X ]. To prove our claim let us
consider the corresponding invariant factor decomposition of Kn ,

Kn = V 1 V k ,

where Vi ' K[X ]/(q i ). With respect to the basis W described in Lemma 5.4, the
operator A is put in Frobenius canonical form, i.e., it is represented by a block matrix
F of the form described in Theorem 5.5. This means that there is an invertible matrix
N with entries in K such that F = N 1 AN . Then, the matrix F when acting on vectors
n
in K represents the operator A (with respect to the basis W viewed as a basis for
n n
K ); the corresponding decomposition of K is
n
K = V1 Vk ,

where Vi ' K[X ]/(q i ). The claim then follows from the uniqueness of the invariant
factors.

We have observed in Corollary 5.7 that the polynomials p 1 (X ), . . . , p h (X ) occur-


ring in the primary decomposition of A are the irreducible factors (in K[X ]!) of the
minimal polynomial A (X ) = q k (X ); moreover, one has
s s
p 1i (X ) p hh (X ) = q 1 (X ) q k1 (X )A (X ) , (5.3)
Section 6 17

where the exponents s 1 , . . . , s h are uniquely determined by the primary decomposi-


tion itself. In K[X ] we can go further (unless all roots of the minimal polynomials
n
are in K). Let us consider the primary decomposition of K induced by the operator
A,
n
K = W1 Wl ,
where Wi ' K[X ]/((X i )r i ) and i K. Also in this case, of course, we have
(X 1 )r 1 (X l )r l = q 1 (X ) q k1 (X )A (X ) .
If we put A in Jordan canonical form, its characteristic polynomial A can easily be
computed by hand (in fact, all blocks are lower triangular matrices). We get
A = (X 1 )r 1 (X l )r l . (5.4)
Now, A = A . So, by comparing Eqq. (5.3) and (5.4), and by taking in mind Corollary
5.7, we get the following result.

Theorem 5.9. Let V be a finite-dimensional vector space over an arbitrary field K. Let
A : V V be an operator. Then the characteristic polynomial A (X ) of A is equal to
the product of the invariant factors of the pair (V, A),
A (X ) = q 1 (X ) q k (X ) ,
where q k (X ) coincides with the minimal polynomial A (X ) of A. Furthermore, A (X )
and A (X ) have the same irreducible factors in K[X ].

6 Hermitian vector spaces and self-adjoint operators

In this section V is a complex vector space of dimension n > 0.

 A Hermitian product on V is a map


h: V V C
(v,w) h(v, w)
such that
1) it is conjugate symmetric, i.e., h(v, w) = h(w, v) for all v, w V ;
2) for any w V , the map h(, w) : V C is linear;
3) it is positive definite, i.e., h(v, v) > 0 for all v , 0.

Notice that properties 1) and 2) imply that


v, w, u V , z C h(v, zw) = z h(v, w) ;
h(v, w + u) = h(v, w) + h(v, u) .
The pair (V, h) is called a Hermitian vector space.
18 Notes in linear algebra

 The protypical example is provided by the standard Hermitian product on C : n

n
v = (v 1 , . . . , v n ), w = (w 1 , . . . , w n ) Cn
X
h(v, w) = vk wk .
k=1

A Hermitian product h on V induces the norm kkh : V R defined by the formula


p
kvkh = h(v, v) for each v V

(of course, since h(v, v) is real and 0 for all v V , the square root is the only positive
real number whose square is h(v, v)). In what follows we shall write just kk instead
of kkh , whenever no confusion may arise.

As usual, we say that two vectors v, w V are orthogonal (w.r.t. the Hermitian
product h) if h(v, w) = 0.

Theorem 6.1 (Pythagorean theorem). If two vectors v, w V are orthogonal, then

kv + wk2 = kvk2 + kwk2 .

Proof. 
Theorem 6.2. For any vectors v, w V and for any complex number z C we have
1) |h(v, w)| kvkkwk (Schwarz inequality);
2) kvk = 0 if and only if v = 0;
3) kzvk = |z| kvk;
4) kv + wk kvk + kwk (triangle inequality).

Proof. 1) For w = 0 the statement is trivial. Assume first that kwk = 1. If we let
h(v, w) = C, then v w and w are othogonal vectors. By the Pythagorean
theorem, one has
kvk2 = kv wk2 + ||2 ,
w
so that || = |h(v, w)| kvk. For a general vector w, one takes the unit vector
kwk
and obtains that
w
|h(v, )| kvk .
kwk


Thus, one concludes that |h(v, w)| kvkkwk. The proofs of 2), 3), and 4) are left to
the reader as (easy) exercises ( ). N

It is well known that a real vector space endowed with an Euclidean scalar product
admits orthogonal bases. The analogous property for a complex vector space with a
Hermitian product is stated in the following theorem.

Theorem 6.3. Let (V, h) be a Hermitian vector space. Then V admits an orthogonal
basis.
Section 6 19

Proof (Gram-Schmidt process). Let {v 1 , . . . , v n } be any basis. We let

e1 = v 1
h(v2 , e1 )
e2 = v 2 e1
h(e1 , e1 )
.. ..
. .
h(vn , en1 ) h(vn , e1 )
en = v n en1 e1 .
h(en1 , en1 ) h(e1 , e1 )

Then {e1 , . . . , en } is an orthogonal basis. N

It is clear that we can obtain an orthonormal basis {f 1 , . . . , f n } for V simply by tak-


ei
ing any orthogonal basis {e1 , . . . , en } and letting f i =.
kei k

Theorem 6.4. Let W V be a subspace of dimension m n = dimV . Let W be the


subspace of all vectors that are orthogonal to W , i.e.

W = {v V | w W h(v, w) = 0} .

Then V = W W .

Proof. Let {w 1 , . . . , w m } be an orthogonal basis for W ; by using the Gram-Schmidt


process we complete it to an orthogonal basis {w 1 , . . . , w m , v m+1 , . . . , v n } for V . It suf-
fices to verify that the set {v m+1 , . . . v n } constitutes a basis for W ( ). N

The subspace W is called the orthogonal complement of W in V .

Hermitian products may seem to be like Euclidean products on real vector spaces.
However, they differ from the latter in at least one important respect, as it is shown
by the next lemma.

Lemma 6.5. Let (V, h) be a Hermitian vector space. Let A : V V be an operator such
that h(Av, v) = 0 for all v V . Then A = 0.

!
Proof. It is immediate to check by direct computation ( ) that, for any operator B,
the Hermitian product satisfies the the so-called polarization identity:

v, w V h(B(v + w), v + w) h(Bv, v) h(Bw, w) = h(Bv, w) + h(Bw, v) . (6.1)

If this identity is applied to an operator A fulfilling our hypotheses, one has

h(Av, w) + h(Aw, v) = 0

for all v, w V . If we multiply by i , we get:

0 = i (h(Av, w) + h(Aw, v)) = i h(Av, w) + i h(Aw, v) =


= h(A(i v), w) h(Aw, i v) = h(A(i v), w) + h(A(i v), w) = 2 h(A(i v), w) .
20 Notes in linear algebra

As v is arbitrary, this amount to say that h(Av, w) = 0 for all v, w V . In other words,
Av has to be orthogonal to all vectors w V : so, Av = 0 for all v V . We conclude
that A = 0. N

 The previous Lemma is not true in the case of an Euclidean product on a real
vector space. For example, we can take, on R2 with the standard Euclidean product
0 1
(, )Eucl , the operator defined by the matrix J = . One has (J x, x)Eucl = 0 for
1 0
2
all vectors x R .

Let A : V V be a linear operator. Its adjoint operator (w.r.t. the Hermitian prod-
uct h) is the operator A uniquely determined by the formula
v, w V h(Av, w) = h(v, A w) .
It is straightforward to check that A : V V is a linear operator and that the adjoint
operation has the following properties ( ):!
1) (A + B) = A + B ;
2) (AB) = B A ;
3) if A is invertible, than (A1 ) = (A )1 ;
4) (A ) = A;
5) (zA) = zA .

 If M (A, V ) is the matrix representing the operator A with respect to any orthog-
onal basis E = {e , . . . , e }, then the complex conjugate transpose matrix M (A, E ) T

represents the adjoint operator A (!). Accordingly, for any complex matrix M we
1 n

shall denote the matrix M T by M .

 An operator A : V V is called self-adjoint (w.r.t. the given Hermitian product h)


if A = A .

 According to the previous remark, a self-adjoint operator is represented w.r.t. an


orthogonal basis by a matrix M that fulfills the condition M = M . A matrix of this
kind is said to be Hermitian. The entries of M have to satisfy the equalities M i j =
M j i ; in particular, the diagonal entries must be real. Thus, the product zM of a
Hermitian matrix M by a complex number z fails to be be Hermitian, unless z is real
or M = 0. On the other hand, the sum M +N of two Hermitian matrices is Hermitian


as well. Therefore, we see that the space Herm(n) of n n Hermitian matrices carries
a natural structure of real vector space; its dimension is n 2 ( ).

From this observation it follows immediately that the space of self-adjoint oper-
ators on a Hermitian vector space (V, h) is a vector space of dimension equal to the
square of the dimension of V .

The crucial property of self-adjoint operators is stated in the following lemma.

Lemma 6.6. An operator A : V V is self-adjoint if and only if h(Av, v) is real for all
v V.
Section 6 21

Proof. Suppose A is self-adjoint. Then we have

h(Av, v) = h(v, A v) = h(v, Av) = h(Av, v) .

Therefore, h(Av, v) is real. Conversely, suppose that h(Av, v) is real for any v V .
Then one has
This follows from (A ) = A
z }| {
h(Av, v) = h(Av, v) = h(v, Av) = h(A v, v) ,
so that h((A A )v, v) = 0 for all v V . By Lemma 6.5 it follows that A A = 0. N

Corollary 6.7. Let A : V V be a self-adjoint operator. Then all its eigenvalues are
real.

Proof. Let be an eigenvalue of A: there exists a non-zero vector v such that Av = v.


We have that h(Av, v) = h(v, v) = h(v, v). Since h(Av, v) and h(v, v) are both real
numbers, it follows that is real as well. N

 One can give a more direct proof of Corollary 6.7 by noticing that, for any oper-

eigenvalues of B ().
ator B : V V , the eigenvalues of its adjoint B are the complex conjugates of the

We are now going to prove a key result, often known under the name of spectral
theorem: every self-adjoint operator is diagonalizable w.r.t. an orthogonal basis of
eigenvectors. First we need an easy technical lemma.

Lemma 6.8. Let A : V V be a self-adjoint operator.


1) Let v V be an eigenvector of A. If w is orthogonal to v, then Aw is orthogonal to v
as well.
2) If W V is an A-invariant subspace, then W is A-invariant as well.

Proof. 
Theorem 6.9 (Spectral theorem). Let A : V V be a self-adjoint operator w.r.t. a
hermitian product h on V . Then V admits an orthogonal basis E = {e1 , . . . , en } such
that each ei is an eigenvector of A. W.r.t. such a basis A is represented by a diagonal
matrix whose entries are real.

Proof. Recall that, by Theorem 2.3, any complex operator has at least one eigen-
vector. We prove the first claim by induction on n = dimV . The case n = 1 is triv-
ial. Suppose the statement is true for any self-adjoint operator on a complex vector
space of dimension n 1. Now, let V be a complex vector space of dimension n; let
A : V V be a self-adjoint operator. We take an eigenvector v of A and consider the
subspace V1 V generated by v. The subspace V1 is A-invariant; its orthogonal com-
plement V1 is A-invariant (by Lemma 6.8) and we have V1 V1 = V (by Theorem
6.4). Therefore, dimV1 = n 1, so that the inductive hypothesis is satisfied for the
operator A|V : V1 V1 . Let {f 1 , . . . , f n1 } be an orthogonal basis for V1 whose el-
1
ements are eigenvectors of A|V ; obviously, they are eigenvectors of A as well. Then
1
22 Notes in linear algebra

the set of vectors {v, f 1 , . . . , f n1 } is an orthogonal basis for V and its elements are
eigenvectors of A. The second claim follows straightforwardly from Corollary 6.7. N

Let us now introduce a particular class of self-adjoint operators.

 A self-adjoint operator A : V V is called positive if h(Av, v) is positive for any


v V.

Clearly, the identity operator I is a positive self-adjoint operator w.r.t. any Hermi-
tian product.

Theorem 6.10. Let A : V V be a self-adjoint operator.


1) A is positive if and if all its eigenvalues are positive.
2) If A is positive, then it is invertible; its inverse is a positive self-adjoint operator.
3) If A is positive, then there exists a uniquely determined positive self-adjoint opera-
tor R : V V such that R2 = A. The operator R is called the square root of A. Moreover,
one has that RA = AR.
4) If A is positive and if B is any positive self-adjoint operator, then AB and A + B are
positive self-adjoint operators.

Proof. All claims can be easily proved once an orthogonal basis {e1 , . . . , en } of eigen-
vectors of A has been fixed (this is possible in view of Theorem 6.9).
1) Let Aei = i ei . We have

h(Aei , ei ) = i h(ei , ei ) ,

where h(ei , ei ) > 0. If A is positive, then one has i > 0 for all i = 1, . . . n. Conversely,
if the eigenvalues 1 , . . . , n are positive, then, for any v = ni=1 v i ei V , we get:
P

n n n n
v i v j i h(ei , e j ) = i kvk2 > 0 .
X X X X
h(Av, v) = h(A( v i ei ), v j ej ) =
i =1 j =1 i , j =1 i =1

2) It follows from 1) that det A , 0, so that A is invertible. The inverse A1 of A is


readily characterized. Indeed, if we define the operator B : V V by letting Bei =
1
i
ei , then AB = BA = I. Hence, B = A1 . By construction it is clear that A1 is a
positive self-adjoint operator.
3) Define the operator R : V V by letting Rei = i ei ; then, R2 = A. Moreover, R
p


is a positive self-adjoint operator and commutes with A.
4) N

 A self-adjoint operator operator A : V V is said to be positive semidefinite if


h(Av, v) 0 for any v V . By proceedings in an analogous way as in the proof of
Theorem 6.10, it is easy to show that:
1) a self-adjoint operator is positive semidefinite if and only if all its eigenvalues are
0;
2) every positive semidefinite self-adjoint operator has a uniquely determined square
root, which is a positive semidefinite self-adjoint operator.

 Let M be an n n Hermitian matrix. It follows from Theorem 6.9 that there exists
Section 7 23

an invertible complex matrix N such that

1 0 0

0 2 0
N 1 M N = . .. ,

.. ..
.. . . .
0 0 n

where 1 , . . . , n are the eigenvalues of M . If all the eigenvalues of a Hermitian ma-


trix M are > 0 (resp. 0), then we say that M is positive (resp. positive semidefi-
nite). Theorem 6.10 and the previous remark imply that every positive (resp. posi-
tive semidefinite) Hermitian matrix M admits a square root; namely, there exists a
uniquely determined positive (resp. positive semidefinite) Hermitian matrix R such
that R 2 = M .

7 Unitary operators and polar decomposition

As it may be expected, a special role in the theory of Hermitian vector spaces is


played by those operators that preserve the Hermitian product.

 Let (V, h) be a Hermitian vector space. An operator U : V V is called unitary


(w.r.t. h) if h(Uv, Uw) = h(v, w) for all vectors v, w V .

Theorem 7.1. The following properties of an operator U : V V are equivalent:


1) U is unitary;
2) UU = U U = I;
3) U is norm preserving, i.e., kUvk = kvk for all v V ;
4) U maps unit norm vectors to unit norm vectors, i.e., kUvk = 1 for all v V such
that kvk = 1.


Proof. The proofs of the logical equivalences 1) 2) and 3) 4) are left to the reader

!
( ). The implication 1) 3) is obvious. It remains only to show the implication
3) 1). The following identity can be easily checked by direct computation ( ):

u1 , u2 V ku1 + u2 k2 ku1 u2 k2 + i ku1 + i u2 k2 i ku1 i u2 k2 = 4 h(u1 , u2 ) .

Suppose now that U is norm preserving and apply the previous identity to u1 = Uv,
u2 = Uw. We get

4 h(Uv, Uw) = kU(v + w)k2 kU(v w)k2 + i kU(v + i w)k2 i kU(v i w)k2 =
= kv + wk2 kv wk2 + i kv + i wk2 i kv i wk2 =
= 4 h(v, w) .
24 Notes in linear algebra

Therefore, U is unitary, as it was to be proved. N

The condition UU = U U = I shows that every unitary operator U is invertible;


its inverse is its adjoint U , which is a unitary operator as well. The composition
(product) U1 U2 of two unitary operators U1 , U2 is unitary, because one has
(U1 U2 )(U1 U2 ) = U1 U2 U2 U1 = I .
So, unitary operators on a Hermitian space (V, h(, )) form a group, which we shall
denote by U (V, h) (or simply by U (V ) whenever there is no risk of ambiguity).

 Let {e1 , . . . , en } be an orthonormal basis for the Hermitian vector space (V, h). Let
U : V V be a unitary operator. If we let f i = Uei for all i = 1, . . . , n, then it turns
out (by 4) of Theorem 7.1) that the set of vectors {f 1 , . . . , f n } is an orthonormal basis
for V . Conversely, if {f 1 , . . . , f n } is any orthonormal basis for V , there exists a unitary
operator U such that f i = Uei for all i = 1, . . . , n. Therefore, the group U (V ) can
be thought of as acting on the space of orthonormal basis for V ; the action is free
and transitive (so that there is a one-to-one correspondence between U (V ) and the
space of orthonormal bases).

 Unitary operators on C n
equipped with its standard Hermitian product are as-
sociated with n n complex matrices U satisfying the relation UU = U U = I .
The group U (n) consisting of those matrices is the unitary group of order n. Since
detUU = 1 and detU = detU , we deduce that the determinant of a unitary matrix
is a unit complex number:
U U (n) |detU | = 1 .

!
When n = 1, the group U (1) coincides with the circle group of unit complex num-
bers. The determinant map det : U (n) U (1) is a group homomorphism ( ); its
kernel is the special unitary group SU (n) U (n) of unitary matrices having deter-
minant equal to 1.
In view of the previous remark, there is a one-to-one correspondence between U (n)
and the space of orthonormal bases of Cn (w.r.t. the standard Hermitian product).
Actually, the columns of every unitary matrix gives the components of an orthonor-
mal basis on the canonical basis {e1 = (1, 0 . . . , 0), e2 = (0, 1, . . . , 0), . . . , en = (0, 0, . . . , 1)},
and every orthonormal basis is obtained in this way.

 Let M be a Hermitian matrix. Then there exists a unitary matrix U such that
U MU is a diagonal matrix (whose diagonal entries are, of course, the eigenvalues
of M).

It is a familiar fact that any complex number z , 0 ca be uniquely expressed as


the product of a positive real number and a unit complex number: z = e i , with
2 = zz. This result can be generalized to invertible operators on a Hermitian vector
space of arbitrary dimension.

Theorem 7.2 (Polar decomposition). Let (V, h) be a Hermitian vector space. Any
invertible operator A : V V can be uniquely factored as
A = RU ,
Section 8 25

where R is a positive self-adjoint operator and U is a unitary operator.

Proof. The operator AA is self-adjoint. Moreover, it is positive, because A and,


therefore, A are invertible. Indeed, for any v , 0, one has A v , 0, so that
h(AA v, v) = h(A v, A v) > 0 .
Let R be the square root of AA ; to make it explicit, R is the unique positive self-
adjoint operator such that R2 = AA . Let U = R1 A. One has that U = A (R1 ) =
A R1 (this follows from the fact that R1 is self-adjoint). Hence, we get
UU = R1 AA R1 = R1 R2 R1 = I ,
so that U is unitary. Suppose there is a different factorization A = R
bUb with R
b positive
self-adjoint and Ub unitary. Then

R2 = AA = R
bU b R
bU b = R
b2 ,

so that R
b = R. N

Notice that, if A = RU as in the above Theorem, then we have det A = det R det U,
with det R is a positive real number and det U is a unit complex number.

S Even in the case of a positive semidefinite operator A : V V it is possible to get


a polar decomposition A = RU. Here R is the uniquely determined positive semidef-
inite self-adjoint operator such that R2 = AA ; indeed, AA is a positive semidefinite
self-adjoint operator, so that it admits a unique square root with the same proper-
ties. It is not difficult to show that Ker R = Ker A = Ker A. On the contrary, the uni-
tary operator U is not uniquely determined. On the subspace (Ker A) we define the
unitary operator U0 as U0 = R1 A; on Ker A we arbitrarily take any unitary operator
U00 : Ker A Ker A. Finally, on Ker A (Ker A) = V we let U = U0 U00 .

 In the proof of Theorem 7.2,


instead of the square root of AA , we may consider
the square root of A A. Let R be the positive self-adjoint operator such that R2 =
e e
A A. The operator U e 1 is unitary. Thus, any invertible operator A can be also
e = AR
factored as
A=U eRe,
where R
e is a positive self-adjoint operator and U
e is a unitary operator.

 Theorem 7.2 is equivalent to the fact that every invertible complex matrix A can
be factored as A = RU , where R is a positive definite Hermitian matrix and U is a
unitary matrix. We have seen that it is always possible to find a unitary matrix Z
such that Z R Z = D is a diagonal matrix. Letting S = Z U , we get
A = Z DS ,
where Z and S are unitary matrices and D is diagonal. The diagonal entries of D
are the square roots of the eigenvalues of A A (so, they are positive real numbers):
they are called the singular values of A. Accordingly, the decomposition A = Z DS is
called the singular value decomposition of A.
26 Notes in linear algebra

8 Euclidean vector spaces and symmetric operators

In this Section we shall prove the analogous versions of the spectral theorem and of
the polar decomposition theorem in the context of real vector spaces. Even though
not strictly needed for this purpose, we start by studying a generalization of the no-
tion of Euclidean product.

 Let V be a real vector space of dimension n > 0. A pseudo-Euclidean (inner) prod-


uct on V is a bilinear map g : V V V such that
1) it is symmetric, i.e., g(v, w) = g(w, v) for all v, w V ;
2) it is nondegenerate, i.e., if for some v V one has g(v, w) = 0 for all w V , then
v = 0.

 There are at least two fundamental examples.


The first is, of course, the standard Euclidean product (, )Eucl on Rn :
n
x = (x 1 , . . . , x n ), y = (y 1 , . . . , y n ) Rn
X
(x, y)Eucl = xk y k .
k=1

The second is the Minkowski pseudo-Euclidean product (, )Mink on Rn :


n1
x = (x 1 , . . . , x n ), y = (y 1 , . . . , y n ) Rn
X
(x, y)Mink = xk y k xn y n .
k=1

Two vectors v, w in a pseudo-Euclidean vector space (V, g) are called orthogonal if


g(v, w) = 0. However, unless g is positive or negative definite, it may happen that a
vector v is orthogonal to itself. We say that v is a null vector (w.r.t. g) if g(v, v) = 0.

Let W be a subspace of V . Its orthogonal space (w.r.t. g) is by definition the sub-


space
W = {v V | w W g(v, w) = 0} .

Lemma 8.1. Let W be a subspace of V . Then W W , {0} if and only if W does not
contain any null vector. In particular, if W does not contain any null vector, one has
W W = V .

Proof. 
Lemma 8.2. Let W be a subspace of (V, g) such that any w W is a null vector. Then
g(w 1 , w 2 ) = 0 for all w 1 , w 2 W . In particular, there is at least one vector in V which
is not a null vector.

Proof. We have the following identity:


1
g(w 1 , w 2 ) = g(w 1 + w 2 , w 1 + w 2 ) g(w 1 , w 1 ) g(w 2 , w 2 ) = 0 .
2
Section 8 27

As for the second statement, it suffices to observe that, if any vector v V is a null
vector, then g is zero. N

When g is definite positive, it is well known that V admits orthogonal bases , i.e.,
bases {e1 , . . . , en } such that g(ei , e j ) = 0 for all i , j (this can be proved in the same way
as we did for Theorem 6.3). The same result holds true also in the general case.

Theorem 8.3. Any pseudo-Euclidean vector space (V, g) has an orthogonal basis.

Proof. Instead of making use of a suitable modification of the Gram-Schmidt pro-


cess (as we could), we proceed by induction on the dimension n of V . The case
n = 1 is trivial. Now, suppose that the statement is true for n 1, and prove it for n.
By Lemma 8.2 there exists a vector e1 V that is not a null vector. Let V1 be the sub-
space generated by e1 . Lemma 8.1 implies that V1 V1 = V , so that dimV1 = n 1.
By the inductive hypothesis V1 has an orthogonal basis {e2 , . . . , en }. Since, by con-
struction, one has g(e1 , e j ) = 0 for all j = 2, . . . , n, the basis {e1 , . . . , en } is an orthogonal
basis for V w.r.t. g. N

No element ei of an orthogonal basis {e1 , . . . , en } can be a null vector: if this were


the case, one would have g(ei , v) = 0 for all v V . But this is absurd, since g is non-
degenerate. So, given any orthogonal basis {e1 , . . . , en } we can produce an orthonor-
ei
mal basis {f 1 , . . . , f n } by letting f i = p for any i = 1, . . . , n. So, the matrix
|g(ei , ei )|
representing the pseudo-Euclidean product g w.r.t. the basis {f 1 , . . . , f n }, up to rear-
rangement of the elements of {f 1 , . . . , f n }, is a diagonal matrix of the form:

Ip 0
p,q =
0 I q

where p + q = n and I r , r = p, q, is the r r identity matrix. The integers p and q


does not depends on the choice of the basis {f 1 , . . . , f n }. In fact, p is the dimension of
the maximal subspace where g is positive definite ( ); similarly, q is the dimension
of the maximal subspace where g is negative definite The pair (p, q) is called the
signature of the pseudo-Euclidean product g.

 Let gp,q be a pseudo-Euclidean product of signature (p, q) on the vector space


V , whose dimension is n = p + q > 0. A gp,q -isometry is an operator : V V such
that
v, w V v,
gp,q ( w) = gp,q (v, w) .

Since gp,q is nondegenerate, any gp,q -isometry is invertible ( ). Moreover, the !


product (composition) of gp,q -isometries is again a gp,q -isometry. The group of
gp,q -isometries is denoted by O(V, gp,q ). When q = 0, one uses the simpler notation
g instead of gp,0 ; accordingly, the group of of g-isometries is denoted by O(V, g) and
is called the orthogonal group of (V, g). By fixing a suitably arranged orthonormal
basis {f 1 , . . . , f n } for (V, gp,q ), it is easy to see that the group O(V, gp,q ) is isomorphic
to the matrix group

O(p, q) = { GL(p + q; R) | T p,q = p,q } .


28 Notes in linear algebra

In the case q = 1, one gets the (generalized) Lorentz groups O(p, 1). When q = 0, we
recover the usual definition of the orthogonal group:

O(n) = { GL(n; R) | T I n = I n } .

It is not difficult to show that there are group isomorphisms O(p, q) ' O(q, p) for any
p, q; in particular, O(n) ' O(0, n).

From now we assume that (V, g) is an Euclidean vector space, namely, that g is
positive definite.

 Let A : V V be an operator. The Euclidean adjoint of A is the operator A T


such
that
g(Av, w) = g(v, AT w) .

An operator A : V V is called symmetric if A = AT .

 Let A : V V be an operator. Let M (A, E ) be the matrix representing A w.r.t. an


orthogonal basis E = {e1 , . . . , en }. Then the matrix representing AT is M (A, E )T . The
operator A is symmetric if and if the matrix M (A, E ) is symmetric. Note that Q is a
g-isometry, i.e., Q O(V, g), if and only if QQT = I ( ). 
In order to apply to the Euclidean framework the results of the previous two sec-
tions, we need to find a way to produce a Hermitian vector space out of an Euclidean
vector space. Allowing ourselves to use only elementary concepts, we can do that in
the following manner. On the direct sum V V ones introduces the complex scalar
multiplication defined by the rule

z = a + i b C, (v, w) (a + i b)(v, w) = (av bw, bv + aw)

(notice that, when b = 0, this is the ordinary real scalar multiplication). Let V C be
the resulting complex vector space. To make notation less cumbersome, the pair
(v, w) will be denoted as v + i w. So,

(a + i b)(v + i w) = (av bw) + i (bv + aw) ,

as required. Any basis for V is a basis for V C over the complex numbers: indeed, if
{e1 , . . . , en } is a basis for V , one has v +i w = nj=1 (v j +i w j )e j for any vector (v +i w)
P

V C . The converse is not true: there are bases for V C that do not stem from bases for
V (the reason for that is that GL(n; C) is much bigger than GL(n; R)).

Now, let g be a Euclidean product on V . We define a Hermitian product h on V C


by the following assignment:

v + i w, x + i y V C h(v + i w, x + i y) = g(v, x) + g(w, y) + i (g(w, x) g(v, y))

!
It can be readily checked ( ) that h(, x + i y) : V C C is C-linear for all x + i y and
that
h(v + i w, x + i y) = h(x + i y, v + i w) for all x + i y, v + i w .
Section 8 29

Moreover, for all v + i w , 0, one has

h(v + i w, v + i w) = g(v, v) + g(w, w) .

Note that any Euclidean orthogonal basis {e1 , . . . , en } for (V, g) is a Hermitian orthog-
onal basis for (V C , h).

S The knowledgeable reader should be aware that V C


is nothing but the complex-
ification of V , namely, the complex vector space V R C. However, h is not the com-
plexification gC of g. Indeed, by definition, gC : V C V C C is a C-bilinear map
(because g is R-bilinear). On the contrary, h is C-linear in first argument and C-anti-
linear in the second argument.

e : V C V C given by
Any operator A : V V induces a complex operator A

e (v + i w) = Av + i Aw .
A

e: V C
Lemma 8.4. Under the previous assumptions, if A : V V is symmetric, then A
C
V is self-adjoint.

Proof. From the definition of our Hermitian product h it follows that

e (v + i w), x + i y) =
h(A
= g(Av, x) + g(Aw, y) + i (g(Aw, x) g(Av, y)) =
= g(v, Ax) + g(w, Ay) + i (g(w, Ax) g(v, Ay)) =
= h(v + i w, A
e (x + i y)) .

Therefore, A
e is self-adjoint. N

Theorem 8.5 (Euclidean spectral theorem) Let (V, g) be an Euclidean vector space.
A symmetric operator A : V V has n real eigenvalues. Moreover, there exists a
basis for V consisting of eigenvectors of A.

Proof. Let A e : V C V C be the self-adjoint operator associated with A. The spec-


tral theorem 6.9 ensures that A e has n real eigenvalues 1 , . . . , n and that the corre-
sponding eigenvectors {v 1 , . . . , v n } are a basis for V C . Let v j = x j + i y j . The equality
e v j = j v j reads as follows:
A

e (x j + i y j ) = Ax j + i Ay j = j x j + i j y j ,
A

so that Ax j = j x j and Ay j = j y j for all j = 1, . . . , n. We conclude that A has n


real eigenvalues. As for the second claim, we notice that neither {x 1 , . . . , x n } nor
{y 1 , . . . , y n } is, in general, a basis for V (over R). However, the eigenvectors x 1 , . . . ,
x n , y 1 , . . . , y n generate V . Indeed, any u V , thought of as a vector in V C , can be ex-
pressed as u = nj=1 z j (x j + i y j ), where z j = a j + i b j C. Now, the imaginary part
P

of this sum must vanish, so that u = nj=1 (a j x j b j y j ). Thus, it is possible to ex-


P

tract from the set of generators x 1 , . . . , x n , y 1 , . . . , y n a basis for V , whose elements are
eigenvectors. N
30 Notes in linear algebra

Corollary 8.6. Under the same hypotheses as in Theorem 8.5, there is an orthonormal
basis for V consisting of eigenvectors of A.

Proof. By Theorem 8.5 A has at least one eigenvector v 1 . Let V1 be the subspace
generated by v 1 . Its orthogonal complement V1 is A-invariant (cf. Lemma 6.8) and
dimV1 = dimV 1. Hence, the statement can be easily proved by induction on
dimV . N

We say that a symmetric operator A : V V is positive if g(Av, v) > 0 for all v , 0.


By proceeding as in the proof of 3) of Theorem 6.10 one has the following result.

Lemma 8.7. Let A : V V be a positive symmetric operator. Then A admits a uniquely


determined square root, i.e., there exists a unique positive symmetric operator S such
that S2 = A.

Putting all together, we are able to get the Euclidean analogue of the polar decom-
position theorem.

Theorem 8.8 (Euclidean polar decomposition). Let (V, g) be an Euclidean vector


space. Any invertible operator A : V V can be uniquely factored as

A = SQ ,

where S is positive symmetric and Q O(V, g).

Proof. The proof is essentially the same as that of Theorem 7.2. The operator AAT is
postive symmetric; let S be its square root. Now, we define Q = S1 A. We have

QQT = S1 AAT S1 = S1 S2 S1 = I ,

so that Q O(V, g). N

 We can rephrase the last three results in the setting of (real) matrices.
Theorem 8.5 is equivalent to the fact that any symmetric matrix can be diagonalized
by an invertible real matrix.
Corollary 8.6 amounts to say that every symmetric matrix can be diagonalized by an
orthogonal matrix. One more remark is here in order. After fixing a basis for V , an
Euclidean product g is uniquely determined by a positive definite symmetric matrix
G. So, Corollary 8.6 entails that any two matrices A and G, with A symmetric and G
positive definite symmetric, can be simultaneously diagonalized.
Needless to say, Theorem 8.8 is equivalent to the fact that any invertible matrix can
be factored as the product of a positive definite symmetric matrix and an orthogonal
matrix.
Section 8 31

References

[1] A RTIN , M ICHAEL, Algebra, Prentice Hall (NJ) 1991.

[2] ATIYAH , M ICHAEL F. & M ACDONALD, I AN G., Introduction to Commutative Al-


gebra, Addison-Wesley, Reading (Mass.) 1969.

[3] B OURBAKI , N ICOLAS, Algbre. Chapitres 1 3, 2me d., Springer, Berlin-


Heidelberg-New York 2007 (rimpression inchange de ldition de 1970).

[4] C AYLEY, A RTHUR, The Collected Mathematical Papers, vol. II, Cambridge Uni-
versity Press, Cambridge 1889.

[5] F ROBENIUS , F ERDINAND G EORG, ber lineare Substitutionen und bilineare


Formen, Journal fr die reine and angewandte Mathematik, 84 (1878), pp. 1-
63.

[6] G REUB , W ERNER H., Linear Algebra, Springer-Verlag, New York 1967.

[7] H UMPHREYS , J AMES E., Introduction to Lie Algebras and Representations The-
ory, third printing, revised, Springer-Verlag, New York-Heidelberg-Berlin 1972.

[8] J ACOBSON , N ATHAN, Lectures in Abstract Algebra, vol. II, Linear Algebra, Van
Nostrand, Princeton (N.J.) 1953.

[9] L ANG , S ERGE, Algebra, second edition, Addison-Wesley, Reading (Mass.) 1984.

[10] , Linear Algebra, Springer-Verlag, New York 1987.

Vous aimerez peut-être aussi