573 vues

Transféré par Anonymous 3zbI8CUQUK

- Python.pdf
- Numerical Linear Algebra Biswa Nath Datta PART II_Ch7-11
- Linear Algebra and Geometry Algebra Logic and Applications
- Non Linear Dynamics
- 0521279488.LinearAlgebra
- The Python Book - The Ultimate Guide to Coding With Python (2015)
- Real and functional Analysis
- Dive Into Python
- Introduction to Linear Algebra for Science and Engineering 1st Ed
- 251783142-Multilinear-Algebra-Greub.pdf
- Matrix Methods - Applied Linear Algebra 3rd Ed - Bronson,Costa.pdf
- Watkins D. Fundamentals of Matrix Computations
- Linear Algebra - Challenging Problems for Students - Fuzhen Zhang (1)
- Matrix Analysis for Scientists and Engineers by Alan J Laub
- O'Neill Differential Geometry
- Python Devloper's Handbook
- Python 3 Orientado a Objeto
- Linear Algebra
- Lectures on Generating Functions, Sergei Lando-1
- Introduction to Matrix Algebra

Vous êtes sur la page 1sur 333

This engaging textbook for advanced undergraduate students and beginning graduates

covers the core subjects in linear algebra. The author motivates the concepts by

drawing clear links to applications and other important areas.

The book places particular emphasis on integrating ideas from analysis wherever

appropriate and features many novelties in its presentation. For example, the notion of

determinant is shown to appear from calculating the index of a vector field which leads

to a self-contained proof of the Fundamental Theorem of Algebra; the

CayleyHamilton theorem is established by recognizing the fact that the set of

complex matrices of distinct eigenvalues is dense; the existence of a real eigenvalue of

a self-adjoint map is deduced by the method of calculus; the construction of the Jordan

decomposition is seen to boil down to understanding nilpotent maps of degree two;

and a lucid and elementary introduction to quantum mechanics based on linear algebra

is given.

The material is supplemented by a rich collection of over 350 mostly proof-oriented

exercises, suitable for readers from a wide variety of backgrounds. Selected solutions

are provided at the back of the book, making it ideal for self-study as well as for use as

a course text.

A Concise Text on

Advanced Linear Algebra

YISONG YANG

Polytechnic School of Engineering, New York University

Cambridge University Press is part of the University of Cambridge.

It furthers the Universitys mission by disseminating knowledge in the pursuit of

education, learning and research at the highest international levels of excellence.

www.cambridge.org

Information on this title: www.cambridge.org/9781107087514

c Yisong Yang 2015

This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,

no reproduction of any part may take place without the written

permission of Cambridge University Press.

First published 2015

Printed in the United Kingdom by Clays, St Ives plc

A catalogue record for this publication is available from the British Library

Library of Congress Cataloguing in Publication data

Yang, Yisong.

A concise text on advanced linear algebra / Yisong Yang, Polytechnic School

of Engineering, New York University.

pages cm

Includes bibliographical references and index.

ISBN 978-1-107-08751-4 (Hardback) ISBN 978-1-107-45681-5 (Paperback)

1. Algebras, LinearTextbooks. 2. Algebras, LinearStudy and teaching (Higher) .

3. Algebras, LinearStudy and teaching (Graduate). I. Title.

II. Title: Advanced linear algebra.

QA184.2.Y36 2015

512 .5dc23 2014028951

ISBN 978-1-107-08751-4 Hardback

ISBN 978-1-107-45681-5 Paperback

Cambridge University Press has no responsibility for the persistence or accuracy

of URLs for external or third-party internet websites referred to in this publication,

and does not guarantee that any content on such websites is, or will remain,

accurate or appropriate.

For Sheng,

Peter, Anna, and Julia

Contents

Preface

Notation and convention

page ix

xiii

Vector spaces

1.1

Vector spaces

1.2

Subspaces, span, and linear dependence

1.3

Bases, dimensionality, and coordinates

1.4

Dual spaces

1.5

Constructions of vector spaces

1.6

Quotient spaces

1.7

Normed spaces

1

1

8

13

16

20

25

28

Linear mappings

2.1

Linear mappings

2.2

Change of basis

2.3

Adjoint mappings

2.4

Quotient mappings

2.5

Linear mappings from a vector space into itself

2.6

Norms of linear mappings

34

34

45

50

53

55

70

Determinants

3.1

Motivational examples

3.2

Definition and properties of determinants

3.3

Adjugate matrices and Cramers rule

3.4

Characteristic polynomials and CayleyHamilton

theorem

78

78

88

102

Scalar products

4.1

Scalar products and basic properties

115

115

vii

107

viii

Contents

4.2

4.3

4.4

4.5

Positive definite scalar products

Orthogonal resolutions of vectors

Orthogonal and unitary versus isometric mappings

120

127

137

142

5.1

Bilinear and quadratic forms

5.2

Self-adjoint mappings

5.3

Positive definite quadratic forms, mappings, and matrices

5.4

Alternative characterizations of positive definite matrices

5.5

Commutativity of self-adjoint mappings

5.6

Mappings between two spaces

147

147

151

157

164

170

172

6.1

Complex sesquilinear and associated quadratic forms

6.2

Complex self-adjoint mappings

6.3

Positive definiteness

6.4

Commutative self-adjoint mappings and consequences

6.5

Mappings between two spaces via self-adjoint mappings

180

180

184

188

194

199

Jordan decomposition

7.1

Some useful facts about polynomials

7.2

Invariant subspaces of linear mappings

7.3

Generalized eigenspaces as invariant subspaces

7.4

Jordan decomposition theorem

205

205

208

211

218

Selected topics

8.1

Schur decomposition

8.2

Classification of skewsymmetric bilinear forms

8.3

PerronFrobenius theorem for positive matrices

8.4

Markov matrices

226

226

230

237

242

9.1

Vectors in Cn and Dirac bracket

9.2

Quantum mechanical postulates

9.3

Non-commutativity and uncertainty principle

9.4

Heisenberg picture for quantum mechanics

248

248

252

257

262

Bibliographic notes

References

Index

267

311

313

315

Preface

a year-long course in Linear Algebra for senior undergraduate and beginning

graduate students in mathematics, science, and engineering. Students who gain

profound understanding and grasp of the concepts and methods of this course

will acquire an essential knowledge foundation to excel in their future academic endeavors.

Throughout the book, methods and ideas of analysis are greatly emphasized

and used, along with those of algebra, wherever appropriate, and a delicate

balance is cast between abstract formulation and practical origins of various

subject matters.

The book is divided into nine chapters. The first seven chapters embody a

traditional course curriculum. An outline of the contents of these chapters is

sketched as follows.

In Chapter 1 we cover basic facts and properties of vector spaces. These

include definitions of vector spaces and subspaces, concepts of linear dependence, bases, coordinates, dimensionality, dual spaces and dual bases,

quotient spaces, normed spaces, and the equivalence of the norms of a finitedimensional normed space.

In Chapter 2 we cover linear mappings between vector spaces. We start from

the definition of linear mappings and discuss how linear mappings may be concretely represented by matrices with respect to given bases. We then introduce

the notion of adjoint mappings and quotient mappings. Linear mappings from

a vector space into itself comprise a special but important family of mappings

and are given a separate treatment later in this chapter. Topics studied there

include invariance and reducibility, eigenvalues and eigenvectors, projections,

nilpotent mappings, and polynomials of linear mappings. We end the chapter

with a discussion of the concept of the norms of linear mappings and use it

to show that being invertible is a generic property of a linear mapping and

ix

Preface

then to show how the exponential of a linear mapping may be constructed and

understood.

In Chapter 3 we cover determinants. As a non-traditional but highly

motivating example, we show that the calculation of the topological degree

of a differentiable map from a closed curve into the unit circle in R2 involves

computing a two-by-two determinant, and the knowledge gained allows us to

prove the Fundamental Theorem of Algebra. We then formulate the definition

of a general determinant inductively, without resorting to the notion of permutations, and establish all its properties. We end the chapter by establishing the

CayleyHamilton theorem. Two independent proofs of this important theorem

are given. The first proof is analytic and consists of two steps. In the first step,

we show that the theorem is valid for a matrix of distinct eigenvalues. In the

second step, we show that any matrix may be regarded as a limiting point of a

sequence of matrices of distinct eigenvalues. Hence the theorem follows again

by taking the limit. The second proof, on the other hand, is purely algebraic.

In Chapter 4 we discuss vector spaces with scalar products. We start from the

most general notion of scalar products without requiring either non-degeneracy

or positive definiteness. We then carry out detailed studies on non-degenerate

and positive definite scalar products, respectively, and elaborate on adjoint

mappings in terms of scalar products. We end the chapter with a discussion

of isometric mappings in both real and complex space settings and noting their

subtle differences.

In Chapter 5 we focus on real vector spaces with positive definite scalar

products and quadratic forms. We first establish the main spectral theorem for

self-adjoint mappings. We will not take the traditional path of first using the

Fundamental Theorem of Algebra to assert that there is an eigenvalue and then

applying the self-adjointness to show that the eigenvalue must be real. Instead

we shall formulate an optimization problem and use calculus to prove directly

that a self-adjoint mapping must have a real eigenvalue. We then present a

series of characteristic conditions for a symmetric bilinear form, a symmetric

matrix, or a self-adjoint mapping, to be positive definite. We end the chapter

by a discussion of the commutativity of self-adjoint mappings and the usefulness of self-adjoint mappings for the investigation of linear mappings between

different spaces.

In Chapter 6 we study complex vector spaces with Hermitian scalar products

and related notions. Much of the theory here is parallel to that of the real space

situation with the exception that normal mappings can only be fully understood

and appreciated within a complex space formalism.

In Chapter 7 we establish the Jordan decomposition theorem. We start with

a discussion of some basic facts regarding polynomials. We next show how

Preface

xi

to reduce a linear mapping over its generalized eigenspaces via the Cayley

Hamilton theorem and the prime factorization of the characteristic polynomial

of the mapping. We then prove the Jordan decomposition theorem. The key

and often the most difficult step in this construction is a full understanding

of how a nilpotent mapping is reduced canonically. We approach this problem

inductively with the degree of a nilpotent mapping and show that it is crucial to

tackle a mapping of degree 2. Such a treatment eases the subtlety of the subject

considerably.

In Chapter 8 we present four selected topics that may be used as materials for some optional extra-curricular study when time and interest permit. In

the first section we present the Schur decomposition theorem, which may be

viewed as a complement to the Jordan decomposition theorem. In the second

section we give a classification of skewsymmetric bilinear forms. In the third

section we state and prove the PerronFrobenius theorem regarding the principal eigenvalues of positive matrices. In the fourth section we establish some

basic properties of the Markov matrices.

In Chapter 9 we present yet another selected topic for the purpose of

optional extra-curricular study: a short excursion into quantum mechanics

using gadgets purely from linear algebra. Specifically we will use Cn as the

state space and Hermitian matrices as quantum mechanical observables to formulate the over-simplified quantum mechanical postulates including Bohrs

statistical interpretation of quantum mechanics and the Schrdinger equation

governing the time evolution of a state. We next establish Heisenbergs uncertainty principle. Then we prove the equivalence of the Schrdinger description

via the Schrdinger equation and the Heisenberg description via the Heisenberg equation of quantum mechanics.

Also provided in the book is a rich collection of mostly proof-oriented

exercises to supplement and consolidate the main course materials. The

diversity and elasticity of these exercises aim to satisfy the needs and interests of students from a wide variety of backgrounds.

At the end of the book, solutions to some selected exercises are presented.

These exercises and solutions provide additional illustrative examples, extend

main course materials, and render convenience for the reader to master the

subjects and methods covered in a broader range.

Finally some bibliographic notes conclude the book.

This text may be curtailed to meet the time constraint of a semester-long

course. Here is a suggested list of selected sections for such a plan: Sections 1.11.5, 2.12.3, 2.5, 3.1.2, 3.2, and 3.3 (present the concept of adjugate

matrices only), Section 3.4 (give the second proof of the CayleyHamilton theorem only, based on an adjugate matrix expansion), Sections 4.3, 4.4, 5.1, 5.2

xii

Preface

(omit the analytic proof that a self-adjoint mapping must have an eigenvalue

but resort to Exercise 5.2.1 instead), Sections 5.3, 6.1, 6.2, 6.3.1, and 7.17.4.

Depending on the pace of lectures and time available, the instructor may

decide in the later stage of the course to what extent the topics in Sections

7.17.4 (the Jordan decomposition) can be presented productively.

The author would like to take this opportunity to thank Patrick Lin, Thomas

Otway, and Robert Sibner for constructive comments and suggestions, and

Roger Astley of Cambridge University Press for valuable editorial advice,

which helped improve the presentation of the book.

West Windsor, New Jersey

Yisong Yang

N = {0, 1, 2, . . . },

and Z the set of all integers,

Z = {. . . , 2, 1, 0, 1, 2, . . . }.

a + ib where a, b are real numbers we use

c = a ib

to denote the complex conjugate of c. We use {c} and {c} to denote the real

and imaginary parts of the complex number c = a + ib. That is,

{c} = a,

{c} = b.

We use i, j, k, l, m, n to denote integer-valued indices or space dimension numbers, a, b, c scalars, u, v, w, x, y, z vectors, A, B, C, D matrices,

P , R, S, T mappings, and U, V , W, X, Y, Z vector spaces, unless otherwise

stated.

We use t to denote the variable in a polynomial or a function or the transpose

operation on a vector or a matrix.

When X or Y is given, we use X Y to denote that Y , or X, is defined to

be X, or Y , respectively.

Occasionally, we use the symbol to express for all.

Let X be a set and Y, Z subsets of X. We use Y \ Z to denote the subset of

elements in Y which are not in Z.

xiii

1

Vector spaces

In this chapter we study vector spaces and their basic properties and structures.

We start by stating the definition and a discussion of the examples of vector

spaces. We next introduce the notions of subspaces, linear dependence, bases,

coordinates, and dimensionality. We then consider dual spaces, direct sums,

and quotient spaces. Finally we cover normed vector spaces.

A vector space is a non-empty set consisting of elements called vectors which

can be added and multiplied by some quantities called scalars. In this section,

we start with a study of vector spaces.

1.1.1 Fields

The scalars to operate on vectors in a vector space are required to form a field,

which may be denoted by F, where two operations, usually called addition,

denoted by +, and multiplication, denoted by or omitted, over F are performed between scalars, such that the following axioms are satisfied.

(1) (Closure) If a, b F, then a + b F and ab F.

(2) (Commutativity) For a, b F, there hold a + b = b + a and ab = ba.

(3) (Associativity) For a, b, c F, there hold (a + b) + c = a + (b + c) and

a(bc) = (ab)c.

(4) (Distributivity) For a, b, c F, there hold a(b + c) = ab + ac.

(5) (Existence of zero) There is a scalar, called zero, denoted by 0, such that

a + 0 = a for any a F.

(6) (Existence of unity) There is a scalar different from zero, called one,

denoted by 1, such that 1a = a for any a F.

1

Vector spaces

a or (a), such that a + (a) = 0.

(8) (Existence of multiplicative inverse) For any a F \ {0}, there is a scalar,

denoted by a 1 , such that aa 1 = 1.

It is easily seen that zero, unity, additive and multiplicative inverses are all

unique. Besides, a field consists of at least two elements.

With the usual addition and multiplication, the sets of rational numbers, real

numbers, and complex numbers, denoted by Q, R, and C, respectively, are all

fields. These fields are infinite fields. However, the set of integers, Z, is not a

field because there is a lack of multiplicative inverses for its non-unit elements.

Let p be a prime (p = 2, 3, 5, . . . ) and set pZ = {n Z | n = kp, k Z}.

Classify Z into the so-called cosets modulo pZ, that is, some non-overlapping

subsets of Z represented as [i] (i Z) such that

[i] = {j Z | i j pZ}.

(1.1.1)

It is clear that Z is divided into exactly p cosets, [0], [1], . . . , [p 1]. Use

Zp to denote the set of these cosets and pass the additive and multiplicative

operations in Z over naturally to the elements in Zp so that

[i] + [j ] = [i + j ],

[i][j ] = [ij ].

(1.1.2)

It can be verified that, with these operations, Zp becomes a field with its obvious zero and unit elements, [0] and [1]. Of course, p[1] = [1] + + [1] (p

terms)= [p] = [0]. In fact, p is the smallest positive integer whose multiplication with unit element results in zero element. A number of such a property

is called the characteristic of the field. Thus, Zp is a field of characteristic p.

For Q, R, and C, since no such integer exists, we say that these fields are of

characteristic 0.

Let F be a field. Consider the set of n-tuples, denoted by Fn , with elements

called vectors arranged in row or column forms such as

a1

.

. or (a1 , . . . , an ) where a1 , . . . , an F.

(1.1.3)

.

an

Furthermore, we can define the addition of two vectors and the scalar multiplication of a vector by a scalar following the rules such as

a1

.

. +

.

an

b1

a1 + b1

..

..

. =

.

bn

(1.1.4)

an + bn

a1

a1

. .

.. = .. where F.

an

(1.1.5)

an

The set Fn , modeled over the field F and equipped with the above operations,

is a prototype example of a vector space.

More generally, we say that a set U is a vector space over a field F if U is

non-empty and there is an operation called addition, denoted by +, between

the elements of U , called vectors, and another operation called scalar multiplication between elements in F, called scalars, and vectors, such that the

following axioms hold.

(1) (Closure) For u, v U , we have u + v U . For u U and a F, we

have au U .

(2) (Commutativity) For u, v U , we have u + v = v + u.

(3) (Associativity of addition) For u, v, w U , we have u + (v + w) =

(u + v) + w.

(4) (Existence of zero vector) There is a vector, called zero and denoted by 0,

such that u + 0 = u for any u U .

(5) (Existence of additive inverse) For any u U , there is a vector, denoted

as (u), such that u + (u) = 0.

(6) (Associativity of scalar multiplication) For any a, b F and u U , we

have a(bu) = (ab)u.

(7) (Property of unit scalar) For any u U , we have 1u = u.

(8) (Distributivity) For any a, b F and u, v U , we have (a+b)u = au+bu

and a(u + v) = au + av.

As in the case of the definition of a field, we see that it readily follows from

the definition that zero vector and additive inverse vectors are all unique in

a vector space. Besides, any vector multiplied by zero scalar results in zero

vector. That is, 0u = 0 for any u U .

Other examples of vector spaces (with obviously defined vector addition and

scalar multiplication) include the following.

(1) The set of all polynomials with coefficients in F defined by

P = {a0 + a1 t + + an t n | a0 , a1 , . . . , an F, n N},

(1.1.6)

Vector spaces

(2) The set of all real-valued continuous functions over the interval [a, b] for

a, b R and a < b usually denoted by C[a, b].

(3) The set of real-valued solutions to the differential equation

an

dn x

dx

+ + a1

+ a0 x = 0,

dt n

dt

a0 , a1 , . . . , an R.

(1.1.7)

(4) In addition, we can also consider the set of arrays of scalars in F consisting

of m rows of vectors in Fn or n columns of vectors in Fm of the form

,

(aij ) =

(1.1.8)

am1 am2 . . . amn

where aij F, i = 1, . . . , m, j = 1, . . . , n, called an m by n or m n

matrix and each aij is called an entry or component of the matrix. The set

of all m n matrices with entries in F may be denoted by F(m, n). In

particular, F(m, 1) or F(1, n) is simply Fm or Fn . Elements in F(n, n) are

also called square matrices.

1.1.3 Matrices

Here we consider some of the simplest manipulations on, and properties of,

matrices.

Let A be the matrix given in (1.1.8). Then At , called the transpose of A, is

defined to be

t

.

A =

(1.1.9)

a1n a2n . . . amn

Of course, At F(n, m). Simply put, At is a matrix obtained from taking the

row (column) vectors of A to be its corresponding column (row) vectors.

For A F(n, n), we say that A is symmetric if A = At , or skew-symmetric

or anti-symmetric if At = A. The sets of symmetric and anti-symmetric

matrices are denoted by FS (n, n) and FA (n, n), respectively.

It is clear that (At )t = A.

It will now be useful to introduce the notion of dot product. For any two

vectors u = (a1 , . . . , an ) and v = (b1 , . . . , bn ) in Fn , their dot product uv F

is defined to be

u v = a1 b1 + + an bn .

(1.1.10)

(1) (Commutativity) u v = v u for any u, v Fn .

(2) (Associativity and homogeneity) u (av + bw) = a(u v) + b(u w) for

any u, v, w Fn and a, b F.

With the notion of dot product, we can define the product of two matrices

A F(m, k) and B F(k, n) by

C = (cij ) = AB,

i = 1, . . . , m,

j = 1, . . . , n,

(1.1.11)

where cij is the dot product of the ith row of A and the j th column of B. Thus

AB F(m, n).

Alternatively, if we use u, v to denote column vectors in Fn , then

u v = ut v.

(1.1.12)

That is, the dot product of u and v may be viewed as a matrix product of the

1 n matrix ut and n 1 matrix v as well.

Matrix product (or matrix multiplication) enjoys the following properties.

(1) (Associativity of scalar multiplication) a(AB) = (aA)B = A(aB) for

any a F and any A F(m, k), B F(k, n).

(2) (Distributivity) A(B + C) = AB + AC for any A F(m, k) and B, C

F(k, n); (A + B)C = AC + BC for any A, B F(m, k) and C F(k, n).

(3) (Associativity) A(BC) = (AB)C for any A F(m, k), B F(k, l),

C F(l, n).

Alternatively, if we express A F(m, k) and B F(k, n) as made of m row

vectors and n column vectors, respectively, rewritten as

A1

.

A=

.. ,

Am

B = (B1 , . . . , Bn ),

(1.1.13)

Vector spaces

A1 B1

A1 B2

A2 B1 A2 B2

AB =

Am B1 Am B2

A1

.

=

.. (B1 , . . . , Bn )

A1 Bn

A2 Bn

Am B n

Am

A1 B1

A2 B1

=

A m B1

A1 B2

A2 B2

Am B2

A1 Bn

A2 Bn

,

Am Bn

(1.1.14)

which suggests that matrix multiplication may be carried out with legitimate

multiplications executed over appropriate matrix blocks.

If A F(m, k) and B F(k, n), then At F(k, m) and B t F(n, k) so

that B t At F(n, m). Regarding how AB and B t At are related, here is the

conclusion.

Theorem 1.1 For A F(m, k) and B F(k, n), there holds

(AB)t = B t At .

(1.1.15)

Other matrices in F(n, n) having interesting properties include the

following.

(1) Diagonal matrices are of the form A = (aij ) with aij = 0 whenever

i

= j . The set of diagonal matrices is denoted as FD (n, n).

(2) Lower triangular matrices are of the form A = (aij ) with aij = 0 whenever j > i. The set of lower triangular matrices is denoted as FL (n, n).

(3) Upper triangular matrices are of the form A = (aij ) with aij = 0 whenever i > j . The set of upper triangular matrices is denoted as FU (n, n).

There is a special element in F(n, n), called the identity matrix, or unit

matrix, and denoted by In , or simply I , which is a diagonal matrix whose

diagonal entries are all 1 (unit scalar) and off-diagonal entries are all 0. For

any A F(n, n), we have AI = I A = A.

is some B F(n, n) such that

AB = BA = I.

(1.1.16)

In this situation, B is unique (cf. Exercise 1.1.7) and called the inverse of A

and denoted by A1 .

If A, B F(n, n) are such that AB = I , then we say that A is a left inverse

of B and B a right inverse of A. It can be shown that a left or right inverse is

simply the inverse. In other words, if A is a left inverse of B, then both A and

B are invertible and the inverses of each other.

If A R(n, n) enjoys the property AAt = At A = I , then A is called an

orthogonal matrix. For A = (aij ) C(m, n), we adopt the notation A = (a ij )

for taking the complex conjugate of A and use A to denote taking the complex

t

conjugate of the transpose of A, A = A , which is also commonly referred

to as taking the Hermitian conjugate of A. If A C(n, n), we say that A is

Hermitian symmetric, or simply Hermitian, if A = A, and skew-Hermitian

or anti-Hermitian, if A = A. If A C(n, n) enjoys the property AA =

A A = I , then A is called a unitary matrix. We will see the importance of

these notions later.

Exercises

1.1.1 Show that it follows from the definition of a field that zero, unit, additive,

and multiplicative inverse scalars are all unique.

1.1.2 Let p N be a prime and [n] Zp . Find [n] and prove the existence

of [n]1 when [n]

= [0]. In Z5 , find [4] and [4]1 .

1.1.3 Show that it follows from the definition of a vector space that both zero

and additive inverse vectors are unique.

1.1.4 Prove the associativity of matrix multiplication by showing that

A(BC) = (AB)C for any A F(m, k), B F(k, l), C F(l, n).

1.1.5 Prove Theorem 1.1.

1.1.6 Let A F(n, n) (n 2) and rewrite A as

A1 A2

,

(1.1.17)

A=

A3 A4

where A1 F(k, k), A2 F(k, l), A3 F(l, k), A4 F(l, l), k, l 1,

k + l = n. Show that

At1 At3

t

A =

.

(1.1.18)

At2 At4

Vector spaces

1.1.7 Prove that the inverse of an invertible matrix is unique by showing the

fact that if A, B, C F(n, n) satisfy AB = I and CA = I then B = C.

1.1.8 Let A C(n, n). Show that A is Hermitian if and only if iA is antiHermitian.

Let U be a vector space over a field F and V U a non-empty subset of U .

We say that V is a subspace of U if V is a vector space over F with the

inherited addition and scalar multiplication from U . It is worth noting that,

when checking whether a subset V of a vector space U becomes a subspace, one only needs to verify the closure axiom (1) in the definition of a

vector space since the rest of the axioms follow automatically as a consequence of (1).

The two trivial subspaces of U are those consisting only of zero vector, {0},

and U itself. A nontrivial subspace is also called a proper subspace.

Consider the subset Pn (n N) of P defined by

Pn = {a0 + a1 t + + an t n | a0 , a1 , . . . , an F}.

(1.2.1)

Consider the set Sa of all vectors (x1 , . . . , xn ) in Fn satisfying the equation

x1 + + xn = a,

(1.2.2)

Let u1 , . . . , uk be vectors in U . The linear span of {u1 , . . . , uk }, denoted by

Span{u1 , . . . , uk }, is the subspace of U defined by

Span{u1 , . . . , uk } = {u U | u = a1 u1 + + ak uk , a1 , . . . , ak F}.

(1.2.3)

Thus, if u Span{u1 , . . . , uk }, then there are a1 , . . . , ak F such that

u = a1 u1 + + ak uk .

(1.2.4)

u1 , . . . , uk . Therefore, zero vector 0 is linearly dependent on any finite set of

vectors.

If U = Span{u1 , . . . , uk }, we also say that U is generated by the vectors

u1 , . . . , u k .

For Pn defined in (1.2.1), we have Pn = Span{1, t, . . . , t n }. Thus

Pn is generated by 1, t, . . . , t n . Naturally, for two elements p, q in

identify p and q if and only if all the coefficients of p and q of like powers of

t coincide in F, or, ai = bi for all i = 0, 1, . . . , n.

In Fn , define

e1 = (1, 0, 0, . . . , 0),

e2 = (0, 1, 0, . . . , 0),

en = (0, 0, . . . , 0, 1).

(1.2.5)

Thus, for S0 defined in (1.2.2), we have

(x1 , x2 , . . . , xn ) = (x2 + + xn )e1 + x2 e2 + + xn en

= x2 (e2 e1 ) + + xn (en e1 ),

(1.2.6)

S0 = Span{e2 e1 , . . . , en e1 }.

(1.2.7)

For F(m, n), we define Mij F(m, n) to be the vector such that all its

entries vanish except that its entry at the position (i, j ) (at the ith row and j th

column) is 1, i = 1, . . . , m, j = 1, . . . , n. We have

F(m, n) = Span{Mij | i = 1, . . . , m, j = 1, . . . , n}.

(1.2.8)

The notion of spans can be extended to cover some useful situations. Let U

be a vector space and S be a (finite or infinite) subset of U . Define

Span(S) = the set of linear combinations

of all possible finite subsets of S.

(1.2.9)

is spanned or generated by the set of vectors S.

As an example, we have

P = Span{1, t, . . . , t n , . . . }.

(1.2.10)

P =

n=0 Pn .

(1.2.11)

Definition 1.3 Let u1 , . . . , um be m vectors in the vector space U over a

field F. We say that these vectors are linearly dependent if one of them may

be written as a linear span of the rest of them or linearly dependent on the rest

10

Vector spaces

a1 , . . . , am F where (a1 , . . . , am )

= (0, . . . , 0) such that

a1 u1 + + am um = 0.

(1.2.12)

only possible vector (a1 , . . . , am ) Fn to make (1.2.12) fulfilled is the zero

vector, (0, . . . , 0).

To proceed further, we need to consider the following system of linear

equations

= 0,

a11 x1 + + a1n xn

(1.2.13)

.................. ... ...

am1 x1 + + amn xn = 0,

over F with unknowns x1 , . . . , xn .

Theorem 1.4 In the system (1.2.13), if m < n, then the system has a nontrivial

solution (x1 , . . . , xn )

= (0, . . . , 0).

Proof We prove the theorem by using induction on m + n.

The beginning situation is m + n = 3 when m = 1 and n = 2. It is clear that

we always have a nontrivial solution.

Assume that the statement of the theorem is true when m + n k where

k 3.

Let m + n = k + 1. If k = 3, the condition m < n implies m = 1, n = 3 and

the existence of a nontrivial solution is obvious. Assume then k 4. If all the

coefficients of the variable x1 in (1.2.13) are zero, i.e. a11 = = am1 = 0,

then x1 = 1, x2 = = xn = 0 is a nontrivial solution. So we may assume

one of the coefficients of x1 is nonzero. Without loss of generality, we assume

a11

= 0. If m = 1, there is again nothing to show. Assume m 2. Dividing the

first equation in (1.2.13) by a11 if necessary, we can further assume a11 = 1.

Then, adding the (ai1 )-multiple of the first equation into the ith equation, in

(1.2.13), for i = 2, . . . , m, we arrive at

= 0,

x1 + a12 x2 + + a1n xn

= 0,

b22 x2 + + b2n xn

(1.2.14)

..................

... ...

bm2 x2 + + bmn xn = 0.

The system below the first equation in (1.2.14) contains m 1 equations and

n 1 unknowns x2 , . . . , xn . Of course, m 1 < n 1. So, in view of the

11

solution into the first equation in (1.2.14) to determine the remaining unknown

x1 , we see that the existence of a nontrivial solution to the original system

(1.2.13) follows.

The importance of Theorem 1.4 is seen in the following.

Theorem 1.5 Any set of more than m vectors in Span{u1 , . . . , um } must be

linearly dependent.

Proof Let v1 , . . . , vn Span{u1 , . . . , um } be n vectors where n > m.

Consider the possible linear dependence relation

x1 v1 + + xn vn = 0,

(1.2.15)

for some x1 , . . . , xn F.

Since each vj Span{u1 , . . . , um }, j = 1, . . . , n, there are scalars aij F

(i = 1, . . . , m, j = 1, . . . , n) such that

vj =

m

aij ui ,

j = 1, . . . , n.

(1.2.16)

i=1

m

n

xj

aij ui = 0.

j =1

m

n

aij xj ui = 0,

i=1

(1.2.17)

i=1

(1.2.18)

j =1

n

aij xj = 0,

i = 1, . . . , m.

(1.2.19)

j =1

This system of equations is exactly the system (1.2.13) which allows a nontrivial solution in view of Theorem 1.4. Hence the proof follows.

We are now prepared to study in the next section several fundamental properties of vector spaces.

12

Vector spaces

Exercises

1.2.1 Let U1 and U2 be subspaces of a vector space U . Show that U1 U2 is

a subspace of U if and only if U1 U2 or U2 U1 .

1.2.2 Let Pn denote the vector space of the polynomials of degrees up to n

over a field F expressed in terms of a variable t. Show that the vectors

1, t, . . . , t n in Pn are linearly independent.

1.2.3 Show that the vectors in Fn defined in (1.2.5) are linearly independent.

1.2.4 Show that S0 defined in (1.2.2) may also be expressed as

S0 = Span{e1 en , . . . , en1 en },

(1.2.20)

1

1

1

1, , . . . , n2 , 2 n1 1 , e1 en , . . . , en1 en , (1.2.21)

2

2

2

1.2.5 Show that FS (n, n), FA (n, n), FD (n, n), FL (n, n), and FU (n, n) are all

subspaces of F(n, n).

1.2.6 Let u1 , . . . , un (n 2) be linearly independent vectors in a vector

space U and set

vi1 = ui1 + ui ,

i = 2, . . . , n;

vn = un + u1 .

(1.2.22)

1.2.7 Let F be a field. For any two vectors u = (a1 , . . . , an ) and v =

(b1 , . . . , bn ) in Fn (n 2), viewed as matrices, we see that the matrix product A = ut v lies in F(n, n). Prove that any two row vectors of

A are linearly dependent. What happens to the column vectors of A?

1.2.8 Consider a slightly strengthened version of the second part of Exercise

1.2.1 above: Let U1 , U2 be subspaces of a vector space U , U1

= U ,

U2

= U . Show without using Exercise 1.2.1 that there exists a vector in U which lies outside U1 U2 . Explain how you may apply the

conclusion of this exercise to prove that of Exercise 1.2.1.

1.2.9 (A challenging extension of the previous exercise) Let U1 , . . . , Uk be k

subspaces of a vector space U over a field of characteristic 0. If Ui

= U

for i = 1, . . . , k, show that there is a vector in U which lies outside

ki=1 Ui .

1.2.10 For A F(m, n) and B F(n, m) with m > n show that AB as an

element in F(m, m) can never be invertible.

13

Let U be a vector space over a field F, take u1 , . . . , un U , and set

V = Span{u1 , . . . , un }. Eliminating linearly dependent vectors from the set

{u1 , . . . , un } if necessary, we can certainly assume that the vectors u1 , . . . , un

are already made linearly independent. Thus, any vector u V may take

the form

u = a1 u1 + + an un ,

a1 , . . . , an F.

(1.3.1)

must be unique. In fact, if we also have

u = b1 u1 + + bn un ,

b1 , . . . , bn F,

(1.3.2)

(an bn )un = 0. Since u1 , . . . , un are linearly independent, we obtain

a1 = b1 , . . . , an = bn and the uniqueness follows.

Furthermore, if there is another set of vectors v1 , . . . , vm in U such that

Span{v1 , . . . , vm } = Span{u1 , . . . , un },

(1.3.3)

linearly independent, then m = n. This observation leads to the following.

Definition 1.6 If there are linearly independent vectors u1 , . . . , un U such

that U = Span{u1 , . . . , un }, then U is said to be finitely generated and the set

of vectors {u1 , . . . , un } is called a basis of U . The number of vectors in any

basis of a finitely generated vector space, n, is independent of the choice of the

basis and is referred to as the dimension of the finitely generated vector space,

written as dim(U ) = n. A finitely generated vector space is also said to be

of finite dimensionality or finite dimensional. If a vector space U is not finite

dimensional, it is said to be infinite dimensional, also written as dim(U ) = .

As an example of an infinite-dimensional vector space, we show that when

R is regarded as a vector space over Q, then dim(R) = . In fact, recall

that a real number is called an algebraic number if it is the zero of a polynomial with coefficients in Q. We also know that there are many non-algebraic

numbers in R, called transcendental numbers. Let be such a transcendental number. Then for any n = 1, 2, . . . the numbers 1, , 2 , . . . , n are linearly independent in the vector space R over the field Q. Indeed, if there are

r0 , r1 , r2 , . . . , rn Q so that

r0 + r1 + r2 2 + + rn n = 0,

(1.3.4)

14

Vector spaces

of the nontrivial polynomial

p(t) = r0 + r1 t + r2 t 2 + + rn t n ,

(1.3.5)

dimensional over Q.

The following theorem indicates that it is fairly easy to construct a basis for

a finite-dimensional vector space.

Theorem 1.7 Let U be an n-dimensional vector space over a field F. Any n

linearly independent vectors in U form a basis of U .

Proof Let u1 , . . . , un U be linearly independent vectors. We only need to

show that they span U . In fact, take any u U . We know that u1 , . . . , un , u

are linearly dependent. So there is a nonzero vector (a1 , . . . , an , a) Fn+1

such that

a1 u1 + + an un + au = 0.

(1.3.6)

Of course, a

= 0, otherwise it contradicts the assumption that u1 , . . . , un

are linearly independent. So u = (a 1 )(a1 u1 + + an un ). Thus u

Span{u1 , . . . , un }.

Definition 1.8 Let {u1 , . . . , un } be a basis of the vector space U . Given u U

there are unique scalars a1 , . . . , an F such that

u = a1 u1 + + an un .

(1.3.7)

coordinate vector, of the vector u with respect to the basis {u1 , . . . , un }.

It will be interesting to investigate the relation between the coordinate vectors of a vector under different bases.

Let U = {u1 , . . . , un } and V = {v1 , . . . , vn } be two bases of the vector space

U . For u U , let (a1 , . . . , an ) Fn and (b1 , . . . , bn ) Fn be the coordinate

vectors of u with respect to U and V, respectively. Thus

u = a1 u1 + + an un = b1 v1 + + bn vn .

(1.3.8)

vj =

n

i=1

aij ui ,

j = 1, . . . , n.

(1.3.9)

15

matrix. Inserting (1.3.9) into (1.3.8), we have

n

n

n

ai ui =

aij bj ui .

(1.3.10)

i=1

i=1

j =1

ai =

n

aij bj ,

i = 1, . . . , n.

(1.3.11)

j =1

Note that the relation (1.3.9) between bases may be formally and conveniently expressed in a matrix form as

(v1 , . . . , vn ) = (u1 , . . . , un )A,

or concisely V = U A, or

v1

.

. = At

.

vn

u1

..

. ,

(1.3.12)

(1.3.13)

un

where multiplications between scalars and vectors are made in a well defined

manner. On the other hand, the relation (1.3.11) between coordinate vectors

may be rewritten as

a1

b1

.

. = A .. ,

(1.3.14)

.

.

an

bn

or

(a1 , . . . , an ) = (b1 , . . . , bn )At .

(1.3.15)

Exercises

1.3.1 Let U be a vector space with dim(U ) = n 2 and V a subspace of U

with a basis {v1 , . . . , vn1 }. Prove that for any u U \ V the vectors

u, v1 , . . . , vn1 form a basis for U .

1.3.2 Show that dim(F(m, n)) = mn.

1.3.3 Determine dim(FS (n, n)), dim(FA (n, n)), and dim(FD (n, n)).

1.3.4 Let P be the vector space of all polynomials with coefficients in a field

F. Show that dim(P) = .

16

Vector spaces

1.3.5 Consider the vector space R3 and the bases U = {e1 , e2 , e3 } and

V = {e1 , e1 + e2 , e1 + e2 + e3 }. Find the basis transition matrix A from

U into V satisfying V = U A. Find the coordinate vectors of the given

vector (1, 2, 3) R3 with respect to the bases U and V, respectively,

and relate these vectors with the matrix A.

1.3.6 Prove that a basis transition matrix must be invertible.

1.3.7 Let U be an n-dimensional vector space over a field F where n 2

(say). Consider the following construction.

(i)

(ii)

(iii)

(iv)

Take u1 U \ {0}.

Take u2 U \ Span{u1 }.

Take (if any) u3 U \ Span{u1 , u2 }.

In general, take ui U \ Span{u1 , . . . , ui1 } (i 2).

that is, it will not be possible anymore to get un+1 , and that the vectors

u1 , u2 , . . . , un so obtained form a basis of U .

Let U be an n-dimensional vector space over a field F. A functional (also called

a form or a 1-form) f over U is a linear function f : U F satisfying

f (u + v) = f (u) + f (v),

u, v U ;

f (au) = af (u),

a F, u U.

(1.4.1)

the sum of f and g, denoted by f + g, by

(f + g)(u) = f (u) + g(u),

u U.

(1.4.2)

called the scalar multiple of a with f , denoted by af , by

(af )(u) = af (u),

u U.

(1.4.3)

It is a simple exercise to check that these two operations make the set of all

functionals over U a vector space over F. This vector space is called the dual

space of U , denoted by U .

Let {u1 , . . . , un } be a basis of U . For any f U and any u = a1 u1 + +

an un U , we have

n

n

f (u) = f

ai ui =

ai f (ui ).

(1.4.4)

i=1

i=1

17

f1 = f (u1 ), . . . , fn = f (un ).

(1.4.5)

f (u) =

n

for any u =

ai fi

i=1

n

ai ui U.

(1.4.6)

i=1

(1.4.5).

Thus, we have seen a well-defined 1-1 correspondence

U Fn ,

f (f1 , . . . , fn ).

(1.4.7)

Especially we may use u1 , . . . , un to denote the elements in U corresponding to the vectors e1 , . . . , en in Fn given by (1.2.5). Then we have

0,

i

= j,

ui (uj ) = ij =

i, j = 1, . . . , n.

(1.4.8)

1,

i = j,

It is clear that u1 , . . . , un are linearly independent and span U because an

element f of U satisfying (1.4.5) is simply given by

f = f1 u1 + + fn un .

(1.4.9)

In other words, {u1 , . . . , un } is a basis of U , commonly called the dual basis

of U with respect to the basis {u1 , . . . , un } of U . In particular, we have seen

that U and U are of the same dimensionality.

Let U = {u1 , . . . , un } and V = {v1 , . . . , vn } be two bases of the vector space U . Let their dual bases be denoted by U = {u1 , . . . , un } and

V = {v1 , . . . , vn }, respectively. Suppose that the bases U and V are related

through

uj

n

aij vi ,

j = 1, . . . , n.

(1.4.10)

i=1

n

n

n

ui (vj ) = ui

akj uk =

akj ui (uk ) =

akj ik = aij ,

k=1

ui (vj )

k=1

n

k=1

aki

vk (vj )

(1.4.11)

k=1

n

k=1

aki

kj = aj i ,

(1.4.12)

18

Vector spaces

the correspondence relation

uj =

n

aj i vi ,

j = 1, . . . , n.

(1.4.13)

i=1

(u1 , . . . , un ) = (v1 , . . . , vn )At ,

or

u1

.

. = A

un

v1

..

. .

(1.4.14)

(1.4.15)

vn

u = a1 u1 + + an un = b1 v1 + + bn vn ,

(1.4.16)

the discussion in the previous section and the above immediately allow us

to get

bi =

n

aj i aj ,

i = 1, . . . , n.

(1.4.17)

j =1

(b1 . . . , bn ) = (a1 . . . , an )A,

or

b1

.

. = At

bn

a1

..

. .

(1.4.18)

(1.4.19)

an

Comparing the above results with those established in the previous section,

we see that, with respect to bases and dual bases, the coordinates vectors in U

and U follow opposite rules of correspondence. For this reason, coordinate

vectors in U are often called covariant vectors, and those in U contravariant

vectors.

Using the relation stated in (1.4.8), we see that we may naturally view

u1 , . . . , un as elements in (U ) = U so that they form a basis of U dual

to {u1 , . . . , un } since

ui (uj )

uj (ui )

= ij =

0,

i

= j,

1,

i = j,

19

i, j = 1, . . . , n.

(1.4.20)

u = a1 u1 + + an un .

(1.4.21)

have seen that the identification just made spells out the relationship

U = U,

(1.4.22)

Notationally, for u U and u U , it is often convenient to rewrite u (u),

which is linear in both the u and u arguments, as

u (u) = u , u.

(1.4.23)

u (u) = u(u ),

u U,

u U ,

(1.4.24)

u , u = u, u ,

u U,

u U .

(1.4.25)

the subset of U given by

S 0 = {u U | u , u = 0, u S}.

(1.4.26)

subspace of U . Likewise, for any nonempty subset S U , we can define the

0

annihilator of S , S , as the subset

S = {u U | u , u = 0, u S }

0

(1.4.27)

0

Exercises

1.4.1 Let F be a field. Describe the dual spaces F and (F2 ) .

1.4.2 Let U be a finite-dimensional vector space. Prove that for any vectors

u, v U (u

= v) there exists an element f U such that f (u)

= f (v).

1.4.3 Let U be a finite-dimensional vector space and f, g U . For any v

U , f (v) = 0 if and only if g(v) = 0. Show that f and g are linearly

dependent.

20

Vector spaces

V = {(x1 , . . . , xn ) Fn | x1 + + xn = 0}.

(1.4.28)

f (x1 , . . . , xn ) = c

n

xi ,

(x1 , . . . , xn ) Fn ,

(1.4.29)

i=1

for some c F.

1.4.5 Let U = P2 and f, g, h U be defined by

f (p) = p(1),

g(p) = p(0),

h(p) = p(1),

p(t) P2 .

(1.4.30)

(ii) Find a basis B of U which is dual to B .

1.4.6 Let U be an n-dimensional vector space and V be an m-dimensional

subspace of U . Show that the annihilator V 0 is an (n m)-dimensional

subspace of U . In other words, there holds the dimensionality equation

dim(V ) + dim(V 0 ) = dim(U ).

(1.4.31)

subspace of U . Show that V 00 = (V 0 )0 = V .

Let U be a vector space and V , W its subspaces. It is clear that V W is also

a subspace of U but V W in general may fail to be a subspace of U . The

smallest subspace of U that contains V W should contain all vectors in U of

the form v + w where v V and w W . Such an observation motivates the

following definition.

Definition 1.9 If U is a vector space and V , W its subspaces, the sum of V

and W , denoted by V + W , is the subspace of U given by

V + W {u U | u = v + w, v V , w W }.

(1.5.1)

of U containing V W will be assigned as an exercise.

21

for V and W , respectively, of the forms

BV = {u1 , . . . , uk , v1 , . . . , vl },

BW = {u1 , . . . , uk , w1 , . . . , wm }. (1.5.2)

V + W = Span{u1 , . . . , uk , v1 , . . . , vl , w1 , . . . , wm }.

(1.5.3)

fact, we only need to show that the vectors u1 , . . . , uk , v1 , . . . , vl , w1 , . . . , wm

are linearly independent. For this purpose, consider the linear relation

a1 u1 + + ak uk + b1 v1 + + bl vl + c1 w1 + + cm wm = 0, (1.5.4)

where a1 , . . . , ak , b1 , . . . , bl , c1 , . . . , cm are scalars. We claim that

w = c1 w1 + + cm wm = 0.

(1.5.5)

have w W . So w V W , which is false since u1 , . . . , uk , w1 , . . . , wm are

linearly independent. Thus (1.5.5) follows and c1 = = cm = 0. Applying

(1.5.5) in (1.5.4), we immediately have a1 = = ak = b1 = = bl = 0.

Therefore we can summarize the above discussion by concluding with the

following theorem.

Theorem 1.10 The following general dimensionality formula

dim(V + W ) = dim(V ) + dim(W ) dim(V W )

(1.5.6)

is valid for the sum of any two subspaces V and W of finite dimensions in a

vector space.

Of great importance is the situation when dim(V W ) = 0 or V W = {0}.

In this situation, the sum is called direct sum, and rewritten as V W . Thus,

we have

dim(V W ) = dim(V ) + dim(W ).

(1.5.7)

Theorem 1.11 The sum of two subspaces V and W of U is a direct sum if and

only if each vector u in V + W may be expressed as the sum of a unique vector

v V and a unique vector w W .

22

Vector spaces

be expressed as

u = v1 + w1 = v2 + w2 ,

v1 , v2 V ,

w1 , w2 W.

(1.5.8)

w2 w1 = 0 and the stated uniqueness follows.

Suppose that any u V + W can be expressed as u = v + w for some

unique v V and w W . If V W

= {0}, take x V W with x

= 0. Then

zero vector 0 may be expressed as 0 = x + (x) with x V and (x) W ,

which violates the stated uniqueness since 0 = 0 + 0 with 0 V and 0 W ,

as well.

Let V be a subspace of an n-dimensional vector space U and BV =

{v1 , . . . , vk } be a basis of V . Extend BV to get a basis of U , say

{v1 , . . . , vk , w1 , . . . , wl }, where k + l = n. Define

W = Span{w1 , . . . , wl }.

(1.5.9)

or simply complement, of V in U . Besides, the subspaces V and W are said to

be mutually complementary in U .

We may also build up a vector space from any two vector spaces, say V and

W , over the same field F, as a direct sum of V and W . To see this, we construct

vectors of the form

u = (v, w),

v V,

w W,

(1.5.10)

u1 + u2 = (v1 , w1 ) + (v2 , w2 ) = (v1 + v2 , w2 + w2 ),

v1 , v2 V ,

au = a(v, w) = (aw, av),

w1 , w2 W,

v V,

w W,

(1.5.11)

a F.

(1.5.12)

It is clear that the set U of all vectors of the form (1.5.10) equipped with the

vector addition (1.5.11) and scalar multiplication (1.5.12) is a vector space over

F. Naturally we may identify V and W with the subspaces of U given by

V = {(v, 0) | v V },

W = {(0, w) | w W }.

(1.5.13)

this relation as U = V W as anticipated. Sometimes the vector space U so

constructed is also referred to as the direct product of V and W and rewritten

as U = V W . In this way, R2 may naturally be viewed as RR, for example.

23

similar manner we can define the sum

V = V1 + + Vk ,

(1.5.14)

each v V may be written as v = v1 + + vk for uniquely determined

v1 V1 , . . . , vk Vk , then we say that V is the direct sum of V1 , . . . , Vk and

rewrite such a relation as

V = V1 Vk .

(1.5.15)

checking whether the sum (1.5.14) is a direct sum. For example, the naive

condition

Vi Vj = {0},

i

= j,

i, j = 1, . . . , k,

(1.5.16)

To illustrate this subtlety, let us consider V = F2 and take

1

1

1

V1 = Span

, V2 = Span

, V3 = Span

.

0

1

1

(1.5.17)

It is clear that V1 , V2 , V3 satisfy (1.5.16) and V = V1 + V2 + V3 but V cannot

be a direct sum of V1 , V2 , V3 .

In fact, in such a general situation, a correct condition which replaces

(1.5.16) is

(1.5.18)

Vi

Vj = {0}, i = 1, . . . , k.

1j k,j

=i

In other words, when the condition (1.5.18) is fulfilled, then (1.5.15) is valid.

The proof of this fact is left as an exercise.

Exercises

1.5.1 For

U = {(x1 , . . . , xn ) Fn | x1 + + xn = 0},

V = {(x1 , . . . , xn ) Fn | x1 = = xn },

prove that Fn = U V .

(1.5.19)

24

Vector spaces

1.5.2 Consider the vector space of all n n matrices over a field F, denoted by

F(n, n). As before, use FS (n, n) and FA (n, n) to denote the subspaces of

symmetric and anti-symmetric matrices. Assume that the characteristic

of F is not equal to 2. For any M F(n, n), rewrite M as

M=

1

1

(M + M t ) + (M M t ).

2

2

(1.5.20)

1

1

Check that (M + M t ) FS (n, n) and (M M t ) FA (n, n). Use

2

2

this fact to prove the decomposition

F(n, n) = FS (n, n) FA (n, n).

(1.5.21)

1.5.3 Show that FL (n, n) FU (n, n) = FD (n, n).

1.5.4 Use FL (n, n) and FU (n, n) in F(n, n) to give an example for the dimensionality relation (1.5.6).

1.5.5 Let X = C[a, b] (a, b R and a < b) be the vector space of all realvalued continuous functions over the interval [a, b] and

b

Y = f X

f (t) dt = 0 .

(1.5.22)

a

(i) Prove that if R is identified with the set of all constant functions over

[a, b] then X = R Y .

(ii) For a = 0, b = 1, and f (t) = t 2 + t 1, find the unique c R and

g Y such that f (t) = c + g(t) for all t [0, 1].

1.5.6 Let U be a vector space and V , W its subspaces such that U = V + W .

If X is subspace of U , is it true that X = (X V ) + (X W )?

1.5.7 Let U be a vector space and V , W, X some subspaces of U such that

U = V W,

U = V X.

(1.5.23)

why not.

1.5.8 Let V1 , . . . , Vk be some subspaces of U and set V = V1 + + Vk .

Show that this sum is a direct sum if and only if one of the following

statements is true.

(i) V1 , . . . , Vk satisfy (1.5.18).

(ii) If non-overlapping sets of vectors

{v11 , . . . , vl11 }, . . . , {v1k , . . . , vlkk }

(1.5.24)

25

{v11 , . . . , vl11 } {v1k , . . . , vlkk }

(1.5.25)

is a basis of V .

(iii) There holds the dimensionality relation

dim(V ) = dim(V1 ) + + dim(Vk ).

(1.5.26)

In order to motivate the introduction of the concept of quotient spaces, we first

consider a concrete example in R2 .

Let v R2 be any nonzero vector. Then

V = Span{v}

(1.6.1)

represents the line passing through the origin and along (or opposite to) the

direction of v. More generally, for any u R2 , the coset

[u] = {u + w | w V } = {x R2 | x u V }

(1.6.2)

represents the line passing through the vector u and parallel to the vector v.

Naturally, we define [u1 ] + [u2 ] = {x + y | x [u1 ], y [u2 ]} and claim

[u1 ] + [u2 ] = [u1 + u2 ].

(1.6.3)

In fact, let z [u1 ] + [u2 ]. Then there exist x [u1 ] and y [u2 ] such that

z = x + y. Rewrite x, y as x = u1 + w1 , y = u2 + w2 for some w1 , w2 V .

Hence z = (u1 + u2 ) + (w1 + w2 ), which implies z [u1 + u2 ]. Conversely,

if z [u1 + u2 ], then there is some w V such that z = (u1 + u2 ) + w =

(u1 + w) + u2 . Since u1 + w [u1 ] and u2 [u2 ], we see that z [u1 ] + [u2 ].

Hence the claim follows.

From the property (1.6.3), we see clearly that the coset [0] = V serves as an

additive zero element among the set of all cosets.

Similarly, we may also naturally define a[u] = {ax | x [u]} for a R

where a

= 0. Note that this last restriction is necessary because otherwise 0[u]

would be a single-point set consisting of zero vector only. We claim

a[u] = [au],

a R \ {0}.

(1.6.4)

In fact, if z a[u], there is some x [u] such that z = ax. Since x [u],

there is some w V such that x = u + w. So z = au + aw which implies z

[au]. Conversely, if z [au], then there is some w V such that z = au + w.

Since z = a(u + a 1 w), we get z a[u]. So (1.6.4) is established.

26

Vector spaces

Since the coset [0] is already seen to be the additive zero when adding cosets,

we are prompted to define

0[u] = [0].

(1.6.5)

a[u] = [au],

a R,

u R2 .

(1.6.6)

We may examine how the above introduced addition between cosets and

scalar multiplication with cosets make the set of all cosets into a vector space

over R, denoted by R2 /V , and called the quotient space of R2 modulo V . As

investigated, the geometric meaning of R2 /V is that it is the set of all the lines

in R2 parallel to the vector v and that these lines can be added and multiplied

by real scalars so that the set of lines enjoys the structure of a real vector space.

There is no difficulty extending the discussion to the case of R3 with V a

line or plane through the origin.

More generally, the above quotient-space construction may be formulated

as follows.

Definition 1.12 Let U be a vector space over a field F and V a subspace of U .

The set of cosets represented by u U given by

[u] = {u + w | w V } = {u} + V u + V ,

(1.6.7)

[u] + [v] = [u + v],

u, v U,

a[u] = [au],

a F,

u U, (1.6.8)

forms a vector space over F, called the quotient space of U modulo V , and is

denoted by U/V .

Let BV = {v1 , . . . , vk } be a basis of V . Extend BV to get a basis of U , say

BU = {v1 , . . . , vk , u1 , . . . , ul }.

(1.6.9)

evident that these vectors span U/V . So we only need to show their linear

independence.

Consider the relation

a1 [u1 ] + + al [ul ] = 0,

a1 , . . . , al F.

(1.6.10)

b1 , . . . , bk F such that

a1 u1 + + al ul = b1 v1 + + bk vk .

(1.6.11)

27

As a consequence of the afore-going discussion, we arrive at the following

basic dimensionality relation

dim(U/V ) + dim(V ) = dim(U ).

(1.6.12)

way to find a basis for U/V . The quantity dim(U/V ) = dim(U ) dim(V ) is

sometimes called the codimension of the subspace V in U . The quotient space

U/V may be viewed as constructed from the space U after collapsing or

suppressing its subspace V .

Exercises

1.6.1 Let V be the subspace of R2 given by V = Span{(1, 1)}.

(i) Draw V in R2 .

(ii) Draw the cosets

S1 = (1, 1) + V ,

S2 = (2, 1) + V .

(1.6.13)

(iv) Draw the coset S3 = (2)S1 + S2 .

(v) Determine whether S3 is equal to (1, 0) + V (explain why).

1.6.2 Let V be the subspace of P2 (set of polynomials of degrees up to 2 and

with coefficients in R) satisfying

1

p(t) dt = 0, p(t) P2 .

(1.6.14)

1

(ii) Find a basis for the quotient space P2 /V and verify (1.6.12).

1.6.3 Describe the plane given in terms of the coordinates (x, y, z) R3 by

the equation

ax + by + cz = d,

(a, b, c)

= (0, 0, 0),

(1.6.15)

quotient space.

1.6.4 Let U be a vector space and V and W some subspaces of U . For u U ,

use [u]V and [u]W to denote the cosets u + V and u + W , respectively.

Show that, if V W , and u1 , . . . , uk U , then that [u1 ]V , . . . , [uk ]V

are linearly dependent implies that [u1 ]W , . . . , [uk ]W are linearly

dependent. In particular, if U is finite dimensional, then dim(U/V )

dim(U/W ), if V W , as may also be seen from using (1.6.12).

28

Vector spaces

It will be desirable to be able to evaluate the length or magnitude or

amplitude of any vector in a vector space. In other words, it will be useful to associate to each vector a quantity that resembles the notion of length of

a vector in (say) R3 . Such a quantity is generically called norm.

In this section, we take the field F to be either R or C.

Definition 1.13 Let U be a vector space over the field F. A norm over U is a

correspondence : U R such that we have the following.

(1) (Positivity) u 0 for u U and u = 0 only for u = 0.

(2) (Homogeneity) au = |a|u for a F and u U .

(3) (Triangle inequality) u + v u + v for u, v U .

A vector space equipped with a norm is called a normed space. If is

the specific norm of the normed space U , we sometimes spell this fact out by

stating normed space (U, ).

Definition 1.14 Let (U, ) be a normed space and {uk } a sequence in U .

For some u0 U , we say that uk u0 or {uk } converges to u0 as k if

lim u0 uk = 0,

(1.7.1)

also said to be the limit of the sequence {uk }.

The notions of convergence and limit are essential for carrying out calculus

in a normed space.

Let U be a finite-dimensional space. We can easily equip U with a norm.

For example, assume that B = {u1 , . . . , un } is a basis of U . For any u U ,

define

u1 =

n

i=1

|ai |,

where

u=

n

ai ui .

More generally, for p 1, we may set

n

1

n

p

p

|ai |

, where u =

ai ui .

up =

i=1

(1.7.2)

i=1

(1.7.3)

i=1

It can be shown that p also defines a norm over U . We will not check this

fact. What interests us here, however, is the limit

29

(1.7.4)

To prove (1.7.4), we note that the right-hand side of (1.7.4) is simply |ai0 |

for some i0 {1, . . . , n}. Thus, in view of (1.7.3), we have

n

1

p

1

p

up

|ai0 |

= |ai0 |n p .

(1.7.5)

i=1

lim sup up |ai0 |.

(1.7.6)

1

up |ai0 |p p = |ai0 |.

(1.7.7)

Thus,

lim inf up |ai0 |.

(1.7.8)

the notation

n

ai ui ,

(1.7.9)

u = max{|ai | | i = 1, . . . , n}, where u =

i=1

lim up = u ,

u U.

(1.7.10)

Thus we have seen that there are many ways to introduce a norm over a

vector space. So it will be important to be able to compare norms. For calculus,

it is obvious that the most important thing with regard to a norm is the notion

of convergence. In particular, assume there are two norms, say and ,

equipped over the vector space U . We hope to know whether convergence with

respect to norm implies convergence with respect to norm . This desire

motivates the introduction of the following concept.

Definition 1.15 Let U be a vector space and and two norms over U .

We say that is stronger than if convergence with respect to norm

implies convergence with respect to norm . More precisely, any convergent

sequence {uk } with limit u0 in (U, ) is also a convergent sequence with the

same limit in (U, ).

Regarding the above definition, we have the following.

30

Vector spaces

Theorem 1.16 Let U be a vector space and and two norms over U .

Then norm is stronger than norm if and only if there is a constant

C > 0 such that

u Cu,

u U.

(1.7.11)

Suppose that is stronger than but (1.7.11) does not hold. Thus there

is a sequence {uk } in U such that uk

= 0 (k = 1, 2, . . . ) and

uk

k,

uk

k = 1, 2, . . . .

(1.7.12)

Define

vk =

1

uk ,

uk

k = 1, 2, . . . .

(1.7.13)

vk

1

,

k

k = 1, 2, . . . .

(1.7.14)

Consequently, vk 0 as k with respect to norm . However, according to (1.7.13), we have vk = 1, k = 1, 2, . . . , and vk

0 as k with

respect to norm . This reaches a contradiction.

Definition 1.17 Let U be a vector space and and two norms over U .

We say that norms and are equivalent if convergence in norm is

equivalent to convergence in norm .

In view of Theorem 1.16, we see that norms and are equivalent if

and only if the inequality

C1 u u C2 u,

u U,

(1.7.15)

The following theorem regarding norms over finite-dimensional spaces is of

fundamental importance.

Theorem 1.18 Any two norms over a finite-dimensional space are equivalent.

Proof Let U be an n-dimensional vector space and B = {u1 , . . . , un } a basis

for U . Define the norm 1 by (1.7.2). Let be any given norm over U .

Then by the properties of norm we have

u

n

|ai |ui 2

i=1

n

31

|ai | = 2 u1 ,

(1.7.16)

i=1

2 = max{ui | i = 1, . . . , n}.

(1.7.17)

In order to show that is also stronger than 1 , we need to prove the

existence of a constant 1 > 0 such that

1 u1 u,

u U.

(1.7.18)

Suppose otherwise that (1.7.18) fails to be valid. In other words, the set of

ratios

u

u

U,

u

=

0

(1.7.19)

u1

does not have a positive infimum. Then there is a sequence {vk } in U such that

1

vk 1 vk ,

k

vk

= 0,

k = 1, 2, . . . .

(1.7.20)

Now set

wk =

1

vk ,

vk 1

k = 1, 2, . . . .

(1.7.21)

On the other hand, we have wk 1 = 1 (k = 1, 2, . . . ). Express wk with

respect to basis B as

wk = a1,k u1 + + an,k un ,

a1,k , . . . , an,k F,

k = 1, 2, . . . . (1.7.22)

1, 2, . . . ). Hence, by the BolzanoWeierstrass theorem, there is a subsequence

of {k}, denoted by {ks }, such that ks as s and

ai,ks some ai,0 F

as

s ,

i = 1, . . . , n.

(1.7.23)

Moreover, the triangle inequality gives us

w0 1 wks 1 wks w0 1 = 1 wks w0 1 ,

s = 1, 2, . . . .

(1.7.24)

However, substituting u = w0 wks in (1.7.16) and letting s , we see

that wks w0 as s with respect to norm as well which is false

because we already know that wk 0 as k with respect to norm .

32

Vector spaces

Summarizing the above study, we see that there are some constants 1 , 2 >

0 such that

1 u1 u 2 u1 ,

u U.

(1.7.25)

1 , 2 > 0 such that

1 u1 u 2 u1 ,

u U.

(1.7.26)

2

1

u u

u,

2

1

u U,

(1.7.27)

As an immediate application of Theorem 1.18, we have the following.

Theorem 1.19 A finite-dimensional normed space (U, ) is locally compact.

That is, any bounded sequence in U contains a convergent subsequence.

Proof Let {u1 , . . . , un } be any basis of U and introduce norm 1 by the

expression (1.7.2). Let {vk } be a bounded sequence in (U, ). Then Theorem 1.18 says that {vk } is also bounded in (U, 1 ). If we rewrite vk as

vk = a1,k u1 + + an,k un ,

a1,k , . . . , an,k F,

k = 1, 2, . . . , (1.7.28)

all bounded in F. Thus the BolzanoWeierstrass theorem indicates that there

are subsequences {ai,ks } (i = 1, . . . , n) which converge to some ai,0 F

(i = 1, . . . , n) as s . Consequently, setting v0 = a1,0 u1 + + an,0 un ,

we have v0 vks 1 0 as s . Thus v0 vks Cv0 vks 1 0

as s as well.

Let (U, ) be a finite-dimensional normed vector space and V a subspace

of U . Consider the quotient space U/V . As an exercise, it may be shown that

[u] = inf{v | v [u]},

[u] U/V ,

u U,

(1.7.29)

Exercises

1.7.1 Consider the vector space C[a, b] of the set of all real-valued continuous

functions over the interval [a, b] where a, b R and a < b. Define two

norms 1 and by setting

u1 =

|u(t)| dt,

t[a,b]

33

u C[a, b].

(1.7.30)

1.7.2 Over the vector space C[a, b] again, define norm p (p 1) by

b

p1

p

|u(t)| dt

, u C[a, b].

(1.7.31)

up =

a

lim up = u ,

u C[a, b].

(1.7.32)

1.7.3 Let (U, ) be a normed space and use U to denote the dual space of

U . For each u U , define

u = sup{|u (u)| | u U, u = 1}.

(1.7.33)

1.7.4 Let U be a vector space and U its dual space which is equipped with a

norm, say . Define

u = sup{|u (u)| | u U , u = 1}.

(1.7.34)

1.7.5 Prove that for any finite-dimensional normed space (U, ) with a given

subspace V we may indeed use (1.7.29) to define a norm for the quotient

space U/V .

1.7.6 Let denote the Euclidean norm of R2 which is given by

(1.7.35)

u = a12 + a22 , u = (a1 , a2 ) R2 .

Consider the subspace V = {(x, y) R2 | 2x y = 0} and the coset

[(1, 1)] in R2 modulo V .

(i) Find the unique vector v R2 such that v = [(1, 1)].

(ii) Draw the coset [(1, 1)] in R2 and the vector v found in part (i) and

explain the geometric content of the results.

2

Linear mappings

stating the definition and a discussion of the structural properties of linear mappings. We then introduce the notion of adjoint mappings and illustrate some

of their applications. We next focus on linear mappings from a vector space

into itself and study a series of important concepts such as invariance and

reducibility, eigenvalues and eigenvectors, projections, nilpotent mappings,

and polynomials of linear mappings. Finally we discuss the use of norms of

linear mappings and present a few analytic applications.

A linear mapping may be regarded as the simplest correspondence

between two vector spaces. In this section we start our study with the definition of linear mappings. We then discuss the matrix representation of a

linear mapping, composition of linear mappings, and the rank and nullity of

a linear mapping.

Let U and V be two vector spaces over the same field F. A linear mapping or

linear map or linear operator is a correspondence T from U into V , written as

T : U V , satisfying the following.

(1) (Additivity) T (u1 + u2 ) = T (u1 ) + T (u2 ), u1 , u2 U .

(2) (Homogeneity) T (au) = aT (u), a F, u U .

34

35

may also say that a linear mapping respects or preserves vector addition and

scalar multiplication.

The set of all linear mappings from U into V will be denoted by L(U, V ).

For S, T L(U, V ), we define S + T to be a mapping from U into V

satisfying

(S + T )(u) = S(u) + T (u),

u U.

(2.1.1)

V satisfying

(aT )(u) = aT (u),

u U.

(2.1.2)

We can directly check that the mapping addition (2.1.1) and scalar-mapping

multiplication (2.1.2) make L(U, V ) a vector space over F. We adopt the

notation L(U ) = L(U, U ).

As an example, consider the space of matrices, F(m, n). For

A = (aij ) F(m, n), define

a11

TA (x) = Ax =

am1

amn

a1n

x1

..

. ,

x1

.

n

x=

.. F .

xn

xn

(2.1.3)

have

a11

a1n

.

.

(2.1.4)

TA (e1 ) =

.. , . . . , TA (en ) = .. .

am1

amn

In other words, the images of e1 , . . . , en under the linear mapping TA are simply the column vectors of the matrix A, respectively.

Conversely, take any element T L(Fn , Fm ). Let v1 , . . . , vn Fm be

images of e1 , . . . , en under T such that

a11

m

.

=

.

T (e1 ) = v1 =

ai1 ei , . . . ,

.

am1

i=1

36

Linear mappings

a1n

m

.

=

.

ain ei ,

T (en ) = vn =

.

(2.1.5)

i=1

amn

has the image

x1

n

n

.

T (x) = T

xj en =

xj vj = (v1 , . . . , vn )

..

j =1

a11

am1

j =1

a1n

amn

xn

x1

..

. .

(2.1.6)

xn

of column vectors as images of e1 , . . . , en under T given in (2.1.5).

It may also be examined that mapping addition and scalar-mapping multiplication correspond to matrix addition and scalar-matrix multiplication.

In this way, as vector spaces, F(m, n) and L(Fn , Fm ) may be identified with

each other.

In general, let BU = {u1 , . . . , un } and BV = {v1 , . . . , vm } be bases of the

finite-dimensional vector spaces U and V over the field F, respectively. For

any T L(U, V ), we can write

T (uj ) =

m

aij vi ,

aij F,

i = 1, . . . , m,

j = 1, . . . , n.

(2.1.7)

i=1

see that the matrix A = (aij ) completely determines the mapping T . In other

words, to each mapping in L(U, V ), there corresponds a unique matrix in

F(m, n).

Conversely, for any A = (aij ) F(m, n), we define T (u1 ), . . . , T (un ) by

n

(2.1.7). Moreover, for any x U given by x =

xj uj for x1 , . . . , xn F,

we set

j =1

T (x) = T

n

j =1

xj uj =

n

xj T (uj ).

(2.1.8)

j =1

37

Thus we again see, after specifying bases for U and V , that we may identify

L(U, V ) with F(m, n) in a natural way.

After the above general description of linear mappings, especially their identification with matrices, we turn our attention to some basic properties of linear

mappings.

Let U, V , W be vector spaces over a field F of respective dimensions n, l, m.

For T L(U, V ) and S L(V , W ), we can define the composition of T and

S with the understanding

(S T )(x) = S(T (x)),

x U.

(2.1.9)

terms of the matrices of S and T .

To this end, let BU = {u1 , . . . , un }, BV = {v1 , . . . , vl }, BW = {w1 , . . . , wm }

be the bases of U, V , W , respectively, and A = (aij ) F(l, n) and B =

(bij ) F(m, l) be the correspondingly associated matrices of T and S, respectively. Then we have

l

l

aij vi =

aij S(vi )

(S T )(uj ) = S(T (uj )) = S

i=1

m

l

i=1 k=1

aij bki wk =

i=1

l

m

bik akj wi .

(2.1.10)

i=1 k=1

associated to the linear mapping S T with respect to the bases BU and BW ,

then C = BA. Hence the composition of linear mappings corresponds to the

multiplication of their associated matrices, in the same order. For this reason, it

is also customary to use ST to denote S T , when there is no risk of confusion.

The composition of linear mappings obviously enjoys the associativity

property

R (S T ) = (R S) T , R L(W, X), S L(V , W ), T L(U, V ),

(2.1.11)

as can be seen from

(R (S T ))(u) = R((S T )(u)) = R(S(T (u)))

(2.1.12)

38

Linear mappings

and

((R S) T )(u) = (R S)(T (u)) = R(S(T (u)))

(2.1.13)

for any u U .

Thus, when applying (2.1.11) to the situation of linear mappings between

finite-dimensional vector spaces and using their associated matrix representations, we obtain another proof of the associativity of matrix multiplication.

For T L(U, V ), the subset

N (T ) = {x U | T (x) = 0}

(2.1.14)

the kernel of T , and denoted as Kernel(T ) or T 1 (0). Here and in the sequel,

note that, in general, T 1 (v) denotes the set of preimages of v V under T .

That is,

T 1 (v) = {x U | T (x) = v}.

(2.1.15)

(2.1.16)

of U under T , and denoted as Image(T ) or T (U ).

For T L(U, V ), we say that T is one-to-one, or 1-1, or injective, if T (x)

=

T (y) whenever x, y U with x

= y. It is not hard to show that T is 1-1 if and

only if N(T ) = {0}. We say that T is onto or surjective if R(T ) = V .

A basic problem in linear algebra is to investigate whether the equation

T (x) = v

(2.1.17)

we see that (2.1.17) has a solution if and only if v R(T ), and the solution

is unique if and only if N (T ) = {0}. More generally, if v R(T ) and u U

is any particular solution of (2.1.17), then the set of all solutions of (2.1.17) is

simply the coset

[u] = {u} + N (T ) = u + N (T ).

(2.1.18)

set of solutions of (2.1.17) completely.

39

U/N (T ) = {{T 1 (v)} | v R(T )}.

(2.1.19)

v2 , T (u2 ) = v2 . Then [u1 ] = T 1 (v1 ), [u2 ] = T 1 (v2 ), and [u1 ] + [u2 ] =

[u1 + u2 ] = T 1 (v1 + v2 ); for a F, v R(T ), and u T 1 (v), we have

au T 1 (av) and a[u] = [au] = T 1 (av), which naturally give us the

construction of U/N (T ) as a vector space.

Next, we consider the quotient space V /R(T ), referred to as cokernel of

T , written as Cokernel(T ). Recall that V /R(T ) is the set of non-overlapping

cosets in V modulo R(T ) so that exactly [0] = R(T ). Therefore we have

V \ R(T ) =

[v],

(2.1.20)

where the union of cosets on the right-hand side of (2.1.20) is made over all

[v] V /R(T ) except the zero element [0]. In other words, the union of all

cosets in V /R(T ) \ {[0]} gives us the set of all such vectors v in V that the

equation (2.1.17) fails to have a solution in U . With set notation, this last statement is

[v].

(2.1.21)

{v V | T 1 (v) = } =

Thus, loosely speaking, the quotient space or cokernel V /R(T ) measures

the size of the set of vectors in V that are missed by the image of U

under T .

Definition 2.1 Let U and V be finite-dimensional vector spaces over a field F.

The nullity and rank of a linear mapping T : U V , denoted by n(T ) and

r(T ), respectively, are the dimensions of N (T ) and R(T ). That is,

n(T ) = dim(N (T )),

(2.1.22)

n(T ) = 0 and T is onto if and only if r(T ) = dim(V ) or dim(Cokernel(T )) =

0.

The following simple theorem will be of wide usefulness.

Theorem 2.2 Let U, V be vector spaces over a field F and T L(U, V ).

(1) If v1 , . . . , vk V are linearly independent and u1 , . . . , uk U are such

that T (u1 ) = v1 , . . . , T (uk ) = vk , then u1 , . . . , uk are linearly independent as well. In other words, the preimages of linear independent vectors

are also linearly independent.

40

Linear mappings

v1 = T (u1 ), . . . , vk = T (uk ) V are linearly independent as well. In

other words, the images of linearly independent vectors under a 1-1 linear

mapping are also linearly independent.

Proof (1) Consider the relation a1 u1 + + ak uk = 0, a1 , . . . , ak F.

Applying T to the above relation, we get a1 v1 + + ak vk = 0. Hence

a1 = = ak = 0.

(2) We similarly consider a1 v1 + + ak vk = 0. This relation immediately

gives us T (a1 u1 + + ak uk ) = 0. Using N (T ) = {0}, we deduce

a1 u1 + +ak uk = 0. Thus the linear independence of u1 , . . . , uk implies

a1 = = ak = 0.

The fact stated in the following theorem is known as the nullity-rank equation or simply rank equation.

Theorem 2.3 Let U, V be finite-dimensional vector spaces over a field F and

T L(U, V ). Then

n(T ) + r(T ) = dim(U ).

(2.1.23)

for the full space U written as {u1 , . . . , uk , w1 , . . . , wl }. Thus

R(T ) = Span{T (w1 ), . . . , T (wl )}.

(2.1.24)

We now show that T (w1 ), . . . , T (wl ) form a basis for R(T ) by establishing

their linear independence. To this end, consider b1 T (w1 ) + + bl T (wl ) = 0

for some b1 , . . . , bl F. Hence T (b1 w1 + + bl wl ) = 0 or b1 w1 + +

bl wl N(T ). So there are a1 , . . . , ak F such that b1 w1 + + bl wl =

a1 u1 + + ak uk . Since u1 , . . . , uk , w1 , . . . , wl are linearly independent, we

arrive at a1 = = ak = b1 = = bl = 0. In particular, r(T ) = l and the

rank equation is valid.

As an immediate application of the above theorem, we have the following.

Theorem 2.4 Let U, V be finite-dimensional vector spaces over a field F. If

there is some T L(U, V ) such that T is both 1-1 and onto, then there must

hold dim(U ) = dim(V ). Conversely, if dim(U ) = dim(V ), then there is some

T L(U, V ) which is both 1-1 and onto.

Proof If T is 1-1 and onto, then n(T ) = 0 and r(T ) = dim(V ). Thus

dim(U ) = dim(V ) follows from the rank equation.

41

{v1 , . . . , vn } be any bases of U and V , respectively. We can define a unique

linear mapping T : U V by specifying the images of u1 , . . . , un under T

to be v1 , . . . , vn . That is,

T (u1 ) = v1 , . . . , T (un ) = vn ,

(2.1.25)

so that

T (u) =

n

ai vi ,

u =

i=1

n

ai ui U.

(2.1.26)

i=1

We can slightly modify the proof of Theorem 2.4 to prove the following.

Theorem 2.5 Let U, V be finite-dimensional vector spaces over a field F so

that dim(U ) = dim(V ). If T L(U, V ) is either 1-1 or onto, then T must be

both 1-1 and onto. In this situation, there is a unique S L(V , U ) such that

S T : U U and T S : V V are identity mappings, denoted by IU and

IV , respectively, satisfying IU (u) = u, u U and IV (v) = v, v V .

Proof Suppose dim(U ) = dim(V ) = n. If T L(U, V ) is 1-1, then n(T ) =

0. So the rank equation gives us r(T ) = n = dim(V ). Hence T is onto. If

T L(U, V ) is onto, then r(T ) = n = dim(U ). So the rank equation gives us

n(T ) = 0. Hence T is 1-1. Now let {u1 , . . . , un } be any basis of U . Since T

is 1-1 and onto, we know that v1 , . . . , vn V defined by (2.1.25) form a basis

for V . Define now S L(V , U ) by setting

S(v1 ) = u1 , . . . , S(vn ) = un ,

(2.1.27)

so that

S(v) =

n

i=1

bi ui ,

v =

n

bi vi V .

(2.1.28)

i=1

S L(V , U ) follows.

If R L(V , U ) is another mapping such that R T = IU and T

R = IV . Then the associativity of the composition of linear mappings (2.1.11)

gives us the result R = R IV = R (T S) = (R T ) S = IU

S = S. Thus the stated uniqueness of S L(V , U ) follows as well.

Let U, V be vector spaces over a field F and T L(U, V ). We say that T is

invertible if there is some S L(V , U ) such that S T = IU and T S = IV .

42

Linear mappings

as T 1 .

The notion of inverse can be slightly relaxed to something refereed to as a

left or right inverse: We call an element S L(V , U ) a left inverse of T

L(U, V ) if

S T = IU ;

(2.1.29)

T R = IV .

(2.1.30)

It is interesting that if T is known to be invertible, then the left and right inverses coincide and are simply the inverse of T . To see this, we let T 1

L(V , U ) denote the inverse of T and use the associativity of composition of

mappings to obtain from (2.1.29) and (2.1.30) the results

S = S IV = S (T T 1 ) = (S T ) T 1 = IV T 1 = T 1 ,

(2.1.31)

and

R = IU R = (T 1 T ) R = T 1 (T R) = T 1 IV = T 1 ,

(2.1.32)

respectively.

It is clear that, regarding T L(U, V ), the condition (2.1.29) implies that

T is 1-1, and the condition (2.1.30) implies that T is onto. In view of Theorem 2.5, we see that when U, V are finite-dimensional and dim(U ) = dim(V ),

then T is invertible. Thus we have S = R = T 1 . On the other hand, if

dim(U )

= dim(V ), T can never be invertible and the notion of left and right

inverses is of separate interest.

As an example, we consider the linear mappings S : F3 F2 and T :

2

F F3 associated with the matrices

1 0

1 0 0

A=

, B = 0 1 ,

(2.1.33)

0 1 0

0 0

respectively according to (2.1.3). Then we may examine that S is a left inverse

of T (thus T is a right inverse of S). However, it is absurd to talk about the

invertibility of these mappings.

Let U and V be two vector spaces over the same field. An invertible element

T L(U, V ) (if it exists) is also called an isomorphism. If there is an isomorphism between U and V , they are said to be isomorphic to each other, written

as U V .

43

spaces over the same field are isomorphic if and only if they have the same

dimension.

Let U, V be finite-dimensional vector spaces over a field F and {u1 , . . . , un },

{v1 , . . . , vm } any bases of U, V , respectively. For fixed i = 1, . . . , m, j =

1, . . . , n, define Tij L(U, V ) by setting

Tij (uj ) = vi ;

Tij (uk ) = 0,

1 k n,

k

= j.

(2.1.34)

particular, we have

dim(L(U, V )) = mn.

(2.1.35)

we have L(U, F) = U . Thus dim(L(U, F)) = dim(U ) = dim(U ) = n as

obtained earlier.

Exercises

2.1.1 For A = (aij ) F(m, n), define a mapping MA : Fm Fn by setting

a11 a1n

MA (x) = xA = (x1 , . . . , xm ) , x Fm ,

am1

amn

(2.1.36)

l-component row vectors. Show that MA is linear and the row

vectors of the matrix A are the images of the standard vectors

e1 , . . . , em of Fm again taken to be row vectors. Describe, as vector

spaces, how F(m, n) may be identified with L(Fm , Fn ).

2.1.2 Let U, V be vector spaces over the same field F and T L(U, V ).

Prove that T is 1-1 if and only if N (T ) = {0}.

2.1.3 Let U, V be vector spaces over the same field F and T L(U, V ). If

Y is a subspace of V , is the subset of U given by

X = {x U | T (x) Y } T 1 (Y ),

(2.1.37)

necessarily a subspace of U ?

2.1.4 Let U, V be finite-dimensional vector spaces and T L(U, V ). Prove

that r(T ) dim(U ). In particular, if T is onto, then dim(U )

dim(V ).

44

Linear mappings

onto.

2.1.6 Consider A F(n, n). Prove that, if there is B or C F(n, n) such

that either AB = In or CA = In , then A is invertible and B = A1 or

C = A1 .

2.1.7 Let U be an n-dimensional vector space (n 2) and U its dual space.

For f U recall that f 0 = N (f ) = {u U | f (u) = 0}. Let g U

be such that f and g are linearly independent. Hence f 0

= g 0 .

(i) Show that U = f 0 + g 0 must hold.

(ii) Establish dim(f 0 g 0 ) = n 2.

2.1.8 For T L(U ), where U is a finite-dimensional vector space, show that

N (T 2 ) = N (T ) if and only if R(T 2 ) = R(T ).

2.1.9 Let U, V be finite-dimensional vector spaces and S, T L(U, V ).

Show that

r(S + T ) r(S) + r(T ).

(2.1.38)

and T L(U, V ), S L(V , W ). Establish the rank and nullity

inequalities

r(S T ) min{r(S), r(T )}, n(S T ) n(S) + n(T ).

(2.1.39)

min{r(A), r(B)}. In particular, if A = BC or A = CB where C

F(n, n) is nonsingular, then r(A) = r(B).

2.1.12 Let A F(m, n) and B F(n, m). Prove that AB F(m, m) must be

singular when m > n.

2.1.13 Let T L(U, V ) and S L(V , W ) where dim(U ) = n and

dim(V ) = m. Prove that

r(S T ) r(S) + r(T ) m.

(2.1.40)

2.1.14 Let U, V , W, X be finite-dimensional vector spaces over the same field

and T L(U, V ), S L(V , W ), R L(W, X). Establish the rank

inequality

r(R S) + r(S T ) r(S) + r(R S T ).

(2.1.41)

(2.1.40) may be deduced as a special case of (2.1.41).

45

{w1 , . . . , wl } be two sets of linearly independent vectors. That is,

the dimensions of the subspaces V = Span{v1 , . . . , vk } and W =

Span{w1 , . . . , wl } of U are k and l, respectively. Consider the subspace

of Fk+l defined by

k

l

yi vi +

zj w j = 0 .

S = (y1 , . . . , yk , z1 , . . . , zl ) Fk+l

i=1

j =1

(2.1.42)

Show that dim(S) = dim(V W ).

2.1.16 Let T L(U, V ) be invertible and U = U1 Uk . Show that

there holds V = V1 Vk where Vi = T (Ui ), i = 1, . . . , k.

2.1.17 Let U be finite-dimensional and T L(U ). Show that U = N (T )

R(T ) if and only if N (T ) R(T ) = {0}.

It will be important to investigate how the associated matrix changes with

respect to a change of basis for a linear mapping.

Let {u1 , . . . , un } and {u 1 , . . . , u n } be two bases of the n-dimensional vector

space U over a field F. A change of bases from {u1 , . . . , un } to {u 1 , . . . , u n } is

simply a linear mapping R L(U, U ) such that

R(uj ) = u j ,

j = 1, . . . , n.

(2.2.1)

Moreover, if we rewrite (2.2.1) in a matrix form, we have

u k =

n

bj k uj ,

k = 1, . . . , n,

(2.2.2)

j =1

where the matrix B = (bj k ) F(n, n) is called the basis transition matrix

from the basis {u1 , . . . , un } to the basis {u 1 , . . . , u n }, which is necessarily

invertible.

Let V be an m-dimensional vector space over F and take T L(U, V ).

Let A = (aij ) and A = (a ij ) be the m n matrices of the linear mapping T

associated with the pairs of the bases

{u1 , . . . , un }

and

{v1 , . . . , vm },

{u 1 , . . . , u n } and

{v1 , . . . , vm },

(2.2.3)

46

Linear mappings

T (uj ) =

m

T (u j ) =

aij vi ,

m

i=1

a ij vi ,

j = 1, . . . , n.

(2.2.4)

i=1

n

m

a ij vi = T (u j ) = T

bkj uk

i=1

k=1

=

=

n

m

bkj

aik vi

k=1

i=1

n

m

aik bkj vi ,

i=1

j = 1, . . . , n.

(2.2.5)

j = 1, . . . , n,

(2.2.6)

k=1

a ij =

n

i = 1, . . . , m,

aik bkj ,

k=1

or

A = AB.

(2.2.7)

T (uj ) =

m

a ij vi ,

j = 1, . . . , n.

(2.2.8)

i=1

Then, with the mapping R L(U, U ) given in (2.2.1), the second relation in

(2.2.4) simply says T = T R, which leads to (2.2.7) immediately.

Similarly, we may consider a change of basis in V . Let {v1 , . . . , vm } be

another basis of V which is related to the basis {v1 , . . . , vm } through an invertible mapping S L(V , V ) with

S(vi ) = vi ,

i = 1, . . . , m,

(2.2.9)

vi =

m

cli vl ,

i = 1, . . . , m.

(2.2.10)

l=1

associated with the pairs of the bases

{u1 , . . . , un }

and

{v1 , . . . , vm },

{u1 , . . . , un } and

{v1 , . . . , vm },

(2.2.11)

47

T (uj ) =

m

aij vi =

i=1

m

a ij vi ,

j = 1, . . . , n.

(2.2.12)

i=1

aij =

m

cil a lj ,

i = 1, . . . , m,

j = 1, . . . , n.

(2.2.13)

l=1

A = C A.

(2.2.14)

T (uj ) =

m

a ij vi ,

j = 1, . . . , n.

(2.2.15)

i=1

Then

(S T )(uj ) =

m

a ij S(vi ) =

i=1

m

a ij vi ,

j = 1, . . . , n.

(2.2.16)

i=1

(2.2.14).

An important special case is when U = V with dim(U ) = n. We investigate

how the associated matrices of an element T L(U, U ) with respect to two

bases are related through the transition matrix between the two bases. For this

purpose, let {u1 , . . . , un } and {u 1 , . . . , u n } be two bases of U related through

the mapping R L(U, U ) so that

R(uj ) = u j ,

j = 1, . . . , n.

(2.2.17)

respect to the bases {u1 , . . . , un }, {u 1 , . . . , u n }, respectively. Thus

T (uj ) =

n

aij ui ,

T (u j ) =

i=1

n

a ij u i ,

j = 1, . . . , n.

(2.2.18)

i=1

T (uj ) =

n

i=1

a ij ui ,

j = 1, . . . , n.

(2.2.19)

48

Linear mappings

R T R 1 = T .

(2.2.20)

(2.2.17), that is,

R(uj ) =

n

bij ui ,

j = 1, . . . , n,

(2.2.21)

i=1

then (2.2.20) gives us the manner in which the two matrices A and A are

related:

1 .

A = B AB

(2.2.22)

Definition 2.6 Two matrices A, B F(n, n) are said to be similar if there is

an invertible matrix C F(n, n) such that A = CBC 1 , written as A B.

It is clear that similarity of matrices is an equivalence relation. In other

words, we have A A, B A if A B, and A C if A B and B C,

for A, B, C F(n, n).

Consequently, we see that the matrices associated to a linear mapping of a

finite-dimensional vector space into itself with respect to different bases are

similar.

Exercises

d

as a linear mapping over

dt

P (the vector space of all real polynomials of variable t) given by

dp(t)

, p P.

(2.2.23)

dt

(i) Find the matrix that represents D L(P2 , P1 ) with respect to the

standard bases

D(p)(t) =

1

BP

= {1, t, t 2 },

2

1

BP

= {1, t}

1

(2.2.24)

of P2 , P1 , respectively.

(ii) If the basis of P2 is changed into

2

BP

= {t 1, t + 1, (t 1)(t + 1)},

2

(2.2.25)

1

find the basis transition matrix from the basis BP

into the basis

2

2

B P2 .

49

2

1

and BP

directly first and then obtain it again by using the

BP

2

1

results of (i) and (ii) in view of the relation stated as in (2.2.7).

(iv) If the basis of P1 is changed into

2

BP

= {t 1, t + 1},

1

(2.2.26)

1

2

and BP

for P2 and P1 respectively.

bases BP

2

1

1

2

into BP

.

(v) Find the basis transition matrix from BP

1

1

(vi) Obtain the matrix that represents D L(P2 , P1 ) with respect to

1

2

and BP

directly first and then obtain it again by using the

BP

2

1

results of (i) and (ii) in view of the relation stated as in (2.2.14).

2.2.2 (Continued from Exercise 2.2.1) Now consider D as an element in

L(P2 , P2 ).

1

2

and BP

,

(i) Find the matrices that represent D with respect to BP

2

2

respectively.

1

2

into the basis BP

(ii) Use the basis transition matrix from the basis BP

2

2

of P2 to verify your results in (i) in view of the similarity relation

stated as in (2.2.22).

2.2.3 Let D : Pn Pn be defined by (2.2.23) and consider the linear mapping T : Pn Pn defined by

T (p) = tD(p) p,

p = p(t) Pn .

(2.2.27)

2.2.4 Show that the real matrices

a b

1 a+b+c+d ab+cd

(2.2.28)

and

2 a+bcd abc+d

c d

are similar through investigating the matrix representations of a certain

linear mapping over R2 under the standard basis {e1 , e2 } and the transformed basis {u1 = e1 + e2 , u2 = e1 e2 }, respectively.

2.2.5 Prove that the matrices

ann an2 an1

and

(2.2.29)

a

2n a22 a21

a1n a12 a11

an1 an2 ann

50

Linear mappings

in F(n, n) are similar by realizing them as the matrix representatives of

a certain linear mapping over Fn with respect to two appropriate bases

of Fn .

Let U, V be finite-dimensional vector spaces over a field F and U , V their

dual spaces. For T L(U, V ) and v V , we see that

T (u), v ,

u U,

(2.3.1)

that

u, u = T (u), v ,

u U.

(2.3.2)

u = T (v ).

(2.3.3)

u, T (v ) = T (u), v ,

u U,

v V .

(2.3.4)

show that T is linear.

In fact, let v1 , v2 V . Then (2.3.4) gives us

u, T (vi ) = T (u), vi ,

u U,

i = 1, 2.

(2.3.5)

Thus

u, T (v1 ) + T (v2 ) = T (u), v1 + v2 ,

u U.

(2.3.6)

Besides, for any a F, we have from (2.3.4) that

u, T (av ) = T (u), av = aT (u), v = au, T (v )

= u, aT (v ),

u U,

v V ,

(2.3.7)

Definition 2.7 For any given T L(U, V ), the linear mapping T : V U

defined by (2.3.4) is called the adjoint mapping, or simply adjoint, of T .

51

T (T ) = T .

(2.3.8)

this purpose, let {u1 , . . . , un } and {v1 , . . . , vm } be bases of U and V , and

} the corresponding dual bases, respectively. Take

{u1 , . . . , un } and {v1 , . . . , vm

) F(n, m) such that

A = (aij ) F(m, n) and A = (akl

T (uj ) =

T (vl ) =

m

i=1

n

aij vi ,

j = 1, . . . , n,

(2.3.9)

akl

uk ,

l = 1, . . . , m.

(2.3.10)

k=1

j = 1, . . . , n). In other words, we get A = At . Thus the adjoint of linear

mapping corresponds to the transpose of a matrix.

We now investigate the relation between N (T ), R(T ) and N (T ), R(T ).

Theorem 2.8 Let U, V be finite-dimensional vector spaces and T L(U, V ).

Then N(T ), N(T ), R(T ), R(T ) and their annihilators enjoy the following

relations.

(1) N (T )0 = R(T ).

(2) N(T ) = R(T )0 .

(3) N(T )0 = R(T ).

(4) N(T ) = R(T )0 .

Proof We first prove (2). Let u N (T ). Then T (v ), u = v , T (u) = 0

for any v V . So u R(T )0 . If u R(T )0 , then u , u = 0 for any

u R(T ). So T (v ), u = 0 for any v V . That is, v , T (u) = 0 for

any v V . Hence u N (T ). So (2) is established.

We now prove (1). If u R(T ), there is some v V such that u =

T (v ). So u , u = T (v ), u = v , T (u) = 0 for any u N (T ). This

shows u N(T )0 . Hence R(T ) N (T )0 . However, using (2) and (1.4.31),

we have

dim(N (T )) = dim(R(T )0 ) = dim(U ) dim(R(T ))

= dim(U ) dim(R(T )),

(2.3.11)

52

Linear mappings

dim(R(T )) = dim(U ) dim(N (T )) = dim(N (T )0 ).

(2.3.12)

Finally, (3) and (4) follow from replacing T by T and using T = T in (1)

and (2).

We note that (1) in Theorem 2.8 may also follow from taking the annihilators

on both sides of (2) and applying Exercise 1.4.7. Thus, part (2) of Theorem 2.8

is the core of all the statements there.

We now use Theorem 2.8 to establish the following result.

Theorem 2.9 Let U, V be finite-dimensional vector spaces. For any T

L(U, V ), the rank of T and that of its dual T L(V , U ) are equal.

That is,

r(T ) = r(T ).

(2.3.13)

Proof Using (2) in Theorem 2.8, we have n(T ) = dim(U ) r(T ). However,

applying the rank equation (2.1.23) or Theorem 2.3, we have n(T ) + r(T ) =

dim(U ). Combining these results, we have r(T ) = r(T ).

As an important application of Theorem 2.8, we consider the notion of rank

of a matrix.

Let T L(Fn , Fm ) be defined by (2.1.6) by the matrix A = (aij ) F(m, n).

Then the images of the standard basis vectors e1 , . . . , en are the column vectors

of A. Thus R(T ) is the vector space spanned by the column vectors of A whose

dimension is commonly called the column rank of A, denoted as corank(A).

Thus, we have

r(T ) = corank(A).

(2.3.14)

On the other hand, since the associated matrix of T is At whose column vectors are the row vectors of A. Hence R(T ) is the vector space spanned by the

row vectors of A (since (Fm ) = Fm ) whose dimension is likewise called the

row rank of A, denoted as rorank(A). Thus, we have

r(T ) = rorank(A).

(2.3.15)

the column and row ranks of a matrix coincide, although they have different

meanings.

Since the column and row ranks of a matrix are identical, they are jointly

called the rank of the matrix.

53

Exercises

2.3.1 Let U be a finite-dimensional vector space over a field F and regard a

given element f U as an element in L(U, F). Describe the adjoint of

f , namely, f , as an element in L(F , U ), and verify r(f ) = r(f ).

2.3.2 Let U, V , W be finite-dimensional vector spaces over a field F.

Show that for T L(U, V ) and S L(V , W ) there holds

(S T ) = T S .

2.3.3 Let U, V be finite-dimensional vector spaces over a field F and T

L(U, V ). Prove that for given v V the non-homogeneous equation

T (u) = v,

(2.3.16)

has a solution for some u U if and only if for any v V such that

T (v ) = 0 one has v, v = 0. In particular, the equation (2.3.16) has

a solution for any v V if and only if T is injective. (This statement is

also commonly known as the Fredholm alternative.)

2.3.4 Let U be a finite-dimensional vector space over a field F and a F.

Define T L(U, U ) by T (u) = au where u U . Show that T is then

given by T (u ) = au for any u U .

2.3.5 Let U = F(n, n) and B U . Define T L(U, U ) by T (A) = ABBA

where A U . For f U given by

f (A) = f, A = Tr(A),

A U,

(2.3.17)

determine T (f ).

2.3.6 Let U = F(n, n). Then f (A) = Tr(AB t ) defines an element f U .

(i) Use Mij to denote the element in F(n, n) whose entry in the ith row

and j th column is 1 but all other entries are zero. Verify the formula

Tr(Mij Mkl ) = il j k ,

i, j, k, k = 1, . . . , n.

(2.3.18)

(ii) Apply (i) to show that for any f U there is a unique element

B U such that f (A) = Tr(AB t ).

(iii) For T L(U, U ), defined by T (A) = At , describe T .

Let U, V be two vector spaces over a field F and T L(U, V ). Suppose that

X, Y are subspaces of U, V , respectively, which satisfy the property T (X) Y

54

Linear mappings

linear mapping

T : U/X V /Y,

(2.4.1)

from T naturally.

As before, we use [] to denote a coset in U/X or V /Y . Define T : U/

X V /Y by setting

T ([u]) = [T (u)],

[u] U/X.

(2.4.2)

We begin by showing that this definition does not suffer any ambiguity by

verifying

T ([u1 ]) = T ([u2 ]) whenever [u1 ] = [u2 ].

(2.4.3)

which implies [T (u1 )] = [T (u2 )]. So (2.4.3) follows.

The linearity of T can now be checked directly.

First, let u1 , u2 U . Then, by (2.4.2), we have

T ([u1 ] + [u2 ]) = T ([u1 + u2 ]) = [T (u1 + u2 )] = [T (u1 ) + T (u2 )]

= [T (u1 )] + [T (u2 )] = T ([u1 ]) + T ([u2 ]).

(2.4.4)

T (a[u]) = T ([au]) = [T (au)] = a[T (u)] = a T ([u]).

(2.4.5)

ensure that (2.4.2) gives us a well-defined mapping T from U/X into V /Y for

T L(U, V ). Indeed, let u X. Then [u] = [0]. Thus [T (u)] = T ([u]) =

T ([0]) = [0] which implies T (u) Y .

In summary, we can state the following basic theorem regarding constructing quotient mappings between quotient spaces.

Theorem 2.10 Let X, Y be subspaces of U, V , respectively, and T L(U, V ).

Then (2.4.2) defines a linear mapping T from U/X into V /Y if and only if the

condition

T (X) Y

(2.4.6)

holds.

In the special situation when X = {0} U and Y is an arbitrary subspace

of V , we see that U/X = U any T L(U, V ) induces the quotient mapping

T : U V /Y,

T (u) = [T (u)].

(2.4.7)

55

Exercises

2.4.1 Let V , W be subspaces of a vector space U . Then the quotient mapping

I : U/V U/W induced from the identity mapping I : U U is

well-defined and given by

I([u]V ) = [u]W ,

u U,

(2.4.8)

U/W , respectively. Show that the fact I([0]V ) = [0]W implies that if

[u1 ]V , . . . , [uk ]V are linearly dependent in U/V then [u1 ]W , . . . , [uk ]W

are linearly dependent in U/W .

2.4.2 Let U be a finite-dimensional vector space and V a subspace of U . Use I

to denote the quotient mapping I : U U/V induced from the identity

mapping I : U U given as in (2.4.7). Apply Theorem 2.3 or the rank

equation (2.1.23) to I to reestablish the relation (1.6.12).

2.4.3 Let U, V be finite-dimensional vector spaces over a field and X, Y the

subspaces of U, V , respectively. For T L(U, V ), assume T (X) Y

and use T to denote the quotient mapping from U/X into V /Y induced

from T . Show that r(T ) r(T ).

Let U be a vector space over a field F. In this section, we study the important

special situation when linear mappings are from U into itself. We shall denote

the space L(U, U ) simply by L(U ). Such mappings are also often called linear

transformations. First, we consider some general properties such as invariance

and reducibility. Then we present some examples.

In this subsection, we consider some situations in which the complexity of a

linear mapping may be reduced somewhat.

Definition 2.11 Let T L(U ) and V be a subspace of U . We say that V is an

invariant subspace of T if T (V ) V .

Given T L(U ), it is clear that the null-space N (T ) and range R(T ) of T

are both invariant subspaces of T .

To see how the knowledge about an invariant subspace reduces the complexity of a linear mapping, we assume that V is a nontrivial invariant subspace of

56

Linear mappings

extend it to get a basis of U , say {u1 , . . . , uk , uk+1 , . . . , un }. With respect to

such a basis, we have

T (ui ) =

k

i =1

bi i ui , i = 1, . . . , k, T (uj ) =

n

cj j uj , j = k + 1, . . . , n,

j =1

(2.5.1)

where B = (bi i ) F(k, k) and C = (cj j ) F(n, [n k]). With respect to

this basis, the associated matrix A F(n, n) becomes

B C1

C1

A=

,

= C.

(2.5.2)

0 C2

C2

Such a matrix is sometimes referred to as boxed upper triangular.

Thus, we see that a linear mapping T over a finite-dimensional vector space

U has a nontrivial invariant subspace if and only if there is a basis of U so that

the associated matrix of T with respect to this basis is boxed upper triangular.

For the matrix A given in (2.5.2), the vanishing of the entries in the leftlower portion of the matrix indeed reduces the complexity of the matrix. We

have seen clearly that such a reduction happens because of the invariance

property

T (Span{u1 , . . . , uk }) Span{u1 , . . . , uk }.

(2.5.3)

property

T (Span{uk+1 , . . . , un }) Span{uk+1 , . . . , un },

(2.5.4)

reduces the complexity of the matrix A.

The above investigation motivates the introduction of the concept of

reducibility of a linear mapping as follows.

Definition 2.12 We say that T L(U ) is reducible if there are nontrivial

invariant subspaces V , W of T so that U = V W . In such a situation, we

also say that T may be reduced over the subspaces V and W . Otherwise, if no

such pair of invariant subspaces exist, we say that T is irreducible.

Thus, if U is n-dimensional and T L(U ) may be reduced over the nontrivial subspaces V and W , then we take {v1 , . . . , vk } and {w1 , . . . , wl } to

be any bases of V and W , so that {v1 , . . . , vk , w1 , . . . , wl } is a basis of U .

57

T (vi ) =

k

bi i vi , i = 1, . . . , k, T (wj ) =

i =1

l

cj j wj , j = 1, . . . , l,

j =1

(2.5.5)

where the matrices B = (bi i ) and C = (cj j ) are in F(k, k) and F(l, l),

respectively. Thus, we see that, with respect to such a basis of U , the associated matrix A F(n, n) assumes the form

B 0

A=

,

(2.5.6)

0 C

which takes the form of a special kind of matrices called boxed diagonal

matrices.

Thus, we see that a linear mapping T over a finite-dimensional vector space

U is reducible if and only if there is a basis of U so that the associated matrix

of T with respect to this basis is boxed diagonal.

An important and useful family of invariant subspaces are called

eigenspaces which may be viewed as an extension of the notion of null-spaces.

Definition 2.13 For T L(U ), a scalar F is called an eigenvalue

of T if the null-space N (T I ) is not the zero space {0}. Any nonzero

vector in N(T I ) is called an eigenvector associated with the eigenvalue

and N(T I ) is called the eigenspace of T associated with the eigenvalue , and often denoted as E . The integer dim(E ) is called the geometric

multiplicity of the eigenvalue .

In particular, for A F(n, n), an eigenvalue, eigenspace, and the eigenvectors associated with and geometric multiplicity of an eigenvalue of the

matrix A are those of the A-induced mapping TA : Fn Fn , defined by

TA (x) = Ax, x Fn .

Let be an eigenvalue of T L(U ). It is clear that E is invariant under

T and T = I (I is the identity mapping) over E . Thus, let {u1 , . . . , uk }

be a basis of E and extend it to obtain a basis for the full space U . Then

the associated matrix A of T with respect to this basis takes a boxed lower

triangular form

Ik C1

A=

,

(2.5.7)

0 C2

where Ik denotes the identity matrix in F(k, k).

58

Linear mappings

the associated matrix of a linear mapping.

Theorem 2.14 Let u1 , . . . , uk be eigenvectors associated with distinct eigenvalues 1 , . . . , k of some T L(U ). Then these vectors are linearly

independent.

Proof We use induction on k. If k = 1, the statement of the theorem is already

valid since u1

= 0. Assume the statement of the theorem is valid for k = m 1.

For k = m + 1, consider the relation

c1 u1 + + cm um + cm+1 um+1 = 0,

c1 , . . . , cm , cm+1 F.

(2.5.8)

c1 1 u1 + + cm m um + cm+1 m+1 um+1 = 0.

(2.5.9)

Multiplying (2.5.8) by m+1 and subtracting the result from (2.5.9), we obtain

c1 (1 m+1 )u1 + + cm (m m+1 )um = 0.

(2.5.10)

= cm = 0 since 1 , . . . , m , m+1 are distinct. Inserting this result into

(2.5.8), we find cm+1 = 0. So the proof follows.

As a corollary of the theorem, we immediately conclude that a linear

mapping over an n-dimensional vector space may have at most n distinct

eigenvalues.

If 1 , . . . , k are the distinct eigenvalues of T , then Theorem 2.14 indicates

that the sum of E1 , . . . , Ek is a direct sum,

E1 + + Ek = E1 Ek .

(2.5.11)

(2.5.12)

U = E1 Ek .

(2.5.13)

expressed as a direct sum of mappings

T = 1 I k I,

(2.5.14)

59

the bases of E1 , . . . , Ek to form a basis of U , we see that the associated

matrix A of T becomes diagonal

A = diag{1 In1 , . . . , k Ink },

(2.5.15)

eigenvalues, 1 , . . . , n , then any n eigenvectors correspondingly associated

with these eigenvalues form a basis of U . With respect to this basis, the associated matrix A of T is simply

A = diag{1 , . . . , n }.

We now present some examples as illustrations.

Consider T L(R2 ) defined by

2 1

x1

R2 .

T (x) =

x, x =

1 2

x2

(2.5.16)

(2.5.17)

E1 = Span{(1, 1)t },

E3 = Span{(1, 1)t }.

(2.5.18)

Thus, with respect to the basis {(1, 1)t , (1, 1)t }, the associated matrix of T is

1 0

.

(2.5.19)

0 3

So T is reducible and reduced over the pair of eigenspaces E1 and E3 .

Next, consider S L(R2 ) defined by

0 1

x1

(2.5.20)

R2 .

S(x) =

x, x =

1 0

x2

We show that S has no nontrivial invariant spaces. In fact, if V is one, then

dim(V ) = 1. Let x V such that x

= 0. Then the invariance of V requires

S(x) = x for some R. Inserting this relation into (2.5.20), we obtain

x2 = x1 ,

x1 = x2 .

(2.5.21)

Hence x1

= 0, x2

= 0. Iterating the two equations in (2.5.21), we obtain

2 + 1 = 0 which is impossible. So S is also irreducible.

60

Linear mappings

2.5.2 Projections

In this subsection, we study an important family of reducible linear mappings

called projections.

Definition 2.15 Let V and W be two complementary subspaces of U . That is,

U = V W . For any u U , express u uniquely as u = v + w, v V , w W ,

and define the mapping P : U U by

P (u) = v.

(2.5.22)

We need to check that the mapping P defined in Definition 2.15 is indeed

linear. To see this, we take u1 , u2 U and express them as u1 = v1 +w1 , u2 =

v2 + w2 , for unique v1 , v2 V , w1 , w2 W . Hence P (u1 ) = v1 , P (u2 ) = v2 .

On the other hand, from u1 +u2 = (v1 +v2 )+(w1 +w2 ), we get P (u1 +u2 ) =

v1 + v2 . Thus P (u1 + u2 ) = P (u1 ) + P (u2 ). Moreover, for any a F and

u U , write u = v + w for unique v V , w W . Thus P (u) = v and

au = av + aw give us P (au) = av = aP (u). So P L(U ) as claimed.

From Definition 2.15, we see that for v V we have P (v) = v. Thus

P (P (u)) = P (u) for any u U . In other words, the projection P satisfies the

special property P P = P . For notational convenience, we shall use T k to

denote the k-fold composition T T for any T L(U ). With this notation,

we see that a projection P satisfies the condition P 2 = P . Any linear mapping

satisfying such a condition is called idempotent.

We now show that being idempotent characterizes a linear mapping being a

projection.

Theorem 2.16 A linear mapping T over a vector space U is idempotent if and

only if it is a projection. More precisely, if T 2 = T and

V = N (I T ),

W = N (T ),

(2.5.23)

Proof

claim

R(T ) = N (I T ),

R(I T ) = N (T ).

(2.5.24)

(I T )u = (I T )(T (x)) = T (x) T 2 (x) = 0. So u N (I T ). If

u N(I T ), then u = T (u) which implies u R(T ) already. If u

61

T (I T )(x) = (T T 2 )(x) = 0 and u N (T ). If u N (T ), then T (u) = 0

which allows us to rewrite u as u = (I T )(u). Thus u R(I T ).

Now consider the identity

I = T + (I T ).

(2.5.25)

V W . Using the definition of V , W in (2.5.23), we get T (u) u = 0 and

T (u) = 0. Thus u = 0. So U = V W .

Finally, for any u U , express u as u = v + w for some unique v V

and w W and let P L(U ) be the projection of U onto V along W . Then

(2.5.23) indicates that T (v) = v and T (w) = 0. So T (u) = T (v + w) =

T (v) + T (w) = v = P (u). That is, T = P and the proof follows.

An easy but interesting consequence of Theorem 2.16 is the following.

Theorem 2.17 Let V , W be complementary subspaces of U . Then P L(U )

is the projection of U onto V along W if and only if I P is the projection of

U onto W along V .

Proof

Since

(I P )2 = I 2P + P 2 ,

(2.5.26)

follows from (2.5.23).

From (2.5.23), we see that when T L(U ) is idempotent, it is reducible

over the null-spaces N (T ) and N (I T ). It may be shown that the converse

is true, which is assigned as an exercise.

Recall that the null-spaces N (T ) and N (I T ), if nontrivial, of an idempotent mapping T L(U ), are simply the eigenspaces E0 and E1 of T associated

with the eigenvalue 0 and 1, respectively. So, with respect to a basis of U consisting of the vectors in any bases of E0 and E1 , the associated matrix A of T

is of the form

0k 0

A=

,

(2.5.27)

0 Il

where 0k is the zero matrix in F(k, k), k = dim(N (T )) = n(T ), l =

dim(N(I T )) = r(T ).

62

Linear mappings

T reduces over E1 , . . . , Ek . Use Pi to denote the projection of U onto Ei

along

E1 Ei Ek ,

(2.5.28)

where []

represent T as

T = 1 P1 + + k Pk .

(2.5.29)

Consider the vector space Pn of the set of all polynomials of degrees up to n

with coefficients in a field F and the differentiation operator

D=

d

so that D(a0 + a1 t + + an t n ) = a1 + 2a2 t + + nan t n1 .

dt

(2.5.30)

example of nilpotent mappings we now study.

Definition 2.18 Let U be a finite-dimensional vector space and T L(U ).

We say that T is nilpotent if there is an integer k 1 such that T k = 0. For a

nilpotent mapping T L(U ), the smallest integer k 1 such that T k = 0 is

called the degree or index of nilpotence of T .

The same definition may be stated for square matrices.

Of course, the degree of a nonzero nilpotent mapping is always at least 2.

Definition 2.19 Let U be a vector space and T L(U ). For any nonzero

vector u U , we say that u is T -cyclic if there is an integer m 1 such that

T m (u) = 0. The smallest such integer m is called the period of u under or

relative to T . If each vector in U is T -cyclic, T is said to be locally nilpotent.

It is clear that a nilpotent mapping must be locally nilpotent. In fact, these

two notions are equivalent in finite dimensions.

Theorem 2.20 If U is finite-dimensional, then a mapping T L(U ) is nilpotent if and only if it is locally nilpotent.

63

{u1 , . . . , un } be any basis of U and m1 , . . . , mn be the periods of u1 , . . . , un ,

respectively. Set

k = max{m1 , . . . , mn }.

(2.5.31)

We now show how to use a cyclic vector to generate an invariant subspace.

Theorem 2.21 For T L(U ), let u U be a nonzero cyclic vector under T

of period m. Then the vectors

u, T (u), . . . , T m1 (u),

(2.5.32)

are linearly independent so that they span an m-dimensional T -invariant subspace of U . In particular, m dim(U ).

Proof We only need to show that the set of vectors in (2.5.32) are linearly

independent. If m = 1, the statement is self-evident. So we now assume m 2.

Let c0 , . . . , cm1 F so that

c0 u + c1 T (u) + + cm1 T m1 (u) = 0.

(2.5.33)

= 0. Let l 0 be the

smallest such integer. Then l m 2 so that (2.5.33) gives us

1

T l (u) = (cl+1 T l+1 (u) + + cm1 T m1 (u)).

cl

This relation implies that

T m1 (u) = T l+([m1]l) (u)

1

= (cl+1 T m (u) + + cm1 T 2(m1)+l (u)) = 0,

cl

(2.5.34)

(2.5.35)

= 0.

We can apply the above results to study the reducibility of a nilpotent

mapping.

Theorem 2.22 Let U be a finite-dimensional vector space and T L(U )

a nilpotent mapping of degree k. If T is reducible over V and W , that is,

U = V W , T (V ) V , and T (W ) W , then

k max{dim(V ), dim(W )}.

In particular, if k = dim(U ), then T is irreducible.

(2.5.36)

64

Linear mappings

respectively. Then, by Theorem 2.21, the periods of these vectors cannot

exceed the right-hand side of (2.5.36). Thus (2.5.36) follows from (2.5.31).

If k = dim(U ), then dim(V ) = k and dim(W ) = 0 or dim(V ) = 0 and

dim(W ) = k, which shows that T is irreducible.

Let dim(U ) = n and T L(U ) be nilpotent of degree n. Then there is a

vector u U of period n. Set

U1 = Span{T n1 (u)}, . . . , Un1 = Span{T (u), . . . , T n1 (u)}.

(2.5.37)

Un1 and dim(U1 ) = 1, . . . , dim(Un1 ) = n 1. Thus, T has invariant

subspaces of all possible dimensions, but yet is irreducible.

Again, let dim(U ) = n and T L(U ) be nilpotent of degree n. Choose a

cyclic vector u U of period n and set

u1 = u, . . . , un = T n1 (u).

(2.5.38)

T (ui ) = ui+1 ,

i = 1, . . . , n 1,

T (un ) = 0.

(2.5.39)

Hence, if we use S = (sij ) to denote the matrix of T with respect to the basis

B, then

1, j = i 1, i = 2, . . . , n,

(2.5.40)

sij =

0, otherwise.

That is,

S=

1

..

.

0

..

.

..

.

..

.

0

..

.

(2.5.41)

u1 = T n1 (u), . . . , un = u.

(2.5.42)

0 1

0 0

0 0

1 0

.. . .

..

..

..

.

.

.

S= .

.

.

.

..

.

.

.

0 1

0

65

(2.5.43)

shift matrix, which clearly indicates that T is nilpotent of degree n.

In general, we shall see later that, if T L(U ) is nilpotent and U is finitedimensional, there are T -invariant subspaces U1 , . . . , Ul such that

U = U1 Ul ,

dim(U1 ) = n1 , . . . , dim(Ul ) = nl ,

(2.5.44)

choose T -cyclic vectors of respective periods n1 , . . . , nl , say u1 , . . . , ul , and

use the vectors

u1 , , T n1 1 (u1 ), . . . , ul , . . . , T nl 1 (ul ),

(2.5.45)

S1 0 0

..

..

.

0

.

0

,

S=

(2.5.46)

..

..

..

...

.

.

.

0

0 Sl

where each Si is an ni ni shift matrix (i = 1, . . . , l).

Let k be the degree of T or the matrix S. From (2.5.46), we see clearly that

the relation

k = max{n1 , . . . , nl }

(2.5.47)

holds.

In this section, we have seen that it is often useful to consider various powers of

a linear mapping T L(U ) as well as some linear combinations of appropriate

66

Linear mappings

polynomials of linear mappings. Specifically, for any p(t) P with the form

p(t) = an t n + + a1 t + a0 ,

a0 , a1 , . . . , an F,

(2.5.48)

p(T ) = an T n + + a1 T + a0 I.

(2.5.49)

variable t can be carried over correspondingly to those over polynomials in

the powers of a linear mapping T over the vector space U . For example, if

f, g, h P satisfy the relation f (t) = g(t)h(t), then

f (T ) = g(T )h(T ),

(2.5.50)

because the powers of T follow the same rule as the powers of t. That is,

T k T l = T k+l , k, l N.

For T L(U ), let F be an eigenvalue of T . Then, for any p(t) P

given as in (2.5.48), p() is an eigenvalue of p(T ). To see this, we assume that

u U is an eigenvector of T associated with . We have

p(T )(u) = (an T n + + a1 T + a0 I )(u)

= (an n + + a1 + a0 )(u) = p()u,

(2.5.51)

as anticipated.

If p P is such that p(T ) = 0, then we say that T is a root of p. Hence, if

T is a root of p, any eigenvalue of T must also be a root of p, p() = 0, by

virtue of (2.5.51).

For example, an idempotent mapping is a root of the polynomial p(t) =

t 2 t, and a nilpotent mapping is a root of a polynomial of the form p(t) = t m

(m N). Consequently, the eigenvalues of an idempotent mapping can only

be 0 and 1, and that of a nilpotent mapping, 0.

For T L(U ), let 1 , . . . , k be some distinct eigenvalues of T

such that T reduces over E1 , . . . , Ek . Then T must be a root of the

polynomial

p(t) = (t 1 ) (t k ).

(2.5.52)

i = 1, . . . , k. Then

p(T )u =

k

67

p(T )ui

i=1

k

(T 1 I ) (T

i I ) (T k I )(T i I )ui

i=1

= 0,

(2.5.53)

It is clear that the polynomial p(t), given in (2.5.52), is the lowest-degree

nontrivial polynomial among all polynomials for which T is a root, which is

often referred to as the minimal polynomial of T , a basic notion to be detailed

later.

Exercises

2.5.1 Let S, T L(U ) and S T = T S. Show that the null-space N (S)

and range R(S) of S are invariant subspaces under T . In particular, an

eigenspace of T associated with eigenvalue is seen to be invariant

under T when taking S = T I .

2.5.2 Let S, T L(U ). Prove that the invertibility of the mappings I + S T

and I + T S are equivalent by showing that if I + S T is invertible

then so is I + T S with

(I + T S)1 = I T (I + S T )1 S.

(2.5.54)

by

cos sin

x1

R (x) =

x, x =

R2 . (2.5.55)

sin

cos

x2

Show that R has no nontrivial invariant subspace in R2 unless = k

(k Z).

2.5.4 Consider the vector space P of the set of all polynomials over a field F.

Define T L(P) by setting

T (p) = tp(t),

p(t) = a0 + a1 t + + an t n P.

(2.5.56)

2.5.5 Let A = (aij ) F(n, n) be invertible and satisfy

n

j =1

aij = a F,

i = 1, . . . , n.

(2.5.57)

68

Linear mappings

(i) Show that a must be an eigenvalue of A and that a

= 0.

(ii) Show that if A1 = (bij ) then

n

bij =

j =1

1

,

a

i = 1, . . . , n.

(2.5.58)

2.5.6 Let S, T L(U ) and assume that S, T are similar. That is, there is an

invertible element R L(U ) such that S = R T R 1 . Show that

F is an eigenvalue of S if and only if it is an eigenvalue of T .

2.5.7 Let U = F(n, n) and T L(U ) be defined by taking matrix transpose,

T (A) = At ,

A U.

(2.5.59)

Prove that T is reducible over the pair E1 and E1 . Can T have an

eigenvalue different from 1? What is the minimal polynomial of T ?

2.5.8 Let U be a vector space over a field F and T L(U ). If 1 , 2

F are distinct eigenvalues and u1 , u2 are the respectively associated

eigenvectors of T , show that, for any nonzero a1 , a2 F, the vector

u = a1 u1 + a2 u2 cannot be an eigenvector of T .

2.5.9 Let U be a finite-dimensional vector space over a field F and T

L(U ). Assume that T is of rank 1.

(i) Prove that T must have an eigenvalue, say , in F.

(ii) If

= 0, show that R(T ) = E .

(iii) Is the statement in (ii), i.e. R(T ) = E , valid when = 0?

2.5.10 Let A C(n, n) be unitary. That is, AA = A A = In . Show that any

eigenvalue of A is of unit modulus.

2.5.11 Show that, if T L(U ) is reducible over the pair of null-spaces N (T )

and N(I T ), then T is idempotent.

2.5.12 Let S, T L(U ) be idempotent. Show that

(i) R(S) = R(T ) if and only if S T = T , T S = S.

(ii) N(S) = N (T ) if and only if S T = S, T S = T .

2.5.13 Let T L(R3 ) be defined by

1 1 1

T (x) = 1 1 1 x,

1

3 1

x1

x = x2 R3 .

(2.5.60)

x3

obtaining a matrix A R(3, 3) that represents S. That is, S(x) = Ax

for x R3 .

69

Let u1 , u2 and u1 , u2 be independent vectors in U and U , respectively.

Define a linear mapping T L(U ) by setting

T (u) = u, u1 u1 + u, u2 u2 ,

u U.

(2.5.61)

projection.

(ii) When T is a projection, determine subspaces V and W of U in

terms of u1 , u2 , u1 , u2 so that T projects U onto V along W .

2.5.15 We may slightly generalize the notion of idempotent mappings to a

mapping T L(U ) satisfying

T 2 = aT ,

T

= 0,

a F,

a

= 0, 1.

(2.5.62)

N(T ) and N (aI T ).

2.5.16 Let T L(U ) and 1 , . . . , k be some distinct eigenvalues of T

such that T is reducible over the eigenspaces E1 , . . . , Ek . Show that

1 , . . . , k are all the eigenvalues of T . In other words, if is any

eigenvalue of T , then is among 1 , . . . , k .

2.5.17 Is the differential operator D : Pn Pn reducible? Find an element

in Pn that is of period n + 1 under D and use it to obtain a basis of Pn .

2.5.18 Let and be nonzero elements in F(n, 1). Then A = t F(n, n).

(i) Prove that the necessary and sufficient condition for A to be nilpotent is t = 0. If A is nilpotent, what is the degree of A?

(ii) Prove that the necessary and sufficient condition for A to be idempotent is t = 1.

2.5.19 Show that the linear mapping T L(P) defined in (2.5.56) and the

differential operator D L(P) satisfy the identity D T T D = I .

2.5.20 Let S, T L(U ) satisfy S T T S = I . Establish the identity

S k T T S k = kS k1 ,

k 2.

(2.5.63)

an idempotent mapping satisfying the equation

T 3 + T 2I = 0.

(2.5.64)

Show that T = I .

2.5.22 Let U be a finite-dimensional vector space over a field F and T L(U )

satisfying

T3 = T

so that 1 cannot be the eigenvalues of T . Show that T = 0.

(2.5.65)

70

Linear mappings

L(U ) satisfying S T . For any p P, show that p(S) p(T ).

2.5.24 Let U be a finite-dimensional vector space over a field F and T

L(U ). For any p P, show that p(T ) = p(T ).

2.5.25 Let U be an n-dimensional vector space over a field F, T L(U )

nilpotent of degree k 2, and n(T ) = l 2. Assume k + l = n + 1.

(i) Show that there are subspaces V and W of U where

V = Span{u, T (u), . . . , T k1 (u)},

(2.5.66)

with u U a T -cyclic vector of period k, and W is an (l 1)dimensional subspace of N (T ), such that T is reducible over the

pair V and W .

(ii) Describe R(T ) and determine r(T ).

2.5.26 Let U be a finite-dimensional vector space over a field F and S, T

L(U ).

(i) Show that if S is invertible and T nilpotent, S T must also be

invertible provided that S and T commute, S T = T S.

(ii) Find an example to show that the condition in (i) that S and T

commute cannot be removed.

2.5.27 Let U be a finite-dimensional vector space, T L(U ), and V a subspace of U which is invariant under T . Show that if U = R(T ) V

then V = N (T ).

In this section, we begin by considering general linear mappings between

arbitrary finite-dimensional normed vector spaces. We then concentrate on

mappings from a normed space into itself.

of linear mappings

Let U, V be finite-dimensional vector spaces over R or C and U , V

norms on U, V , respectively. Assume that B = {u1 , . . . , un } is any basis of U .

n

For any u U , write u as u =

ai ui where a1 , . . . , an are the coordinates

i=1

71

" n

"

n

"

"

"

"

ai ui "

|ai |T (ui )V

T (u)V = "T

"

"

i=1

i=1

V

n

max {T (ui )V }

|ai |

1in

i=1

1in

(2.6.1)

where we have used the fact that norms over a finite-dimensional space are all

equivalent. This estimate may also be restated as

T (u)V

C,

uU

u U,

u

= 0.

(2.6.2)

of a linear mapping T L(U, V ) as follows.

Definition 2.23 For T L(U, V ), we define the norm of T , induced from the

respective norms of U and V , by

T (u)V

T = sup

u U, u

= 0 .

(2.6.3)

uU

To show that (2.6.3) indeed defines a norm for the space L(U, V ), we need

to examine that it fulfills all the properties required of a norm. To this end, we

note from (2.6.3) that

T (u)V T uU ,

u U.

(2.6.4)

(S + T )(u)V S(u)V + T (u)V (S + T )uU ,

(2.6.5)

L(U, V ).

Let a be any scalar. Then aT (u)V = |a|T (u)V for any u U . Hence

aT (u)V

u U, u

= 0

aT = sup

uU

T (u)V

= |a| sup

u U, u

= 0 = |a|T ,

(2.6.6)

uU

which indicates that homogeneity follows.

Finally, it is clear that T = 0 implies T = 0.

72

Linear mappings

space with the norm W . Then S T L(U, W ) and it follows from (2.6.4)

that

(S T )(u)W ST (u)V ST uU ,

u U.

(2.6.7)

Consequently, we get

S T ST .

(2.6.8)

In the rest of the section, we will focus on mappings from U into itself. We

note that, for the identity mapping I L(U ), it is clear that I = 1.

Let U be a finite-dimensional vector space with norm . It has been seen

that the space L(U ) may be equipped with an induced norm which may also be

denoted by since there is no risk of confusion. The availability of a norm of

L(U ) allows one to perform analysis on L(U ) so that a deeper understanding

of L(U ) may be achieved.

As an illustration, in this subsection, we will characterize invertibility of

linear mappings by using the norm.

Theorem 2.24 Let U be a finite-dimensional normed space. An element T

L(U ) is invertible if and only if there is a constant c > 0 such that

T (u) cu,

u U.

(2.6.9)

is invertible. Conversely, assume that T is invertible and T 1 L(U ) is its

inverse. Then 1 = I = T 1 T T 1 T implies that the norm

of an invertible mapping can never be zero. Thus, for any u U , we have

u = (T 1 T )(u) T 1 T (u) or T (u) (T 1 )1 u, u U ,

which establishes (2.6.9).

We now show that invertibility is a generic property for elements in L(U ).

Theorem 2.25 Let U be a finite-dimensional normed space and T L(U ).

(1) For any > 0 there exists an invertible element S L(U ) such that

S T < . This property says that the subset of invertible mappings in

L(U ) is dense in L(U ) with respect to the norm of L(U ).

73

(2) If T L(U ) is invertible, then there is some > 0 such that S L(U ) is

invertible whenever S satisfies S T < . This property says that the

subset of invertible mappings in L(U ) is open in L(U ) with respect to the

norm of L(U ).

Proof For any scalar , consider S = T I . If dim(U ) = n, there are at

most n possible values of for which S is not invertible. Now T S =

||I = ||. So for any > 0, there is a scalar , || < , such that S is

invertible. This proves (1).

We next consider (2). Let T L(U ) be invertible. Then (2.6.9) holds for

some c > 0. Let S L(U ) be such that S T < for some > 0. Then,

for any u U , we have

cu T (u) = (T S)(u) + S(u)

T Su + S(u) u + S(u), u U,

(2.6.10)

of Theorem 2.24 that S is invertible when S T < . Hence (2) follows

as well.

Let T L(U ). For a positive integer m, we consider Tm L(U ) given by

Tm =

m

1 k

T ,

k!

(2.6.11)

k=0

Tl Tm

m

T k

.

k!

(2.6.12)

k=l+1

m

1 k

T

m

k!

lim

(2.6.13)

k=0

1 k

e =

T ,

k!

T

k=0

(2.6.14)

74

Linear mappings

S, T L(U ) are commutative, we can verify the formula

eS eT eS eT = eS+T .

(2.6.15)

mapping T L(U ) is invertible. In fact, the relation (2.6.15) indicates that

(eT )1 = eT .

(2.6.16)

(t) = etT (t R), we have

(1)

(s)

(t) =

(s + t), s, t R,

(2)

(0) = I ,

and we say that

: R L(U ) defines a one-parameter group.

Furthermore, we also have

1

1

(

(t + h)

(t)) = (e(t+h)T etT )

h

h

1

1

= (ehT I )etT = etT (ehT I )

h

h

hk

=T

T k etT

(k + 1)!

k=0

hk

= etT

T k T , h R, h

= 0. (2.6.17)

(k + 1)!

k=0

1

(

(t + h)

(t)) = T etT = etT T .

h0 h

lim

(2.6.18)

(t) =

d tT

e = T etT = etT T = T

(t) =

(t)T ,

dt

t R.

(2.6.19)

In particular,

(0) = T , which intuitively says that T is the initial rate of

change of the one-parameter group

. One also refers to this relationship as

T generates

or T is the generator of

.

The relation (2.6.19) suggests that it is legitimate to differentiate the series

1 k k

(t) =

t T term by term:

k!

k=0

d 1 k k 1 d k k

1 k k

t T =

(t )T = T

t T ,

dt

k!

k! dt

k!

k=0

k=0

k=0

t R.

(2.6.20)

75

functions of T involving the exponentials of T may also be introduced accordingly. For example,

$

1# T

e + eT ,

2

$

1 # iT

cos T =

e + eiT ,

2

cosh T =

$

1# T

e eT ,

2

$

1 # iT

e eiT ,

sin T =

2i

sinh T =

(2.6.21)

(2.6.22)

are all well defined and enjoy similar properties of the corresponding classical

functions.

The matrix version of the discussion here can also be easily formulated analogously and is omitted.

Exercises

2.6.1 Let U be a finite-dimensional normed space and {Tn } L(U ) a

sequence of invertible mappings that converges to a non-invertible

mapping T L(U ). Show that Tn1 as n .

2.6.2 Let U and V be finite-dimensional normed spaces with the norms U

and V , respectively. For T L(U, V ), show that the induced norm

of T may also be evaluated by the expression

T = sup{T (u)V | u U, uU = 1}.

(2.6.23)

x1

.

n

.. R . (2.6.24)

xn

Use the norm for both Rm and Rn , correspondingly, and denote

by T the induced norm of the linear mapping T given in (2.6.24).

Show that

|aij | .

T = max

(2.6.25)

1im

j =1

2.6.4 Let T be the linear mapping defined in (2.6.24). Show that, if we use

the norm 1 for both Rm and Rn , correspondingly, and denote by

T 1 the induce norm of T , then

76

Linear mappings

T 1 = max

1j n

m

|aij | .

(2.6.26)

i=1

2.6.5 Let U be a finite-dimensional normed space over R or C and use N to

denote the subset of L(U ) consisting of all nilpotent mappings. Show

that N is not an open subset of L(U ).

2.6.6 Let U be a finite-dimensional normed space and T L(U ). Prove that

eT eT .

t

2.6.7 Let A R(n, n). Show that (eA )t = eA and that, if A is skewsymmetric, At = A, then eA is orthogonal.

2.6.8 Let A C(n, n). Show that (eA ) = eA and that, if A is antiHermitian, A = A, then eA is unitary.

2.6.9 Let A R(n, n) and consider the initial value problem of the following

system of differential equations

x1 (t)

.

x = x(t) =

.. ;

dx

= Ax,

dt

x1,0

.

x(0) = x0 =

.. .

xn (t)

xn,0

(2.6.27)

(t) = etA , the solution

of the problem (2.6.27) is simply given by x =

(t)x0 .

(ii) Moreover, use x(i) (t) to denote the solution of (2.6.27) when x0 =

ei where {e1 , . . . , en } is the standard basis of Rn , i = 1, . . . , n.

Show that

(t) = etA is the n n matrix with x(1) (t), . . . , x(n) (t)

as the n corresponding column vectors,

(2.6.28)

(This result provides a practical method for computing the matrix exponential, etA , which may also be viewed as the solution of the matrixvalued initial value problem

dX

= AX,

dt

X(0) = In ,

X R(n, n).)

: R R(2, 2) defined by

cos t sin t

(t) =

, t R.

sin t

cos t

(2.6.29)

(2.6.30)

77

(t) is a one-parameter group.

(ii) Find the generator, say A R(2, 2), of

(t).

(iii) Compute etA directly by the formula

etA =

k

t

k=0

k!

Ak

(2.6.31)

and verify

(t) = etA .

(iv) Use the practical method illustrated in Exercise 2.6.9 to obtain

the matrix exponential etA through solving two appropriate initial

value problems as given in (2.6.27).

2.6.11 For the functions of T L(U ) defined in (2.6.21) and (2.6.22), establish the identities

cosh2 T sinh2 T = I,

cos2 T + sin2 T = I.

(2.6.32)

d

d

(sinh tT ) = T cosh tT ,

(cosh tT ) = T sinh tT .

dt

dt

2.6.13 For T L(U ), establish the formulas

d

(sin tT ) = T cos tT ,

dt

d

(cos tT ) = T sin tT .

dt

(2.6.33)

(2.6.34)

3

Determinants

linear algebra the determinants. First we discuss some motivational examples. Next we present the definition and basic properties of determinants. Then

we study some applications of determinants.

We now present some examples occurring in geometry, algebra, and topology

that use determinants as a natural underlying computational tool.

Let u = (a1 , a2 ) and v = (b1 , b2 ) be nonzero vectors in R2 . We consider

the area of the parallelogram formed from using these two vectors as adjacent

edges. First we may express u in polar coordinates as

u = (a1 , a2 ) = u(cos , sin ).

(3.1.1)

Thus, we may easily resolve the vector v along the direction of u and the

direction perpendicular to u as follows

%

# %

&$

&

, sin

v = (b1 , b2 ) = c1 (cos , sin ) + c2 cos

2

2

(3.1.2)

= (c1 cos c2 sin , c1 sin c2 cos ), c1 , c2 R.

Here c2 may be interpreted as the length of the vector in the resolution that is

taken to be perpendicular to u. Hence, from (3.1.2), we can read off the result

c2 = (b2 cos b1 sin ) = |b2 cos b1 sin |.

78

(3.1.3)

79

Therefore, using (3.1.3) and then (3.1.1), we see that the area of the parallelogram under consideration is given by

= c2 u = |u cos b2 u sin b1 | = |a1 b2 a2 b1 |.

(3.1.4)

Thus we see that the quantity a1 b2 a2 b1 formed from the vectors (a1 , a2 )

and (b1 , b2 ) stands out, that will be called the determinant of the matrix

a1 a2

A=

,

(3.1.5)

b1 b2

written as det(A) or denoted by

a

1

b1

a2

.

b2

(3.1.6)

Since det(A) = , it is also referred to as the signed area of the parallelogram formed by the vectors (a1 , a2 ) and (b1 , b2 ).

We now consider volume. We shall apply some vector algebra over R3 to

facilitate our discussion.

We use and to denote the usual dot and cross products between vectors

in R3 . We use i, j, k to denote the standard mutually orthogonal unit vectors in

R3 that form a right-hand system. For any vectors

u = (a1 , a2 , a3 ) = a1 i + a2 j + a3 k,

v = (b1 , b2 , b3 ) = b1 i + b2 j + b3 k,

(3.1.7)

in R3 , we know that

u v = (a2 b3 a3 b2 )i (a1 b3 a3 b2 )j + (a1 b2 a2 b1 )k

(3.1.8)

is perpendicular to both u and v and u v gives us the area of the parallelogram formed from using u, v as two adjacent edges, which generalizes the

preceding discussion in R2 . To avoid the trivial situation, we assume u and v

are linearly independent. So u v

= 0 and

R3 = Span{u, v, u v}.

(3.1.9)

express w as

w = au + bv + c(u v),

a, b, c R.

(3.1.10)

From the geometry of the problem, we see that the volume of the parallelepiped

formed from using u, v, w as adjacent edges is given by

= u vc(u v) = |c|u v2

(3.1.11)

80

Determinants

because c(u v) is the height of the parallelepiped, with the bottom area

u v.

From (3.1.10), we have

w (u v) = cu v2 .

(3.1.12)

= |w (u v)|

= |c1 (a2 b3 a3 b2 ) c2 (a1 b3 a3 b2 ) + c3 (a1 b2 a2 b1 )|.

(3.1.13)

the matrix

c1 c2 c3

(3.1.14)

A = a1 a2 a3 ,

b1

b2

b3

c1 c2 c3

det(A) = a1 a2 a3

b b b

1

In view of the 2 2 determinants already defined, we may rewrite (3.1.15)

in the decomposed form

c1 c2 c3

a a

a a

a a

3

3

2

2

1

1

c2

+ c3

.

a1 a2 a3 = c1

b2 b3

b1 b3

b1 b2

b b b

1

(3.1.16)

Next we remark that a more standard motivational example for determinants is

Cramers rule or Cramers formulas for solving systems of linear equations.

For example, consider the 2 2 system

a1 x1 + a2 x2 = c1 ,

(3.1.17)

b1 x1 + b2 x2 = c2 ,

where a1 , a2 , b1 , b2 , c1 , c2 F. Multiplying the first equation by b1 , the second equation by a1 , and subtracting, we have

(a1 b2 a2 b1 )x2 = a1 c2 b1 c1 ;

(3.1.18)

81

we have

(a1 b2 a2 b1 )x1 = b2 c1 a2 c2 .

(3.1.19)

Thus, with the notation of determinants and in view of (3.1.18) and (3.1.19),

we may express the solution to (3.1.17) elegantly as

c a

a c

2

1

1 1

a a

c2 b2

b1 c2

1

2

, x2 =

, if

x1 =

(3.1.20)

= 0.

a a

a a

b1 b2

2

2

1

1

b1 b2

b1 b2

The extension of these formulas to 3 3 systems will be assigned as an

exercise.

Let f be a real-valued continuously differentiable function over R that maps

the closed interval [, ] ( < ) into [a, b] (a < b) so that the boundary

points are mapped into boundary points as well,

f : {, } {a, b}.

(3.1.21)

The function f maps the interval [, ] to cover the interval [a, b].

At a point t0 [, ] where the derivative of f is positive, f (t0 ) > 0, f

maps a small interval around t0 onto a small interval around f (t0 ) and preserves the orientation (from left to right) of the intervals; if f (t0 ) < 0, it

reverses the orientation. If f (t0 )

= 0, we say that t0 is a regular point of f .

For c [a, b], if f (t)

= 0 for any t f 1 (c), we call c a regular value of

f . It is clear that, f 1 (c) is a finite set when c is a regular value of f . If c is a

regular value of f , we define the integer

N (f, c) = N + (f, c) N (f, c),

(3.1.22)

where

N (f, c) = the number of points in f 1 (c) where f > 0.

(3.1.23)

If f (t)

= 0 for any t (, ), then N (f, c) = 1 and N (f, c) = 0

according to f (t) > 0 (t (, )), which leads to N (f, c) = 1 according

to f (t) > 0 (t (, )). In particular, N (f, c) is independent of c.

If f = 0 at some point, such a point is called a critical point of f . We

assume further that f has finitely many critical points in (, ), say t1 , . . . , tm

82

Determinants

if c is a regular value of f , we have

N (f, c) = the number of intervals (ti , ti+1 ) containing f 1 (c)

and satisfying f (t) > 0, t (ti , ti+1 ), i = 0, . . . , m.

(3.1.24)

Thus, we have seen that, although both N + (f, c) and N (f, c) may depend

on c, the quantity N (f, c) as given in (3.1.22) does not depend on c and is seen

to satisfy

f () = a, f () = b,

1,

N(f, c) =

(3.1.25)

1,

f () = b, f () = a,

0,

f () = f ().

This quantity may summarily be rewritten into the form of an integral

1

1

N(f, c) =

f (t) dt,

(3.1.26)

(f () f ()) =

ba

ba

and interpreted to be the number count for the orientation-preserving times

minus the orientation-reversing times the function f maps the interval [, ]

to cover the interval [a, b]. The advantage of using the integral representation

for N(f, c) is that it is clearly independent of the choice of the regular value

c. Indeed, the right-hand side of (3.1.26) is well defined for any differentiable

function f not necessarily having only finitely many critical points and the

specification of a regular value c for f becomes irrelevant. In fact, the integral

on the right-hand side of (3.1.26) is the simplest topological invariant called

the degree of the function f , which may now be denoted by

1

deg(f ) =

f (t) dt.

(3.1.27)

ba

The word topological is used to refer to the fact that a small alteration of f

cannot perturb the value of deg(f ) since deg(f ) may take only integer values and the right-hand side of (3.1.27), however, relies on the derivative of f

continuously.

As a simple application, we note that it is not hard to see that for any c

[a, b] the equation f (t) = c has at least one solution when deg(f )

= 0.

We next extend our discussion of topological invariants to two-dimensional

situations.

Let and C be two closed differentiable curves in R2 oriented counterclockwise and let

u:C

(3.1.28)

83

over an interval discussed above, we may express the number count for the

orientation-preserving times minus the orientation-reversing times u maps the

curve to cover the curve C in the form of a line integral,

1

du,

(3.1.29)

deg(u) =

|C|

where |C| denotes the length of the curve C and is the unit tangent vector

along the positive direction of C.

In the special situation when C = S 1 (the unit circle in R2 centered at the

origin), we write u as

u = (f, g),

f 2 + g 2 = 1,

(3.1.30)

= (g, f ).

(3.1.31)

over the interval [, ]. Then, inserting (3.1.30) and (3.1.31) into (3.1.29) and

using |S 1 | = 2, we have

1

(g, f ) (f , g ) dt

deg(u) =

2

1

=

(f g gf ) dt

2

1

f g

=

(3.1.32)

dt.

2 f g

Thus we see that the concept of determinant arises naturally again.

Let v be a vector field over R2 . Let be a closed curve in R2 where v

= 0.

Then

1

v =

v

(3.1.33)

v

defines a map from into S 1 . The index of the vector field v along the curve

is then defined as

ind(v| ) = deg(v ),

(3.1.34)

As an example, we consider a vector field v over R2 given by

v(x, y) = (x 2 y 2 , 2xy),

(x, y) R2 .

(3.1.35)

= (0, 0). So for any closed curve

not intersecting the origin, the quantity ind(v| ) is well defined.

84

Determinants

Let SR1 denote the circle of radius R > 0 in R2 centered at the origin. We

may parameterize SR1 by the polar angle : x = R cos , y = R sin ,

[0, 2 ]. With (3.1.35), we have

1 2

(x y 2 , 2xy)

R2

= (cos2 sin2 , 2 cos sin ) = (cos 2, sin 2 ).

vS 1 =

R

2

1

cos 2

ind(v|S 1 ) = deg(vS 1 ) =

R

R

2 0 2 sin 2

(3.1.36)

d = 2. (3.1.37)

2 cos 2

sin 2

For any closed curve enclosing but not intersecting the origin, we can

continuously deform it into a circle SR1 (R > 0), while staying away from

the origin in the process. By continuity or topological invariance, we obtain

ind(v| ) = ind(v|S 1 ) = 2. The meaning of this result will be seen in the

R

following theorem.

Theorem 3.1 Let v be a vector field that is differentiable over a bounded

domain
in R2 and let be a closed curve contained in
. If v

= 0 on

and ind(v| )

= 0, then there must be at least one point enclosed inside

where v = 0.

Proof Assume otherwise that v

= 0 in the domain enclosed by . Let

be another closed curve enclosed inside . Since may be obtained from

through a continuous deformation and v = 0 nowhere inside , we have

ind(v| ) = ind(v| ). On the other hand, if we parameterize the curve using

its arclength s, then

1

1

v (s) ds, v =

v , (3.1.38)

ind(v| ) = deg(v ) =

2

v

where (s) is the unit tangent vector along the unit circle S 1 at the image point

v (s) under the map v : S 1 . Rewrite v as

v (s) = (f (x(s), y(s)), g(x(s), y(s))).

(3.1.39)

Then

v (s) = fx x (s) + fy y (s), gx x (s) + gy y (s) .

(3.1.40)

The assumption on v gives us the uniform boundedness |fx |, |fy |, |gx |, |gy |

inside . Using this property and (3.1.40), we see that there is a -independent

constant C > 0 such that

v (s) C x (s)2 + y (s)2 = C.

In view of (3.1.38) and (3.1.41), we have

1

C

v (s) ds

| |,

1 |ind(v| )|

2

2

85

(3.1.41)

(3.1.42)

which leads to absurdness when the total arclength | | of the curve is made

small enough.

Thus, returning to the example (3.1.35), we conclude that the vector field v

has a zero inside any circle SR1 (R > 0) since we have shown that ind(v|S 1 ) =

R

2

= 0, which can only be the origin as seen trivially in (3.1.35) already.

We now use Theorem 3.1 to establish the celebrated Fundamental Theorem

of Algebra as stated below.

Theorem 3.2 Any polynomial of degree n 1 with coefficients in C of the

form

f (z) = an zn + an1 zn1 + + a0 , a0 , . . . , an1 , an C, an

= 0,

(3.1.43)

must have a zero in C. That is, there is some z0 C such that f (z0 ) = 0.

Proof Without loss of generality and for sake of simplicity, we may assume

an = 1 otherwise we may divide f (z) by an .

Let z = x + iy, x, y R, and write f (z) as

f (z) = P (x, y) + iQ(x, y),

(3.1.44)

v(x, y) = (P (x, y), Q(x, y)).

(3.1.45)

Then it is clear that v(x, y) = |f (z)| and it suffices to show that v vanishes

at some (x0 , y0 ) R2 .

In order to simplify our calculation, we consider a one-parameter deformation of f (z) given by

f t (z) = zn + t (an1 zn1 + + a0 ),

t [0, 1],

(3.1.46)

and denote the correspondingly constructed vector field by v t (x, y). So on the

circle SR1 = {(x, y) R2 | (x, y) = |z| = R} (R > 0), we have the uniform

lower estimate

86

Determinants

v t (x, y) = |f t (z)|

1

1

R n 1 |an1 | |a0 | n

R

R

C(R),

t [0, 1].

(3.1.47)

Thus, when R is sufficiently large, we have C(R) 1 (say). For such a choice

of R, by topological invariance, we have

#

$

#

$

#

$

ind v|S 1 = ind v 1 |S 1 = ind v 0 |S 1 .

(3.1.48)

R

On the other hand, over SR1 we may again use the polar angle : x =

R cos , y = R sin , or z = Rei , to represent f 0 as f 0 (z) = R n ein . Hence

v 0 = R n (cos n, sin n ). Consequently,

vS0 1 =

R

1 0

v = (cos n, sin n ).

v 0

$

# $

#

ind v 0 |S 1 = deg vS0 1

R

R

2

1

cos n

=

2 0 n sin n

d = n.

n cos n

sin n

(3.1.49)

(3.1.50)

rem 3.1, we conclude that v must vanish somewhere inside the circle SR1 .

Use to denote a closed surface in R3 and S 2 the standard unit sphere in R3 .

We may also consider a map u : S 2 . Since the orientation of S 2 is given

by its unit outnormal vector, say , we may analogously express the number

count, for the number of times that u covers S 2 in an orientation-preserving

manner minus the number of times that u covers S 2 in an orientation-reversing

manner, in the form of a surface integral, also called the degree of the map

u : S 2 , by

1

deg(u) = 2

d,

(3.1.51)

|S |

where d is the vector area element over S 2 induced from the map u.

To further facilitate computation, we may assume that is parameterized

by the parameters s, t over a two-dimensional domain
and u = (f, g, h),

where f, g, h are real-valued functions of s, t so that f 2 + g 2 + h2 = 1. At the

image point u, the unit outnormal of S 2 at u, is simply u itself. Moreover, the

vector area element at u under the mapping u can be represented as

d =

87

u u

dsdt.

s

t

(3.1.52)

u u

1

u

dsdt

deg(u) =

4

s

t

f g h

1

=

fs gs hs dsdt.

4

f g h

t

t

t

(3.1.53)

Exercises

3.1.1 For the 3 3 system of equations

a1 x1 + a2 x2 + a3 x3

b1 x1 + b2 x2 + b3 x3

c1 x1 + c2 x2 + c3 x3

d1 ,

d2 ,

d3 ,

(3.1.54)

some 3 3 determinants.

3.1.2 Let f : over R [, ] [a, b] be a real-valued continuously differentiable function satisfying {f (), f ()} {a, b}, where , , a, b R

and < and a < b. Show that if deg(f )

= 0 then f (t) = c has a

solution for any c (a, b).

3.1.3 Consider the vector fields or maps f, g : R2 R2 defined by

f (x, y) = (ax, by), g(x, y) = (a 2 x 2 b2 y 2 , 2abxy), (x, y) R2 ,

(3.1.55)

where a, b > 0 are constants.

(i) Compute the indices of f and g around a suitable closed curve

around the origin of R2 .

(ii) What do you notice? Do the results depend on a, b? In particular,

what are the numbers (counting multiplicities) of zeros of the maps

f and g?

(iii) If we make some small alterations of f and g by a positive parameter > 0 to get (say)

f (x, y) = (ax , by),

g (x, y) = (a 2 x 2 b2 y 2 , 2abxy), (x, y) R2 ,

(3.1.56)

88

Determinants

curve? What happens to the zeros of f and g ?

3.1.4 Consider the following simultaneous system of nonlinear equations

3

x 3xy 2 = 5 cos2 (x + y),

(3.1.57)

2 2

3x 2 y y 3 = 2ex y .

Use the topological method in this section to prove that the system has

at least one solution.

3.1.5 Consider the stereographic projection of S 2 sited in R3 with the Cartesian coordinates x, y, z onto the xy-plane through the south pole

(0, 0, 1) which induces a parameterization of S 2 by R2 given by

f =

2x

2y

1 x2 y2

,

g

=

,

h

=

, (x, y) R2 ,

1 + x2 + y2

1 + x2 + y2

1 + x2 + y2

(3.1.58)

we have deg(u) = 1. Verify this result by computing the integral

g h

f

1

deg(u) =

(3.1.59)

fx gx hx dxdy.

4 R2

f g h

y

3.1.6 The hedgehog map is a map S 2 S 2 defined in terms of the parameterization of R2 by polar coordinates r, by the expression

u = (cos(n ) sin f (r), sin(n ) sin f (r), cos f (r)),

(3.1.60)

satisfying the boundary condition f (0) = and f () = 0. Compute

2

u u

1

u

d dr

(3.1.61)

deg(u) =

4 0

r

0

and explain your result.

Motivated by the practical examples shown in the previous section, we now

systematically develop the notion of determinants. There are many ways to

89

the simplest.

Definition 3.3 Consider A F(n, n) (n 1). If n = 1, then A = (a) (a F)

and the determinant of A, det(A), is defined to be det(A) = a; if A = (aij )

with n 2, the minor Mij of the entry aij is the determinant of the (n 1)

(n 1) submatrix of A obtained from deleting the ith row and j th column of

A occupied by aij and the cofactor Cij is given by

Cij = (1)i+j Mij ,

i, j = 1, . . . , n.

(3.2.1)

det(A) =

n

(3.2.2)

ai1 Ci1 .

i=1

The formula (3.2.2) is also referred to as the cofactor expansion of the determinant according to the first column.

This definition indicates that if a column of an n n matrix A is zero then

det(A) = 0. To show this, we use induction. When n = 1, it is trivial. Assume

that the statement is true at n 1 (n 2). We now prove the statement at

n (n 2). In fact, if the first column of A is zero, then det(A) = 0 simply

by the definition of determinant (see (3.2.2)); if another column rather than

the first column of A is zero, then all the cofactors Ci1 vanish by the inductive

assumption, which still results in det(A) = 0. The definition also implies that if

A = (aij ) is upper triangular then det(A) is the product of its diagonal entries,

det(A) = a11 ann , as may be shown by induction as well.

The above definition of a determinant immediately leads to the following

important properties.

Theorem 3.4 Consider the n n matrices A, B given as

a11

A = ak1

an1

...

...

...

a1n

akn ,

ann

a11

B = rak1

an1

...

...

...

a1n

rakn .

ann

(3.2.3)

90

Determinants

k = 1, . . . , n. Then we have det(B) = r det(A).

Proof We prove the theorem by induction.

When n = 1, the statement of the theorem is trivial.

Assume that the statement is true for (n 1) (n 1) matrices when n 2.

Now for the n n matrices given in (3.2.3), use CijA , and CijB to denote

the cofactors of A and B, respectively, i, j = 1, . . . , n. Then, (3.2.3) and the

inductive assumption give us

B

A

= Ck1

;

Ck1

B

A

Ci1

= rCi1

,

i

= k.

(3.2.4)

Therefore, we arrive at

B

B

A

A

ai1 Ci1

+ rak1 Ck1

=

ai1 rCi1

+ rak1 Ck1

= r det(A).

det(B) =

i

=k

i

=k

(3.2.5)

The proof is complete.

This theorem implies that if a row of a matrix A is zero then det(A) = 0.

As an application, we show that if an n n matrix A = (aij ) is lower

triangular then det(A) = a11 ann . In fact, when n = 1, there is nothing

to show. Assume that the formula is true at n 1 (n 2). At n 2, the

first row of the minor Mi1 of A vanishes for each i = 2, . . . , n. So Mi1 = 0,

i = 2, . . . , n. However, the inductive assumption gives us M11 = a22 ann .

Thus det(A) = a11 (1)1+1 M11 = a11 ann as claimed.

Therefore, if an n n matrix A = (aij ) is either upper or lower triangular,

we infer that there holds det(At ) = det(A), although, later, we will show that

such a result is true for general matrices.

Theorem 3.5 For the n n matrices A = (aij ), B = (bij ), C = (cij ) which

have identical rows except the kth row in which ckj = akj + bkj , j = 1, . . . , n,

we have det(C) = det(A) + det(B).

Proof We again use induction on n.

The statement is clear when n = 1.

Assume that the statement is valid for the n 1 case (n 2).

For A, B, C given in the theorem with n 2, with the notation in the proof

of Theorem 3.4 and in view of the inductive assumption, we have

C

A

Ck1

= Ck1

;

C

A

B

Ci1

= Ci1

+ Ci1

,

i

= k.

(3.2.6)

Consequently,

det(C) =

91

C

C

ai1 Ci1

+ ck1 Ck1

i

=k

A

B

A

ai1 (Ci1

+ Ci1

) + (ak1 + bk1 )Ck1

i

=k

A

A

ai1 Ci1

+ ak1 Ck1

+

i

=k

B

A

ai1 Ci1

+ bk1 Ck1

i

=k

= det(A) + det(B),

(3.2.7)

as asserted.

Theorem 3.6 Let A, B be two n n (n 2) matrices so that B is obtained

from interchanging any two rows of A. Then det(B) = det(A).

Proof We use induction on n.

At n = 2, we can directly check that the statement of the theorem is true.

Assume that the statement is true at n 1 2.

Let A, B be n n (n 3) matrices given by

a11 . . . a1n

a11 . . . a1n

a

i1 . . . ain

j 1 . . . aj n

A=

(3.2.8)

, B = ,

aj 1 . . . aj n

ai1 . . . ain

an1 . . . ann

an1 . . . ann

where j = i + k for some k 1. We observe that it suffices to prove the

adjacent case when k = 1 because when k 2 we may obtain B from A simply by interchanging adjacent rows k times downwardly and then k 1 times

upwardly, which gives rise to an odd number of adjacent row interchanges.

For the adjacent row interchange, j = i +1, the inductive assumption allows

us to arrive at the following relations between the minors of the matrices A and

B immediately,

B

A

= Mk1

,

Mk1

k

= i, i + 1;

B

A

Mi1

= Mi+1,1

,

B

A

Mi+1,1

= Mi1

, (3.2.9)

which implies that the corresponding cofactors of A and B all differ by a sign,

B

A

= Ck1

,

Ck1

k

= i, i + 1;

B

A

Ci1

= Ci+1,1

,

B

A

Ci+1,1

= Ci1

. (3.2.10)

92

Determinants

Hence

det(B) =

B

B

B

ak1 Ck1

+ ai+1,1 Ci1

+ ai1 Ci+1,1

k

=i,i+1

A

A

A

ak1 (Ck1

) + ai+1,1 (Ci+1,1

) + ai1 (Ci1

)

k

=i,i+1

= det(A),

(3.2.11)

as expected.

This theorem indicates that if two rows of an n n (n 2) matrix A are

identical then det(A) = 0. Thus adding a multiple of a row to another row of

A does not alter the determinant ofA:

a11 . . . a1n

...

a1n

a11

a

...

ain

ai1

i1 . . . ain

=

rai1 + aj 1 . . . rain + aj n rai1 . . . rain

...

ann

an1

an1 . . . ann

a11 . . . a1n

a

i1 . . . ain

(3.2.12)

+ = det(A).

aj 1 . . . aj n

an1 . . . ann

The above results provide us with practical computational techniques when

evaluating the determinant of an n n matrix A. In fact, we may perform the

following three types of permissible row operations on A.

(1) Multiply a row of A by a nonzero scalar. Such an operation may also

be realized by multiplying A from the left by the matrix obtained from

multiplying the corresponding row of the n n identity matrix I by the

same scalar.

(2) Interchange any two rows of A when n 2. Such an operation may also

be realized by multiplying A from the left by the matrix obtained from

interchanging the corresponding two rows of the n n identity matrix I .

93

(3) Add a multiple of a row to another row of A when n 2. Such an operation may also be realized by multiplying A from the left by the matrix

obtained from adding the same multiple of the row to another row, correspondingly, of the n n identity matrix I .

The matrices constructed in the above three types of permissible row operations are called elementary matrices of types 1, 2, 3. Let E be an elementary

matrix of a given type. Then E is invertible and E 1 is of the same type. More

precisely, if E is of type 1 and obtained from multiplying a row of I by the

scalar r, then det(E) = r det(I ) = r and E 1 is simply obtained from multiplying the same row of I by r 1 , resulting in det(E 1 ) = r 1 ; if E is of type

2, then E 1 = E and det(E) = det(E 1 ) = det(I ) = 1; if E is of type 3

and obtained from adding an r multiple of the ith row to the j th row (i

= j )

of I , then E 1 is obtained from adding a (r) multiple of the ith row to the

j th row of I and det(E) = det(E 1 ) = det(I ) = 1. In all cases,

det(E 1 ) = det(E)1 .

(3.2.13)

In conclusion, the properties of determinant under permissible row operations may be summarized collectively as follows.

Theorem 3.7 Let A be an n n matrix and E be an elementary matrix of the

same dimensions. Then

det(EA) = det(E) det(A).

(3.2.14)

operations on A to reduce it into an upper triangular matrix, say U , whose

determinant is simply the product of whose diagonal entries. Thus, if we

express A as Ek E1 A = U where E1 , . . . , Ek are some elementary matrices, then Theorem 3.7 gives us the relation

det(Ek ) det(E1 ) det(A) = det(U ).

(3.2.15)

respect to column operations.

Theorem 3.8 The conclusions of Theorems 3.4, 3.5, and 3.6 hold when the

statements about the row vectors are replaced correspondingly with the column

vectors of matrices.

Proof Using induction, it is easy to see that the conclusions of Theorems 3.4

and 3.5 hold. We now prove that the conclusion of Theorem 3.6 holds when

94

Determinants

the row interchange there is replaced with column interchange. That is, we

show that if two columns in an n n matrix A are interchanged, its determinant will change sign. This property is not so obvious since our definition of

determinant is based on the cofactor expansion by the first column vector and

an interchange of the first column with another column alters the first column

of the matrix. The effect of the value of determinant with respect to such an

alteration needs to be examined closely, which will be our task below.

We still use induction.

At n = 2 the conclusion may be checked directly.

Assume the conclusion holds at n 1 1.

We now prove the conclusion at n 3. As before, it suffices to establish the

conclusion for any adjacent column interchange.

If the column interchange does not involve the first column, we see that the

conclusion about the sign change of the determinant clearly holds in view of

the inductive assumption and the cofactor expansion formula (3.2.2) since all

the cofactors Ci1 (i = 1, . . . , n) change their sign exactly once under any pair

of column interchange.

Now consider the effect of an interchange of the first and second columns

of A. It can be checked that such an operation may be carried out through

multiplying A by the matrix F from the right where F is obtained from the

n n identity matrix I by interchanging the first and second columns of I . Of

course, det(F ) = 1.

Let E1 , . . . , Ek be a sequence of elementary matrices and U = (uij ) an

upper triangular matrix so that Ek E1 A = U . Then we have

UF = 0

(3.2.16)

0 u33 u3n .

0

0 unn

Thus the cofactor expansion formula down the first column, as stated in Definition 3.3, and the inductive assumption at n 1 lead us to the result

0 u23

u11 u13

0 u33

0 u33

det(U F ) = u11

u22

0 unn

0 unn

= u11 u22 unn = det(U ).

(3.2.17)

95

det(Ek E1 AF ) = det(U F ) = det(U )

= det(Ek ) det(E1 ) det(A).

(3.2.18)

det(AF ) = det(A) as desired.

Theorem 3.8 gives us additional practical computational techniques when

evaluating the determinant of an nn matrix A because it allows us to perform

the following three types of permissible column operations on A.

(1) Multiply a column of A by a nonzero scalar. Such an operation may also

be realized by multiplying A from the right by the matrix obtained from

multiplying the corresponding column of the n n identity matrix I by

the same scalar.

(2) Interchange any two columns of A when n 2. Such an operation may

also be realized by multiplying A from the right by the matrix obtained

from interchanging the corresponding two columns of the n n identity

matrix I .

(3) Add a multiple of a column to another column of A when n 2. Such

an operation may also be realized by multiplying A from the right by the

matrix obtained from adding the same multiple of the column to another

column, correspondingly, of the n n identity matrix I .

The matrices constructed in the above three types of permissible column

operations are simply the elementary matrices defined and described earlier.

Like those for permissible row operations, the properties of a determinant

under permissible column operations may be summarized collectively as well,

as follows.

Theorem 3.9 Let A be an n n matrix and E be an elementary matrix of the

same dimensions. Then

det(AE) = det(A) det(E).

(3.2.19)

properties of determinants.

First we show that a determinant is invariant under matrix transpose.

Theorem 3.10 Let A be an n n matrix. Then

det(A) = det(At ).

(3.2.20)

96

Determinants

Proof

Ek E1 A = U

(3.2.21)

is an upper triangular matrix. Thus from U t = At E1t Ekt and Theorem 3.9

we have

det(U ) = det(U t ) = det(At E1t Ekt ) = det(At ) det(E1t ) det(Ekt ).

(3.2.22)

Comparing (3.2.22) with (3.2.15) and noting det(El ) = det(Elt ), l = 1, . . . , k,

because an elementary matrix is either symmetric or lower or upper triangular,

we arrive at det(A) = det(At ) as claimed.

Next we show that a determinant preserves matrix multiplication.

Theorem 3.11 Let A and B be two n n matrices. Then

det(AB) = det(A) det(B).

(3.2.23)

holds for an upper triangular matrix U . There are two cases to be treated

separately.

(1) U has a zero row. Then det(U ) = 0. Moreover, by the definition of matrix

multiplication, we see that U B also has a zero row. Hence det(U B) = 0.

On the other hand (3.2.21) leads us to Ek E1 AB = U B. So

det(Ek ) det(E1 ) det(A) = det(U ) = 0,

det(Ek ) det(E1 ) det(AB) = 0.

(3.2.24)

(2) U has no zero row. Hence unn

= 0. Using (type 3) row operations if necessary we may assume uin = 0 for all i n 1. Therefore un1,n1

= 0.

Thus, using more (type 3) row operations if necessary, we may eventually

assume that U is made diagonal with u11

= 0, . . . , unn

= 0. Using (type

1) row operations if necessary we may assume u11 = = unn = 1.

That is, U = I . Thus we get Ek E1 A = I and Ek E1 AB = B.

Consequently,

det(Ek ) det(E1 ) det(A) = 1,

det(Ek ) det(E1 ) det(AB) = det(B),

which immediately lead us to the anticipated conclusion (3.2.23).

The proof is complete.

(3.2.25)

97

The formula (3.2.23) can be used immediately to derive a few simple but

basic conclusions about various matrices.

For example, if A F(n, n) is invertible, then there is some B F(n, n)

such that AB = In . Thus det(A) det(B) = det(AB) = det(In ) = 1, which

implies that det(A)

= 0. In other words, the condition det(A)

= 0 is necessary for any A F(n, n) to be invertible. In the next section, we shall show

that this condition is also sufficient. As another example, if A R(n, n) is orthogonal, then AAt = In . Hence (det(A))2 = det(A) det(At ) = det(AAt ) =

det(In ) = 1. In other words, the determinant of an orthogonal matrix can only

take values 1. Similarly, if A C(n, n) is unitary, then the condition

AA = In leads us to the conclusion

t

(3.2.26)

(cf. (3.2.38)). That is, the determinant of a unitary matrix is of modulus one.

Below we show that we can make a cofactor expansion along any column

or row to evaluate a determinant.

Theorem 3.12 Let A = (aij ) be an n n matrix and C = (Cij ) its cofactor

matrix. Then

det(A) =

=

n

i=1

n

aik Cik

akj Ckj ,

k = 1, . . . , n.

(3.2.27)

j =1

along any column or any row of A.

Proof We first consider the column expansion case.

We will make induction on k = 1, . . . , n.

If k = 1, there is nothing to show.

Assume the statement is true at k 1 and n 2. Interchanging the kth and

(k +1)th columns and using the inductive assumption that a cofactor expansion

can be made along the kth column of the matrix with the two columns already

interchanged, we have

det(A) = a1,k+1 (1)1+k M1,k+1 + + ai,k+1 (1)i+k Mi,k+1

+ + an,k+1 (1)n+k Mn,k+1 ,

(3.2.28)

98

Determinants

det(A) =

n

(3.2.29)

i=1

t

We use MijA to denote the minor of the (i, j )th entry of the matrix At (i, j =

1, . . . , n). Applying Theorem 3.10 and Definition 3.3 we have

det(A) = det(At ) =

n

j =1

t

n

j =1

which establishes the legitimacy of the cofactor expansion formula along the

first row of A. The validity of the cofactor expansion along an arbitrary row

may be proved by induction as done for the column case.

Assume that A F(n, n) takes a boxed upper triangular form,

A1 A3

,

A=

0 A2

(3.2.31)

where A1 F(k, k), A2 F(l, l), A3 F(k, l), and k + l = n. Then we have

the useful formula

det(A) = det(A1 ) det(A2 ).

(3.2.32)

an upper triangular form U1 whose diagonal entries are u11 , . . . , ukk (say).

Thus det(U1 ) = u11 ukk . Likewise, we may also use a few permissible

row operations to reduce A2 into an upper triangular form U2 whose diagonal

entries are uk+1,k+1 , . . . , unn (say). Thus det(U2 ) = uk+1,k+1 unn . Now

apply the same sequences of permissible row operations on A. The boxed

upper triangular form of A allows us to reduce A into the following upper

triangular form

U1 A4

U=

.

(3.2.33)

0 U2

Since the diagonal entries of U are u11 , . . . , ukk , uk+1,k+1 , . . . , unn , we have

det(U ) = u11 ukk uk+1,k+1 unn = det(U1 ) det(U2 ).

(3.2.34)

we see that (3.2.34) implies (3.2.32).

A1 0

,

A=

A3 A2

99

(3.2.35)

where A1 F(k, k), A2 F(l, l), A3 F(l, k), and k + l = n, then (3.2.32)

still holds. Indeed, taking transpose in (3.2.35), we have

At1 At3

t

,

(3.2.36)

A =

0 At2

which becomes a boxed upper triangular matrix studied earlier. Thus, using

Theorem 3.10 and (3.2.32), we obtain

det(A) = det(At ) = det(At1 ) det(At2 ) = det(A1 ) det(A2 ),

(3.2.37)

as anticipated.

Exercises

3.2.1 For A C(n, n) use Definition 3.3 to establish the property

det(A) = det(A),

(3.2.38)

for all entries of A.

3.2.2 Let E F(n, n) be such that each row and each column of E can only

have exactly one nonzero entry which may either be 1 or 1. Show that

det(E) = 1.

3.2.3 In F(n, n), anti-upper triangular and anti-lower triangular matrices

are of the forms

0

0 a1n

.

.

..

..

.

.

.

.

.

..

..

..

.

.

.

0

.

,

,

.

.

.

.

.

.

..

..

..

..

..

..

..

0

an1 0

an1 an2 ann

0

(3.2.39)

respectively. Establish the formulas to express the determinants of these

matrices in terms of the anti-diagonal entries an1 , . . . , a1n .

100

Determinants

1

det

1

1

y

0

1 1

0 0

= txyz yz tz ty,

z 0

0 t

x, y, z, t R.

(3.2.40)

3.2.5 Let A, B F(3, 3) and assume that the first and second columns of

A are same as the first and second columns of B. If det(A) = 5 and

det(B) = 2, find det(3A 2B) and det(3A + 2B).

3.2.6 (Extension of Exercise 3.2.5) Let A, B F(n, n) such that only the j th

columns of them are possibly different. Establish the formula

det(aA + bB) = (a + b)n1 (a det(A) + b det(B)) ,

a, b F.

(3.2.41)

3.2.7 Let A(t) = (aij (t)) R(n, n) be such that each entry aij (t) is a differentiable function of t R. Establish the differentiation formula

n

daij (t)

d

(3.2.42)

det(A(t)) =

Cij (t),

dt

dt

i,j =1

where Cij (t) is the cofactor of the entry aij (t), i, j = 1, . . . , n, in the

matrix A(t).

3.2.8 Prove the formula

x a1 a2 an

a

1 x a2 an

n

n

'

a1 a2 x an

det

ai

(x ai ).

= x +

.

..

.. . .

..

i=1

i=1

.

. .

.

.

.

a1

a2

a3

x

(3.2.43)

with coefficients in C. Show that for any n + 2 numbers c1 , . . . , cn+2

in C, there holds

p1 (c2 )

p1 (cn+2 )

p1 (c1 )

p2 (c1 )

p2 (c2 )

p2 (cn+2 )

det

= 0. (3.2.44)

..

..

..

..

.

.

.

.

101

x 1 0

0

0

x 1

0

0

0

..

0

.

0

x

0

0

det

..

..

..

..

..

..

.

.

.

.

.

.

0

0

x

1

0

a1

a0

a2

= an x + an1 x

n

n1

an1

an

+ + a1 x + a0 .

a2

an

a1

a1

a2

an

det

.

.

..

..

..

..

.

.

a1

a2

an

n

n n1

ai .

= (1)

(3.2.45)

(3.2.46)

i=1

3.2.12 Let A F(n, n) be such that AAt = In and det(A) < 0. Show that

det(A + In ) = 0.

3.2.13 Let A F(n, n) such that the entries of A are either 1 or 1. Show that

det(A) is an even integer when n 2.

3.2.14 Let A = (aij ) R(n, n) satisfy the following diagonally dominant

condition:

|aii | >

|aij |, i = 1, . . . , n.

(3.2.47)

j

=i

= 0. (This result is also known as the Levy

Desplanques theorem.)

3.2.15 (A refined version of the previous exercise) Let A = (aij ) R(n, n)

satisfy the following positive diagonally dominant condition:

|aij |, i = 1, . . . , n.

(3.2.48)

aii >

j

=i

Show that det(A) > 0. (This result is also known as the Minkowski

theorem.)

102

Determinants

What happens when n is even?

3.2.17 Let , F(1, n). Establish the formula

det(In t ) = 1 t .

(3.2.49)

100

x

1

f (x1 , . . . , xn ) = det x2

.

.

.

xn

x1

x2

0

..

.

1

..

.

..

.

..

xn

(3.2.50)

geometrically.

Let A = (aij ) be an n n (n 2) matrix and let C = (Cij ) be the associated

cofactor matrix. For k, l = 1, . . . , n, we may apply Theorem 3.12 to obtain the

relations

n

aik Cil = 0,

k

= l,

(3.3.1)

akj Clj = 0,

k

= l.

(3.3.2)

i=1

n

j =1

In fact, it is easy to see that the left-hand side of (3.3.1) is the cofactor expansion of the determinant along the lth column of such a matrix that is obtained

from A through replacing the lth column by the kth column of A whose value

must be zero and the left-hand side of (3.3.2) is the cofactor expansion of the

determinant along the lth row of such a matrix that is obtained from A through

replacing the lth row by the kth row of A whose value must also be zero.

We can summarize the properties stated in (3.2.27), (3.3.1), and (3.3.2) by

the expressions

C t A = det(A)In ,

AC t = det(A)In .

(3.3.3)

103

adjugate matrix of A, denoted by adj(A), is the transpose of the cofactor matrix

of A:

adj(A) = C t .

(3.3.4)

As a consequence of this definition and (3.3.3) we have

adj(A)A = A adj(A) = det(A)In ,

(3.3.5)

Theorem 3.14 Let A be an n n matrix. Then A is invertible if and only if

det(A)

= 0. Furthermore, if A is invertible, then A1 may be expressed as

A1 =

1

adj(A).

det(A)

(3.3.6)

Proof If det(A)

= 0, from (3.3.5) we arrive at (3.3.6). Conversely, if A1

exists, then from AA1 = In and Theorem 3.11 we have det(A) det(A1 ) = 1.

Thus det(A)

= 0.

As an important application, we consider the unique solution of the system

Ax = b

(3.3.7)

unknowns, and b = (b1 , . . . , bn )t a given non-homogeneous right-hand-side

vector.

In such a situation we can use (3.3.6) to get

x=

1

adj(A)b.

det(A)

(3.3.8)

of A) we may read off to obtain the result

xi =

1

1

Aij bj =

bj Cj i

det(A)

det(A)

n

j =1

j =1

det(Ai )

=

,

det(A)

i = 1, . . . , n,

(3.3.9)

where Ai is the matrix obtained from A after replacing the ith column of A by

the vector b, i = 1, . . . , n.

104

Determinants

The formulas stated in (3.3.9) are called Cramers formulas. Such a solution

method is also called Cramers rule.

Let A F(m, n) and of rank k. Then there are k row vectors of A which are

linearly independent. Use B to denote the submatrix of A consisting of those k

row vectors. Since B is of rank k we know that there are k column vectors of B

which are linearly independent. Use C to denote the submatrix of B consisting

of those k column vectors. Then C is a submatrix of A which lies in F(k, k)

and is of rank k. In particular det(C)

= 0. In other words, we have shown that

if A is of rank k then A has a k k submatrix of nonzero determinant.

To end this section, we consider a practical problem as an application of

determinants: The unique determination of a polynomial by interpolation.

Let p(t) be a polynomial of degree (n 1) 1 over a field F given by

p(t) = an1 t n1 + + a1 t + a0 ,

(3.3.10)

illustration to follow, we may assume that n is sufficiently large (say n 5).

Therefore we have the simultaneous system of equations

p1 ,

a0 + a1 t2 + + an2 t2n2

p2 ,

+ an1 t2n1

(3.3.11)

pn ,

the determinant

det(A) =

t1

t12

t1n2

t2

t22

t2n2

tn

tn2

tnn2

t1n1

t2n1

.

t n1

(3.3.12)

Adding the (t1 ) multiple of the second last column to the last column,. . . ,

the (t1 ) multiple of the second column to the third column, and the (t1 )

multiple of the first column to the second column, we get

det(A)

1

1

=

1

(t2 t1 )

t22 (t2 t1 )

t2n3 (t2 t1 )

tnn3 (tn t1 )

t2n3 t2n2

t3n3 t3n2

(tn t1 ) tn (tn t1 )

1

t22

t2

n

'

1

t3

t32

(ti t1 )

=

i=2

1

tn

tn2

tnn3

105

n2

t2 (t2 t1 )

tnn2 (tn t1 )

0

.

t n2

(3.3.13)

det(A)

n

n

n

'

'

'

(ti t1 )

(tj t2 )

(tk tn2 ) (tn tn1 )

=

j =3

i=2

'

(tj ti ).

k=n1

(3.3.14)

1i<j n

such a condition the coefficients a0 , a1 , . . . , an1 are uniquely determined.

The determinant (3.3.12) is called the Vandermonde determinant.

If A, B F(n, n) are similar, there is an invertible C F(n, n) such that

A = C 1 BC, which gives us det(A) = det(B). That is, similar matrices

have the same determinant. In view of this fact and the fact that the matrix

representations of a linear mapping from a finite-dimensional vector space into

itself with respect to different bases are similar, we may define the determinant

of a linear mapping, say T , denoted by det(T ), to be the determinant of the

matrix representation of T with respect to any basis. In this situation, we see

that T is invertible if and only if det(T )

= 0.

Exercises

3.3.1 Show that A F(m, n) is of rank k if and only if there is a k k

submatrix of A whose determinant is nonzero and that the determinant

of any square submatrix of a larger size (if any) is zero.

3.3.2 Let A R(n, n) be an orthogonal matrix. Prove that adj(A) = At or

At depending on the sign of det(A).

106

Determinants

det(adj(A)) = (det(A))n1 .

(3.3.15)

so.

3.3.4 For A F(n, n) (n 2) prove the rank relations

n,

r(adj(A)) =

1,

0,

r(A) = n,

r(A) = n 1,

(3.3.16)

r(A) n 2.

3.3.6 For A F(n, n) where n 3 show that adj(adj(A)) = (det(A))n2 A.

What happens when n = 2?

3.3.7 For A F(n, n) where n 2 show that adj(At ) = (adj(A))t .

3.3.8 Let A R(n, n) satisfy adj(A) = At . Prove that A

= 0 if and only if

det(A)

= 0.

3.3.9 For A, B R(n, n) show that adj(AB) = adj(B)adj(A). Thus, if A is

idempotent, so is adj(A), and if A is nilpotent, so is adj(A).

3.3.10 For A = (aij ) F(n, n) consider the linear system

a11 x1 + + a1n xn

an1 x1 + + ann xn

3.3.11

3.3.12

3.3.13

3.3.14

0,

0.

(3.3.17)

= 0 then the space of solutions

of (3.3.17) is spanned by any nonzero column vector of adj(A).

Use Cramers rule and the Vandermonde determinant to find the

quadratic polynomial p(t) with coefficients in C whose values at

t = 1, 1 + i, 3 are 2, 0, 5, respectively.

Let p(t) be a polynomial of degree n 1 given in (3.3.10). Use

Cramers rule and the Vandermonde determinant to prove that p(t) cannot have n distinct zeros unless it is the zero polynomial.

Let A F(m, n), B F(n, m), and m > n. Show that det(AB) = 0.

For U = F(n, n) define T L(U ) by T (A) = AB BA for A U

where B U is fixed. Show that for such a linear mapping T we have

det(T ) = 0 no matter how B is chosen.

107

3.4 Characteristic

polynomials and CayleyHamilton theorem

We first consider the concrete case of matrices.

Let A = (aij ) be an n n matrix over a field F. We first consider the linear

mapping TA : Fn Fn induced from A defined by

x1

.

n

(3.4.1)

TA (x) = Ax, x =

.. F .

xn

Recall that an eigenvalue of TA is a scalar F such that the null-space

E = N (TA I ) = {x Fn | TA (x) = x},

(3.4.2)

eigenspace of TA , whose nonzero vectors are called eigenvectors. The eigenvalues, eigenspaces, and eigenvectors of TA are also often simply referred to

as those of the matrix A. The purpose of this section is to show how to use

determinant as a tool to find the eigenvalues of A.

If is an eigenvalue of A and x an eigenvector, then Ax = x. In other

words, x is a nonzero solution of the equation (In A)x = 0. Hence the

matrix In A is singular. Therefore satisfies

det(In A) = 0.

(3.4.3)

Of course the converse is true as well: If satisfies (3.4.3) then (In A)x = 0

has a nontrivial solution which indicates that is an eigenvalue of A. Consequently the eigenvalues of A are the roots of the function

pA () = det(In A),

(3.4.4)

formula (3.2.2) in Definition 3.3. The polynomial pA () defined in (3.4.4) is

called the characteristic polynomial associated with the matrix A, whose roots

are called the characteristic roots of A. So the eigenvalues of A are the characteristic roots of A. In particular, A can have at most n distinct eigenvalues.

Theorem 3.15 Let A = (aij ) be an n n (n 2) matrix and pA () its

characteristic polynomial. Then pA () is of the form

pA () = n Tr(A)n1 + + (1)n det(A).

(3.4.5)

108

Proof

Determinants

Write pA () = an n + an1 n1 + + a0 . Then

a0 = pA (0) = det(A) = (1)n det(A)

(3.4.6)

as asserted. Besides, using Definition 3.3 and induction, we see that the two

leading-degree terms in pA () containing n and n1 can only appear in the

In A

of the entry a11

product of the entry a11 and the cofactor C11

in the matrix In A. Let An1 be the submatrix of A obtained by deleting

In A

=

the row and column vectors of A occupied by the entry a11 . Then C11

n1

n2

and

det(In1 An1 ) whose two leading-degree terms containing

can only appear in the product of a22 and its cofactor in the matrix In1

An1 . Carrying out this process to the end we see that the two leading-degree

terms in pA () can appear only in the product

( a11 ) ( ann ),

(3.4.7)

n ,

(3.4.8)

If A C(n, n) we may also establish Theorem 3.15 by means of calculus.

In fact, we have

pA ()

1

=

det

I

A

.

(3.4.9)

n

n

also gives us the expression

1

1

1

(3.4.10)

an + an1 + + a0 n = det In A .

Thus, replacing

1

by t, we get

q(t) an + an1 t + + a0 t n

1 ta11

ta12

ta21

1 ta22

=

tan1

tan2

ta1n

ta2n

1 tann

.

(3.4.11)

Therefore

an1 = q (0) =

n

i=1

aii

(3.4.12)

109

as before.

We now consider the abstract case.

Let U be an n-dimensional vector space over a field F and T L(U ).

Assume that u U is an eigenvector of T associated to the eigenvalue F.

Given a basis U = {u1 , . . . , un } we express u as

u = x1 u1 + + xn un ,

x = (x1 , . . . , xn )t Fn .

(3.4.13)

basis U so that

T (uj ) =

n

aij ui ,

j = 1, . . . , n.

(3.4.14)

i=1

n

aij xj = xi ,

i = 1, . . . , n,

(3.4.15)

j =1

be obtained from solving for the roots of the characteristic equation (3.4.3).

Let V = {v1 , . . . , vn } be another basis of U and B = (bij ) F(n, n) be the

matrix representation of T with respect to the basis V so that

T (vj ) =

n

bij vi ,

i = 1, . . . , n.

(3.4.16)

i=1

F(n, n) called the basis transition matrix such that

vj =

n

cij ui ,

j = 1, . . . , n.

(3.4.17)

i=1

Following the study of the previous chapter we know that A, B, C are related through the similarity relation A = C 1 BC. Hence we have

pA () = det(In A) = det(In C 1 BC)

= det(C 1 [In B]C)

= det(In B) = pB ().

(3.4.18)

That is, two similar matrices have the same characteristic polynomial. Thus

we may use pA () to define the characteristic polynomial of linear mapping

T L(U ), rewritten as pT (), where A is the matrix representation of T with

respect to any given basis of U , since such a polynomial is independent of the

choice of the basis.

110

Determinants

The following theorem, known as the CayleyHamilton theorem, is of fundamental importance in linear algebra.

Theorem 3.16 Let A C(n, n) and pA () be its characteristic polynomial.

Then

pA (A) = 0.

(3.4.19)

First we assume that A has n distinct eigenvalues, say 1 , . . . , n . We can

write pA () as

pA () =

n

'

( i ).

(3.4.20)

i=1

1 , . . . , n respectively which form a basis of Cn . Then

n

'

pA (A)ui = (A j In ) ui

j =1

'

(A j In ) (A i In )ui = 0,

(3.4.21)

j

=i

Next we consider the general situation when A may have multiple eigenvalues. We begin by showing that A may be approximated by matrices of n

distinct eigenvalues. We show this by induction on n. When n = 1, there is

nothing to do. Assume the statement is true at n 1 1. We now proceed

with n 2.

In view of Theorem 3.2 there exists an eigenvalue of A, say 1 . Let u1 be an

associated eigenvector. Extend {u1 } to get a basis for Cn , say {u1 , u2 , . . . , un }.

Then the linear mapping defined by (3.4.1) satisfies

TA (uj ) = Auj =

n

i=1

(3.4.22)

That is, the matrix B = (bij ) of TA with respect to the basis {u1 , . . . , un } is of

the form

1 b0

B=

,

(3.4.23)

0 B0

111

assumption, for any > 0, there is some C0 C(n 1, n 1) such that

B0 C0 < and C0 has n 1 distinct eigenvalues, say 2 , . . . , n , where

we use to denote any norm of the space of square matrices whenever there

is no risk of confusion. Now set

1 + b0

C=

.

(3.4.24)

0

C0

It is clear that the eigenvalues of C are 1 + , 2 , . . . , n , which are distinct

when > 0 is small enough. Of course there is a constant K > 0 depending

on n only such that B C < K when > 0 is sufficiently small.

On the other hand, if we use the notation uj = (u1j , . . . , unj )t , j =

1, . . . , n, then we have

uj =

n

uij ei ,

j = 1, . . . , n.

(3.4.25)

i=1

A U 1 CU = U 1 (B C)U U 1 U K.

(3.4.26)

the asserted approximation property in the situation of n n matrices is established.

Finally, let {A(l) } be a sequence in C(n, n) such that A(l) A as l

and each A(l) has n distinct eigenvalues (l = 1, 2, . . . ). We have already shown

that pA(l) (A(l) ) = 0 (l = 1, 2, . . . ). Consequently,

pA (A) = lim pA(l) (A(l) ) = 0.

l

(3.4.27)

Of course Theorem 3.16 is valid for A F(n, n) whenever F is a subfield

of C although A may not have an eigenvalue in F. Important examples include

F = Q and F = R.

We have seen that the proof of Theorem 3.16 relies on the assumption of the

existence of an eigenvalue which in general is only valid when the underlying

field is C. In fact, however, the theorem holds universally over any field. Below

we give a proof of it without assuming F = C, which is of independent interest

and importance.

112

Determinants

To proceed, we need to recall and re-examine the definition of polynomials in the most general terms. Given a field F, a polynomial p over F is an

expression of the form

p(t) = a0 + a1 t + + an t n ,

a0 , a1 , . . . , an F,

(3.4.28)

formal symbol that generates the formal symbols t 0 = 1, t 1 = t, . . . , t n =

(t)n (called the powers of t) which participate in all the algebraic operations

following the usual associative, commutative, and distributive laws as if t were

an F-valued parameter or variable. For example, (at i )(bt j ) = abt i+j for any

a, b F and i, j N. The set of all polynomials with coefficients in F and in

the variable t is denoted by P, which is a vector space over F, with addition

and scalar multiplication defined in the usual ways, whose zero element is any

polynomial with zero coefficients. In other words, two polynomials in P are

considered equal if and only if the coefficients of the same powers of t in the

two polynomials all agree.

In the rest of this section we shall assume that the variable of any polynomial

we consider is such a formal symbol. It is clear that this assumption does not

affect the computation we perform.

Let A F(n, n) and be a (formal) variable just introduced. Then we have

(In A)adj(In A) = det(In A)In ,

(3.4.29)

det(In A)In = (n + an1 n1 + + a1 + a0 )In .

(3.4.30)

On the other hand, we may expand the left-hand side of (3.4.29) into the form

(In A)adj(In A)

= (In A)(An1 n1 + + A1 + A0 )

= An1 n +(An2 AAn1 )n1 +(A0 AA1 ) AA0 ,

(3.4.31)

(3.4.30) and (3.4.31) we get the relations

An1 = In ,

A0 AA1 = a1 In ,

...,

AA0 = a0 In .

(3.4.32)

Multiplying from the left the first relation in (3.4.32) by An , the second by

An1 ,. . . , and the second last by A, and then summing up the results, we obtain

0 = An + an1 An1 + + a1 A + a0 In = pA (A),

as anticipated.

(3.4.33)

113

a basis U = {u1 , . . . , un } let A F(n, n) be the matrix that represents T with

respect to U . Then we know that Ai represents T i with respect to U for any

i = 1, 2 . . . . As a consequence, for any polynomial p(t) with coefficients in

F, the matrix p(A) represents p(T ) with respect to U . Therefore the Cayley

Hamilton theorem may be restated in terms of linear mappings as follows.

Theorem 3.17 Let U be a finite-dimensional vector space over an arbitrary

field. Any T L(U ) is trivialized or annihilated by its characteristic polynomial pT (). That is,

pT (T ) = 0.

(3.4.34)

the characteristic polynomial of A. Theorem 3.15 gives us a0 = (1)n det(A).

Inserting this result into the equation pA (A) = 0 we have

A(An1 + an1 An2 + + a1 In ) = (1)n+1 det(A)In ,

(3.4.35)

=

0. Furthermore, in this situation, the relation (3.4.35) implies

A1 =

(1)n+1 n1

+ an1 An2 + + a1 In ),

(A

det(A)

(3.4.36)

or alternatively,

adj(A) = (1)n+1 (An1 + an1 An2 + + a1 In ),

det(A)

= 0.

(3.4.37)

= 0 above for

(3.4.37) to hold is not necessary and can thus be dropped.

Exercises

3.4.1 Let pA () be the characteristic polynomial of a matrix A = (aij )

F(n, n), where n 2 and F = R or C. Show that in pA () = n + +

a1 + a0 we have

a1 = (1)n1

n

Cii ,

(3.4.38)

i=1

where Cii is the cofactor of the entry aii of the matrix A (i = 1, . . . , n).

3.4.2 Consider the subset D of C(n, n) defined by

D = {A C(n, n) | A has n distinct eigenvalues}.

(3.4.39)

114

Determinants

3.4.3 Let A F(2, 2) be given by

A=

(3.4.40)

= 0

and use (3.4.36) to derive the formula that gives A1 .

3.4.4 Show that (3.4.37) is true for any A F(n, n). That is, the condition

det(A)

= 0 in (3.4.37) may actually be removed.

3.4.5 Let U be an n-dimensional vector space over F and T L(U ) is nilpotent of degree n. What is the characteristic polynomial of T ?

3.4.6 For A, B F(n, n), prove that the characteristic polynomials of AB and

BA are identical. That is,

pAB () = pBA ().

(3.4.41)

3.4.7 Let F be a field and , F(1, n). Find the characteristic polynomial

of the matrix t F(n, n).

3.4.8 Consider the matrix

1

2

0

(3.4.42)

A= 0

2

0 .

2

1 1

theorem to find A1 .

(ii) Use the characteristic polynomial of A and the CayleyHamilton

theorem to find A10 .

4

Scalar products

shall start from the most general situation of scalar products. We then consider

the situations when scalar products are non-degenerate and positive definite,

respectively.

In this section, we use F to denote the field R or C.

Definition 4.1 Let U be a vector space over F. A scalar product over U is

defined to be a bilinear symmetric function f : U U F, written simply

as (u, v) f (u, v), u, v U . In other words the following properties hold.

(1) (Symmetry) (u, v) = (v, u) F for u, v U .

(2) (Additivity) (u + v, w) = (u, w) + (v, w) for u, v, w U .

(3) (Homogeneity) (au, v) = a(u, v) for a F and u, v U .

We say that u, v U are mutually perpendicular or orthogonal to each

other, written as u v, if (u, v) = 0. More generally for any non-empty

subset S of U we use the notation

S = {u U | (u, v) = 0 for any v S}.

(4.1.1)

It is obvious that S is a subspace of U for any nonempty subset S of U .

Moreover {0} = U . Furthermore it is easy to show that if the vectors

u1 , . . . , uk are mutually perpendicular and not null then they are linearly

independent.

115

116

Scalar products

Let u, v U so that u is not null. Then we can resolve v into the sum of

two mutually perpendicular vectors, one in Span{u}, say cu for some scalar

c, and one in Span{u} , say w. In fact, rewrite v as v = w + cu and require

(u, w) = 0. We obtain the unique solution c = (u, v)/(u, u). In summary, we

have obtained the orthogonal decomposition

v=w+

(u, v)

u,

(u, u)

w=v

(u, v)

u Span{u} .

(u, u)

(4.1.2)

Theorem 4.2 Let U be a finite-dimensional vector space equipped with a

scalar product (, ) and set U0 = U . Any basis of U0 can be extended to

become an orthogonal basis of U . In other words any two vectors in such a

basis of U are mutually perpendicular.

Proof If U0 = U there is nothing to show. Below we assume U0

= U .

If U0 = {0} we may start from any basis of U . If U0

= {0} let {u1 , . . . , uk }

be a basis of U0 and extend it to get a basis of U , say {u1 , . . . , uk , v1 , . . . , vl }.

That is, U = U0 V , where

V = Span{v1 , . . . , vl }.

(4.1.3)

= 0 otherwise v1 U0 which is false. Thus, without loss of generality, we may

assume (v1 , v2 )

= 0 and consider {v2 , v1 , . . . , vl } instead if (v2 , v2 )

= 0 otherwise we may consider {v1 + v2 , v2 , . . . , vl } as a basis of V because now

(v1 + v2 , v1 + v2 ) = 2(v1 , v2 )

= 0. So we have seen that we may assume

(v1 , v1 )

= 0 to start with after renaming the basis vectors {v1 , . . . , vl } of V if

necessary. Now let w1 = v1 and set

wi = vi

(w1 , vi )

w1 ,

(w1 , w1 )

i = 2, . . . , l.

(4.1.4)

Then wi

= 0 since v1 , vi are linearly independent for all i = 2, . . . , l. It is

clear that wi w1 (i = 2, . . . , l). If (wi , wi )

= 0 for some i = 2, . . . , l,

we may assume i = 2 after renaming the basis vectors {v1 , . . . , vl } of V if

necessary. If (wi , wi ) = 0 for all i = 2, . . . , l, there must be some j

=

i, i, j = 2, . . . , l, such that (wi , wj )

= 0, otherwise wi U0 for i = 2, . . . , l,

which is false. Without loss of generality, we may assume (w2 , w3 )

= 0 and

consider

w2 + w3 = (v2 + v3 )

(w1 , v2 + v3 )

w1 .

(w1 , w1 )

(4.1.5)

117

= 0 and (w2 +w3 ) w1 . Since

{v1 , v2 + v3 , v3 , . . . , vl } is also a basis of V , the above procedure indicates that

we may rename the basis vectors {v1 , . . . , vl } of V if necessary so that we

obtain

(w1 , v2 )

(4.1.6)

w1 ,

w2 = v2

(w1 , w1 )

which satisfies (w2 , w2 )

=

Span{w1 , w2 , v3 , . . . , vl }. Now set

wi = vi

0. Of course Span{v1 , . . . , vl }

(w2 , vi )

(w1 , vi )

w2

w1 ,

(w2 , w2 )

(w1 , w1 )

i = 3, . . . , l.

(4.1.7)

= 0 for some i =

3, . . . , l, by renaming {v1 , . . . , vl } if necessary, we may assume (w3 , w3 )

= 0.

If (wi , wi ) = 0 for all i = 3, . . . , l, then there is some i = 4, . . . , l such

that (w3 , wi )

= 0. We may assume (w3 , w4 )

= 0. Thus (w3 + w4 , w3 +

w4 ) = 2(w3 , w4 )

= 0 and (w3 + w4 ) w1 , (w3 + w4 ) w2 . Of course,

{v1 , v2 , v3 + v4 , v4 . . . , vl } is also a basis of V . Thus we see that we may

rename the basis vectors {v1 , . . . , vl } of V so that we obtain

w3 = v3

(w2 , v3 )

(w1 , v3 )

w2

w1 ,

(w2 , w2 )

(w1 , w1 )

(4.1.8)

= 0 and

Span{v1 , . . . , vl } = Span{w1 , w2 , w3 , v4 , . . . , vl }.

(4.1.9)

be able to carry the above procedure out and obtain a new set of vectors

{w1 , . . . , wl } given by

w1 = v1 ,

wi = vi

i1

(wj , vi )

wj ,

(wj , wj )

i = 2, . . . , l,

(4.1.10)

j =1

(wi , wi )

= 0,

i = 1, . . . , l,

(wi , wj ) = 0,

i

= j, i, j = 1, . . . , l,

(4.1.11)

Span{w1 , . . . , wl } = Span{v1 , . . . , vl }.

In other words {u1 , . . . , uk , w1 , . . . , wl } is seen to be an orthogonal basis of U .

The method described in the proof of Theorem 4.2, especially the scheme

given by the formulas in (4.1.10)(4.1.11), is known as the GramSchmidt

procedure for basis orthogonalization.

118

Scalar products

If U0 = U is a proper subspace of U , that is, U0

= {0} and U0

= U , we

have seen from Theorem 4.2 that we may express an orthogonal basis of U by

(say)

{u1 , . . . , un0 , v1 , . . . , vn+ , w1 , . . . , wn },

(4.1.12)

i = 1, . . . , n+ ,

(vi , vi ) > 0,

(wi , wi ) < 0,

i = 1, . . . , n .

(4.1.13)

U+ = Span{v1 , . . . , vn+ },

U = Span{w1 , . . . , wn },

(4.1.14)

U = U0 U+ U ,

(4.1.15)

an orthogonal basis. Such a statement is also known as the Sylvester theorem.

In fact, it is obvious that n0 is independent of the choice of an orthogonal

basis since n0 is the dimension of U0 = U . Assume that

{u 1 , . . . , u n0 , v1 , . . . , vm+ , w 1 , . . . , w m }

(4.1.16)

(if any)

(vi , vi ) > 0,

i = 1, . . . , m+ ,

(w i , w i ) < 0,

i = 1, . . . , m .

(4.1.17)

To proceed, we assume n+ m+ for definiteness and we need to establish n+ m+ . For this purpose, we show that u1 , . . . , un0 , v1 , . . . , vn+ ,

w 1 , . . . , w m are linearly independent. Indeed, if there are scalars a1 , . . . , an0 ,

b1 , . . . , bn+ , c1 , . . . , cm in R such that

a1 u1 + + an0 un0 + b1 v1 + + bn+ vn+ = c1 w 1 + + cm w m ,

(4.1.18)

then we may take the scalar products of both sides of (4.1.18) with themselves

to get

2

(w m , w m ).

b12 (v1 , v1 ) + + bn2+ (vn+ , vn+ ) = c12 (w 1 , w 1 ) + + cm

(4.1.19)

119

that b1 = = bn+ = c1 = = cm = 0. Inserting this result into

(4.1.18) and using the linear independence of u1 , . . . , un0 , we arrive at a1 =

= an0 = 0. So the asserted linear independence follows. As a consequence,

we have

n0 + n+ + m dim(U ).

(4.1.20)

Thus the integers n0 , n+ , n are determined by the scalar product and

independent of the choice of an orthogonal basis. These integers are sometimes referred to as the indices of nullity, positivity, and negativity of the scalar

product, respectively.

It is clear that for the orthogonal basis (4.1.12) of U we can further rescale

the vectors v1 , . . . , vn+ and w1 , . . . , wn to make them satisfy

(vi , vi ) = 1,

i = 1, . . . , n+ ,

(wi , wi ) = 1,

i = 1, . . . , n . (4.1.21)

Exercises

4.1.1 Let S be a non-empty subset of a vector space U equipped with a scalar

product (, ). Show that S is a subspace of U and S (S ) .

4.1.2 Let u1 , . . . , uk be mutually perpendicular vectors of a vector space U

equipped with a scalar product (, ). Show that if these vectors are not

null then they must be linearly independent.

4.1.3 Let S1 and S2 be two non-empty subsets of a vector space U equipped

with a scalar product (, ). If S1 S2 , show that S1 S2 .

4.1.4 Consider the vector space Rn and define

a1

b1

.

, v = .. Rn ,

.

(u, v) = ut Av, u =

(4.1.22)

.

.

an

bn

(i) Show that (4.1.22) defines a scalar product over Rn if and only if A

is symmetric.

(ii) Show that the subspace U0 = (Rn ) is in fact the null-space of the

matrix A given by

N (A) = {x Rn | Ax = 0}.

(4.1.23)

120

Scalar products

4.1.5 In special relativity, one equips the space R4 with the Minkowski scalar

product or Minkowski metric given by

a1

b1

a2

b2

4

(u, v) = a1 b1 a2 b2 a3 b3 a4 b4 , u =

, v = b R .

a3

3

a4

b4

(4.1.24)

Find an orthogonal basis and determine the indices of nullity, positivity,

and negativity of R4 equipped with this scalar product.

4.1.6 With the notation of the previous exercise, consider the following modified scalar product

(u, v) = a1 b1 a2 b3 a3 b2 a4 b4 .

(4.1.25)

of R4 .

(ii) Compute the indices of nullity, positivity, and negativity.

4.1.7 Let U be a vector space with a scalar product and V , W two subspaces

of U . Show that

(V + W ) = V W .

(4.1.26)

(, ). Show that if n 2 then there must be a vector u U , u

= 0, such

that (u, u) = 0.

Let U be a vector space equipped with the scalar product (, ). In this section,

we examine the special situation when U0 = U = {0}.

Definition 4.3 A scalar product (, ) over U is said to be non-degenerate if

U0 = U = {0}. Or equivalently, if (u, v) = 0 for all v U then u = 0.

The most important consequence of a non-degenerate scalar product is that

it allows us to identify U with its dual space U naturally through the pairing

given by the scalar product. To see this, we note that for each v U

f (u) = (u, v),

uU

(4.2.1)

121

defined this way.

In fact, assume dim(U ) = n and let {u1 , . . . , un } be an orthogonal basis

of U . Since

(ui , ui ) ci

= 0,

i = 1, . . . , n,

(4.2.2)

we may take

vi =

1

ui ,

ci

i = 1, . . . , n,

(4.2.3)

i, j = 1, . . . , n.

(4.2.4)

to achieve

(ui , vj ) = ij ,

Thus, if we define fi U by setting

fi (u) = (u, vi ),

u U,

i = 1, . . . , n,

(4.2.5)

for each f U , there are scalars a1 , . . . , an such that

f = a1 f1 + + an fn .

(4.2.6)

f (u) = (a1 f1 + + an fn )(u) = a1 f1 (u) + + an fn (u)

= a1 (u, v1 ) + + an (u, vn ) = (u, a1 v1 + + an vn ) (u, v),

(4.2.7)

which proves that any element f of U is of the form (4.2.1). Such a statement,

that is, any functional over U may be represented as a scalar product, in the

context of infinite-dimensional spaces, is known as the Riesz representation

theorem.

In order to make our discussion more precise, we denote the dependence of

f on v in (4.2.1) explicitly by f v U and use : U U to express

this correspondence,

(v) = v .

(4.2.8)

u, (v) = u, v = (u, v),

u, v U.

(4.2.9)

u, (v + w) = (u, v + w) = (u, v) + (u, w)

= u, (v) + u, (w) = u, (v) + (w),

u U,

(4.2.10)

122

Scalar products

we have

u, (av) = (u, av) = a(u, v) = au, (v) = u, a(v),

(4.2.11)

Since we have seen that : U U is onto, we conclude that : U U

is an isomorphism, which may rightfully be called the Riesz isomorphism. As

a consequence, we can rewrite (4.2.9) as

u, (v) = u, v = (u, v) = (u, 1 (v )),

u, v U,

v U .

(4.2.12)

On the other hand, for T L(U ), recall that the dual of T , T L(U ),

satisfies

u, T (v ) = T (u), v ,

u U,

v U .

(4.2.13)

(T (u), v) = (u, ( 1 T )(v)),

u, v U.

(4.2.14)

called the dual of T with respect to the non-degenerate scalar product (, )

and determined by the relation

T = 1 T ,

(4.2.15)

(T (u), v) = (u, T (v)),

u, v U.

(4.2.16)

this way, we may view U as its own dual space and describe U as a self-dual

space. Thus, for T L(U ), we may naturally identify T with T as well,

without spelling out the Riesz isomorphism, which leads us to formulate the

following definition.

Definition 4.4 Let U be a vector space with a non-degenerate scalar product

(, ). For a mapping T L(U ), the unique mapping T L(U ) satisfying

(u, T (v)) = (T (u), v),

u, v U,

(4.2.17)

is called the dual or adjoint mapping of T , with respect to the scalar product

(, ).

If T = T , T is said to be a self-dual or self-adjoint mapping with respect to

the scalar product (, ).

123

Definition 4.5 Let T L(U ) where U is a vector space equipped with a scalar

product (, ). We say that T is an orthogonal mapping if (T (u), T (v)) = (u, v)

for any u, v U .

As an immediate consequence of the above definition, we have the following

basic results.

Theorem 4.6 That T L(U ) is an orthogonal mapping is equivalent to one

of the following statements.

(1) (T (u), T (u)) = (u, u) for any u U .

(2) For any orthogonal basis {u1 , . . . , un } of U the vectors T (u1 ), . . . , T (un )

are mutually orthogonal and (T (ui ), T (ui )) = (ui , ui ) for i = 1, . . . , n.

(3) T T = T T = I , the identity mapping over U .

Proof If T is orthogonal, it is clear that (1) holds.

Now assume (1) is valid. Using the properties of the scalar product, we have

the identity

2(u, v) = (u + v, u + v) (u, u) (v, v),

u, v U.

(4.2.18)

2(u, v) for any u, v U . Thus T is orthogonal.

That T being orthogonal implies (2) is trivial.

Assume (2) holds. We express any u, v U as

u=

n

ai ui ,

v=

n

i=1

bi ui ,

ai , bi F,

i = 1, . . . , n.

(4.2.19)

i=1

Therefore we have

n

n

(T (u), T (v)) = T

ai ui , T

bj uj

j =1

i=1

=

=

=

n

ai bj (T (ui ), T (uj ))

i,j =1

n

ai bi (T (ui ), T (ui ))

i=1

n

ai bi (ui , ui )

i=1

n

n

=

ai ui ,

bj uj = (u, v),

i=1

j =1

(4.2.20)

124

Scalar products

We now show that T being orthogonal and (3) are equivalent. In fact, if T is

orthogonal, then (u, (T T )(v)) = (u, v) or (u, (T T I )(v)) = 0 for any

u U . By the non-degeneracy of the scalar product we get (T T I )(v) = 0

for any v U , which proves T T = I . In other words, T is a left inverse

of T . In view of the discussion in Section 2.1, T is also a right inverse of T .

That is, T T = I . So (3) follows. That (3) implies the orthogonality of T is

obvious.

As an example, we consider R2 equipped with the standard Euclidean scalar

product, i.e. the dot product, given as

a1

b1

(u, v)+ u v = a1 b1 + a2 b2 for u =

,v=

R2 .

a2

b2

(4.2.21)

It is straightforward to check that the rotation mapping R : R2 R2

defined by

cos sin

u, R, u R2 ,

(4.2.22)

R (u) =

sin

cos

is an orthogonal mapping with respect to the scalar product (4.2.21). However,

it fails to be orthogonal with respect to the Minkowski scalar product

a1

b1

(u, v) a1 b1 a2 b2 for u =

,v=

R2 . (4.2.23)

a2

b2

Nevertheless, if we modify R into : R2 R2 using hyperbolic cosine and

sine functions and dropping the negative sign, by setting

cosh sinh

u, R, u R2 ,

(4.2.24)

(u) =

sinh cosh

we see that is orthogonal with respect to the scalar product (4.2.23),

although it now fails to be orthogonal with respect to (4.2.21), of course. This

example clearly illustrates the dependence of the form of an orthogonal mapping on the underlying scalar product.

As another example, consider the space R2 with the scalar product

a1

b1

(u, v) = a1 b1 + a1 b2 + a2 b1 a2 b2 , u =

,v=

R2 .

a2

b2

(4.2.25)

125

1 1

t

(u, v) = u

v.

1 1

Thus, for a mapping T L(R2 ) defined by

a b

T (u) =

u Au, u R2 ,

c d

we have

=u

which implies

T (u) =

=

t

a, b, c, d R,

Av

1 1

1

1 1

1

ab+cd

1

1

v,

(4.2.28)

A

u

1 1

1 1

1 a+b+c+d a+bcd

2

(4.2.27)

(4.2.26)

abc+d

u,

u R2 .

(4.2.29)

condition a = b + c + d holds for a but b, c, d are arbitrary.

Exercises

4.2.1 Let U be a vector space equipped with a non-degenerate scalar product,

(, ), and T L(U ).

(i) Show that the pairing (u, v)T = (u, T (v)) (u, v U ) defines a

scalar product on U if and only if T is self-adjoint.

(ii) Show that for a self-adjoint mapping T L(U ) the pairing (, )T

given above defines a non-degenerate scalar product over U if and

only if T is invertible.

4.2.2 Consider the vector space R2 equipped with the non-degenerate scalar

product

a1

b1

(u, v) = a1 b2 + a2 b1 , u =

,v=

R2 . (4.2.30)

a2

b2

126

Scalar products

Let

V = Span

1

0

.

(4.2.31)

may fail to make up the full space.

4.2.3 Let A = (aij ) R(2, 2) and define TA L(R2 ) by

a11 a12

a1

TA (u) =

(4.2.32)

u, u =

R2 .

a21 a22

a2

Find conditions on A such that TA is self-adjoint with respect to the

scalar product (, ) defined in (4.2.23).

4.2.4 Define the scalar product

a1

b1

v=

R2 , (4.2.33)

(u, v)0 = a1 b2 + a2 b1 , u =

a2

b2

over R2 .

(i) Show that the scalar product (, )0 is non-degenerate.

(ii) For TA L(R2 ) given in (4.2.32), obtain the adjoint mapping TA

of TA and find conditions on the matrix A so that TA is self-adjoint

with respect to the scalar product (, )0 .

4.2.5 Use U to denote the vector space of real-valued functions with all

orders of derivatives over the real line R which vanish outside bounded

intervals. Equip U with the scalar product

u(t)v(t) dt, u, v U.

(4.2.34)

(u, v) =

d

: U U is anti-self-dual or

dt

4.2.6 Let U be a finite-dimensional vector space equipped with a scalar product, (, ). Decompose U into the direct sum as stated in (4.1.15). Show

that we may use the scalar product (, ) of U to make the quotient

space U/U0 into a space with a non-degenerate scalar product when

U0

= U , still denoted by (, ), given by

([u], [v]) = (u, v),

(4.2.35)

4.2.7 Let U be a finite-dimensional vector space equipped with a scalar product, (, ), and let V be a subspace of U . Show that (, ) is a nondegenerate scalar product over V if and only if V V = {0}.

127

product (, ). Let : U U be the Riesz isomorphism. Show that

for any subspace V of U there holds

(V ) = V 0 .

(4.2.36)

In particular, dim(V ) = dim(V 0 ). Thus, in view of (1.4.31) and

(4.2.36), we have the dimensionality equation

dim(V ) + dim(V ) = dim(U ).

(4.2.37)

4.2.9 Let U be a finite-dimensional space with a non-degenerate scalar product and let V be a subspace of U . Use (4.2.37) to establish V = (V ) .

4.2.10 Let U be a finite-dimensional space with a non-degenerate scalar product and let V , W be two subspaces of U . Establish the relation

(V W ) = V + W .

(4.2.38)

scalar product (, ). Show that if n 2 then there must be linearly

independent vectors u, v U such that (u, u) = 0 and (v, v) = 0 but

(u, v) = 1.

In this section we consider two types of positive definite scalar products: real

ones and complex ones. Real ones are modeled over the standard Euclidean

scalar product on Rn :

(u, v) = u v = ut v = a1 b1 + + an bn ,

b1

a1

.

.

n

u=

.. , v = .. R ,

an

(4.3.1)

bn

(u, v) = u v = u v = a 1 b1 + + a n bn ,

b1

a1

.

.

n

u=

.. , v = .. C .

an

bn

(4.3.2)

128

Scalar products

for any vector u and (u, u) = 0 only when u = 0. The major difference is

that the former is symmetric but the latter fails to be so. Instead, there holds

the adjusted property (u, v) = (v, u) for any u, v Cn , which is seen to be

naturally implemented to ensure the positivity property. As a consequence, in

both the(Rn and Cn cases, we are able to define the norm of a vector u to be

u = (u, u).

Motivated from the above examples, we can bring forth the following

definition.

Definition 4.7 A positive definite scalar product over a real vector space U

is a scalar product (, ) satisfying (u, u) 0 for u U and (u, u) = 0 only

when u = 0.

A positive definite scalar product over a complex vector space U is a scalar

function (u, v) C, defined for each pair of vectors u, v U , which satisfies

the following conditions.

(1)

(2)

(3)

(4)

(Additivity) (u + v, w) = (u, w) + (v, w) for u, v, w U .

(Partial homogeneity) (u, av) = a(u, v) for a C and u, v U .

(Positivity) (u, u) 0 for u U and (u, u) = 0 only when u = 0.

Since the real case is contained as a special situation of the complex case,

we shall focus our discussion on the complex case, unless otherwise stated.

Needless to say, additivity regarding the second argument in (, ) still holds

since

(u, v + w) = (v + w, u) = (v, u) + (w, u) = (u, v) + (u, w),

u, v, w U.

(4.3.3)

On the other hand, homogeneity regarding the first argument takes a modified

form,

(au, v) = (v, au) = a(v, u) = a(u, v),

a C,

u U.

(4.3.4)

We will extend our study carried out in the previous two sections for general

scalar products to the current situation of a positive definite scalar product that

is necessarily non-degenerate since (u, u) > 0 for any nonzero vector u in U .

First, we see that for u, v U we can still use the condition (u, v) = 0 to

define u, v to be mutually perpendicular vectors. Next, since for any u U we

have (u, u) 0, we can formally define the norm of u as in Cn by

(

(4.3.5)

u = (u, u).

129

It is clearly seen that the norm so defined enjoys the positivity and homogeneity

conditions required of a norm. That it also satisfies the triangle inequality will

be established shortly. Thus (4.3.5) indeed gives rise to a norm of the space U

that is said to be induced from the positive definite scalar product (, ).

Let u, v U be perpendicular. Then we have

u + v2 = u2 + v2 .

(4.3.6)

may be proved by a simple expansion

u + v2 = (u + v, u + v) = (u, u) + (u, v) + (v, u) + (v, v) = u2 + v2 ,

(4.3.7)

since (u, v) = 0 and (v, u) = (u, v) = 0.

For u, v U with u

= 0, we may use the orthogonal decomposition formula

(4.1.2) to resolve v into the form

(u, v)

(u, v)

(u, v)

v= v

u +

uw+

u,

(4.3.8)

(u, u)

(u, u)

(u, u)

so that (u, w) = 0. Hence, in view of the Pythagoras theorem, we get

2

(u, v) 2

2

2

u2 |(u, v)| .

v = w +

(4.3.9)

(u, u)

u2

In other words, we have

|(u, v)| uv,

(4.3.10)

(4.3.10) is valid when u = 0. Hence, in summary, we may state that (4.3.10)

holds for any u, v U and that the equality is achieved if and only if u, v are

linearly dependent.

The inequality (4.3.10) is the celebrated Schwarz inequality whose derivation is seen to be another direct application of the vector orthogonal decomposition formula (4.1.2).

We now apply the Schwarz inequality to establish the triangle inequality for

the norm induced from a positive definite scalar product (, ).

Let u, v U . Then in view of (4.3.10) we have

u + v2 = u2 + v2 + (u, v) + (v, u)

u2 + v2 + 2|(u, v)|

u2 + v2 + 2uv

= (u + v)2 .

(4.3.11)

130

Scalar products

If {u1 , . . . , un } is a basis of U , we may invoke the GramSchmidt procedure

v1 = u1 ,

vi = ui

i1

(vj , ui )

vj ,

(vj , vj )

i = 2, . . . , n,

(4.3.12)

j =1

validity of this procedure by a simple induction.

When n = 1, there is nothing to show.

Assume the procedure is valid at n = k 1.

At n = k + 1, by the inductive assumption, we know that we may construct

{v1 , . . . , vk } to get an orthogonal basis for Span{u1 , . . . , uk }. Define

vk+1 = uk+1

k

(vj , uk+1 )

j =1

(vj , vj )

vj .

(4.3.13)

vk+1 Span{uk+1 , v1 , . . . , vk } Span{u1 , . . . , uk , uk+1 }.

(4.3.14)

Of course vk+1

= 0 otherwise uk+1 Span{v1 , . . . , vk } = Span{u1 , . . . , uk }.

Thus we have obtained k + 1 nonzero mutually orthogonal vectors

v1 , . . . , vk , vk+1 that make up a basis for Span{u1 , . . . , uk , uk+1 } as asserted.

Thus, we have seen that, in the positive definite scalar product situation,

from any basis {u1 , . . . , un } of U , the GramSchmidt procedure (4.3.12) provides a scheme of getting an orthogonal basis {v1 , . . . , vn } of U so that each

of its subsets {v1 , . . . , vk } is an orthogonal basis of Span{u1 , . . . , uk } for

k = 1, . . . , n.

Let {v1 , . . . , vn } be an orthogonal basis for U . The positivity property allows

us to modify the basis further by setting

wi =

1

vi ,

vi

i = 1, . . . , n,

(4.3.15)

(that is, wi = 1 for i = 1, . . . , n). Such an orthogonal basis is called an

orthonormal basis.

We next examine the dual space U of U in view of the positive definite

scalar product (, ) over U .

For any u U , it is clear that

f (v) = (u, v),

v U,

(4.3.16)

and define fi U by setting

fi (v) = (ui , v),

v U.

131

(4.3.17)

any f U , there are scalars a1 , . . . , an such that

f = a1 f1 + + an fn .

(4.3.18)

Consequently, we have

f (v) = a1 f1 (v) + + an fn (v)

= a1 (u1 , v) + + an (un , v)

= (a 1 u1 + + a n un , v)

(u, v), u = a 1 u1 + + a n un .

(4.3.19)

of a scalar product. In other words, the Riesz representation theorem still holds

here, although homogeneity of such a representation takes an adjusted form,

fi # ui ,

i = 1, . . . , n;

f = a1 f1 + + an fn # a 1 u1 + + a n un .

(4.3.20)

Let U, V be finite-dimensional vector spaces over C with positive definite

scalar products, both denoted by (, ). For T L(U, V ) and any v V , the

expression

f (u) = (v, T (u)),

u U,

(4.3.21)

that f (u) = (w, u). Since w depends on v, we may denote this relation by

w = T (v). Hence

(v, T (u)) = (T (v), u).

(4.3.22)

In fact, for v1 , v2 V , we have

(T (v1 + v2 ), u) = (v1 + v2 , T (u))

= (v1 , T (u)) + (v2 , T (u))

= (T (v1 ), u) + (T (v2 ), u)

= (T (v1 ) + T (v2 ), u),

u U.

(4.3.23)

132

Scalar products

(T (av), u) = (av, T (u)) = a(v, T (u))

= a(T (v), u) = (aT (v), u),

u U.

(4.3.24)

Of particular interest is a mapping from U into itself. In this case we can

define a self-dual or self-adjoint mapping T with respect to the positive definite

scalar product (, ) to be such that T = T . Similar to Definition 4.5, we also

have the following.

Definition 4.8 Let U be a real or complex vector space equipped with

a positive definite scalar product (, ). Assume that T L(U ) satisfies

(T (u), T (v)) = (u, v) for any u, v U .

(1) T is called orthogonal when U is real.

(2) T is called unitary when U is complex.

In analogue to Theorem 4.6, we have the following.

Theorem 4.9 That T L(U ) is orthogonal or unitary is equivalent to one of

the following statements.

(1) T is norm-preserving. That is, T (u) = u for any u U , where

is the norm of U induced from its positive definite scalar product (, ).

(2) T maps an orthonormal basis to another orthonormal basis of U .

(3) T T = T T = I .

Proof We need only to carry out the proof in the complex case because now

the scalar product (, ) fails to be symmetric and the relation (4.2.18) is invalid.

That T being unitary implies (1) is trivial since T (u)2 = (T (u), T (u)) =

(u, u) = u2 for any u U .

Assume (1) holds. From the expansions

u + v2 = u2 + v2 + 2{(u, v)},

(4.3.25)

(4.3.26)

(u, v) =

1

1

(u + v2 u2 v2 ) + i(iu + v2 u2 v2 ),

2

2

u, v U.

(4.3.27)

133

1

(T (u + v)2 T (u)2 T (v)2 )

2

1

+ i(T (iu + v)2 T (u)2 T (v)2 )

2

1

= (u + v2 u2 v2 )

2

1

+ i(iu + v2 u2 v2 )

2

= (u, v), u, v U.

(4.3.28)

(T (u), T (v)) =

Hence T is unitary.

The rest of the proof is similar to that of Theorem 4.6 and thus skipped.

If T L(U ) is orthogonal or unitary and C an eigenvalue of T , then it

is clear that || = 1 since T is norm-preserving.

Let A F(n, n) and define TA L(Fn ) in the usual way TA (u) = Au for

any column vector u Fn .

When F = R, let the positive define scalar product be the Euclidean one

given in (4.3.1). That is,

(u, v) = ut v,

u, v Rn .

(4.3.29)

Thus

(u, TA (v)) = ut Av = (At u)t v = (TA (u), v),

u, v Rn .

(4.3.30)

TA = I which leads to At A = AAt = In . Besides, a self-adjoint mapping

TA = TA is defined by a symmetric matrix, A = At .

The above discussion may be carried over to the abstract setting as follows.

Let U be a real vector space with a positive definite scalar product (, ) and

B = {u1 , . . . , un } an orthonormal basis. Assume T L(U ) is represented by

the matrix A R(n, n) with respect to the basis B so that

T (uj ) =

n

aij ui ,

j = 1, . . . , n.

(4.3.31)

i=1

aij = (ui , T (uj )) = (T (ui ), uj ) = aj i ,

i, j = 1, . . . , n.

(4.3.32)

134

Scalar products

which gives us AAt = At A = In as before. If T is self-adjoint, T = T , then

A is again a symmetric matrix.

When F = C, let the positive define scalar product be the Hermitian one

given in (4.3.2). Then

(u, v) = ut v,

u, v Cn .

(4.3.33)

Thus

# t $t

(u, TA (v)) = ut Av = A u v = (TA (u), v),

u, v Cn .

(4.3.34)

t

t

which leads to A A = AA = In . Besides, a self-adjoint mapping T = T

is defined by a matrix which is symmetric under complex conjugation and

t

transpose of A. That is, A = A .

Similar to the real vector space situation, we leave it as an exercise to examine that, in the abstract setting, the matrix representing the adjoint mapping

with respect to an orthonormal basis is obtained by taking matrix transpose

and complex conjugate of the matrix of the original mapping with respect to

the same basis.

The above calculations lead us to formulate the following concepts, which

were originally introduced in Section 1.1 without explanation.

t

Definition 4.10 A real matrix A R(n, n) is said to be orthogonal if its transpose At is its inverse. That is, AAt = At A = In . It is easily checked that

A is orthogonal if and only if its sets of column and row vectors both form

orthonormal bases of Rn with the standard Euclidean scalar product.

A complex matrix A C(n, n) is said to be unitary if the complex conjugate

t

t

of its transpose A , also called its Hermitian conjugate denoted as A = A ,

unitary if and only if its sets of column and row vectors both form orthonormal

bases of Cn with the standard Hermitian scalar product.

A complex matrix A C(n, n) is called Hermitian if it satisfies the property

A = A or, equivalently, if it defines a self-adjoint mapping TA = TA .

With the above terminology and the GramSchmidt procedure, we may

establish a well-known matrix factorization result, commonly referred to as

the QR factorization, for a non-singular matrix.

In fact, let A C(n, n) be non-singular and use u1 , . . . , un to denote the n

corresponding column vectors of A that form a basis of Cn . Use (, ) to denote

135

are column vectors. Apply the GramSchmidt procedure to set

v1 = u 1 ,

(v1 , u2 )

v1 ,

v2 = u2

(v1 , v1 )

(4.3.35)

(v , u )

(vn1 , un )

vn = un 1 n v1

vn1 .

(v1 , v1 )

(vn1 , vn1 )

Then {v1 , . . . , vn } is an orthogonal basis of Cn . Set wi = (1/vi )vi for

i = 1, . . . , n. We see that {w1 , . . . , wn } is an orthonormal basis of Cn .

Therefore, inverting (4.3.35) and rewriting the resulting relations in terms of

{w1 , . . . , wn }, we get

u1 = v1 w1 ,

(v1 , u2 )

u2 =

(v1 , v1 )

(v1 , un )

(vn1 , un )

un =

v1 w1 + +

vn1 wn1 + vn wn .

(v1 , v1 )

(vn1 , vn1 )

(4.3.36)

For convenience, we may express (4.3.36) in the compressed form

u1 = r11 w1 ,

u = r w +r w ,

2

un

12

22

(4.3.37)

= r1n w1 + + rn1,n wn1 + rnn wn ,

A = (u1 , . . . , un ) = (w1 , . . . , wn )

= QR,

r11

r12

r1n

0

..

.

r22

..

.

..

.

r2n

..

.

rnn

(4.3.38)

mutually perpendicular and of unit length and R = (rij ) C(n, n) is upper

triangular with positive diagonal entries since rii = vi for i = 1, . . . , n.

136

Scalar products

if A is real then the matrices Q and R are also real and Q is orthogonal.

To end this section, we note that the norm of a vector u in a vector space U

equipped with a positive definite scalar product (, ) may be evaluated through

the following useful expression

u = sup{|(u, v)| | v U, v = 1}.

(4.3.39)

Indeed, let denote the right-hand side of (4.3.39). From the Schwarz inequality (4.3.10), we get |(u, v)| u for any v U satisfying v = 1. So

u. To show that u, it suffices to consider the nontrivial situation

u

= 0. In this case, we have

1

u = u.

(4.3.40)

u,

u

Hence, in conclusion, = u and (4.3.39) follows.

Exercises

4.3.1 Let U be a vector space with a positive definite scalar product (, ).

Show that u1 , . . . , uk U are linearly independent if and only if their

associated metric matrix, also called the Gram matrix,

(u1 , u1 ) (u1 , uk )

M =

(4.3.41)

(uk , u1 )

(uk , uk )

is nonsingular.

4.3.2 (Continued from Exercise 4.3.1) Show that if u U lies in

Span{u1 , . . . , uk } then the column vector ((u1 , u), . . . , (uk , u))t lies in

the column space of the metric matrix M. However, the converse is not

true when k < dim(U ).

4.3.3 Let U be a complex vector space with a positive definite scalar product

and B = {u1 , . . . , un } an orthonormal basis of U . For T L(U ), let

A, A C(n, n) be the matrices that represent T , T , respectively, with

respect to the basis B. Show that A = A .

4.3.4 Let U be a finite-dimensional complex vector space with a positive definite scalar product and S L(U ) be anti-self-adjoint. That is, S = S.

Show that I S must be invertible.

4.3.5 Consider the complex vector space C(m, n). Show that

(A, B) = Tr(A B),

A, B C(m, n)

(4.3.42)

137

defines a positive definite scalar product over C(m, n) that extends the

m

traditional Hermitian scalar product over

( C = C(m, 1). With such a

scalar product, the quantity A = (A, A) is sometimes called the

HilbertSchmidt norm of the matrix A C(m, n).

4.3.6 Let (, ) be the standard Hermitian scalar product on Cm and A

C(m, n). Establish the following statement known as the Fredholm

alternative for complex matrix equations: Given b Cm the nonhomogeneous equation Ax = b has a solution for some x Cn if and

only if (y, b) = 0 for any solution y Cm of the homogeneous equation

A y = 0.

4.3.7 For A C(n, n) show that if the column vectors of A form an orthonormal basis of Cn with the standard Hermitian scalar product so do the

row vectors of A.

4.3.8 For the matrix

1 1 1

(4.3.43)

A = 1 1

2 ,

2

obtain a QR factorization.

In this section, we continue to study a finite-dimensional vector space U with a

positive definite scalar product (, ). We focus our attention on the problem of

resolving a vector into the span of a given set of orthogonal vectors. To avoid

the trivial situation, we always assume that the set of the orthogonal vectors

concerned never contains a zero vector unless otherwise stated.

We begin with the following basic orthogonal decomposition theorem.

Theorem 4.11 Let V be a subspace of U . Then there holds the orthogonal

decomposition

U = V V .

(4.4.1)

= {0}. Let

{v1 , . . . , vk } be an orthogonal basis of V . We can then expand {v1 , . . . , vk } into

an orthogonal basis for U , which may be denoted as {v1 , . . . , vk , w1 , . . . , wl },

where k + l = n = dim(U ). Of course, w1 , . . . , wl V . For any u V ,

we rewrite u as

u = a1 v1 + + ak vk + b1 w1 + + bl wl ,

(4.4.2)

138

Scalar products

(i = 1, . . . , k), we obtain a1 = = ak = 0, which establishes

u Span{w1 , . . . , wl }. So V Span{w1 , . . . , wl }. Therefore V =

Span{w1 , . . . , wl } and U = V + V .

Finally, take u V V . Then (u, u) = 0. By the positivity condition, we

get u = 0. Thus (4.4.1) follows.

We now follow (4.4.1) to concretely construct the orthogonal decomposition

of a given vector.

Theorem 4.12 Let V be a nonzero subspace of U with an orthogonal basis

{v1 , . . . , vk }. Any u U may be uniquely decomposed into the form

u = v + w,

v V,

w V ,

(4.4.3)

v=

k

(vi , u)

vi .

(vi , vi )

(4.4.4)

i=1

Moreover, the vector v given in (4.4.4) is the unique solution of the minimization problem

inf{u x | x V }.

(4.4.5)

Proof The validity of the expression (4.4.3) for some unique v V and

w V is already ensured by Theorem 4.11.

We rewrite v as

v=

k

(4.4.6)

ai vi .

i=1

ai =

(vi , v)

,

(vi , vi )

i = 1, . . . , k,

(4.4.7)

For the scalars a1 , . . . , ak given in (4.4.7) and v in (4.4.6), we have from

(4.4.3) the relation

u

k

k

bi vi = w +

(ai bi )vi ,

i=1

i=1

x=

k

i=1

bi vi V .

(4.4.8)

139

Consequently,

"

"2

k

"

"

"

"

u x2 = "u

bi vi "

"

"

i=1

"

"2

k

"

"

"

"

= "w +

(ai bi )vi "

"

"

i=1

= w2 +

k

|ai bi |2 vi 2

i=1

w2 ,

(4.4.9)

and the lower bound w2 is attained only when bi = ai for all i = 1, . . . , k,

or x = v.

So the proof is complete.

Definition 4.13 Let {v1 , . . . , vk } be a set of orthogonal vectors in U . For u

U , the sum

k

ai vi ,

ai =

i=1

(vi , u)

,

(vi , vi )

i = 1, . . . , k,

(4.4.10)

with respect to the orthogonal set {v1 , . . . , vk }.

Of particular interest is when a set of orthogonal vectors becomes a basis.

Definition 4.14 Let {v1 , . . . , vn } be a set of orthogonal vectors in U . The set

is said to be complete if it is a basis of U .

The completeness of a set of orthogonal vectors is seen to be characterized

by the norms of vectors in relation to their Fourier coefficients.

Theorem 4.15 Let U be a vector space with a positive definite scalar product

and V = {v1 , . . . , vn } a set of orthogonal vectors in U . For any u U , let the

scalars a1 , . . . , an be the Fourier coefficients of u with respect to V.

(1) There holds the inequality

n

i=1

(4.4.11)

140

Scalar products

(2) That u Span{v1 , . . . , vn } if and only if the equality in (4.4.11) is

attained. That is,

n

(4.4.12)

i=1

complete if and only if the Parseval identity (4.4.12) holds for any u U .

Proof For given u, use v to denote the Fourier expansion of v as stated in

(4.4.3) and (4.4.4) with k = n. Then u2 = w2 + v2 . In particular,

v2 u2 , which is (4.4.11).

It is clear that u Span{v1 , . . . , vn } if and only if w = 0, which is equivalent

to the fulfillment of the equality u2 = v2 , which is (4.4.12).

If {u1 , . . . , un } is a set of orthogonal unit vectors in U , the Fourier expansion

and Fourier coefficients of a vector u U with respect to {u1 , . . . , un } take

the elegant forms

n

ai ui ,

ai = (ui , u),

i = 1, . . . , n,

(4.4.13)

i=1

n

(4.4.14)

i=1

n

(4.4.15)

i=1

It is interesting to note that, since ai = (vi , u)/vi 2 (i = 1, . . . , n) in

(4.4.11) and (4.4.12), the inequalities (4.4.11) and (4.4.14), and the identities

(4.4.12) and (4.4.15), are actually of the same structures, respectively.

Exercises

4.4.1 Let U be a finite-dimensional vector space with a positive definite scalar

product, (, ), and S = {u1 , . . . , un } an orthogonal set of vectors in U .

Show that the set S is complete if and only if S = {0}.

141

4.4.2 Let V be a nontrivial subspace of a vector space U with a positive definite scalar product. Show that, if {v1 , . . . , vk } is an orthogonal basis of

V , then the mapping P : U U given by its Fourier expansion,

P (u) =

k

(vi , u)

vi ,

(vi , vi )

u U,

(4.4.16)

i=1

N(P ) = V , and R(P ) = V .

4.4.3 Let U be a finite-dimensional vector space with a positive definite scalar

product, (, ), and V a subspace of U . Use PV : U U to denote the

projection of U onto V along V . Prove that if W is a subspace of U

containing V then

u PV (u) u PW (u),

u U,

(4.4.17)

of U is induced from the positive definite scalar product (, ). In other

words, orthogonal projection of a vector into a larger subspace provides

a better approximation of the vector.

4.4.4 Let V be a nontrivial subspace of a finite-dimensional vector space U

with a positive definite scalar product. Recall that over U/V we may

define the norm

[u] = inf{x | x [u]},

[u] U/V ,

(4.4.18)

(i) Prove that for each [u] U/V there is a unique w [u] such that

[u] = w.

(ii) Find a practical method to compute the vector w shown to exist in

part (i) among the coset [u].

4.4.5 Consider the vector space Pn of real-coefficient polynomials in variable

t of degrees up to n 1 with the positive definite scalar product

1

(u, v) =

u(t)v(t) dt, u, v Pn .

(4.4.19)

1

{1, t, . . . , t n } of Pn , we may construct an orthonormal basis, say

{L0 , L1 , . . . , Ln }, of Pn . The set of polynomials {Li (t)} are the wellknown Legendre polynomials. Explain why the degree of each Li (t)

must be i (i = 0, 1, . . . , n) and find Li (t) for i = 0, 1, 2, 3, 4.

142

Scalar products

in terms of the Legendre polynomials.

4.4.7 Let Pn be the vector space with the scalar product defined in (4.4.19).

For f P3 satisfying

f (1) = 1,

f (t) = 2,

f (t 2 ) = 6,

f (t 3 ) = 5,

(4.4.20)

: P3 P3 of f P3 , or (v) = f . That is, there holds

f (u) = (u, v),

u P3 .

(4.4.21)

Let U be a vector space with a positive definite scalar product, (, ), which

induces a norm on U . If T L(U ) is orthogonal or unitary, then it is

clear that

T (u) T (v) = u v,

u, v U.

(4.5.1)

In other words, the distance of the images of any two vectors in U under T

is the same as that between the two vectors. A mapping from U into itself

satisfying such a property is called an isometry or isometric. In this section, we

show that any zero-vector preserving mapping from a real vector space U with

a positive definite scalar product into itself satisfying the property (4.5.1) must

be linear. Therefore, in view of Theorems 4.6, it is orthogonal. In other words,

in the real setting, being isometric characterizes a mapping being orthogonal.

Theorem 4.16 Let U be a real vector space with a positive definite scalar

product. A mapping T from U into itself satisfying the isometric property

(4.5.1) and T (0) = 0 if and only if it is orthogonal.

Proof Assume T satisfies T (0) = 0 and (4.5.1). We show that T must be

linear. To this end, from (4.5.1) and replacing v by 0, we get T (u) = u

for any u U . On the other hand, the symmetry of the scalar product (, )

gives us the identity

(u, v) =

1

(u + v2 u2 v2 ),

2

u, v U.

(4.5.2)

143

1

(T (u) T (v)2 T (u)2 T (v)2 )

2

1

= (u v2 u2 v2 )

2

= (u, v).

(4.5.3)

(T (u), T (v)) =

Hence (T (u), T (v)) = (u, v) for any u, v U . Using this result, we have

T (u + v) T (u) T (v)2

= T (u + v)2 + T (u)2 + T (v)2

2(T (u + v), T (u)) 2(T (u + v), T (v)) + 2(T (u), T (v))

= u + v2 + u2 + v2 2(u + v, u) 2(u + v, v) + 2(u, v)

= (u + v) u v2 = 0,

(4.5.4)

T (u + v) = T (u) + T (v),

u, v U.

(4.5.5)

T (u) for any u U .

Moreover, (4.5.5) also implies that, for any integer m, we have T (mu) =

1

mT (u). Replacing u by u where m is a nonzero integer, we also have

m

1

1

T ( u) = T (u). Combining these results, we conclude that

m

m

T (ru) = rT (u),

r Q,

u U.

(4.5.6)

Then we find

T (au) aT (u) T (au) T (rk u) + rk T (u) aT (u)

= |a rk |u + |rk a|T (u) 0 as k .

(4.5.7)

That is, T (au) = aT (u) for any a R and u U . So homogeneity is established and the proof follows.

We next give an example showing that, in the complex situation, being isometric alone is not sufficient to ensure a mapping to be unitary.

The vector space we consider here is taken to be Cn with the standard Hermitian scalar product (4.3.2) and the mapping T : Cn Cn is given by

T (u) = u for u Cn . Then T satisfies (4.5.1) and T (0) = 0. However,

144

Scalar products

T (au)

= aT (u) whenever a C with (a)

= 0 and u Cn \ {0}. Hence

T

L(U ).

It will be interesting to spell out some conditions in the complex situation in

addition to (4.5.1) that would, when put together with (4.5.1), along with the

zero-vector preserving property, ensure a mapping to be unitary. The following

theorem is such a result.

Theorem 4.17 Let U be a complex vector space with a positive definite scalar

product (, ). A mapping T from U into itself satisfying the isometric property

(4.5.1), T (0) = 0, and

iT (u) T (v) = iu v,

u, v U,

(4.5.8)

Proof It is obvious that if T L(U ) is unitary then both (4.5.1) and (4.5.8)

are fulfilled. We now need to show that the converse is true.

First, setting v = iu in (4.5.8), we obtain

T (iu) = iT (u),

u U.

(4.5.9)

(4.5.1) and (4.5.8), we have

1

(T (u) T (v)2 T (u)2 T (v)2 )

2

1

+ i(iT (u) T (v)2 T (u)2 T (v)2 )

2

1

= (u v2 u2 v2 )

2

1

+ i(u iv2 u2 u2 )

2

= (u, v), u, v U.

(4.5.10)

(T (u), T (v)) =

That is, (T (u), T (v)) = (u, v) for u, v U . Thus a direct expansion gives us

the result

T (u + v) T (u) T (v)2

= T (u + v)2 + T (u)2 + T (v)2

2(T (u + v), T (u)) 2(T (u + v), T (v)) + 2(T (u), T (v))

= u + v2 + u2 + v2 2(u + v, u) 2(u + v, v) + 2(u, v)

= (u + v) u v2 = 0,

(4.5.11)

145

as before. Therefore (4.5.6) holds in the current complex formalism.

In view of the additivity property of T , (4.5.6), and (4.5.9), we have

T ((p + iq)u) = T (pu) + T (iqu) = pT (u) + iqT (u)

= (p + iq)T (u),

p, q Q,

u U.

(4.5.12)

{rk } (rk = pk + iqk with pk , qk Q) such that pk b and qk c or

rk a as k . In view of these and (4.5.12), we see that (4.5.7) holds in

the complex situation as well. In other words, T (au) = aT (u) for a C and

u U.

The proof is complete.

It is worth noting that Theorems 4.16 and 4.17 are valid in general without

restricting to finite-dimensional spaces.

Exercises

4.5.1 Let U be a real vector space with a positive definite scalar product (, ).

Establish the following variant of Theorem 4.16: A mapping T from U

into itself satisfying the property

T (u) + T (v) = u + v,

u, v U,

(4.5.13)

4.5.2 Show that the mapping T : Cn Cn given by T (u) = u for u Cn ,

which is equipped with the standard Hermitian scalar product, satisfies

(4.5.13) but that it is not unitary.

4.5.3 Let U be a complex vector space with a positive definite scalar product

(, ). Establish the following variant of Theorem 4.17: A mapping T

from U into itself satisfying the property (4.5.13) and

iT (u) + T (v) = iu + v,

u, v U,

(4.5.14)

4.5.4 Check to see why the mapping T defined in Exercise 4.5.2 fails to satisfy

the property (4.5.14) with U = Cn .

4.5.5 Let T : Rn Rn (n 2) be defined by

x1

xn

.

.

n

(4.5.15)

T (x) =

.. , x = .. R ,

x1

xn

146

Scalar products

where Rn is equipped with the standard Euclidean scalar product.

(i) Show that T is an isometry.

(ii) Determine all the eigenvalues of T .

(iii) Show that eigenvectors associated to different eigenvalues are mutually perpendicular.

(iv) Determine the minimal polynomial of T .

5

Real quadratic forms and self-adjoint mappings

In this chapter we exclusively consider vector spaces over the field of reals

unless otherwise stated. We first present a general discussion on bilinear and

quadratic forms and their matrix representations. We also show how a symmetric bilinear form may be uniquely represented by a self-adjoint mapping.

We then establish the main spectrum theorem for self-adjoint mappings based

on a proof of the existence of an eigenvalue using calculus. We next focus on

characterizing the positive definiteness of self-adjoint mappings. After these

we study the commutativity of self-adjoint mappings. In the last section we

show the effectiveness of using self-adjoint mappings in computing the norm

of a mapping between different spaces and in the formalism of least squares

approximations.

Let U be a finite-dimensional vector space over R. The simplest real-valued

functions over U are linear functions, which are also called functionals earlier

and have been studied. The next simplest real-valued functions to be studied

are bilinear forms whose definition is given as follows.

Definition 5.1 A function f : U U R is called a bilinear form if it

satisfies, for any u, v, w U and a R, the following conditions.

(1) f (u + v, w) = f (u, w) + f (v, w), f (au, v) = af (u, v).

(2) f (u, v + w) = f (u, v) + f (u, w), f (u, av) = af (u, v).

Let B = {u1 , . . . , un } be a basis of U . For u, v U with coordinate vectors

x = (x1 , . . . , xn )t , y = (y1 , . . . , yn )t Rn with respect to B, we have

147

148

f (u, v) = f

n

i=1

xi ui ,

n

yj uj =

j =1

n

i,j =1

where A = (aij ) = (f (ui , uj )) R(n, n) is referred to as the matrix representation of the bilinear form f with respect to the basis B.

Let B = {u 1 , . . . , u n } be another basis of U so that A = (a ij ) =

n

If x,

y R are the coordinate vectors of u, v U with respect to B and the

basis transition matrix between B and B is B = (bij ) so that

u j =

n

bij ui ,

j = 1, . . . , n,

(5.1.2)

i=1

then x = B x,

y = B y (cf. Section 1.3). Hence we arrive at

f (u, v) = x t A y = x t Ay = x t (B t AB)y,

(5.1.3)

which leads to the relation A = B t AB and gives rise to the following concept.

Definition 5.2 For A, B F(n, n), we say that A and B are congruent if there

is an invertible element C F(n, n) such that

A = C t BC.

(5.1.4)

bilinear form with respect to different bases are congruent.

For a bilinear form f : U U R, we can set

q(u) = f (u, u),

u U,

(5.1.5)

which is called the quadratic form associated with the bilinear form f . The

quadratic form q is homogeneous of degree 2 since q(tu) = t 2 q(u) for any

t R and u U .

Of course q is uniquely determined by f through (5.1.5). However, the converse is not true, which will become clear after the following discussion.

To proceed, let B = {u1 , . . . , un } be a basis of U and u U any given

vector whose coordinate vector with respect to B is x = (x1 , . . . , xn )t . Then,

from (5.1.1), we have

1

1

1

(A + At ) + (A At ) x = x t (A + At )x,

q(u) = x t Ax = x t

2

2

2

(5.1.6)

149

words, the quadratic form q constructed from (5.1.5) can only capture the

information contained in the symmetric part, 12 (A + At ), but nothing in the

skewsymmetric part, 12 (AAt ), of the matrix A = (f (ui , uj )). Consequently,

the quadratic form q cannot determine the bilinear form f completely, in general situations, unless the skewsymmetric part of A is absent or A At = 0.

In other words, A = (f (ui , uj )) is symmetric. This observation motivates the

following definition.

Definition 5.3 A bilinear form f : U U R is symmetric if it satisfies

f (u, v) = f (v, u),

u, v U.

(5.1.7)

f (u + v, u + v) = f (u, u) + f (v, v) + 2f (u, v),

u, v U.

(5.1.8)

Thus, if q is the quadratic form associated with f , we derive from (5.1.8) the

relation

f (u, v) =

1

(q(u + v) q(u) q(v)),

2

u, v U,

(5.1.9)

also have

f (u, v) =

1

(q(u + v) q(u v)),

4

u, v U.

(5.1.10)

As in the situation of scalar products, the relations of the types (5.1.9) and

(5.1.10) are often referred to as polarization identities for symmetric bilinear

forms.

From now on we will concentrate on symmetric bilinear forms.

Let f : U U R be a symmetric bilinear form. If x, y Rn are the

coordinate vector of u, v U with respect to a basis B, then f (u, v) is given

by (5.1.1) so that matrix A R(n, n) is symmetric. Recall that (x, y) = x t y is

the Euclidean scalar product over Rn . Thus, if we view A as a linear mapping

Rn Rn given by x # Ax, then the right-hand side of (5.1.1) is simply

(x, Ay). Since A = At , the right-hand side of (5.1.1) is also (Ax, y). In other

words, A defines a self-adjoint mapping over Rn with respect to the standard

Euclidean scalar product over Rn .

Conversely, if U is a vector space equipped with a positive definite scalar

product (, ) and T L(U ) is a self-adjoint or symmetric mapping, then

f (u, v) = (u, T (v)),

u, v U,

(5.1.11)

150

is a symmetric bilinear form. Thus, in this way, we see that symmetric bilinear

forms are completely characterized.

In a more precise manner, we have the following theorem, which relates

symmetric bilinear forms and self-adjoint mappings over a vector space with a

positive definite scalar product.

Theorem 5.4 Let U be a finite-dimensional vector space with a positive definite scalar product (, ). For any symmetric bilinear form f : U U R,

there is a unique self-adjoint or symmetric linear mapping, say T L(U ),

such that the relation (5.1.11) holds.

Proof For each v U , the existence of a unique vector, say T (v), so that

(5.1.11) holds, is already shown in Section 4.2. Since f is bilinear, we have

T L(U ). The self-adjointness or symmetry of T follows from the symmetry

of f and the scalar product.

Note that, in view of Section 4.1, a symmetric bilinear form is exactly a kind

of scalar product as well, not necessarily positive definite, though.

Exercises

5.1.1 Let f : U U R be a bilinear form such that f (u, u) > 0 and

f (v, v) < 0 for some u, v U .

(i) Show that u, v are linearly independent.

(ii) Show that there is some w U, w

= 0 such that f (w, w) = 0.

5.1.2 Let A, B F(n, n). Show that if A, B are congruent then A, B must

have the same rank.

5.1.3 Let A, B F(n, n). Show that if A, B are congruent and A is symmetric

then so is B.

5.1.4 Are the matrices

A=

,

B=

(5.1.12)

5.1.5 Consider the identity matrix In F(n, n). Show that In and In are not

congruent if F = R but are congruent if F = C.

151

q(x) = x12 + 2x22 x32 + 2x1 x2 4x1 x3 ,

x1

x = x2 R3 ,

x3

(5.1.13)

(x, y) = x t y for x, y R3 .

(i) Find the unique symmetric bilinear form f : R3 R3 R such

that q(x) = f (x, x) for x R3 .

(ii) Find the unique self-adjoint mapping T L(R3 ) such that

f (x, y) = (x, T (y)) for any x, y R3 .

5.1.7 Let f : U U R be a bilinear form.

(i) Show that f is skewsymmetric, that is, f (u, v) = f (v, u) for any

u, v U , if and only if f (u, u) = 0 for any u U .

(ii) Show that if f is skewsymmetric then the matrix representation of

f with respect to an arbitrary basis of U must be skewsymmetric.

5.1.8 Prove that Theorem 5.4 is still true if the positive definite scalar product

(, ) of U is only assumed to be non-degenerate instead to ensure the

existence and uniqueness of an element T L(U ).

Let U be a finite-dimensional vector space with a positive definite scalar

product, (, ). For self-adjoint mappings, we have the following foundational

theorem.

Theorem 5.5 Assume that T L(U ) is self-adjoint with respect to the scalar

product of U . Then

(1) T must have a real eigenvalue,

(2) there is an orthonormal basis of U consisting of eigenvectors of T .

Proof We use induction on dim(U ).

There is nothing to show at dim(U ) = 1.

Assume that the theorem is true at dim(U ) = n 1 1.

We investigate the situation when dim(U ) = n 2.

152

any vector u U , let x = (x1 , . . . , xn )t Rn be its coordinate vector. Use

A = (aij ) R(n, n) to denote the matrix representation of T with respect to

B so that

T (uj ) =

n

aij ui ,

i = 1, . . . , n.

(5.2.1)

i=1

Then set

n

n

xi ui ,

xj T (uj )

Q(x) = (u, T (u)) =

n

xi ui ,

n

n

j =1

i=1

j =1

i=1

xj

n

n

akj uk =

k=1

xi xj akj ik

i,j,k=1

xi xj aij = x t Ax.

(5.2.2)

i,j =1

n

n

2

2

S= u=

xi ui U u = (u, u) =

xi = 1 ,

i=1

(5.2.3)

i=1

which may also be identified as the unit sphere in Rn centered at the origin

and commonly denoted by S n1 . Since S n1 is compact, the function Q given

in (5.2.2) attains its minimum over S n1 at a certain point on S n1 , say x 0 =

(x10 , . . . , xn0 )t .

We may assume xn0

= 0. Without loss of generality, we may also assume

0

xn > 0 (the case xn0 < 0 can be treated similarly). Hence, near x 0 , we may

represent the points on S n1 by the formulas

0

2

xn = 1 x12 xn1

where (x1 , . . . , xn1 ) is near (x10 , . . . , xn1

).

(5.2.4)

Therefore, with

1 x12

2

xn1

(5.2.5)

0

) is a critical point of P . Thus we have

we see that (x10 , . . . , xn1

0

(P )(x10 , . . . , xn1

) = 0.

(5.2.6)

153

function P as

P (x1 , . . . , xn1 ) =

n1

aij xi xj

i,j =1

+2

n1

2

ain xi 1 x12 xn1

i=1

2

+ ann (1 x12 xn1

).

(5.2.7)

Thus

n1

P

2

=2

aij xj + 2ain 1 x12 xn1

xi

j =1

2

n1

xi

2

1 x12 xn1

aj n xj 2ann xi , i = 1, . . . , n 1.

j =1

(5.2.8)

Using (5.2.8) in (5.2.6), we arrive at

n

j =1

n

1

aij xj0 = 0

aj n xj0 xi0 ,

xn

i = 1, . . . , n 1.

(5.2.9)

j =1

n

j =1

n

1

anj xj0 = 0

aj n xj0 xn0

xn

(5.2.10)

j =1

Ax 0 = 0 x 0 ,

n

1

0 = 0

aj n xj0 ,

xn

(5.2.11)

j =1

with 0 , of A.

Let v1 U be such that its coordinate vector with respect to the basis B

is x 0 . Then we have

154

n

n

0

xj uj =

xj0 T (uj )

T (v1 ) = T

j =1

n

aij ui xj0 =

i,j =1

= 0

n

j =1

n

n

aij xj0 ui

j =1

i=1

xi0 ui = 0 v1 ,

(5.2.12)

i=1

Now set U1 = Span{v1 } and make the decomposition U = U1 U1 .

We claim that U1 is invariant under T . In fact, for any u U1 , we have

(u, v1 ) = 0. Hence (T (u), v1 ) = (u, T (v1 )) = (u, 0 v1 ) = 0 (u, v1 ) = 0

which indicates T (u) U1 .

Of course dim(U1 ) = n 1. Using the inductive assumption, we know that

associated with a real eigenvalue of T , i = 2, . . . , n.

Finally, we may rescale v1 to make it a unit vector. Thus {v1 , v2 , . . . , vn } is

an orthonormal basis of U so that each vector vi is an eigenvector associated

with a real eigenvalue of T , i = 1, 2, . . . , n, as desired.

With respect to the basis {v1 , . . . , vn }, the matrix representation of T is

diagonal, whose diagonal, entries are the eigenvalues of T that are shown to be

all real. In other words, all eigenvalues of T are real.

Let T L(U ) be self-adjoint, use 1 , . . . , k to denote all the distinct eigenvalues of T , and denote by E1 , . . . , Ek the corresponding eigenspaces which

are of course invariant subspaces of T . Using Theorem 5.5, we see that there

holds the direct sum

U = E1 Ek .

(5.2.13)

eigenvalue 0 (if any). Moreover, we may set

)

)

Ei , E =

Ei .

(5.2.14)

E+ =

i >0

i <0

n0 = dim(E0 ),

n+ = dim(E+ ),

n = dim(E ),

(5.2.15)

are exactly what were previously called the indices of nullity, positivity, and

negativity, respectively, in the context of a real scalar product in Section 4.2,

which is simply a symmetric bilinear form here and may always be represented

155

or n0 = n(T ), and n+ + n is the rank of T , n+ + n = r(T ). Furthermore,

for ui Ei , uj Ej , we have

i (ui , uj ) = (T (ui ), uj ) = (ui , T (uj )) = j (ui , uj ),

(5.2.16)

= j , we have (ui , uj ) = 0.

In other words, the eigenspaces associated with distinct eigenvalues of T are

mutually perpendicular. (This observation suggests a practical way to construct an orthogonal basis of U consisting of eigenvectors of T : First find

all eigenspaces of T . Then obtain an orthogonal basis for each of these

eigenspaces by using the GramSchmidt procedure. Finally put all these orthogonal bases together to get an orthogonal basis of the full space.)

A useful matrix version of Theorem 5.5 may be stated as follows.

Theorem 5.6 A matrix A R(n, n) is symmetric if and only if there is an

orthogonal matrix P R(n, n) such that

A = P t DP ,

(5.2.17)

where D R(n, n) is a diagonal matrix and the diagonal entries of D are the

eigenvalues of A.

Proof The right-hand side of (5.2.17) is of course symmetric.

Conversely, let A be symmetric. The linear mapping T L(Rn ) defined by

T (x) = Ax, with x Rn any column vector, is self-adjoint with respect to the

standard scalar product over Rn . Hence there are column vectors u1 , . . . , un

in Rn consisting of eigenvectors of T or A associated with real eigenvalues,

say 1 , . . . , n , which form an orthonormal basis of Rn . The relations Au1 =

1 u1 , . . . , Aun = n un , may collectively be rewritten as AQ = QD where

D = diag{1 , . . . , n } and Q R(n, n) is made of taking u1 , . . . , un as the

respective column vectors. Since uti uj = ij , i, j = 1, . . . , n, we see that Q is

an orthogonal matrix. Setting P = Qt , we arrive at (5.2.17).

Note that the proof of Theorem 5.6 gives us a practical way to construct the

matrices P and D in (5.2.17).

Exercises

5.2.1 The fact that a symmetric matrix A R(n, n) has and can have only

real eigenvalues may also be proved algebraically more traditionally as

follows. Consider A as an element in C(n, n) and let C be any of

its eigenvalue whose existence is ensured by the Fundamental Theorem

156

of Algebra. Let u Cn be an eigenvector of A associated to . Then,

using (, ) to denote the standard Hermitian scalar product over Cn , we

have

(u, u) = u u = (u, Au).

(5.2.18)

show that A has an eigenvector in Rn associated to .

5.2.2 Consider the quadratic form

x1

x3

(i) Find a symmetric matrix A R(3, 3) such that q(x) = x t Ax.

(ii) Find an orthonormal basis of R3 (with respect to the standard Euclidean scalar product of R3 ) consisting of the eigenvectors of A.

(iii) Find an orthogonal matrix P R(3, 3) so that the substitution of

the variable, y = P x, transforms the quadratic form (5.2.19) into

the diagonal form

y1

y3

(5.2.20)

where 1 , 2 , 3 are the eigenvalues of A, counting multiplicities.

5.2.3 Show that any symmetric matrix A R(n, n) must be congruent with

a diagonal matrix of the form D = diag{d1 , . . . , dn }, where di = 1

or 0 for i = 1, . . . , n.

5.2.4 Let A R(n, n) be symmetric and det(A) < 0. Show that there is a

column vector x Rn such that x t Ax < 0.

5.2.5 Show that if A R(n, n) is orthogonal then adj(A) = At .

5.2.6 Show that if A1 , . . . , Ak R(n, n) are orthogonal, so is their product

A = A1 Ak .

5.2.7 Assume that A R(n, n) is symmetric and all of its eigenvalues are

1. Prove that A is orthogonal.

5.2.8 Let A R(n, n) be an upper or lower triangular matrix. If A is orthogonal, show that A must be diagonal and the diagonal entries of A can

only be 1.

5.2.9 Show that if T L(U ) is self-adjoint and T m = 0 for some integer

m 1 then T = 0.

157

then there is a self-adjoint element S L(U ) such that T = S m .

5.2.11 Let A R(n, n) be symmetric and satisfy the equation A3 +A2 +4In =

0. Prove that A = 2In .

5.2.12 Let A R(n, n) be orthogonal.

(i) Show that the real eigenvalues of A can only be 1.

(ii) If n = odd and det(A) = 1, then 1 is an eigenvalue of A.

(iii) If det(A) = 1, then 1 is an eigenvalue of A.

5.2.13 Let x Rn be a nonzero column vector. Show that

2

P = In

xx t

xt x

(5.2.21)

is an orthogonal matrix.

5.2.14 Let A R(n, n) be an idempotent symmetric matrix such that r(A) =

r. Show that the characteristic polynomial of A is

pA () = det(In A) = ( 1)r nr .

(5.2.22)

5.2.15 Let A R(n, n) be a symmetric matrix whose eigenvalues are all nonnegative. Show that det(A + In ) > 1 if A

= 0.

5.2.16 Let u Rn be a nonzero column vector. Show that there is an orthogonal matrix Q R(n, n) such that

Qt (uut )Q = diag{ut u, 0, . . . , 0}.

(5.2.23)

and matrices

Let q be a quadratic form over a finite-dimensional vector space U with a positive definite scalar product (, ) and f a symmetric bilinear form that induces

q: q(u) = f (u, u), u U . Then there is a self-adjoint mapping T L(U )

such that

q(u) = (u, T (u)),

u U.

(5.3.1)

In this section, we apply the results of the previous section to investigate the

situation when q stays positive, which is important in applications.

Definition 5.7 The positive definiteness of various subjects of concern is

defined as follows.

158

q(u) > 0,

u U,

u

= 0.

(5.3.2)

(u, T (u)) > 0,

u U,

u

= 0.

(5.3.3)

nonzero column vector x Rn there holds

x t Ax > 0.

(5.3.4)

through (5.3.1), then the positive definiteness of q and T are equivalent. Therefore it will be sufficient to study the positive definiteness of self-adjoint mappings which will be shown to be equivalent to the positive definiteness of any

matrix representations of the associated bilinear forms of the self-adjoint mappings. Recall that the matrix representation of the symmetric bilinear form

induced from T with respect to an arbitrary basis {u1 , . . . , un } of U is a matrix

A R(n, n) defined by

A = (aij ),

i, j = 1, . . . , n.

(5.3.5)

Theorem 5.8 That a self-adjoint mapping T L(U ) is positive definite is

equivalent to any of the following statements.

(1) All the eigenvalues of T are positive.

(2) There is a positive constant, 0 > 0, such that

(u, T (u)) 0 u2 ,

u U.

(5.3.6)

(4) The matrix A defined in (5.3.5) with respect to an arbitrary basis is positive definite.

(5) The eigenvalues of the matrix A defined in (5.3.5) with respect to an arbitrary basis are all positive.

(6) The matrix A defined in (5.3.5) with respect to an arbitrary basis enjoys

the factorization A = B 2 for some positive definite matrix B R(n, n).

Proof Assume T is positive definite. Let be any eigenvalue of T and u U

an associated eigenvector. Then u2 = (u, T u) > 0. Thus > 0 and (1)

follows.

159

is an eigenvector associated with the eigenvalue i (i = 1, . . . , n). Set 0 =

n

min{1 , . . . , n }. Then 0 > 0. Moreover, for any u U with u =

ai ui

i=1

n

n

n

n

ai ui ,

j aj uj =

i ai2 0

ai2 = 0 u2 ,

(u, T (u)) =

i=1

j =1

i=1

i=1

(5.3.7)

which establishes (2). It is obvious that (2) implies the positive definiteness

of T .

We now show that the positive definiteness of T is equivalent to the statement (3). In fact, let {u1 , . . . , un } be an orthonormal basis of U consisting of

eigenvectors with {1 , . . . , n } the corresponding eigenvalues. In view of (1),

i > 0 for i = 1, . . . , n. Now define S L(U ) to satisfy

(

S(ui ) = i ui , i = 1, . . . , n.

(5.3.8)

It is clear that T = S 2 .

Conversely, if T = S 2 for some positive definite mapping S, then 0 is not

an eigenvalue of S. So S is invertible. Therefore

(u, T (u)) = (u, S 2 (u)) = (S(u), S(u)) = S(u)2 > 0, u U, u

= 0,

(5.3.9)

and the positive definiteness of T follows.

Let {u1 , . . . , un } be an arbitrary basis of U and x = (x1 , . . . , xn )t Rn a

n

nonzero vector. For u =

xi ui , we have

i=1

n

n

n

xi ui ,

xj T (uj ) =

xi (ui , T (uj ))xj = x t Ax,

(u, T (u)) =

i=1

j =1

i,j =1

(5.3.10)

which is positive if T is positive definite, and vice versa. So the equivalence of

the positive definiteness of T and the statement (4) follows.

Suppose that A is positive definite. Let be any eigenvalue of A and x an

associated eigenvector. Then x t Ax = x t x > 0 implies > 0 since x t x > 0.

So (5) follows.

Now assume (5). Using Theorem 5.6, we see that A = P t DP where D is a

diagonal matrix in R(n, n) whose diagonal entries are the positive eigenvalues

160

y = P x = (y1 , . . . , yn )t Rn where x Rn , we have

x t Ax = (P x)t DP x = y t Dy =

n

i yi2 > 0,

(5.3.11)

i=1

whenever x

= 0 since P is nonsingular. Hence (5) implies (4) as well.

Finally, assume (5) holds and set

(

(

D1 = diag{ 1 , . . . , n }.

(5.3.12)

Then D12 = D. Thus

A = P t DP = P t D12 P = (P t D1 P )(P t D1 P ) = B 2 ,

B = P t D1 P .

(5.3.13)

Using (5) we know that B is positive definite. So (6) follows. Conversely, if (6)

holds, we may use the symmetry of B to get x t Ax = x t B 2 x = (Bx)t (Bx) > 0

whenever x Rn with x

= 0 because B is nonsingular, which implies Bx

=

0. Thus (4) holds.

The proof is complete.

We note that, if in (5.3.5) the basis {u1 , . . . , un } is orthonormal, then

T (uj ) =

n

n

(ui , T (uj ))ui =

aij ui ,

i=1

j = 1, . . . , n,

(5.3.14)

i=1

For practical convenience, it is worth stating the following matrix version of

Theorem 5.8.

Theorem 5.9 That a symmetric matrix A R(n, n) is positive definite is

equivalent to any of the following statements.

(1) All the eigenvalues of A are positive.

(2) There is a positive definite matrix B R(n, n) such that A = B 2 .

(3) A is congruent to the identity matrix. That is, there is a nonsingular matrix

B R(n, n) such that A = B t In B = B t B.

Proof That A is positive definite is equivalent to either (1) or (2) has already

been demonstrated in the proof of Theorem 5.8 and it remains to establish (3).

If there is a nonsingular matrix B R(n, n) such that A = B t B, then

x t Ax = (Bx)t (Bx) > 0 for x Rn with x

= 0, which implies that A is

positive definite.

161

Now assume A is positive definite. Then, combining Theorem 5.6 and (1),

we can rewrite A as A = P t DP where P R(n, n) is orthogonal and

D R(n, n) is diagonal whose diagonal entries, say 1 , . . . , n , are all positive. Define the diagonal matrix D1 by (5.3.12) and set D1 P = B. Then B is

nonsingular and A = B t B as asserted.

Similarly, we say that the quadratic form q, self-adjoint mapping T , or

symmetric matrix A R(n, n) is positive semi-definite or non-negative if in

the condition (5.3.2), (5.3.3), or (5.3.4), respectively, the greater than sign

(>) is replaced by greater than or equal to sign (). For positive semidefinite or non-negative mappings and matrices, the corresponding versions

of Theorems 5.8 and 5.9 can similarly be stated, simply with the word positive there being replaced by non-negative and In in Theorem 5.9 (3) by

diag{1, . . . , 1, 0, . . . , 0}.

Besides, we can define a quadratic form q, self-adjoint mapping T , or symmetric matrix A to be negative definite or negative semi-definite (non-positive),

if q, T , or A is positive definite or positive semi-definite (non-negative).

Thus, a quadratic form, self-adjoint mapping, or symmetric matrix is called

indefinite or non-definite if it is neither positive semi-definite nor negative

semi-definite.

As an illustration, we consider the quadratic form

x1

.

4

.. R ,

x4

(5.3.15)

where a is a real number. It is clear that q

a 1

1 a

A=

1 0

1 1

0 0

,

(5.3.16)

a 0

0 a

( a)2 ([ a]2 3) = 0.

(5.3.17)

Consequently

the eigenvalues of

A are 1 = 2 = a, 3 = a + 3, 4 =

162

3, A is negative semi-definite but not

negative definite; when a ( 3, 3), A is indefinite.

Exercises

5.3.1 Let T L(U ) be self-adjoint and S L(U ) be anti-self-adjoint. Show

that if T is positive definite then so is T S 2 .

5.3.2 Prove that if A and B are congruent matrices then A being positive

definite or positive semi-definite is equivalent to B being so.

5.3.3 Let U be a vector space with a positive definite scalar product (, ).

Show that, for any set of linearly independent vectors {u1 , . . . , uk } in

U , the metric matrix

(u1 , u1 ) (u1 , uk )

M =

(5.3.18)

,

(uk , u1 )

(uk , uk )

5.3.4 Show that a necessary and sufficient condition for A R(n, n) to be

symmetric is that there exists an invertible matrix B R(n, n) such

that A = B t B + aIn for some a R.

5.3.5 Let A R(n, n) be positive definite. Show that det(A) > 0.

5.3.6 Show that the inverse of a positive definite matrix is also positive definite.

5.3.7 Show that if A R(n, n) is positive definite then so is adj(A).

5.3.8 Let A R(m, n) where m

= n. Then AAt R(m, m) and At A

R(n, n) are both symmetric. Prove that if A is of full rank, that is,

r(A) = min{m, n}, then one of the matrices AAt , At A is positive definite and the other is positive semi-definite but can never be positive

definite.

5.3.9 Let A = (aij ) R(n, n) be positive definite.

(i) Prove that aii > 0 for i = 1, . . . , n.

(ii) Establish the inequalities

= j, i, j = 1, . . . , n.

(5.3.19)

and k 1 any integer, then there is a unique positive semi-definite

element S L(U ) such that T = S k .

5.3.11 Let A R(n, n) be positive semi-definite and consider the null set

S = {x Rn | x t Ax = 0}.

(5.3.20)

5.3.12

5.3.13

5.3.14

5.3.15

163

the matrix A. What happens when A is indefinite such that there are

nonzero vectors x, y Rn so that x t Ax > 0 and y t Ay < 0?

Let A, B R(n, n) be symmetric and A positive definite. Prove that

there is a nonsingular matrix C R(n, n) so that both C t AC and

C t BC are diagonal matrices (that is, A and B are simultaneously congruent to diagonal matrices).

Assume that A, B R(n, n) are symmetric and the eigenvalues of

A, B are greater than or equal to a, b R, respectively. Show that the

eigenvalues of A + B are greater than or equal to a + b.

Let A, B R(n, n) be positive definite. Prove that the eigenvalues of

AB are all positive.

Consider the quadratic form

x1

2

n

n

.

n

xi2

xi , x =

q(x) = (n + 1)

.. R .

i=1

i=1

xn

(5.3.21)

x Rn .

(ii) Compute the eigenvalues of A to determine whether q or A is positive definite.

5.3.16 Let A, B R(n, n) where A is positive definite and B is positive semidefinite. Establish the inequality

det(A + B) det(A) + det(B)

(5.3.22)

5.3.17 Let A, B R(n, n) be positive definite such that B A is positive

semi-definite. Prove that any solution of the equation

det(A B) = 0,

(5.3.23)

5.3.18 Let U be an n-dimensional vector space with a positive definite scalar

product (, ) and T L(U ) a positive definite self-adjoint mapping.

Given b U , consider the function

f (u) =

1

(u, T (u)) (u, b),

2

u U.

(5.3.24)

164

T (x) = b

(5.3.25)

if and only if it is the minimum point of f , i.e. f (u) f (x) for

any u U .

(ii) Show without using the invertibility of T that the solution to

(5.3.25) is unique.

5.3.19 Use the notation of the previous exercise and consider the quadratic

form q(u) = (u, T (u)) (u U ) where T L(U ) is positive semidefinite. Prove that q is convex. That is, for any , 0 satisfying

+ = 1, there holds

q(u + v) q(u) + q(v),

u, v U.

(5.3.26)

In this section, we present two alternative characterizations of positive definite

matrices, that are useful in applications. The first one involves using determinants and the second one involves an elegant matrix decomposition.

Theorem 5.10 Let A = (aij ) R(n, n) be symmetric. The matrix A is positive

definite if and only if all its leading principal minors are positive, that is, if and

only if

a11 a1n

a

11 a12

a11 > 0,

> 0, . . . , > 0. (5.4.1)

a21 a22

a

a

n1

Proof

nn

a11 a1k

(5.4.2)

Ak = .

ak1

akk

= 0, we take x = (y1 , . . . , yk , 0, . . . , 0)t

Rn . Thus, there holds

y t Ak y =

k

i,j =1

aij yi yj = x t Ax > 0.

(5.4.3)

165

matrix B R(k, k) such that Ak = B t B. Thus, det(Ak ) = det(B t B) =

det(B t ) det(B) = det(B)2 > 0, and (5.4.1) follows.

We next assume that (5.4.1) holds and we show that A is positive definite by

an inductive argument.

If n = 1, there is nothing to show.

Assume that the assertion is valid at n 1 (n 2).

We prove that A is positive definite at n 2.

To proceed, we rewrite A in a blocked form as

An1

,

(5.4.4)

A=

ann

t

where = (a1n , . . . , an1,n )t Rn1 . By the inductive assumption at n 1,

we know that An1 is positive definite. Hence, applying Theorem 5.9, we have

a nonsingular matrix B R(n 1, n 1) such that An1 = B t B. Thus

An1

Bt 0

B 0

In1

A=

=

, (5.4.5)

0 1

ann

ann

t

0 1

t

if we choose = (B t )1 (b1 , . . . , bn1 )t Rn1 . Since det(A) > 0, we

obtain, after making some suitable row operations, the result

In1

det(A)

2

> 0. (5.4.6)

= det

= ann b12 bn1

det(An1 )

ann

t

Now for x Rn so that

y=

we have

x Ax = y

t

In1

ann

x (y1 , . . . , yn1 , yn )t ,

(5.4.7)

2

= y12 + + yn1

+ 2(b1 y1 + + bn1 yn1 )yn + ann yn2

2

+ (ann b12 bn1

)yn2 ,

(5.4.8)

= 0, or equivalently, x

= 0, in view of

the condition (5.4.6).

Therefore the proof is complete.

166

components of x, if any, are given by

xi1 = y1 , . . . , xik = yk ,

i1 , . . . , ik = 1, . . . , n,

i1 < < ik ,

(5.4.9)

xi1 = y1 , if any) then for y = (y1 , . . . , yk )t Rk there holds

k

y t Ai1 ,...,ik y =

ail im yl ym = x t Ax,

(5.4.10)

l,m=1

where

ai1 i1

Ai1 ,...,ik =

aik i1

ai1 ik

aik ik

(5.4.11)

is a submatrix of A obtained from deleting all the ith rows and j th columns

of A for i, j = 1, . . . , n and i, j

= i1 , . . . , ik . The quantity det(Ai1 ,...,ik )

is referred to as a principal minor of A of order k. Such a principal minor

becomes a leading principal minor when i1 = 1, . . . , ik = k with A1,...,k = Ak .

For A = (aij ) the principal minors of order 1 are all its diagonal entries,

a11 , . . . , ann . It is clear that if A is positive definite then so is Ai1 ,...,ik . Hence

det(Ai1 ,...,ik ) > 0. Therefore we arrive at the following slightly strengthened

version of Theorem 5.10.

Theorem 5.11 Let A = (aij ) R(n, n) be symmetric. The matrix A is positive

definite if and only if all its principal minors are positive, that is, if and only if

ai1 i1 ai1 ik

aii > 0, > 0,

(5.4.12)

a

a

ik i1

ik ik

We now pursue the decomposition of a positive definite matrix as another

characterization of this kind of matrices.

Theorem 5.12 Let A R(n, n) be symmetric. The matrix A is positive definite

if and only if there is a nonsingular lower triangular matrix L R(n, n) such

that A = LLt . Moreover, when A is positive definite, there is a unique L with

the property that all the diagonal entries of L are positive.

167

view of Theorem 5.9 that A is of course positive definite.

Now we assume A = (aij ) R(n, n) is positive definite. We look for a

unique lower triangular matrix L = (lij ) R(n, n) such that A = LLt so that

all the diagonal entries of L are positive, l11 > 0, . . . , lnn > 0.

We again use induction.

When n = 1, then A = (a11 ) with a11 > 0. So the unique choice is L =

( a11 ).

Assume that the assertion is valid at n 1 (n 2).

We establish the decomposition at n (n 2).

To proceed, we rewrite A in the form (5.4.4). In view of Theorem 5.10, the

matrix An1 is positive definite. Thus there is a unique lower triangular matrix

L1 R(n 1, n 1), with positive diagonal entries, so that An1 = L1 Lt1 .

Now take L R(n, n) with

L2 0

L=

,

(5.4.13)

t a

where L2 R(n 1, n 1) is a lower triangular matrix, Rn1 is a column

vector, and a R is a suitable number. Then, if we set A = LLt , we obtain

An1

L2 0

Lt2

A=

=

ann

t

t a

0 a

t

L2 L2

L2

=

.

(5.4.14)

t

t

(L2 ) + a 2

Therefore we arrive at the relations

An1 = L2 Lt2 ,

= L2 ,

t + a 2 = ann .

(5.4.15)

If we require that all the diagonal entries of L2 be positive, then the inductive

assumption leads to L2 = L1 . Hence the vector is also uniquely determined,

= L1

1 . So it remains to show that the number a may be uniquely determined as well. For this purpose, we need to show in (5.4.15) that

ann t > 0.

(5.4.16)

that gives us An1 = B t B. Thus, from the relation An1 = L1 Lt here we may

specify L1 = B t in (5.4.5) so that = there. That is,

168

A=

An1

ann

=

L1

In1

ann

Lt1

.

(5.4.17)

Hence (5.4.6), namely, (5.4.16), follows immediately. Therefore the last relation in (5.4.15) leads to the unique determination of the number a:

(

a = ann t > 0.

(5.4.18)

The proof follows.

The above-described characterization that a positive definite n n matrix

can be decomposed as the product of a unique lower triangular matrix L, with

positive diagonal entries, and its transpose, is known as the Cholesky decomposition theorem, which has wide applications in numerous areas. Through

resolving the Cholesky relation A = LLt , it may easily be seen that the lower

triangular matrix L = (lij ) can actually be constructed from A = (aij ) explicitly following the formulas

l11 = a11 ,

li1 =

ai1 , i = 2, . . . , n,

11

i1

+

,aii

l

=

lij2 , i = 2, . . . , n,

ii

j =1

j 1

1

l

aij

=

lik lj k , i = j + 1, . . . , n, j = 2, . . . , n.

ij

ljj

k=1

(5.4.19)

Thus, if A R(n, n) is nonsingular, the product AAt may always be reduced

into the form LLt for a unique lower triangular matrix L with positive diagonal

entries.

Exercises

5.4.1 Let a1 , a2 , a3 be real numbers and set

a1 a2

A = a2 a3

a3

a1

a3

a1 .

(5.4.20)

a2

chosen.

169

x1

x3

(5.4.21)

where a1 , a2 , a3 R. Find a necessary and sufficient condition on

a1 , a2 , a3 for q to be positive definite. Can you extend your finding to

the case over Rn ?

5.4.3 Let A = (aij ) R(n, n) be symmetric. Prove that if the matrix A is positive semi-definite then all its leading principal minors are non-negative,

a11

a

11

0,

a21

a12

a22

a11

a

n1

0, . . . ,

a1n

0. (5.4.22)

ann

5.4.4 Investigate the possibility whether Theorem 5.12 can be extended so

that the lower triangular matrix may be replaced by an upper triangular

matrix so that the theorem may now read: A symmetric matrix A is positive definite if and only if there is a nonsingular upper triangular matrix

U such that A = U U t . Furthermore, if A is positive definite, then there

is a unique upper triangular matrix U with positive diagonal entries such

that A = U U t .

5.4.5 Assume that A = (aij ) R(n, n) is positive definite and rewrite A as

A=

An1

ann

(5.4.23)

column vector in Rn1 .

(i) Show that the matrix equation

In1

=

t

An1

ann

An1

ann + a

In1

(5.4.24)

170

det(A) det(An1 )ann .

(5.4.25)

det(A) a11 ann .

(5.4.26)

5.4.6 Let A R(n, n) and view A as made of n column vectors u1 , . . . , un in

Rn which is equipped with the standard Euclidean scalar product.

(i) Show that At A is the metric or Gram matrix of u1 , . . . , un Rn .

(ii) Use the fact that At A is positive semi-definite and (5.4.26) to prove

that

| det(A)| u1 un .

(5.4.27)

satisfying

|aij | a,

i, j = 1, . . . , n.

(5.4.28)

Hadamard inequality for determinants,

n

| det(A)| a n n 2 .

5.4.8 Consider the matrix

A= 1

1

3

1 1

(5.4.29)

1 .

(5.4.30)

(ii) Find the unique lower triangular matrix L R(3, 3) stated in Theorem 5.12 such that A = LLt .

We continue our discussion about self-adjoint mappings over a real finitedimensional vector space U with a positive definite scalar product (, ).

The main focus of this section is to characterize a situation when two mappings may be simultaneously diagonalized.

171

Theorem 5.13 Let S, T L(U ) be self-adjoint. Then there is an orthonormal basis of U consisting of eigenvectors of both S, T if and only if S and T

commute, or S T = T S.

Proof Let 1 , . . . , k be all the distinct eigenvalues of T and E1 , . . . , Ek

the associated eigenspaces which are known to be mutually perpendicular. Assume that S, T L(U ) commute. Then, for any u Ei (i = 1, . . . , k), we

have

T (S(u)) = S(T (u)) = S(i u) = i S(u).

(5.5.1)

Thus S(u) Ei . In other words, each Ei is invariant under S. Since S is selfadjoint, each Ei has an orthonormal basis, say {ui,1 , . . . , ui,mi }, consisting of

the eigenvectors of S. Therefore, the set of vectors

{u1,1 , . . . , u1,m1 , . . . , uk,1 , . . . , uk,mk }

(5.5.2)

Conversely, if {u1 , . . . , un } is an orthonormal basis of U consisting of the

eigenvectors of both S and T , then

S(ui ) = i ui ,

T (ui ) = i ui ,

i , i R,

i = 1, . . . , n.

(5.5.3)

T = T S as anticipated.

Note that although the theorem says that, when S and T commute and

n = dim(U ), there are n mutually orthogonal vectors that are the eigenvectors of S and T simultaneously, it does not say that an eigenvector of S or T

must also be an eigenvector of T or S. For example, we may take S = I (the

identity mapping). Then S commutes with any mapping and any nonzero vector u is an eigenvector of S but u cannot be an eigenvector of all self-adjoint

mappings.

The matrix version of Theorem 5.13 is easily stated and proved: Two symmetric matrices A, B R(n, n) are commutative, AB = BA, if and only if

there is an orthogonal matrix P R(n, n) such that

A = P t DA P ,

B = P t DB P ,

(5.5.4)

where DA , DB are diagonal matrices in R(n, n) whose diagonal entries are the

eigenvalues of A, B, respectively.

172

Exercises

5.5.1 Let U be a finite-dimensional vector space over a field F. Show that for

T L(U ), if T commutes with any S L(U ), then there is some a F

such that T = aI .

5.5.2 If A, B R(n, n) are positive definite matrices and AB = BA, show

that AB is also positive definite.

5.5.3 Let U be an n-dimensional vector space over a field F, where F = R or

C and T , S L(U ). Show that, if T has n distinct eigenvalues in F and

S commutes with T , then there is a polynomial p(t), of degree at most

(n 1), of the form

p(t) = a0 + a1 t + + an1 t n1 ,

a0 , a1 , . . . , an1 F, (5.5.5)

5.5.4 (Continued from the previous exercise) If T L(U ) has n distinct

eigenvalues in F, show that

CT = {S L(U ) | S T = T S},

(5.5.6)

subspace of L(U ), is exactly n-dimensional.

5.5.5 Let U be a finite-dimensional vector space with a positive definite scalar

product and T L(U ).

(i) Show that if T is normal satisfying T T = T T then T (u) =

T (u) for any u U . In particular, N (T ) = N (T ).

(ii) Show that if T is normal and idempotent, T 2 = T , then T = T .

In this section, we briefly illustrate how to use self-adjoint mappings to study

general mappings between two vector spaces with positive definite scalar products.

Use U and V to denote two real vector spaces of finite dimensions equipped

with positive definite scalar products (, )U and (, )V , respectively. For T

L(U, V ), since

f (u) = (T (u), v)V ,

u U,

(5.6.1)

depending on v, say T (v), such that f (u) = (u, T (v))U . That is,

(T (u), v)V = (u, T (v))U ,

u U, v V .

(5.6.2)

173

It is straightforward to check that T is linear. Thus T L(V , U ). This

construction allows us to consider the composed mappings T T L(U ) and

T T L(V ), which are both seen to be self-adjoint.

Moreover, since

(u, (T T )(u))U = (T (u), T (u))V 0,

u U,

(5.6.3)

v V,

(5.6.4)

Let U and V be the norms induced from the positive definite scalar

products (, )U and (, )V , respectively. Recall that the norm of T with respect

to U and V is given by

T = sup{T (u)V | uU = 1, u U }.

(5.6.5)

On the other hand, since T T L(U ) is self-adjoint and positive semidefinite, there is an orthonormal basis {u1 , . . . , un } of U consisting of eigenvectors of T T , associated with the corresponding non-negative eigenvalues

n

n

1 , . . . , n . Therefore, for u =

ai ui U with u2U =

ai2 = 1, we

i=1

i=1

have

T (u)2V = (T (u), T (u))V = (u, (T T )(u))U =

n

i ai2 0 , (5.6.6)

i=1

where

0 = max {i },

1in

which proves T

Then (5.6.5) leads to

(5.6.7)

(5.6.8)

T =

0 ,

In particular, the above also gives us a practical method to compute the norm

of a linear mapping T from U into itself by using the generated self-adjoint

mapping T T .

174

largest eigenvalue of T 2 , we have the expression

T = max {|i |},

1in

(5.6.10)

An immediate consequence of (5.6.9) and (5.6.10) is the elegant formula

T 2 = T T ,

T L(U, V ).

(5.6.11)

that T may also be assessed according to

T = sup{|(u, T (u))| | u U, u = 1}.

(5.6.12)

In fact, let denote the right-hand side of (5.6.12). Then the Schwarz

inequality (4.3.10) implies that |(u, T (u))| T (u) T for u U satisfying u = 1. So T . On the other hand, let be the eigenvalue of T

such that T = || and u U an associated eigenvector with u = 1. Then

T = || = |(u, T (u))| . Hence (5.6.12) is verified.

We now show how to extend (5.6.12) to evaluate the norm of a general linear

mapping between vector spaces with scalar products.

Theorem 5.14 Let U, V be finite-dimensional vector spaces equipped with

positive definite scalar products (, )U , (, )V , respectively. For T L(U, V ),

we have

T = sup{|(T (u), v)V | | u U, v V , uU = 1, vV = 1}.

Proof

(5.6.13)

T = sup{T (u)V | u U, uU = 1}.

(5.6.14)

T (u )V T .

(5.6.15)

T (u )V = sup{|(T (u ), v)V | | v V , vV = 1}.

(5.6.16)

|(T (u ), v )V | (T (u )V .

(5.6.17)

175

may combine (5.6.15) and (5.6.17) to obtain T 2. Since > 0 is

arbitrary, we have T .

On the other hand, for u U, v V with uU = 1, vV = 1, we may

use the Schwarz inequality (4.3.10) to get |(T (u), v)V | T (u)V T .

Hence T .

Therefore we arrive at T = and the proof follows.

An important consequence of Theorem 5.14 is that the norms of a linear

mapping, over two vector spaces with positive definite scalar products, and its

dual assume the same value.

Theorem 5.15 Let U, V be finite-dimensional vector spaces equipped with

positive definite scalar products (, )U , (, )V , respectively. For T L(U, V )

and its dual T L(V , U ), we have T = T . Thus T T = T T

and the largest eigenvalues of the positive semi-definite mappings T T

L(U ) and T T L(V ) are the same.

Proof The fact that T = T may be deduced from applying (5.6.13) to

T and the relation (T (u), v)V = (u, T (v))U (u U ,v V ). The conclusion

T T = T T follows from (5.6.11) and that the largest eigenvalues

of the positive semi-definite mappings T T and T T are the same is a

consequence of the eigenvalue characterization of the norm of a self-adjoint

mapping stated in (5.6.10) and T T = T T .

As an application, we see that, for any matrix A R(m, n), the largest

eigenvalues of the symmetric matrices At A R(n, n) and AAt R(m, m)

must coincide. In fact, regarding eigenvalues, it is not hard to establish a more

general result as stated as follows.

Theorem 5.16 Let U, V be vector spaces over a field F and T L(U, V ), S

L(V , U ). Then the nonzero eigenvalues of S T and T S are the same.

Proof Let F be a nonzero eigenvalue of S T and u U an associated

eigenvector. We show that is also an eigenvalue of T S. In fact, from

(S T )(u) = u,

(5.6.18)

T S, it suffices to show T (u)

= 0. However, such a property is already seen

in (5.6.18) since

= 0 and u

= 0.

176

then it is also an eigenvalue of S T .

It is interesting that we do not require any additional properties for the vector

spaces U, V in order for Theorem 5.16 to hold.

We next study another important problem in applications known as the least

squares approximation.

Let T L(U, V ) and v V . Consider the optimization problem

.

inf T (u) v2V | u U .

(5.6.19)

For simplicity, we first assume that (5.6.19) has a solution, that is, there is

some x U such that T (x) v2V = , and we look for some appropriate

condition the solution x will fulfill. For this purpose, we set

f () = T (x + y) v2V ,

y U,

R.

(5.6.20)

Then f (0) = f (). Hence we may expand the right-hand side of (5.6.20)

to obtain

df

= (T (x) v, T (y))V + (T (y), T (x) v)V

0=

d =0

= 2((T T )(x) T (v), y)U ,

y U,

(5.6.21)

(T T )(x) = T (v),

v V.

(5.6.22)

to see that the equation (5.6.22) is always consistent for any v V . In fact,

it is clear that R(T T ) R(T ). On the other hand, if u N (T T ),

then T (u)2V = (u, (T T )(u)) = 0. Thus u N (T ). This establishes

N (T ) = N (T T ). So by the rank equation we deduce dim(U ) = n(T ) +

r(T ) = n(T T ) + r(T T ). Therefore, in view of Theorem 2.9, we get

r(T T ) = r(T ) = r(T ). That is, R(T T ) = R(T ), as desired, which

proves the solvability of (5.6.22) for any v V .

Next, let x be a solution to (5.6.22). We show that x solves (5.6.19). In fact,

if y is another solution to (5.6.22), then z = x y N (T T ) = N (T ). Thus

T (y) v2V = T (x + z) v2V = T (x) v2V ,

(5.6.23)

177

which shows that the quantity T (x) v2V is independent of the solution x

of (5.6.22). Besides, for any test element u U , we rewrite u as u = x + w

where x is a solution to (5.6.22) and w U . Then we have

T (u) v2V =T (x) v2V + 2(T (x) v, T (w))V + T (w)2V

=T (x) v2V + 2((T T )(x) T (v), w)U

+ T (w)2V

=T (x) v2V + T (w)2V

T (x) v2V .

(5.6.24)

Using knowledge about self-adjoint mappings, we can express a solution to

the normal equation (5.6.22) explicitly in terms of the eigenvectors and eigenvalues of T T . In fact, let {u1 , . . . , uk , . . . , un } be an orthonormal basis of U

consisting of the eigenvectors of the positive semi-definite mapping T T

L(U ) associated with the non-negative eigenvalues 1 , . . . , k , . . . , n among

which 1 , . . . , k are positive (if any). Then a solution x of (5.6.22) may be

n

written x =

ai ui for some a1 , . . . , an R. Inserting this into (5.6.22), we

i=1

obtain

k

ai i ui = T (v).

(5.6.25)

i=1

ai =

1

(ui , T (v))U ,

i

i = 1, . . . , k,

(5.6.26)

which leads to the following general solution formula for the equation (5.6.22):

x=

k

1

(ui , T (v))U ui + x0 ,

i

x0 N (T ),

(5.6.27)

i=1

since N(T T ) = N (T ).

Exercises

5.6.1 Show that the mapping T : V U defined in (5.6.2) is linear.

5.6.2 Let U, V be finite-dimensional vector spaces with positive definite scalar

products. For any T L(U, V ) show that the mapping T T L(U )

is positive definite if and only if n(T ) = 0.

178

products, (, )U , (, )V , respectively. For any T L(U, V ), define T

L(V , U ) by (5.6.2).

(i) Establish the relations R(T ) = N (T ) and R(T ) = N (T ) (the

latter is also known as the Fredholm alternative).

(ii) Prove directly, without resorting to Theorem 2.9, that r(T ) =

r(T ).

(iii) Prove that r(T T ) = r(T T ) = r(T ).

5.6.4 Apply the previous exercise to verify the validity of the rank relation

r(At A) = r(AAt ) = r(A) = r(At ),

A R(m, n).

(5.6.28)

T

x1

x2

= 1 1

2

x1

x2

,

x1

x2

R2 ,

(5.6.29)

where R2 and R3 are equipped with the standard Euclidean scalar products.

(i) Find the eigenvalues of T T and compute T .

(ii) Find the eigenvalues of T T and verify that T T is positive

semi-definite but not positive definite (cf. part (ii) of (2)).

(iii) Check to see that the largest eigenvalues of T T and T T are

the same.

(iv) Compute all eigenvalues of T T and T T and explain the results

in view of Theorem 5.16.

5.6.6 Let A F(m, n) and B F(n, m) where m < n and consider AB

F(m, m) and BA F(n, n). Show that BA has at most m + 1 distinct

eigenvalues in F.

5.6.7 (A specialization of the previous exercise) Let u, v Fn be column

vectors. Hence ut v F and vut F(n, n).

(i) Show that the matrix vut has a nonzero eigenvalue in F only if

ut v

= 0.

(ii) Show that, when ut v

= 0, the only nonzero eigenvalue of vut in F

is ut v so that v is an associated eigenvector.

(iii) Show that, when ut v

= 0, the eigenspace of vut associated with

the eigenvalue ut v is one-dimensional.

179

5.6.8 Consider the Euclidean space Rk equipped with the standard inner product and let A R(m, n). Formulate a solution of the following optimization problem

.

(5.6.30)

inf Ax b2 | x Rn , b Rm ,

by deriving a matrix version of the normal equation.

5.6.9 Consider a parametrized plane in R3 given by

x1 = y1 + y2 ,

x2 = y1 y2 ,

x3 = y1 + y2 ,

y1 , y2 R.

(5.6.31)

Use the least squares approximation to find a point in the plane that is the

closest to the point in R3 with the coordinates x1 = 2, x2 = 1, x3 = 3.

6

Complex quadratic forms

and self-adjoint mappings

In this chapter we extend our study on real quadratic forms and self-adjoint

mappings to the complex situation. We begin with a discussion on the complex version of bilinear forms and the Hermitian structures. We will relate the

Hermitian structure of a bilinear form with representing it by a unique selfadjoint mapping. Then we establish the main spectrum theorem for self-adjoint

mappings. We next focus again on the positive definiteness of self-adjoint mappings. We explore the commutativity of self-adjoint mappings and apply it to

obtain the main spectrum theorem for normal mappings. We also show how to

use self-adjoint mappings to study a mapping between two spaces.

Let U be a finite-dimensional vector space over C. Extending the standard

Hermitian scalar product over Cn , we may formulate the notion of a complex

bilinear form as follows.

Definition 6.1 A complex-valued function f : U U C is called a

sesquilinear form, which is also sometimes loosely referred to as a bilinear

form, if it satisfies for any u, v, w U and a C the following conditions.

(1) f (u + v, w) = f (u, w) + f (v, w), f (u, v + w) = f (u, v) + f (u, w).

(2) f (au, v) = af (u, v), f (u, av) = af (u, v).

As in the real situation, we may consider how to use a matrix to represent

a sesquilinear form. To this end, let B = {u1 , . . . , un } be a basis of U . For

u, v U , let x = (x1 , . . . , xn )t , y = (y1 , . . . , yn )t Cn denote the coordinate

vectors of u, v with respect to the basis B. Then

180

f (u, v) = f

n

i=1

xi ui ,

n

yj uj =

j =1

n

181

i,j =1

(6.1.1)

where A = (aij ) = (f (ui , uj )) lies in C(n, n) which is the matrix representation of the sesquilinear form f with respect to B.

Let B = {u 1 , . . . , u n } be another basis of U so that the coordinate vectors of

Hence, using A = (a ij ) = (f (u i , u j ))

u, v are x,

y Cn with respect to B.

C(n, n) to denote the matrix representation of the sesquilinear form f with

respect to B and B = (bij ) C(n, n) the basis transition matrix from B into

B so that

u j =

n

bij ui ,

j = 1, . . . , n,

(6.1.2)

i=1

conclude with

f (u, v) = x A y = x Ay = x (B AB)y.

(6.1.3)

definition that two matrices A, B C(n, n) are said to be Hermitian congruent, or simply congruent, if there is an invertible matrix C C(n, n) such that

A = C BC.

(6.1.4)

with respect to different bases of U are Hermitian congruent.

Let f : U U C be a sesquilinear form. Define q : U C by setting

q(u) = f (u, u),

u U.

(6.1.5)

As in the real situation, we may call q the quadratic form associated with f .

We have the following homogeneity property

q(zu) = |z|2 q(u),

u U,

z C.

(6.1.6)

is in sharp contrast with the real situation.

In fact, since

f (u + v, u + v) = f (u, u)+f (v, v)+f (u, v)+f (v, u), u, v U,

(6.1.7)

f (u+iv, u+iv) = f (u, u)+f (v, v)+if (u, v) if (v, u), u, v U,

(6.1.8)

182

induced quadratic form,

1

(q(u + v) q(u) q(v))

2

i

(q(u + iv) q(u) q(v)) , u, v U.

(6.1.9)

2

For U , let B = {u1 , . . . , un } be a basis, u U , and x Cn the coordinate

vector of u with respect to B. In view of (6.1.1), we get

1

(A + A ) + (A A ) x

q(u) =x Ax = x

2

2

1

1

= x (A + A )x + x (A A )x = {q(u)} + i{q(u)}. (6.1.10)

2

2

f (u, v) =

checked to be equivalent to the condition

f (u, v) = f (v, u),

u, v U.

(6.1.11)

we have

i

1

if (v, u) = (q(u + iv) q(u) q(v)) (q(u + v) q(u) q(v)) .

2

2

(6.1.12)

Combining (6.1.9) and (6.1.12), and using the condition that q is real-valued,

we arrive at (6.1.11).

Thus we are led to the following definition.

Definition 6.2 A sesquilinear form f : U U C is said to be Hermitian if

it satisfies the condition (6.1.11).

For any column vectors x, y Cn , the standard Hermitian scalar product is

given by

(x, y) = x y.

(6.1.13)

Hermitian sesquilinear form. More generally, with any n n Hermitian matrix

A, we see that

f (x, y) = x Ay = (x, Ay),

x, y Cn ,

(6.1.14)

is also a Hermitian sesquilinear form. Conversely, the relation (6.1.1) indicates that a Hermitian sesquilinear form over an n-dimensional complex vector

183

matrix, in terms of the standard Hermitian scalar product over Cn .

Consider a finite-dimensional complex vector space U with a positive definite scalar product (, ). Given any sesquilinear form f : U U C, since

g(v) = f (u, v),

v U,

(6.1.15)

T (u) U such that

f (u, v) = (T (u), v),

u, v U,

(6.1.16)

denote the adjoint of T . Then f may also be represented as

f (u, v) = (u, T (v)),

u, v U.

(6.1.17)

by a linear mapping T or T from U into itself, in terms of the positive definite

scalar product over U , alternatively through the expression (6.1.16) or (6.1.17).

Applying (6.1.9) to f (u, v) = (u, T (v)) (u, v U ) for any T L(U ), we

obtain the following useful polarization identity for T :

(u, T (v)) =

1

((u + v, T (u + v)) (u, T (u)) (v, T (v)))

2

i

((u + iv, T (u + iv)) + (u, T (u)) + (v, T (v))) ,

2

u, v U.

(6.1.18)

Theorem 6.3 Let f be a sesquilinear form over a finite-dimensional complex

vector space U with a positive definite scalar product (, ) and represented by

the mapping T L(U ) through (6.1.16) or (6.1.17). Then f is Hermitian if

and only if T is self-adjoint, T = T .

Proof

If f is Hermitian, then

f (u, v) = f (v, u) = (T (v), u) = (u, T (v)),

u, v U.

(6.1.19)

Let B = {u1 , . . . , un } be an orthonormal basis of U , T L(U ) be selfadjoint, and A = (aij ) C(n, n) the matrix representation of T with respect

n

to B. Then T (uj ) =

aij ui (j = 1, . . . , n) so that

i=1

184

aij = (ui , T (uj )) = (T (ui ), uj ) = a j i ,

i, j = 1, . . . , n.

(6.1.20)

if and only if the matrix representation of T with respect to any orthonormal

basis is Hermitian. Consequently, a self-adjoint mapping over a complex vector

space with a positive definite scalar product is interchangeably referred to as

Hermitian as well.

Exercises

6.1.1 Let U be a complex vector space with a positive definite scalar product

(, ) and T L(U ). Show that T = 0 if and only if (u, T (u)) = 0 for

any u U . Give an example to show that the same may not hold for a

real vector space with a positive definite scalar product.

6.1.2 Let U be a complex vector space with a positive definite scalar product

(, ) and T L(U ). Show that T is self-adjoint or Hermitian if and only

if (u, T (u)) = (T (u), u) for any u U . Give an example to show that

the same may not hold for a real vector space with a positive definite

scalar product.

6.1.3 Let U be a complex vector space with a basis B = {u1 , . . . , un } and

A = (f (ui , uj )) C(n, n) the matrix representation of f with respect

to B. Show that f is Hermitian if and only if A is Hermitian.

6.1.4 Let U be a complex vector space with a positive definite scalar product

(, ) and T L(U ). Show that T can be uniquely decomposed into a

sum T = R + iS where R, S L(U ) are both self-adjoint.

6.1.5 Show that the inverse of an invertible self-adjoint mapping is also selfadjoint.

6.1.6 Show that In and In cannot be Hermitian congruent, although they are

congruent as elements in C(n, n).

As in the real situation, we now show that complex self-adjoint or Hermitian

mappings are completely characterized by their spectra as well.

Theorem 6.4 Let U be a complex vector space with a positive definite scalar

product (, ) and T L(U ). If T is self-adjoint, then the following are valid.

(1) The eigenvalues of T are all real.

(2) Let 1 , . . . , k be all the distinct eigenvalues of T . Then T may be reduced

over the direct sum of mutually perpendicular eigenspaces

U = E1 Ek .

185

(6.2.1)

Proof Let T be self-adjoint and C an eigenvalue of T with u U an

associated eigenvector. Then, using T (u) = u, we have

u2 = (u, T (u)) = (T (u), u) = u2 ,

(6.2.2)

which gives us = so that R. This establishes (1). Note that this proof

does not assume that U is finite dimensional.

To establish (2), we use induction on dim(U ).

If dim(U ) = 1, there is nothing to show.

Assume that the statement (2) is valid if dim(U ) n 1 for some n 2.

Consider dim(U ) = n, n 2.

Let 1 be an eigenvalue of T in C. From (1), we know that actually 1 R.

Use E1 to denote the eigenspace of T associated with 1 :

E1 = {u U | T (u) = 1 u} = N (1 I T ).

(6.2.3)

E1

= U .

It is clear that E1 is invariant under T . In fact, T is reducible over the

direct sum U = E1 (E1 ) . To see this, we need to establish the invariance

T ((E1 ) ) (E1 ) . Indeed, for any u E1 and v (E1 ) , we have

(u, T (v)) = (T (u), v) = 1 (u, v) = 0.

(6.2.4)

Using the fact that dim((E1 ) ) = dim(U ) dim(E1 ) n 1 and the

inductive assumption, we see that T is reduced over a direct sum of mutually

perpendicular eigenspaces of T in (E1 ) :

(E1 ) = E2 Ek ,

(6.2.5)

where 2 , . . . , k are all the distinct eigenvalues of T over the invariant subspace (E1 ) , which are real.

Finally, we need to show that 1 , . . . , k obtained above are all possible

eigenvalues of T . For this purpose, let be an eigenvalue of T and u an

associated eigenvector. Then there are u1 E1 , . . . , uk Ek such that

u = u1 + + uk . Hence the relation T (u) = u gives us 1 u1 + + k uk =

(u1 + + uk ). That is,

(1 )u1 + + (k )uk = 0.

(6.2.6)

186

Since u

= 0, there exists some i = 1, . . . , k such that ui

= 0. Thus, taking

scalar product of the above equation with ui , we get (i )ui 2 = 0, which

implies = i . Therefore 1 , . . . , k are all the possible eigenvalues of T .

To establish (3), we simply construct an orthonormal basis over each

eigenspace Ei , obtained in (2), denoted by Bi , i = 1, . . . , k. Then B =

B1 Bk is a desired orthonormal basis of U as stated in (3).

Let A = (aij ) C(n, n) and consider the mapping T L(Cn ) defined by

T (x) = Ax,

x Cn .

(6.2.7)

Using the mapping (6.2.7) and Theorem 6.4, we may obtain the following

characterization of a Hermitian matrix, which may also be regarded as a matrix

version of Theorem 6.4.

Theorem 6.5 A matrix A C(n, n) is Hermitian if and only if there is a

unitary matrix P C(n, n) such that

A = P DP ,

(6.2.8)

where D R(n, n) is a real diagonal matrix whose diagonal entries are the

eigenvalues of A.

Proof If (6.2.8) holds, it is clear that A is Hermitian, A = A .

Conversely, assume A is Hermitian. Using B0 = {e1 , . . . , en } to denote the

standard basis of Cn equipped with the usual Hermitian positive definite scalar

product (x, y) = x y for x, y Cn , we see that B0 is an orthonormal basis of

Cn . With the mapping T defined by (6.2.7), we have (T (x), y) = (Ax) y =

x Ay = (x, T (y)) for any x, y Cn , and

T (ej ) =

n

aij ei ,

j = 1, . . . , n.

(6.2.9)

i=1

orthonormal basis, say B, consisting of eigenvalues of T , say 1 , . . . , n ,

which are all real. With respect to B, the matrix representation of T is diagonal, D = diag{1 , . . . , n }. Now since the basis transition matrix from B0 into

B is unitary, thus (6.2.8) must hold for some unitary matrix P as expected.

In other words, Theorem 6.5 states that a complex square matrix is diagonalizable through a unitary matrix into a real diagonal matrix if and only if it

is Hermitian.

Practically, the decomposition (6.2.8) for a Hermitian matrix A may also,

more preferably, be established as in the proof of Theorem 5.6 as follows.

187

scalar product, consisting of eigenvectors of A: Aui = i ui , i = 1, . . . , n. Let

Q C(n, n) be made of taking u1 , . . . , un as its respective column vectors.

Then Q is unitary and AQ = QD where D = diag{1 , . . . , n }. Thus P = Q

renders (6.2.8).

Exercises

6.2.1 Let A C(n, n) be Hermitian. Show that det(A) must be a real number.

6.2.2 (Extension of the previous exercise) Let U be a complex vector space

with a positive definite scalar product and T L(U ). If T is self-adjoint,

then the coefficients of the characteristic polynomial of T are all real.

6.2.3 Let U be a complex vector space with a positive definite scalar product

and S, T L(U ) self-adjoint and commutative, S T = T S.

(i) Prove the identity

(S iT )(u)2 = S(u)2 + T (u)2 ,

u U.

(6.2.10)

converse is not true.

(This is an extended version of Exercise 4.3.4.)

6.2.4 Let U be a complex vector space with a positive definite scalar product

and V a subspace of U . Show that the mapping P L(U ) that projects

U onto V along V is self-adjoint.

6.2.5 (A strengthened version of the previous exercise) If P L(U ) is idempotent, P 2 = P , show that P projects U onto R(P ) along R(P ) if and

only if P = P .

6.2.6 We rewrite any A C(n, n) into the form A = B + iC where

B, C R(n, n). Show that a necessary and sufficient condition for A

to be unitary is that B t C = C t B and B t B + C t C = In .

6.2.7 Let U = C[0, 1] be the vector space of all complex-valued continuous

functions in the variable t [0, 1] equipped with the positive definite

scalar product

1

u(t)v(t) dt, u, v U.

(6.2.11)

(u, v) =

0

(i) Show that T (u)(t) = tu(t) for t [0, 1] defines a self-adjoint mapping in over U .

(ii) Show that T does not have an eigenvalue whatsoever.

6.2.8 Let T L(U ) be self-adjoint where U is a finite-dimensional complex

vector space with a positive definite scalar product.

188

(i) Show that if T k = 0 for some integer k 1 then T = 0.

(ii) (A sharpened version of (i)) Given u U show that if T k (u) = 0

for some integer k 1 then T (u) = 0.

We now consider the notion of positive definiteness in the complex situation.

Assume that U is a finite-dimensional complex vector space with a positive

definite scalar product (, ).

We start with the definition of the positive definiteness of various subjects of

our interest in the complex situation.

Definition 6.6 The positive definiteness of a quadratic form, a self-adjoint or

Hermitian mapping, or a Hermitian matrix may be defined as follows.

(1) A real-valued quadratic form q over U (hence it is generated from a

sesquilinear Hermitian form) is positive definite if

q(u) > 0,

u U,

u

= 0.

(6.3.1)

(u, T (u)) > 0,

u U,

u

= 0.

(6.3.2)

x Ax > 0,

x Cn ,

x

= 0.

(6.3.3)

form q is obtained from f through (6.1.5). Then there is a unique self-adjoint

or Hermitian mapping T L(U ) such that

q(u) = f (u, u) = (u, T (u)),

u U.

(6.3.4)

Thus we see that the positive definiteness of q and that of T are equivalent.

n

xi ui

Besides, let {u1 , . . . , un } be any basis of U and write u U as u =

i=1

(6.1.1),

q(u) = x Ax,

A = (f (ui , uj )),

(6.3.5)

189

thus, the positive definiteness of A and that of q are equivalent. Therefore the

positive definiteness of a self-adjoint or Hermitian mapping is central, which

is our focus of this section.

Parallel to Theorem 5.8, we have the following.

Theorem 6.7 That a self-adjoint or Hermitian mapping T L(U ) is positive

definite is equivalent to any of the following statements.

(1) All the eigenvalues of T are positive.

(2) There is a positive constant, 0 > 0, such that

(u, T (u)) 0 u2 ,

u U.

(6.3.6)

such that T = S 2 .

(4) The Hermitian matrix A defined by

A = (aij ),

i, j = 1, . . . , n,

(6.3.7)

(5) The eigenvalues of the Hermitian matrix A defined in (6.3.7) with respect

to an arbitrary basis are all positive.

(6) The Hermitian matrix A defined in (6.3.7) with respect to an arbitrary

basis enjoys the factorization A = B 2 for some Hermitian positive definite

matrix B C(n, n).

The proof of Theorem 6.7 is similar to that of Theorem 5.8 and thus left as

an exercise. Here we check only that the matrix A defined in (6.3.7) is indeed

Hermitian. In fact, this may be seen from the self-adjointness of T through

a j i = (uj , T (ui )) = (T (uj ), ui ) = (ui , T (uj )) = aij ,

i, j = 1, . . . , n.

(6.3.8)

Note also that if {u1 , . . . , un } is an orthonormal basis of U , then the quantities aij (i, j = 1, . . . , n) in (6.3.7) simply give rise to the matrix representation

n

of T with respect to this basis. That is, T (uj ) =

aij ui (j = 1, . . . , n).

i=1

Theorem 6.8 That a Hermitian matrix A C(n, n) is positive definite is

equivalent to any of the following statements.

(1) All the eigenvalues of A are positive.

190

whose diagonal entries are all positive such that A = P DP .

(3) There is a positive definite Hermitian matrix B C(n, n) such that A=B 2 .

(4) A is Hermitian congruent to the identity matrix. That is, there is a

nonsingular matrix B C(n, n) such that A = B In B = B B.

The proof is similar to that of Theorem 5.9 and left as an exercise.

Positive semi-definiteness and negative definiteness can be defined and

investigated analogously as in the real situation in Section 5.3 and are

skipped here.

Hermitian matrices

If A C(n, n) is Hermitian, then det(A) R. Moreover, if A is positive

definite, Theorem 6.8 says there is a nonsingular matrix B C(n, n) such that

A = B B. Thus det(A) = det(B ) det(B) = | det(B)|2 > 0. Such a property

suggests that it may be possible to extend Theorem 5.10 to Hermitian matrices

as well, as we now do.

Theorem 6.9 Let A = (aij ) C(n, n) be Hermitian. The matrix A is positive

definite if and only if all its leading principal minors are positive, that is, if and

only if

a11 a1n

a

11 a12

a11 > 0,

> 0, . . . , > 0. (6.3.9)

a21 a22

a

ann

n1

Proof The necessity proof is similar to that in Theorem 5.1.11 and omitted.

The sufficiency proof is also similar but needs some adaptation to meet the

delicacy with handling complex numbers. To see how this is done, we may

assume that (6.3.9) holds and we show that A is positive definite by an inductive argument. Again, if n = 1, there is nothing to show, and assume that the

assertion is valid at n1 (that is, when A C(n1, n1)) (n 2). It remains

to prove that A is positive definite at n 2.

As before, we rewrite the Hermitian matrix A in the blocked form

An1

,

(6.3.10)

A=

ann

where = (a1n , . . . , an1,n )t Cn1 . By the inductive assumption at n 1,

we know that the Hermitian matrix An1 is positive definite. Hence, applying

191

An1 = B B. Thus

An1

In1

B 0

B 0

=

,

A=

0 1

ann

ann

0 1

(6.3.11)

where = (B )1 (b1 , . . . , bn1 )t Cn1 . Since det(A) > 0, we obtain

with some suitable row operations the result

In1

det(A)

= det

= ann |b1 |2 |bn1 |2 > 0,

det(An1 )

ann

(6.3.12)

with det(B B) = det(An1 ). Taking x Cn and setting

B 0

y=

x (y1 , . . . , yn1 , yn )t ,

0 1

we have

x Ax = y

In1

ann

(6.3.13)

= (y 1 + b1 y n , . . . , y n1 + bn1 y n ,

y1

.

b1 y 1 + + bn1 y n1 + ann y n )

..

yn

2

(6.3.14)

= 0 or x

= 0 because of the condition (6.3.12).

Thus the proof is complete.

In this subsection, we show that the Cholesky decomposition theorem is also

valid in the complex situation.

Theorem 6.10 Let A = (aij ) C(n, n) be Hermitian. The matrix A is positive definite if and only if there is a nonsingular lower triangular matrix

L C(n, n) such that A = LL . Moreover, when A is positive definite, there is

a unique such L with the property that all the diagonal entries of L are positive.

192

= 0

whenever x

= 0. Thus we have x Ax = x (LL )x = y y > 0, which proves

that A is positive definite.

Assume now A is positive definite. We show that there is a unique lower

triangular matrix L = (lij ) C(n, n) whose diagonal entries are all positive,

l11 , . . . , lnn > 0, so that A = LL . We again use induction.

since a11 > 0 in view of Theorem 6.9.

Assume the conclusion is established at n 1 (n 2).

We proceed to establish the conclusion at n (n 2).

Rewrite A in the blocked form (6.3.10). Applying Theorem 6.9, we know

that An1 C(n 1, n 1) is positive definite. So there is a unique lower

triangular matrix L1 C(n 1, n 1) with positive diagonal entries such that

An1 = L1 L1 .

Now consider L C(n, n) of the form

L2 0

L=

,

(6.3.15)

a

where L2 C(n 1, n 1) is a lower triangular matrix, Cn1 is a column

vector, and a R is a suitable number. Then, if we set A = LL , we obtain

An1

L2 0

L2

A=

=

a

ann

0 a

L 2 L2

L2

=

.

(6.3.16)

(L2 ) + a 2

Therefore we arrive at the relations

An1 = L2 L2 ,

= L2 ,

+ a 2 = ann .

(6.3.17)

Thus, if we require that all the diagonal entries of L2 are positive, then the

inductive assumption gives us L2 = L1 . So the vector is also uniquely

determined, = L1

1 . Thus it remains only to show that the number a may

be uniquely determined as well. To this end, we need to show in (6.3.17) that

ann > 0.

In fact, inserting L1 = B in (6.3.11), we have = . That is,

An1

In1

L1 0

L1

A=

=

0 1

ann

ann

0

(6.3.18)

0

1

.

(6.3.19)

193

Hence (6.3.18) follows as a consequence of det(A) > 0. Thus the third equation in (6.3.17) leads to the unique determination of the positive number a as

in the real situation:

a = ann > 0.

(6.3.20)

The inductive proof is now complete.

Exercises

6.3.1 Prove Theorem 6.7.

6.3.2 Prove Theorem 6.8.

6.3.3 Let A C(n, n) be nonsingular. Prove that there is a unique lower

triangular matrix L C(n, n) with positive diagonal entries so that

AA = LL .

6.3.4 Show that if A1 , . . . , Ak C(n, n) are unitary matrices, so is their

product A = A1 Ak .

6.3.5 Show that if A C(n, n) is Hermitian and all its eigenvalues are 1

then A is unitary.

6.3.6 Assume that u Cn is a nonzero column vector. Show that there is a

unitary matrix Q C(n, n) such that

Q (uu )Q = diag{u u, 0, . . . , 0}.

(6.3.21)

definite then aii > 0 for i = 1, . . . , n and

(6.3.22)

6.3.8 Is the Hermitian matrix

A = i

2+i

i

4

1+i

2i

1i

(6.3.23)

positive definite?

6.3.9 Let A = (aij ) C(n, n) be a positive semi-definite Hermitian matrix.

Show that aii 0 for i = 1, . . . , n and that

det(A) a11 ann .

(6.3.24)

scalar product, which are the n column vectors of a matrix A C(n, n).

Establish the inequality

| det(A)| u1 un .

(6.3.25)

194

6.3.11 Let A C(n, n) be nonsingular. Show that there exist a unitary matrix

P C(n, n) and a positive definite Hermitian matrix B C(n, n) such

that A = P B. Show also that, if A is real, then the afore-mentioned

matrices P and B may also be chosen to be real so that P is orthogonal

and B positive definite.

6.3.12 Let A C(n, n) be nonsingular. Since A A is positive definite, its

eigenvalues, say 1 , . . . , n , are all positive. Show that there exist unitary matrices P and Q in C(n, n) such that A = P DQ where

-(

( .

1 , . . . , n

(6.3.26)

D = diag

is a diagonal matrix. Show also that the same conclusion is true for

some orthogonal matrices P and Q in R(n, n) when A is real.

6.3.13 Let U be a finite-dimensional complex vector space with a positive

definite scalar product (, ) and T L(U ) a positive definite mapping.

(i) Establish the generalized Schwarz inequality

(

(

|(u, T (v))| (u, T (u)) (v, T (v)), u, v U.

(6.3.27)

(ii) Show that the equality in (6.3.27) occurs if and only if u, v are

linearly dependent.

In this section, we extend our investigation about commutativity of self-adjoint

mappings in the real situation to that in the complex situation. It is not hard to

see that the conclusion in this extended situation is the same as in the real

situation.

Theorem 6.11 Let U be a finite-dimensional complex vector space with a positive definite scalar product (, ) and S, T L(U ) be self-adjoint or Hermitian. Then there is an orthonormal basis of U consisting of eigenvectors of

both S, T if and only if S and T commute, or S T = T S.

The proof of the theorem is the same as that for the real situation and

omitted.

We now use Theorem 6.11 to study a slightly larger class of linear mappings

commonly referred to as normal mappings.

195

T T = T T.

(6.4.1)

T =

1

1

(T + T ) + (T T ) = P + Q,

2

2

(6.4.2)

1

1

(T + T ) L(U ) is self-adjoint or Hermitian and Q = (T

2

2

T ) L(U ) is anti-self-adjoint or anti-Hermitian since it satisfies Q = Q.

From

where P =

u, v U, (6.4.3)

and iQ are commutative. Applying Theorem 6.11, we conclude that U has an

orthonormal basis consisting of eigenvectors of P and iQ simultaneously, say

{u1 , . . . , un }, associated with the corresponding real eigenvalues, {1 , . . . , n }

and {1 , . . . , n }, respectively. Consequently, we have

T (ui ) = (P + Q)(ui ) = (P i[iQ])(ui ) = (i + ii )ui ,

i = 1, . . . , n,

(6.4.4)

theorem.

Theorem 6.13 Let T L(U ). Then T is normal if and only if there is an

orthonormal basis of U consisting of eigenvectors of T .

Proof Suppose that U has an orthonormal basis consisting of eigenvectors

of T , say B = {u1 , . . . , un }, with the corresponding eigenvalues {1 , . . . , n }.

Let T L(U ) be represented by the matrix B = (bij ) C(n, n) with respect

to B. Then

n

bij ui , j = 1, . . . , n.

(6.4.5)

T (uj ) =

i=1

Therefore, we have

(T (uj ), ui ) =

n

bkj uk , ui

= bij ,

i, j = 1, . . . , n,

(6.4.6)

k=1

and

(T (uj ), ui ) = (uj , T (ui )) = (uj , i ui ) = i ij ,

i, j = 1, . . . , n. (6.4.7)

196

In other words, B is diagonal, B = diag{1 , . . . , n }, such that 1 , . . . , n

are the eigenvalues of T with the corresponding eigenvectors u1 , . . . , un .

In particular,

(T T )(ui ) = i i ui = i i ui = (T T )(ui ),

i = 1, . . . , n,

(6.4.8)

Conversely, if T is normal, the existence of an orthonormal basis of U consisting of eigenvectors of T is already shown.

The proof is complete.

To end this section, we consider commutative normal mappings.

Theorem 6.14 Let S, T L(U ) be normal. Then there is an orthonormal

basis of U consisting of eigenvectors of S and T simultaneously if and only if

S and T are commutative, S T = T S.

Proof If {u1 , . . . , un } is an orthonormal basis of U so that S(ui ) = i ui

and T (ui ) = i ui , i , i C, i = 1, . . . , n, then (S T )(ui ) = i i ui =

(T S)(ui ), i = 1, . . . , n, which establishes the commutativity of S and T :

S T = T S.

Conversely, assume S and T are commutative normal mappings. Let

1 , . . . , k be all the distinct eigenvalues of T and E1 , . . . , Ek the associated

eigenspaces that may readily be checked to be mutually perpendicular (left

as an exercise in this section). Fix i = 1, . . . , k. For any u Ei , we have

T (S(u)) = S(T (u)) = S(i u) = i S(u). Thus S(u) Ei . That is, Ei

is invariant under S. Since S is normal, Ei has an orthonormal basis, say

{ui,1 , . . . , ui,mi }, consisting of the eigenvectors of S. Therefore, the set of

vectors

{u1,1 , . . . , u1,m1 , . . . , uk,1 , . . . , uk,mk }

(6.4.9)

and T .

An obvious but pretty general example of a normal mapping that is not necessarily self-adjoint or Hermitian is a unitary mapping T L(U ) since it

satisfies T T = T T = I . Consequently, for a unitary mapping T L(U ),

there is an orthonormal basis of U consisting of eigenvectors of T . Furthermore, since T is isometric, it is clear that all eigenvalues of T are of absolute

value 1, as already observed in Section 4.3.

197

The matrix versions of Definition 6.12 and Theorems 6.13 and 6.14 may be

stated as follows.

Definition 6.15 A matrix A C(n, n) is said to be normal if it satisfies the

property AA = A A.

Theorem 6.16 Normal matrices have the following characteristic properties.

(1) A matrix A C(n, n) is diagonalizable through a unitary matrix, that is,

there is a unitary matrix P C(n, n) and a diagonal matrix D C(n, n)

such that A = P DP , if and only if A is normal.

(2) Two normal matrices A, B C(n, n) are simultaneously diagonalizable

through a unitary matrix, that is, there is a unitary matrix P C(n, n)

and two diagonal matrices D1 , D2 C(n, n) such that A = P D1 P ,

B = P D2 P , if and only if A, B are commutative: AB = BA.

To prove the theorem, we may simply follow the standard way to associate

a matrix A C(n, n) with the mapping it generates over Cn through x # Ax

for x Cn as before and apply Theorems 6.13 and 6.14.

Moreover, if A C(n, n) is unitary, AA = A A = In , then there is a

unitary matrix P C(n, n) and a diagonal matrix D = diag{1 , . . . , n } with

|i | = 1, i = 1, . . . , n, such that A = P DP .

Exercises

6.4.1 Let U be a finite-dimensional complex vector space with a positive

definite scalar product (, ) and T L(U ). Show that T is normal if

and only if T satisfies the identity

T (u) = T (u),

u U.

(6.4.10)

to show directly that if T is normal and is an eigenvalue of T with an

associated eigenvector u U then and u are a pair of eigenvalue and

eigenvector of T .

6.4.3 Let U be a finite-dimensional complex vector space with a positive

definite scalar product (, ) and T L(U ) satisfy the property that, if

C is an eigenvalue of T and u U an associated eigenvector, then

is an eigenvalue and u an associated eigenvector of T . Show that if

, C are two different eigenvalues of T and u, v are the associated

eigenvectors, respectively, then (u, v) = 0.

198

6.4.4 Assume that A C(n, n) is triangular and normal. Show that A must

be diagonal.

6.4.5 Let U be a finite-dimensional complex vector space with a positive

definite scalar product and T L(U ) be normal.

(i) Show that T is self-adjoint if and only if all the eigenvalues of T

are real.

(ii) Show that T is anti-self-adjoint if and only if all the eigenvalues of

T are imaginary.

(iii) Show that T is unitary if and only if all the eigenvalues of T are of

absolute value 1.

6.4.6 Let T L(U ) be a normal mapping where U is a finite-dimensional

complex vector space with a positive definite scalar product.

(i) Show that if T k = 0 for some integer k 1 then T = 0.

(ii) (A sharpened version of (i)) Given u U show that if T k (u) = 0

for some integer k 1 then T (u) = 0.

(This is an extended version of Exercise 6.2.8.)

6.4.7 If R, S L(U ) are Hermitian and commutative, show that T = R iS

is normal.

6.4.8 Consider the matrix

A=

(6.4.11)

(ii) Find an orthonormal basis of C2 , with the standard Hermitian

scalar product, consisting of the eigenvectors of A.

6.4.9 Let U be a finite-dimensional complex vector space with a positive

definite scalar product and T L(U ) be normal. Show that for any

integer k 1 there is a normal element S L(U ) such that T = S k .

Moreover, if T is unitary, then there is a unitary element S L(U )

such that T = S k .

6.4.10 Recall that any c C may be rewritten as c = aei , where a, R

and a 0, known as a polar decomposition of c. Let U be a finitedimensional complex vector space with a positive definite scalar product and T L(U ). Show that T enjoys a similar polar decomposition

property such that there are a positive semi-definite element R and a

unitary element S, both in L(U ), satisfying T = R S = S R, if and

only if T is normal.

199

6.4.11 Let U be a complex vector space with a positive definite scalar product

(, ). A mapping T L(U ) is called hyponormal if it satisfies

(u, (T T T T )(u)) 0,

u U.

(6.4.12)

(i) Show that T is normal if any only if T and T are both hyponormal.

(ii) Show that T being hyponormal is equivalent to

T (u) T (u),

u U.

(6.4.13)

(iv) Show that if C is an eigenvalue and u an associated eigenvector of a hyponormal mapping T L(U ) then and u are an

eigenvalue and an associated eigenvector of T .

6.4.12 Let U be an n-dimensional (n 2) complex vector space with a positive definite scalar product and T L(U ). Show that T is normal if

and only if there is a complex-coefficient polynomial p(t) of degree at

most n 1 such that T = p(T ).

6.4.13 Let U be a finite-dimensional complex vector space with a positive

definite scalar product and T L(U ). Show that T is normal if and

only if T and T have the same invariant subspaces of U .

As in the real situation, we show that self-adjoint or Hermitian mappings may

be used to study mappings between two complex vector spaces.

As in Section 5.6, let U and V denote two complex vector spaces of finite

dimensions, with positive definite Hermitian scalar products (, )U and (, )V ,

respectively. Given T L(U, V ) and v V , it is clear that

f (u) = (v, T (u))V ,

u U,

(6.5.1)

defines an element f in U which depends on v V . So there is a unique element in U depending on v, say T (v), such that f (u) = (T (v), u)U . That is,

(v, T (u))V = (T (v), u)U ,

u U, v V .

(6.5.2)

It may easily be verified that T is linear. Thus T L(V , U ). As in the

real situation, we can consider the composed mappings T T L(U ) and

T T L(V ), which are both self-adjoint or Hermitian.

200

(u, (T T )(u))U = (T (u), T (u))V 0,

u U,

v V,

(6.5.3)

Use U and V to denote the norms induced from (, )U and (, )V ,

respectively. Then

T = sup{T (u)V | uU = 1, u U }.

(6.5.4)

On the other hand, using the fact that T T L(U ) is self-adjoint and positive semi-definite, we know that there is an orthonormal basis {u1 , . . . , un } of

U consisting of eigenvectors of T T , with the corresponding non-negative

n

eigenvalues, 1 , . . . , n , respectively. Hence, for any u =

ai ui U with

u2U

n

i=1

|ai | = 1, we have

2

i=1

n

i |ai |2 0 ,

i=1

(6.5.5)

where

0 = max {i } 0,

1in

which shows T

Then (6.5.4) gives us

(6.5.6)

(6.5.7)

(6.5.8)

Therefore the norm of a linear mapping T L(U, V ) may be obtained

by computing the largest eigenvalue of the induced self-adjoint or Hermitian

mapping T T L(U ).

In particular, if T L(U ) is already self-adjoint, then, since T is simply

the radical root of the largest eigenvalue of T 2 , we arrive at

T = max {|i |},

1in

(6.5.9)

201

Thus, a combination of (6.5.8) and (6.5.9) leads to the formula

T 2 = T T ,

T L(U, V ).

(6.5.10)

quantity T may also be evaluated accordingly by

T = sup{|(u, T (u))| | u U, u = 1},

(6.5.11)

We can similarly show how to extend (6.5.11) to evaluate the norm of an

arbitrary linear mapping between U and V .

Theorem 6.17 Let U, V be finite-dimensional complex vector spaces with

positive definite Hermitian scalar products (, )U , (, )V , respectively. For

T L(U, V ), we have

T = sup{|(v, T (u))V | | u U, v V , uU = 1, vV = 1}.

(6.5.12)

As in the real situation, we may use Theorem 6.17 to establish the fact that

the norms of a linear mapping and its dual, over two complex vector spaces

with positive definite Hermitian scalar products, share the same value.

Theorem 6.18 Let U, V be finite-dimensional complex vector spaces with

positive definite Hermitian scalar products (, )U , (, )V , respectively. For

T L(U, V ) and its dual T L(V , U ), we have T = T . Thus

T T = T T and the largest eigenvalues of the positive semi-definite

self-adjoint or Hermitian mappings T T L(U ) and T T L(V ) are the

same.

Proof The fact that T = T may be deduced from applying (6.5.12) to

T and the relation (v, T (u))V = (T (v), u)U (u U ,v V ). The conclusion

T T = T T follows from (6.5.10) and that the largest eigenvalues of

the positive semi-definite self-adjoint or Hermitian mappings T T and T T

are the same is a consequence of the eigenvalue characterization of the norm

of a self-adjoint mapping stated in (6.5.9) and T T = T T .

However, as in the real situation, Theorem 6.18 is natural and hardly surprising in view of Theorem 5.16.

We continue to study a mapping T L(U, V ), where U and V are two complex vector spaces with positive definite Hermitian product (, )U and (, )V

202

and of dimensions n and m, respectively. Let 1 , . . . , n be all the eigenvalues of the positive semi-definite mapping T T L(U ), among which

1 , . . . , k are positive, say. Use {u1 , . . . , uk , . . . , un } to denote an orthonormal basis of U consisting of eigenvectors of T T associated with the eigenvalues 1 , . . . , k , . . . , n . Then we have

(T (ui ), T (uj ))V = (ui , (T T )(uj ))U

= j (ui , uj )U = j ij ,

i, j = 1, . . . , n.

(6.5.13)

This simple expression indicates that T (ui ) = 0 for i > k (if any) and that

{T (u1 ), . . . , T (uk )} forms an orthogonal basis of R(T ). In particular, k =

r(T ).

Now set

1

T (ui ), i = 1, . . . , k.

(6.5.14)

vi =

T (ui )V

Then {v1 , . . . , vk } is an orthonormal basis for R(T ). Taking i = j = 1, . . . , k

(6.5.14), we arrive at

the singular values of T and the expression (6.5.15) the singular value decomposition for T . This result may conveniently be summarized as a theorem.

Theorem 6.19 Let U, V be finite-dimensional complex vector spaces with

positive definite Hermitian scalar products and T L(U, V ) is of

rank k 1. Then there are orthonormal bases {u1 , . . . , uk , . . . , un }

and {v1 , . . . , vk , . . . , vm } of U and V , respectively, and positive numbers

1 , . . . , k , referred to as the singular values of T , such that

T (ui ) = i vi ,

i = 1, . . . , k,

T (uj ) = 0,

j >k

(if any).

(6.5.16)

the associated eigenvectors of the self-adjoint mapping T T L(U ).

Let A C(m, n). Theorem 6.19 implies that there are unitary matrices P

C(n, n) and Q C(m, m) such that

AP =Q or A = QP ,

(6.5.17)

D 0

=

R(m, n), D = diag{1 , . . . , k },

0 0

203

21 , . . . , 2k are all the positive eigenvalues of the Hermitian matrix A A. The

numbers 1 , . . . , k are called the singular values of the matrix A and the

expression (6.5.17) the singular value decomposition for the matrix A.

Note that Exercise 6.3.12 may be regarded as an early and special version of

the general singular value decomposition procedure here.

Of course the results above may also be established similarly for mappings

between real vector spaces and for real matrices.

Exercises

6.5.1 Verify that for T L(U, V ) the mapping T given in (6.5.2) is a welldefined element in L(V , U ) although the scalar products (, )U and

(, )V of U and V are both sesquilinear.

6.5.2 Consider the matrix

2 1i

3

A=

.

(6.5.18)

3i 1 3 + 2i

(i) Find the eigenvalues of AA and A A and compare.

(ii) Find A.

6.5.3 Apply (6.5.8) and use the fact that the nonzero eigenvalues of T T

and T T are the same to prove directly that T = T as stated in

Theorem 6.18.

6.5.4 If U is a finite-dimensional vector space with a positive definite scalar

product and T L(U ) is normal, show that T 2 = T 2 . Can this

result be extended to T m = T m for any positive integer m?

6.5.5 Consider the matrix

1+i

2

1

A=

.

(6.5.19)

2

1i 1

(i) Find the singular values of A.

(ii) Find a singular value decomposition of A.

6.5.6 Let A C(m, n). Show that A and A have the same singular values.

6.5.7 Let A C(n, n) be invertible. Investigate the relationship between the

singular values of A and those of A1 .

6.5.8 Let A C(n, n). Use the singular value decomposition for A to show

that A may be rewritten as A = P B = CQ, where P , Q are some unitary matrices and B, C some positive semi-definite Hermitian matrices.

204

definite Hermitian scalar product and T L(U ) positive semi-definite.

Show that the singular values of T are simply the positive eigenvalues

of T .

6.5.10 Show that a square complex matrix is unitary if and only if its singular

values are all 1.

Note that most of the exercises in Chapter 5 may be restated in the context

of the complex situation of this chapter and are omitted.

7

Jordan decomposition

which allows us to reduce a linear mapping over C into a canonical form in

terms of its eigenspectrum. As a preparation we first recall some facts regarding factorization of polynomials. Then we show how to reduce a linear mapping over a set of its invariant subspaces determined by a prime factorization of

the characteristic polynomial of the mapping. Next we reduce a linear mapping

over its generalized eigenspaces. Finally we prove the Jordan decomposition

theorem by understanding how a mapping behaves itself over each of its generalized eigenspaces.

Let P be the vector space of all polynomials with coefficients in a given field

F and in the variable t. Various technical computations and concepts involving

elements in P may be simplified considerably with the notion ideal as we

now describe.

Definition 7.1 A non-empty subset I P is called an ideal of P if it satisfies

the following two conditions.

(1) f + g I for any f, g I.

(2) f g I for any f P and g I.

Since F may naturally be viewed as a subset of P, we see that af I for

any a F and f I. Hence an ideal is also a subspace.

Let g1 , . . . , gk P. Construct the subset of P given by

{f1 g1 + + fk gk | f1 , . . . , fk P}.

205

(7.1.1)

206

Jordan decomposition

may say that this deal is generated from g1 , . . . , gk and use the notation

I(g1 , . . . , gk ) to denote it.

There are two trivial ideals: I = {0} and I = P and it is obvious that

{0} = I(0) and P = I(1). That is, both {0} and P are generated from some

single elements in P. The following theorem establishes that any ideal in P

may be generated from a single element in P.

Theorem 7.2 Any ideal in P is singly generated. More precisely, if I

= {0}

is an ideal of P, then there is an element g P such that I = I(g). Moreover, if there is another h P such that I = I(h), then g and h are of the

same degree. Besides, if the coefficients of the highest-degree terms of g and h

coincide, then g = h.

Proof If I = I(g) for some g P, it is clear that g will have the lowest

degree among all elements in I in view of the definition of I(g). Such an

observation indicates what to look for in our proof. Indeed, since I

= {0}, we

may choose g to be an element in I \ {0} that is of the lowest degree. Then it

is clear that I(g) I. For any h I \ {0}, we claim g|h (i.e. g divides h).

Otherwise, we may rewrite h as

h(t) = q(t)g(t) + r(t)

(7.1.2)

we have qg I. Thus r = h qg I, which contradicts the definition of g.

Consequently, g|h. So h I(g) as expected, which proves I I(g).

If there is another h P such that I = I(h), of course g and h must have

the same degree since h|g and g|h. If the coefficients of the highest-degree

terms of g and h coincide, then g = h, otherwise g h I \ {0} would be of

a lower degree, which contradicts the choice of g given earlier.

The theorem is proved.

Let g1 , . . . , gk P \ {0}. Choose g P such that I(g) = I(g1 , . . . , gk ).

Then there are elements f1 , . . . , fk P such that

g = f1 g1 + + fk gk ,

(7.1.3)

words, if h|g1 , . . . , h|gk , then h|g. On the other hand, the definition of

I(g1 , . . . , gk ) already gives us g|g1 , . . . , g|gk . So g itself is a common

divisor of g1 , . . . , gk . In view of Theorem 7.2, we see that the coefficient of the

highest-degree term of g determines g completely. Thus, we may fix g by taking the coefficient of the highest-degree term of g to be 1. Such a polynomial

207

as g = gcd(g1 , . . . , gk ). Therefore, we see that the notion of an ideal and its

generator provides an effective tool for the study of greatest common divisors.

Given g1 , . . . , gk P \ {0}, if there does not exist a common divisor of a

nontrivial degree ( 1) for g1 , . . . , gk , then we say that g1 , . . . , gk are relatively prime or co-prime. Thus g1 , . . . , gk are relatively prime if and only if

gcd(g1 , . . . , gk ) = 1. In this situation, there are elements f1 , . . . , fk P such

that the identity

f1 (t)g1 (t) + + fk (t)gk (t) = 1

(7.1.4)

is valid for arbitrary t. This fact will be our starting point in the subsequent

development.

A polynomial p P of degree at least 1 is called a prime polynomial or an

irreducible polynomial if p cannot be factored into the product of two polynomials in P of degrees at least 1. Two polynomials are said to be equivalent if

one is a scalar multiple of the other.

Exercises

7.1.1 For f, g P show that f |g if and only if I(f ) I(g).

7.1.2 Consider I P over a field F given by

I = {f P | f (a1 ) = = f (ak ) = 0},

(7.1.5)

(i) Show that I is an ideal.

(ii) Find a concrete element g P such that I = I(g).

(iii) To what extent, is the element g obtained in (ii) unique?

7.1.3 Show that if f P and (t 1)|f (t n ), where n 1 is an integer, then

(t n 1)|f (t n ).

7.1.4 Let f, g P, g

= 0, and n 1 be an integer. Prove that f |g if and only

if f n |g n .

7.1.5 Let f, g P be nonzero polynomials and n 1 an integer. Show that

(gcd(f, g))n = gcd(f n , g n ).

(7.1.6)

7.1.6 Let U be a finite-dimensional vector space over a field F and P the vector space of all polynomials over F. Show that I = {p P | p(T ) = 0}

is an ideal of P. Let g P be such that I = I(g) and the coefficient of

the highest-degree term of g is 1. Is g the minimal polynomial of T that

has the minimum degree among all the polynomials that annihilate T ?

208

Jordan decomposition

In this section, we show how to use characteristic polynomials and their prime

factorization to resolve or reduce linear mappings into mappings over invariant subspaces. As a preparation, we first establish a factorization theorem for

characteristic polynomials.

Theorem 7.3 Let U be a finite-dimensional vector space over a field F and

T L(U ). Assume that V , W are nontrivial subspaces of U such that

U = V W and V , W are invariant under T . Use R, S to denote T

restricted to V , W , and pR (), pS (), pT () the characteristic polynomials of

R L(V ), S L(W ), T L(U ), respectively. Then pT () = pR ()pS ().

Proof Let {v1 , . . . , vk } and {w1 , . . . , wl } be bases of V , W , respectively.

Then {v1 , . . . , vk , w1 , . . . , wl } is a basis of U . Assume that B F(k, k) and

C F(l, l) are the matrix representations of R and S, with respect to the bases

{v1 , . . . , vk } and {w1 , . . . , wl }, respectively. Then

B 0

A=

(7.2.1)

0 C

is the matrix representation of T with respect to

{v1 , . . . , vk , w1 , . . . , wl }. Consequently, we have

Ik B

0

pT () = det(I A) = det

0

Il C

the

basis

(7.2.2)

We can now demonstrate how to use the factorization of the characteristic

polynomial of a linear mapping to naturally resolve it over invariant subspaces.

Theorem 7.4 Let U be a finite-dimensional vector space over a field F and

T L(U ) and use pT () to denote the characteristic polynomial of T . Factor

pT () into the form

n

pT = p1n1 pk k ,

(7.2.3)

pT

ni

nk

gi = ni = p1n1 p/

i = 1, . . . , k,

(7.2.4)

i pk ,

pi

where !denotes the factor that is missing. Then we have the following.

209

U = V1 Vk ,

Vi = N (pini (T )),

i = 1, . . . , k.

(7.2.5)

Proof Since pT () is the characteristic polynomial, the CayleyHamilton

theorem gives us pT (T ) = 0. Since g1 , . . . , gk are relatively prime, there are

polynomials f1 , . . . , fk P such that

f1 ()g1 () + + fk ()gk () = 1.

(7.2.6)

f1 (T )g1 (T ) + + fk (T )gk (T ) = I.

(7.2.7)

Therefore, we have

u = u1 + + uk ,

ui = fi (T )gi (T )u,

i = 1, . . . , k.

(7.2.8)

ui N(pini (T )) (i = 1, . . . , k), which proves U = V1 + + Vk .

For any i = 1, . . . , k, we need to show

(7.2.9)

Vj = {0}.

Wi Vi

1j k,j

=i

In fact, since pini and gi are relatively prime, there are polynomials qi and ri

in P such that

qi pini + ri gi = 1.

(7.2.10)

qi (T )pini (T ) + ri (T )gi (T ) = I.

(7.2.11)

Vj N (gi (T )).

(7.2.12)

1j k,j

=i

u = qi (T )pini (T )u + ri (T )gi (T )u = 0.

(7.2.13)

Let u Vi . Then pini (T )(T (ui )) = T pini (T )(u) = 0. Thus T (u) Vi and

the invariance of Vi under T is proved, which establishes (2).

210

Jordan decomposition

Exercises

7.2.1 Let S, T L(U ), where U is a finite-dimensional vector space over a

field F. Show that if the characteristic polynomials pS () and pT () of

S and T are relatively prime then pS (T ) and pT (S) are both invertible.

7.2.2 Let S, T L(U ) where U is a finite-dimensional vector space over

a field F. Use the previous exercise to show that if the characteristic

polynomials of S and T are relatively prime and R L(U ) satisfies

R S = T R then R = 0.

7.2.3 Let U be a finite-dimensional vector space over a field F and T L(U ).

Show that T is idempotent, T 2 = T , if and only if

r(T ) + r(I T ) = dim(U ).

(7.2.14)

Prove the following slightly extended version of Theorem 7.4.

Suppose that the characteristic polynomial of T , say pT (), has the

factorization pT () = g1 ()g2 () over F where g1 and g2 are relatively prime polynomials. Then U = N (g1 (T )) N (g2 (T )) and both

N(g1 (T )) and N (g2 (T )) are invariant under T .

7.2.5 Let u Cn be a nonzero column vector and set A = uu C(n, n).

(i) Show that A2 = aA, where a = u u.

(ii) Find a nonsingular matrix B C(n, n) so that A = B 1 DB where

D C(n, n) is diagonal and determine D.

(iii) Describe

N (A) = {x Cn | Ax = 0},

N (A aIn ) = {x Cn | (A aIn )x = 0},

(7.2.15)

x # Ax (x Cn ) over which T reduces.

(iv) Determine r(T ).

7.2.6 Let U be a finite-dimensional vector space and T1 , . . . , Tk L(U ) satisfy Ti2 = Ti (i = 1, . . . , k) and Ti Tj = 0 (i, j = 1, . . . , k, i

= j , if

any). Show that there holds the space decomposition

U = R(T1 ) R(Tk ) V , V = ki=1 N (Ti ),

which reduces T1 , . . . , Tk simultaneously.

(7.2.16)

211

In the first part of this section, we will carry out a study of nilpotent mappings,

which will be crucial for the understanding of the structure of a general linear

mapping in terms of its eigenvalues, to be seen in the second part of the section.

Let U be a finite-dimensional vector space over a field F and T L(U ) a

nilpotent mapping. Recall that T is said to be of degree m 1 if m is the

smallest integer such that T m = 0. When m = 1, then T = 0 and the situation

is trivial. In the nontrivial case, m 2, we know that if u U is of period

m (that is, T m (u) = 0 but T m1 (u)

= 0), then u, T (u), . . . , T m1 (u) are

linearly independent vectors in U . Thus, in any nontrivial situation, m satisfies

2 m dim(U ).

For a nontrivial nilpotent mapping, we have the following key results.

Theorem 7.5 Let T L(U ) be a nilpotent mapping of degree m 2 where

U is finite-dimensional. Then the following are valid.

(1) There are k vectors u1 , . . . , uk and k integers m1 2, . . . , mk 2 such

that u1 , . . . , uk are of periods m1 , . . . , mk , respectively, such that U has

a basis of the form

u01 , . . . , u0k0 , u1 , T (u1 ), . . . , T m1 1 (u1 ), . . . , uk , T (uk ), . . . , T mk 1 (uk ),

(7.3.1)

where u1 , . . . , uk0 , if any, are some vectors taken from N (T ). Thus, setting

U0 = Span{u01 , . . . , u0k0 },

(7.3.2)

Ui = Span{ui , T (ui ), . . . , T mi 1 (ui )}, i = 1, . . . , k,

we have the decomposition

U = U0 U1 Uk ,

(7.3.3)

(2) The degree m of T is given by

m = max{mi | i = 1, . . . , k}.

(7.3.4)

k0 + k = n(T ).

(7.3.5)

212

Jordan decomposition

(i) m = 2.

Let {v1 , . . . , vk } be a basis of R(T ). Choose u1 , . . . , uk U such that

T (u1 ) = v1 , . . . , T (uk ) = vk . We assert that

(7.3.6)

u1 , T (u1 ), . . . , uk , T (uk )

are linearly independent vectors in U . To see this, consider

a1 u1 + b1 T (u1 ) + + ak uk + bk T (uk ) = 0,

ai , bi F,

i = 1, . . . , k.

(7.3.7)

ak = 0. Inserting this result into (7.3.7), we obtain b1 v1 + + bk vk = 0

which leads to b1 = = bk = 0.

It is clear that v1 , . . . , vk N (T ) since T 2 = 0. Choose u01 , . . . , u0k0

N(T ), if any, so that {u01 , . . . , u0k0 , v1 , . . . , vk } becomes a basis of N (T ).

For any u U , we rewrite T (u) as

T (u) = a1 v1 + + ak vk = T (a1 u1 + + ak uk ),

(7.3.8)

that u (a1 u1 + + ak u) N (T ) and that there are unique scalars

c1 , . . . , ck0 , b1 , . . . , bk such that

u (a1 u1 + + ak u) = c1 u1 + + ck0 uk0 + b1 v1 + + bk vk .

(7.3.9)

Consequently we see that

u01 , . . . , u0k0 , u1 , T (u1 ), . . . , uk , T (uk )

(7.3.10)

(ii) m 3.

We assume the statement in (1) holds when the degree of a nilpotent mapping is up to m 1 2.

Let T L(U ) be of degree m 3 and set V = R(T ). Since V is invariant

under T , we may regard T as an element in L(V ). To avoid confusion, we use

TV to denote the restriction of T over V .

It is clear that the degree of TV is m 1 2. Applying the inductive

assumption to TV , we see that there are vectors v1 , . . . , vl V with respective

periods m1 2, . . . , ml 2 and vectors v10 , . . . , vl00 N (TV ), if any, such

that

m1 1

form a basis of V .

ml 1

(v1 ), . . . , vl , TV (vl ), . . . , TV

(vl )

(7.3.11)

213

T (w1 ) = v10 , . . . , T (wl0 ) = vl00 , T (u1 ) = v1 , . . . , T (ul ) = vl .

(7.3.12)

(7.3.13)

are linearly independent. In fact, consider the relation

mi

l0

l

(ai wi + bi vi0 ) +

bij T j (ui ) = 0,

(7.3.14)

i=1 j =0

i=1

where ai s, bi s, and bij s are scalars. Applying T to (7.3.14) and using (7.3.12)

we arrive at

l0

ai vi0 +

l m

i 1

bij TV (vi ) = 0,

(7.3.15)

i=1 j =0

i=1

ai = 0,

i = 1, . . . , l0 ;

bij = 0,

j = 0, 1, . . . , mi 1,

i = 1, . . . , l.

(7.3.16)

(7.3.17)

m1 1

ml 1

(v1 ) + + blml TV

(vl ) = 0. (7.3.18)

b1 = = bl0 = b1m1 = = blml = 0

(7.3.19)

as well, which establishes that the vectors given in (7.3.13) are linearly independent.

It is clear that

(7.3.20)

form a basis of N (T ).

(7.3.21)

214

Jordan decomposition

T (u) =

l0

bi vi0

bij TV (vi )

i=1 j =0

i=1

l0

l m

i 1

bi T (wi ) +

mi

l

bij T j (ui )

(7.3.22)

i=1 j =1

i=1

for some unique bi s and bij s in F, which gives rise to the result

u

l0

bi wi

i=1

l m

i 1

bij T j (ui ) N (T ).

(7.3.23)

i=1 j =0

u

l0

i=1

bi wi

l m

i 1

bij T j (ui ) =

i=1 j =0

k0

i=1

ai u0i +

l0

ci vi0 +

i=1

l

di T mi (ui )

i=1

(7.3.24)

for some unique ai s, ci s, and di s in F. Thus we conclude that the vectors

u01 , . . . , u0k0 , w1 , T (w1 ), . . . , wl0 , T (wl0 ),

ui , T (ui ), . . . , T mi (ui ), i = 1, . . . , l,

(7.3.25)

period 2, and u1 , . . . , ul of periods m1 + 1, . . . , ml + 1, respectively.

Thus (1) is proved.

Statement (2) is obvious.

In (7.3.25), we have seen that k = l0 + l. Hence, from (7.3.21), we have

n(T ) = k0 + k as anticipated, which proves (3).

We are now prepared to consider general linear mappings over complex vector spaces.

We now assume that the field we work on is C. In this situation, any prime

polynomial must be of degree 1 which allows us to make the statements in

Theorem 7.4 concrete and explicit.

215

For the characteristic polynomial pT () of T , let 1 , . . . , k be the distinct

roots of pT () so that

pT () = ( 1 )n1 ( k )nk ,

n1 , . . . , nk N.

(7.3.26)

(1) The vector space U has the decomposition

U = V1 Vk ,

Vi = N ((T i I )ni ),

i = 1, . . . , k. (7.3.27)

1, . . . , k.

(3) The characteristic polynomial of T restricted to Vi is simply

pVi () = ( i )ni ,

(7.3.28)

Proof In view of Theorem 7.4, it remains only to establish (3).

From Theorem 7.2, we have the factorization

pT () = pV1 () pVk (),

(7.3.29)

1, . . . , k).

For fixed i, we consider T over Vi . Since T i I is nilpotent on Vi , we can

find vectors u01 , . . . , u0m0 N (T i I ), if any, and cyclic vectors u1 , . . . , ul

of respective periods m1 2, . . . , ml 2, if any, so that

u01 , . . . , u0m0 ,

...............

(7.3.30)

di dim(Vi ) = m0 + m1 + + ml .

(7.3.31)

216

Jordan decomposition

boxed diagonal form

0 0

0 S1

0 ...

0

0

0

,

0

Sl

(7.3.32)

where, in the diagonal of the above matrix, 0 is the zero matrix of size m0 m0

and S1 , . . . , Sl are shift matrices of sizes m1 m1 , . . . , ml ml , respectively.

Therefore the matrix of T = (T i I ) + i I with respect to the same basis is

simply

0 0 0

0 S1 0

Ai =

(7.3.33)

+ i Idi .

0 ... 0

0

0 Sl

Consequently, it follows immediately that the characteristic polynomial of T

restricted to Vi may be computed by

pVi () = det(Idi Ai ) = ( i )di .

(7.3.34)

pT () = ( 1 )d1 ( k )dk .

(7.3.35)

asserted.

Note that the factorization expressed in (7.3.26) indicates that each eigenvalue, i , repeats itself ni times as a root in the characteristic polynomial of

T . For this reason, the integer ni is called the algebraic multiplicity of the

eigenvalue i of T .

Given T L(U ), let i be an eigenvalue of T . For any integer m 1, the

nonzero vectors in N ((T i I )m ) are called the generalized eigenvectors and

N((T i I )m ) the generalized eigenspace associated to the eigenvalue i . If

m = 1, generalized eigenvectors and eigenspace are simply eigenvectors and

eigenspace, respectively, associated to the eigenvalue i of T , and

n(T i I ) = dim(N (T i I ))

(7.3.36)

217

i )ni ), we have

n(T i I ) dim(N (T i )ni ) = ni .

(7.3.37)

In other words, the geometric multiplicity is less than or equal to the algebraic

multiplicity, of any eigenvalue, of a linear mapping.

Exercises

7.3.1 Let U be a finite-dimensional vector space over a field F and T L(U ).

Assume that 0 F is an eigenvalue of T and consider the eigenspace

E0 = N(T 0 I ) associated with 0 . Let {u1 , . . . , uk } be a basis of E0

and extend it to obtain a basis of U , say B = {u1 , . . . , uk , v1 , . . . , vl }.

Show that, using the matrix representation of T with respect to the

basis B, the characteristic polynomial of T may be shown to take the

form

pT () = ( 0 )k q(),

(7.3.38)

where q() is a polynomial of degree l with coefficients in F. In particular, use (7.3.38) to infer again, without relying on Theorem 7.6, that the

geometric multiplicity does not exceed the algebraic multiplicity, of the

eigenvalue 0 .

7.3.2 Let U be a finite-dimensional vector space over a field F and T L(U ).

We say that U is a cyclic vector space with respect to T if there is a

vector u U such that the vectors

u, T (u), . . . , T n1 (u),

(7.3.39)

(i) Find the matrix representation of T with respect to the basis

{T n1 (u), . . . , T (u), u}

(7.3.40)

with specifying

T (T n1 (u)) = an1 T n1 (u) + + a1 T (u) + a0 u,

a0 , a1 , . . . , an1 F.

(7.3.41)

an1 = 0.

(ii) Find the characteristic polynomial and the minimal polynomial of

T in terms of a0 , a1 , . . . , an1 .

218

Jordan decomposition

T L(U ). Assume that T has a cyclic vector. Show that S L(U )

commutes with T if and only S = p(T ) for some polynomial p(t) of

degree at most n 1 and with coefficients in F.

7.3.4 Let U be an n-dimensional complex vector space with a positive definite

scalar product and T L(U ) a normal mapping.

(i) Show that if T has a cyclic vector then T has n distinct eigenvalues.

(ii) Assume that T has n distinct eigenvalues and u1 , . . . , un are the

associated eigenvectors. Show that

u = a1 u1 + + an un

(7.3.42)

= 0 for any i = 1, . . . , n.

7.3.5 Let U be an n-dimensional vector space over a field F and T L(U )

a degree n nilpotent mapping, n 2. Show that there is no S L(U )

such that S 2 = T .

Let T L(U ) where U is an n-dimensional vector space over C and

1 , . . . , k all the distinct eigenvalues of T , of respective algebraic multiplicities n1 , . . . , nk , so that the characteristic polynomial of T assumes the form

pT () = ( i )n1 ( k )nk .

(7.4.1)

with the eigenvalue i :

Vi = N ((T i I )ni ).

(7.4.2)

(1) Vi is invariant under T .

(2) There are eigenvectors u0i,1 , . . . , u0i,l 0 of T , if any, associated with the

i

mi,1 2, . . . , mi,li 2, if any, relative to T i I , such that Vi has a

basis, denoted by Bi , consisting of vectors

u0i,1 , . . . , u0i,l 0 ,

(7.4.3)

u , (T I )(u ), . . . , (T )mi,li 1 (u ).

i,li

i,li

i,li

219

mi = max{1, mi,1 , . . . , mi,li }.

(7.4.4)

these vectors, we have

i u0i,s , s = 1, . . . , li0 ,

T (u0i,s ) =

T (ui,1 ) =

(T i I )(ui,1 ) + i ui,1 ,

T ((T i I )(ui,1 )) =

(T i I )2 (ui,1 )

+i (T i I )(ui,1 ),

T ((T )mi,1 1 (u )) =

i (T i )mi,1 1 (ui,1 ),

i

i,1

(7.4.5)

(T i I )(ui,li ) + i ui,li ,

T (ui,li ) =

T ((T i I )(ui,li )) =

(T i I )2 (ui,li )

+i (T i I )(ui,li ),

mi,li 1

(ui,li )) =

i (T i )mi,li 1 (ui,li ).

T ((T i )

From (7.4.5), we see that, as an element in L(Vi ), the matrix representation of

T with respect to the basis (7.4.3) is

0

0

Ji,0

0 Ji,1

0

Ji

(7.4.6)

,

..

0

.

0

0

0 Ji,li

where Ji,0 = i Il 0 and

i

Ji,s

1

..

.

..

.

i

..

.

0

..

..

.

..

..

..

.

i

(7.4.7)

220

Jordan decomposition

u0i,1 , . . . , u0i,l 0 ,

(7.4.8)

(T )mi,li 1 (u ), . . . , (T I )(u ), u .

i

i,li

i,li

i,li

With the choice of these reordered basis vectors, the submatrix Ji,s instead

takes the following updated form,

1

0 0

i

.

..

0

. ..

1

.. . .

.

.

.

.

Ji,s = .

C(mi,s , mi,s ), s = 1, . . . , li .

.

.

. 0

..

..

.

.

. 1

.

0

i

(7.4.9)

The submatrix Ji,s given in either (7.4.7) or (7.4.9) is called a Jordan block.

To simplify this statement, Ji,0 = i Il 0 is customarily said to consist of li0

i

1 1 (degenerate) Jordan blocks.

Consequently, if we choose

B = B1 Bk ,

(7.4.10)

the matrix that represents T with respect to B is

J1 0 0

.

..

..

.

. ..

0

,

J =

(7.4.11)

. .

..

.. ... 0

0

0 Jk

which is called a Jordan canonical form or a Jordan matrix.

We may summarize the above discussion into the following theorem, which

is the celebrated Jordan decomposition theorem.

Theorem 7.7 Let U be an n-dimensional vector space over C and T L(U )

so that its distinct eigenvalues are 1 , . . . , k with respective algebraic multiplicities n1 , . . . , nk . Then the following hold.

221

of the subspaces V1 , . . . , Vk .

(2) For each i = 1, . . . , k, T i I is nilpotent of degree mi over Vi where

mi ni .

(3) For each i = 1, . . . , k, appropriate eigenvectors and generalized eigenvectors may be chosen in Vi as stated in (7.4.3) or (7.4.8) which generate

a basis of Vi , say Bi .

(4) With respect to the basis B = B1 Bk of U , the matrix representation

of T assumes the Jordan canonical form (7.4.11).

The theorem indicates that T nullifies the polynomial

mT () = ( 1 )m1 ( k )mk

(7.4.12)

having T as a root. In fact, to show mT (T ) = 0, we rewrite any u U in the

form

u = u1 + + uk ,

u1 V1 , . . . , uk Vk .

(7.4.13)

Thus, we have

mT (T )(u) = (T 1 I )m1 (T k I )nk (u1 + + uk )

=

k #

i I )mi ]

(T 1 I )m1 [(T

i=1

(T k I )nk (T i I )mi (ui )

= 0,

(7.4.14)

which establishes mT (T ) = 0 as asserted. Here []

missing in the expression. Since mT ()|pT (), we arrive at pT (T ) = 0, as

stated in the CayleyHamilton theorem.

Given T L(U ), use P to denote the vector space of all polynomials with

coefficients in C and consider the subspace of P:

AT = {p P | p(T ) = 0}.

(7.4.15)

polynomials of the mapping T . Let AT be generated by some m(). Then m()

has the property that it is a minimal-degree polynomial among all nonzero

elements in AT . If we normalize the coefficient of the highest-degree term of

m() to 1, then m is uniquely determined and is called the minimal polynomial

of the linear mapping T . It is clear that, given T L(U ), if 1 , . . . , k are all

the distinct eigenvalues of T and m1 , . . . , mk are the corresponding degrees of

222

Jordan decomposition

(7.4.2), then mT () defined in (7.4.12) is the minimal polynomial of T .

For example, if T is nilpotent of degree k, then mT () = k ; if T is idempotent, that is, T 2 = T , and T

= 0, T

= I , then mT () = 2 .

We may use minimal polynomials to characterize a diagonalizable linear

mapping whose matrix representation with respect to a suitable basis is diagonal. In such a situation it is clear that this basis is made of eigenvectors and

the diagonal entries of the diagonal matrix are the corresponding eigenvalues

of the mapping.

Theorem 7.8 Let U be an n-dimensional vector space over C and T L(U ).

Then T is diagonalizable if and only if its minimal polynomial has only simple

roots.

Proof Let mT () be the minimal polynomial of T and 1 , . . . , k the distinct

eigenvalues of T . If all roots of mT () are simple, then for each eigenvalue i ,

the degree of T i I over the generalized eigenspace associated to i is 1, or

T = i I , i = 1, . . . , k. Thus all the Jordan blocks given in (7.4.6) are diagonal,

which makes J stated in (7.4.11) diagonal.

Conversely, assume T is diagonalizable and 1 , . . . , k are all the distinct

eigenvalues of T . Since U = E1 Ek , we see that h() = (

1 ) (k ) AT defined in (7.4.15). We claim that mT () = h(). To see

this, we only need to show that for any element p AT we have p(i ) = 0,

i = 1, . . . , k. Assume otherwise p(1 )

= 0 (say). Then p() and 1 are

co-prime. So there are polynomials f, g such that f ()p() + g()( 1 )

1. Consequently, I = g(T )(T 1 I ), which leads to the contradiction u = 0

for any u E1 . Thus, we see that h is the lowest degree element in AT \ {0}.

In other words, mT = h.

As a matrix version of Theorem 7.7, we may state that any n n complex

matrix is similar to a Jordan matrix of the form (7.4.11). For matrices, minimal polynomials may be defined analogously and, hence, omitted. Besides,

the matrix version of Theorem 7.8 may read as follows: An n n matrix

is diagonalizable if and only if the roots of its minimal polynomial are all

simple.

Exercises

7.4.1 Show that a diagonalizable nilpotent mapping must be trivial, T = 0,

and a nontrivial idempotent mapping T

= I is diagonalizable.

223

T over an n-dimensional vector space U over C. Show that

mT () = ( 1 ) ( k ).

(7.4.16)

the proof of Theorem 7.8, it is shown that if S T , then mS () =

mT (). Give an example to show that the condition mS () = mT () is

not sufficient to ensure S T .

7.4.4 Let A C(n, n). Show that A At .

7.4.5 Let A, B R(n, n). Show that if there is a nonsingular element C

C(n, n) such that A = C 1 BC then there is a nonsingular element

K R(n, n) such that A = K 1 BK.

7.4.6 Let A, B C(n, n) be normal. Show that if the characteristic polynomials of A, B coincide then A B.

7.4.7 Let T L(Cn ) be defined by

x1

xn

.

.

n

(7.4.17)

T (x) =

.. , x = .. C .

x1

xn

(ii) Find the minimal polynomial of T .

(iii) Does Cn have a basis consisting of eigenvectors of T ?

7.4.8 Consider the matrix

A= 1

0

0 0

0 1 ,

(7.4.18)

1 0

where a R.

(i) For what value(s) of a can or cannot the matrix A be diagonalized?

(ii) Find the Jordan forms of A corresponding to various values of a.

7.4.9 Let A C(n, n) and k 2 be an integer such that A Ak .

(i) Show that, if is an eigenvalue of A, so is k .

(ii) Show that, if in addition A is nonsingular, then each eigenvalue of

A is a root of unity. In other words, if C is an eigenvalue of A,

then there is an integer s 1 such that s = 1.

7.4.10 Let A C(n, n) satisfy Am = aIn for some integer m 1 and nonzero

a C. Use the information about the minimal polynomial of A to

prove that A is diagonalizable.

224

Jordan decomposition

7.4.12 Show that the n n matrices

1 1 1

n 0 0

1 1 1

b2 0 0

A= . . .

, (7.4.19)

, B = . . .

.. ..

.. ..

. . ...

. . ...

bn 0 0

1 1 1

where b2 , . . . , bn R, are similar and diagonalizable.

7.4.13 Show that the matrices

1 0 0

2 0 0

A = 0 0 1 , B = 0 1 0 ,

0

(7.4.20)

C 1 BC.

7.4.14 Show that if A C(n, n) has a single eigenvalue then A is not diagonalizable unless A = In . In particular, a triangular matrix with identical diagonal entries can never be diagonalizable unless it is already

diagonal.

7.4.15 Consider A C(n, n) and express its characteristic polynomial pA ()

as

pA () = ( 1 )n1 ( k )nk ,

(7.4.21)

the respective algebraic multiplicities of these eigenvalues such that

k

ni = n. Show that A is diagonalizable if and only if r(i I A) =

i=1

n ni for i = 1, . . . , k.

7.4.16 Show that the matrix

0

A=

0

a4

0

,

1

(7.4.22)

7.4.17 Show that there is no matrix in R(3, 3) whose minimal polynomial is

m() = 2 + 3 + 4.

7.4.18 Let A R(n, n) and satisfy A2 + In = 0.

225

(ii) Show that n must be an even integer, n = 2m.

(iii) Show that there are n linearly independent vectors,

u1 , v1 , . . . , um , vm , in Rn such that

Aui = vi ,

Avi = ui ,

i = 1, . . . , m.

(7.4.23)

matrix B R(n, n) such that

0

Im

(7.4.24)

A=B

B 1 .

Im 0

8

Selected topics

In this chapter we present a few selected subjects that are important in applications as well, but are not usually included in a standard linear algebra course.

These subjects may serve as supplemental or extracurricular materials. The

first subject is the Schur decomposition theorem, the second is about the classification of skewsymmetric bilinear forms, the third is the PerronFrobenius

theorem for positive matrices, and the fourth concerns the Markov or stochastic

matrices.

In this section we establish the Schur decomposition theorem, which serves as

a useful complement to the Jordan decomposition theorem and renders further

insight and fresh angles into various subjects, such as the spectral structures of

normal mappings and self-adjoint mappings, already covered.

Theorem 8.1 Let U be a finite-dimensional complex vector space with a positive definite scalar product and T L(U ). There is an orthonormal basis

B = {u1 , . . . , un } of U such that the matrix representation of T with respect

to B is upper triangular. That is,

T (uj ) =

j

bij ui ,

j = 1, . . . , n,

(8.1.1)

i=1

entries b11 , . . . , bnn of the upper triangular matrix B = (bij ) C(n, n) are

the eigenvalues of T , which are not necessarily distinct.

Proof We prove the theorem by induction on dim(U ).

When dim(U ) = 1, there is nothing to show.

226

227

We proceed to establish the theorem when dim(U ) = n 2.

Let w be an eigenvector of T associated with the eigenvalue and consider

V = (Span{w}) .

(8.1.2)

(w, T (v)) = (T (w), v) = (w, v) = (w, v) = 0,

(8.1.3)

Applying the inductive assumption on T L(V ) since dim(V ) = n 1, we

see that there is an orthonormal basis {u1 , . . . , un1 } of V and scalars bij C,

i = 1, . . . , j, j = 1, . . . , n 1 such that

T (uj ) =

j

bij ui ,

j = 1, . . . , n 1.

(8.1.4)

i=1

orthonormal basis of U with the stated properties.

Using the Schur decomposition theorem, Theorem 8.1, the Cayley

Hamilton theorem (over C) can be readily proved.

In fact, since T (u1 ) = b11 u1 , we have (T b11 I )(u1 ) = 0. For j 1 1,

we assume (T b11 I ) (T bj 1,j 1 I )(uk ) = 0 for k = 1, . . . , j 1.

Using the matrix representation of T with respect to {u1 , . . . , un }, we have

(T bjj I )(uj ) =

j 1

(8.1.5)

bij ui .

i=1

(T b11 I ) (T bjj I )(uj )

j 1

= (T b11 I ) (T bj 1,j 1 I )

bij ui = 0,

(8.1.6)

i=1

j = 1, . . . , n. In particular, since the characteristic polynomial of T takes the

form

pT () = det(I B) = ( b11 ) ( bnn ),

(8.1.7)

we have

pT (T )(ui ) = (T b11 I ) (T bnn I )(ui ) = 0,

i = 1, . . . , n. (8.1.8)

228

Selected topics

Consequently, pT (T ) = 0, as anticipated.

It is clear that, in Theorem 8.1, if the upper triangular matrix B C(n, n)

is diagonal, then the orthonormal basis B is made of the eigenvectors of T .

In this situation T is normal. Likewise, if B is diagonal and real, then T is

self-adjoint.

We remark that Theorem 8.1 may also be proved without resorting to the

adjoint mapping.

In fact, use the notation of Theorem 8.1 and proceed to the nontrivial situation dim(U ) = n 2 directly. Let 1 C be an eigenvalue of T and u1 an

associated unit eigenvector. Then we have the orthogonal decomposition

U = Span{u1 } V ,

V = (Span{u1 }) .

(8.1.9)

P T . Then S may be viewed as an element in L(V ).

Since dim(V ) = n 1, we may apply the inductive assumption to obtain an

orthonormal basis, say {u2 , . . . , un }, of V , such that

S(uj ) =

j

bij ui ,

j = 2, . . . , n,

(8.1.10)

i=2

for some bij s in C. Of course the vectors u1 , u2 , . . . , un now form an orthonormal basis of U . Moreover, in view of (8.1.10) and R(I P ) = Span{u1 },

we have

T (u1 ) = 1 u1 b11 u1 ,

T (uj ) = ((I P ) T )(uj ) + (P T )(uj )

= b1j u1 +

j

bij ui ,

b1j C,

(8.1.11)

j = 2, . . . , n,

i=2

Thus Theorem 8.1 is again established.

The matrix version of Theorem 8.1 may be stated as follows.

Theorem 8.2 For any matrix A C(n, n) there is a unitary matrix P

C(n, n) and an upper triangular matrix B C(n, n) such that

A = P BP .

(8.1.12)

matrix to an upper triangular matrix B whose diagonal entries are all the

eigenvalues of A.

229

The proof of Theorem 8.2 may be obtained by applying Theorem 8.1, where

we take U = Cn with the standard Hermitian scalar product and define T

L(Cn ) by setting T (u) = Au for u Cn . In fact, with B = {u1 , . . . , un } being

the orthonormal basis of Cn stated in Theorem 8.1, the unitary matrix P in

(8.1.12) is such that the ith column vector of P is simply ui , i = 1, . . . , n.

From (8.1.12) we see immediately that A is normal if and only if B is diagonal and that A is Hermitian if and only if B is diagonal and real.

Exercises

8.1.1 Show that the matrix B may be taken to be lower triangular in Theorems 8.1 and 8.2.

8.1.2 Let A = (aij ), B = (bij ) C(n, n) be stated in the relation (8.1.12).

Use the fact Tr(A A) = Tr(B B) to infer the identity

n

i,j =1

|aij |2 =

|bij |2 .

(8.1.13)

1ij n

that A is normal if and only if it satisfies the equation

Tr(A A) = |1 |2 + + |n |2 .

(8.1.14)

8.1.4 For A R(n, n) assume that the roots of the characteristic polynomial

of A are all real. Establish the real version of the Schur decomposition

theorem, which asserts that there is an orthogonal matrix P R(n, n)

and an upper triangular matrix B R(n, n) such that A = P t BP and

that the diagonal entries of B are all the eigenvalues of A. Can you prove

a linear mapping version of the theorem when U is a real vector space

with a positive definite scalar product?

8.1.5 Show that if A R(n, n) is normal and all the roots of its characteristic

polynomial are real then A must be symmetric.

8.1.6 Let U be a finite-dimensional complex vector space with a positive definite scalar product and S, T L(U ). If S and T are commutative, show

that U has an orthonormal basis, say B, such that, with respect to B, the

matrix representations of S and T are both upper triangular.

8.1.7 Let U be an n-dimensional (n 2) complex vector space with a positive

definite scalar product and T L(U ). If C is an eigenvalue of T u

U an associated eigenvector, then the quotient space V = U/Span{u} is

of dimension n 1.

(i) Define a positive definite scalar product over V and show that T

induces an element in L(V ).

230

Selected topics

in (i).

8.1.8 Given A C(n, n) (n 2), follow the steps below to carry out an

inductive but also constructive proof of Theorem 8.2.

(i) Find an eigenvalue, say 1 , of A, and a unit eigenvector u1 Cn ,

taken as a column vector. Let u2 , . . . , un Cn be chosen so that

u1 , u2 , . . . , un form an orthonormal basis of Cn . Use u1 , u2 , . . . , un

as the first, second,. . . , and the nth column vectors of a matrix called

Q1 . Then Q1 is unitary. Check that

1

,

(8.1.15)

Q1 AQ1 =

0 An1

where An1 C(n 1, n 1) and Cn1 is a row vector.

(ii) Apply the inductive assumption to get a unitary element Q C(n1,

n 1) so that Q An1 Q is upper triangular. Show that

1 0

(8.1.16)

Q2 =

0 Q

is a unitary element in C(n, n) such that

(Q1 Q2 ) A(Q1 Q2 ) = Q2 Q1 AQ1 Q2

(8.1.17)

8.1.9 Let T L(U ) where U is a finite-dimensional complex vector space

with a positive definite scalar product. Prove that if is an eigenvalue of

T then is an eigenvalue of T .

Let U be a finite-dimensional vector space over a field F. A bilinear form

f : U U F is called skewsymmetric or anti-symmetric if it satisfies

f (u, v) = f (v, u),

u, v U.

(8.2.1)

x = (x1 , . . . , xn )t , y = (y1 , . . . , yn )t Fn with respect to B, we can rewrite

f (u, v) as

n

n

n

f (u, v) = f

xi ui ,

yj uj =

xi f (ui , uj )yj = x t Ay, (8.2.2)

i=1

j =1

i,j =1

231

with respect to the basis B. Thus, combining (8.2.1) and (8.2.2), we see that A

must be skewsymmetric or anti-symmetric, A = At , because

x t Ay = f (u, v) = f (v, u) = y t Ax = (y t Ax)t = x t At y,

(8.2.3)

Let B = {u 1 , . . . , u n } be another basis of U and A = (a ij ) = (f (u i , u j ))

If the transition matrix

F(n, n) the matrix representation of f with respect to B.

u j =

n

j = 1, . . . , n,

bij ui ,

(8.2.4)

i=1

in Chapter 5.

In this section, we study the canonical forms of skewsymmetric forms

through a classification of skewsymmetric matrices by congruent relations.

Theorem 8.3 Let A F(n, n) be skewsymmetric. Then there is some nonsingular matrix C F(n, n) satisfying det(C) = 1 so that A is congruent

through C to a matrix F(n, n) of the following canonical form

0 0

1

.

..

..

..

..

..

.

.

.

.

.

.

.

.

..

..

..

.

.

k

(8.2.5)

= CAC t ,

..

0

0 .

.

.

.

..

..

..

.

.

.

. .

.

.

.

.

0 0

where

i =

ai

ai

i = 1, . . . , k,

(8.2.6)

scalars a1 , . . . , ak F.

Proof In the trivial case A = 0, there is nothing to show.

We now make induction on n.

232

Selected topics

= 0, then

0 a

A=

, a

= 0.

(8.2.7)

a 0

Hence there is nothing to show either.

Assume the statement of the theorem is valid for any n l for some l 2.

Let n = l + 1. Assume the nontrivial case A = (aij )

= 0. So there is some

aij

= 0 for i, j = 1, . . . , n. Let i be the smallest among {1, . . . , n} such that

aij

= 0 for some j = 1, . . . , n. If i = 1 and j = 2, then a12

= 0. If i = 1 and

j > 2, we let E1 be the elementary matrix obtained from interchanging the

second and j th rows of In . Then det(E1 ) = 1 and the entry at the first row

and second column of E1 AE1t is nonzero. If i > 1, then the first row of A is a

zero row. Let E1 be the elementary matrix obtained from interchanging the first

and ith rows of In . Then det(E1 ) = 1 and the first row of E1 AE1t = (bij )

is nonzero. So there is some 2 j n such that b1j

= 0. Let E2 be the

elementary matrix obtained from interchanging the second and j th rows of

E1 AE1t . Then, in E2 E1 AE1t E2t = (cij ), we have c12 a1

= 0. We also have

det(E2 ) = 1 depending on whether j = 2 or j

= 2.

Thus we may summarize that there is a matrix E with det(E) = 1

such that

1

0

a1

t

EAE =

, 1 =

,

(8.2.8)

a1 0

t An2

where F(2, n 2) and An2 F(n 2, n 2) is skewsymmetric.

Consider P F(n, n) of the form

0

I2

,

(8.2.9)

P =

t In2

where F(2, n 2) is to be determined. Then det(P ) = 1 and

I2

t

P =

.

0 In2

Consequently, we have

I2

t t

P EAE P =

t

=

In2

An2

1 +

1

t

I2

0

In2

1 t + t + An2

t

(8.2.10)

(8.2.11)

233

1 + = 0.

(8.2.12)

led to the unique choice

0 a11

1

1

= 1 , 1 =

.

(8.2.13)

a1

0

Thus, in view of (8.2.12), we see that (8.2.11) becomes

1 0

t t

,

P EAE P =

0 G

(8.2.14)

where

G = t 1 t + t + An2

(8.2.15)

Using the inductive assumption, we can find an element D F(n2, n2)

satisfying det(D) = 1 such that DGD t is of the desired canonical form

stated in the theorem.

Now let

I2 0

Q=

.

(8.2.16)

0 D

Then det(Q) = 1 and

QP EAE P Q =

t

DGD t

= ,

(8.2.17)

Taking C = QP E, we see that the theorem is established.

Furthermore, applying a sequence of elementary matrices to the left and

right of the canonical matrix given in (8.2.5) realizing suitable row and column interchanges, we see that we can use a nonsingular matrix of determinant

1 to congruently reduce into another canonical form,

0

Dk 0

(8.2.18)

= Dk 0 0 F(n, n),

0

234

Selected topics

Dk = diag{a1 , . . . , ak }.

(8.2.19)

skewsymmetric matrix is 2k, an even number.

We now consider the situation when F = R. By row and column operations

if necessary, we may assume without loss of generality that a1 , . . . , ak > 0 in

(8.2.6). With this assumption, define

0

1 0

1

.

0

..

..

..

ai

.

.

. , =

, i = 1, . . . , k.

R= .

i

1

0

0

k

0

ai

0 0 In2k

(8.2.20)

Then

J2

.

.

RR t =
.

0

..

.

0

..

.

J2

where

J2 =

0

..

. ,

(8.2.21)

(8.2.22)

Of course, as before, we may also further congruently reduce in (8.2.21)

into another canonical form,

J2k 0

=

,

(8.2.23)

0 0

where J2k R(2k, 2k) is given by

J2k =

Ik

Ik

(8.2.24)

U U F, define

S = {u U | f (u, v) = 0, v S}

(8.2.25)

235

Set

U0 = U = {u U | f (u, v) = 0, v U }.

(8.2.26)

skewsymmetric bilinear form f is characterized by the fact that when u U

and f (u, v) = 0 for all v U then u = 0. Thus f is called degenerate if

U0

= {0} or dim(U0 ) 1. It is not hard to show that f is degenerate if and

only if the rank of the matrix representation of f with respect to any basis of

U is smaller than dim(U ).

If f is degenerate over U , then the canonical form (8.2.5) indicates that we

may obtain a basis

{u1 , . . . , un0 , v1 , . . . , vk , w1 , . . . , wk }

(8.2.27)

f (vi , wi ) = ai

= 0, i = 1, . . . , k,

(8.2.28)

f (vi , vj ) = f (wi , wj ) = 0, i, j = 1, . . . , k,

f (vi , wj ) = 0, i

= j, i, j = 1, . . . , k.

Of particular interest is when f is non-degenerate over U . In such a situation, dim(U ) must be an even number, 2k.

Definition 8.4 Let U be a vector space of 2k dimensions. A skewsymmetric

bilinear form f : U U F is called symplectic if f is non-degenerate. An

even dimensional vector space U equipped with a symplectic form is called a

symplectic vector space. A basis {v1 , . . . , vk , w1 , . . . , wk } of a 2k-dimensional

symplectic vector space U equipped with the symplectic form f is called symplectic if

f (vi , wi ) = 1, i = 1, . . . , k,

(8.2.29)

f (vi , vj ) = f (wi , wj ) = 0, i, j = 1, . . . , k,

f (vi , wj ) = 0, i

= j, i, j = 1, . . . , k.

A symplectic basis is also called a Darboux basis.

Therefore, we have seen that a real symplectic vector space always has a

symplectic basis.

Definition 8.5 Let U be a symplectic vector space equipped with a symplectic

form f . A subspace V of U is called isotropic if f (u, v) = 0 for any u, v V .

236

Selected topics

Lagrangian subspace.

Let U be a symplectic vector space with a symplectic basis given as in

(8.2.29). Then we see that both

V = Span{v1 , . . . , vk },

W = Span{w1 , . . . , wk }

(8.2.30)

If U is a finite-dimensional complex vector space, we may consider a

skewsymmetric sesquilinear form f from U U into C. Such a form satisfies

f (u, v) = f (v, u),

u, v U,

(8.2.31)

g(u, v) = if (u, v),

u, v U,

(8.2.32)

is Hermitian. It is clear that the matrix representation, say A C(n, n), with

respect to any basis of U of a Hermitian skewsymmetric form is anti-Hermitian

or skew-Hermitian, A = A. Hence iA is Hermitian. Applying the knowledge about Hermitian forms and Hermitian matrices studied in Chapter 6, it is

not hard to come up with a complete understanding of skew-Hermitian forms

and matrices, in the same spirit of Theorem 8.3. We leave this as an exercise.

Exercises

8.2.1 Let f be a bilinear form over a vector space U . Show that f is skewsymmetric if and only if f (u, u) = 0 for any u U .

8.2.2 Let f be a skewsymmetric bilinear form over U and S U a non-empty

subset. Show that S defined in (8.2.25) is a subspace of U .

8.2.3 Let U be a finite-dimensional vector space and f a skewsymmetric bilinear form over U . Define U0 by (8.2.26) and use A to denote the matrix

representation of f with respect to any given basis of U . Show that

dim(U0 ) = dim(U ) r(A).

(8.2.33)

If V is a subspace of U , V is called the symplectic complement of V

in U . Prove the following.

(i) dim(V ) + dim(V ) = dim(U ).

(ii) (V ) = V .

8.2.5 Let U be a symplectic vector space equipped with a symplectic form f .

If V is a space of U , we can consider the restriction of f over V . Prove

that f is symplectic over V if and only if V V = {0}.

237

and only if V V .

8.2.7 Show that a subspace V of a symplectic vector space U is Lagrangian if

and only if V = V .

8.2.8 For A C(n, n), show that A is skew-Hermitian if and only if there is a

nonsingular element C C(n, n) such that CAC = iD where D is an

n n real diagonal matrix.

Let A = (aij ) R(n, n). We say that A is positive (non-negative) if aij > 0

(aij 0) for all i, j = 1, . . . , n. Likewise we say that a vector u = (ai )

Rn is positive (non-negative) if ai > 0 (ai 0) for all i = 1, . . . , n. More

generally, for A, B R(n, n), we write A > B (A B) if (A B) > 0

((A B) 0); for u, v Rn , we write u > v (u v) if (u v) > 0

((u v) 0).

The PerronFrobenius theorem concerns the existence and properties of a

positive eigenvector, associated to a positive eigenvalue, of a positive matrix

and may be stated as follows.

Theorem 8.6 Let A = (aij ) R(n, n) be a positive matrix. Then there is

a positive eigenvalue, r, of A, called the dominant eigenvalue, satisfying the

following properties.

(1) There is a positive eigenvector, say u, associated with r, such that any

other non-negative eigenvectors of A associated with r must be positive

multiples of u.

(2) r is a simple root of the characteristic polynomial of A.

(3) If C is any other eigenvalue of A, then

|| < r.

(8.3.1)

the dominant eigenvalue r.

Proof

x = max{|xi | | i = 1, . . . , n},

x1

.

n

x=

.. R ,

xn

(8.3.2)

238

Selected topics

S = {x Rn | x = 1, x 0}

(8.3.3)

(8.3.4)

of R . Then define

n

We can show that is an interval in R of the form [0, r] for some r > 0.

In fact, take a test vector, say y = (1, . . . , 1)t Rn . Since A is positive, it

is clear that Ay y if satisfies

n

aij i = 1, . . . , n .

0 < min

(8.3.5)

j =1

[0, ] . Thus is connected.

Let . Then there is some x S such that Ax x. Therefore, since

x = 1, we have

n

aij xj i = 1, . . . , n

= x = Ax = max

j =1

n

aij i = 1, . . . , n ,

(8.3.6)

max

j =1

Now set

r = sup { } .

(8.3.7)

Then we have seen that r satisfies 0 < r < . Hence there is a sequence

{k } such that

r = lim k .

k

(8.3.8)

sequence {x (k) } S such that

Ax (k) k x (k) ,

k = 1, 2, . . . .

(8.3.9)

which we still denote by {x (k) } without loss of generality such that it converges

to some element, say u = (a1 , . . . , an )t , in S, as k . Letting k in

(8.3.9), we arrive at

Au ru.

(8.3.10)

239

We next show that equality must hold in (8.3.10).

Suppose otherwise there is some i0 = 1, . . . , n such that

n

n

j =1

aij aj rai

(8.3.11)

j =1

= i0 , we conclude from

the second inequality in (8.3.11) that

n

s > 0.

(8.3.12)

j =1

n

(8.3.13)

j =1

which leads to

n

(8.3.14)

j =1

Thus Az > rz. Set v = z/z. Then v S and Av > rv. Hence we may

choose > 0 small such that Av > (r + )v. So r + , which contradicts

the definition of r made in (8.3.7).

Therefore Au = ru and r is a positive eigenvalue of A.

The positivity of u = (ai ) follows easily since u S and A > 0 so that

ru = Au leads to

ai =

n

aij aj > 0,

i = 1, . . . , n,

(8.3.15)

j =1

because u

= 0. That is, a non-negative eigenvector of A associated to a positive

eigenvalue must be positive.

Let v be any non-negative eigenvector of A associated to r. We show that

there is a positive number a such that v = au. For this purpose, we construct

the vector

us = u sv,

s > 0.

(8.3.16)

small. Set

s0 = sup{s | us 0}.

(8.3.17)

240

Selected topics

> 0. If us0

= 0 then us0 > 0 since us0 is an

eigenvector of A associated to r that contradicts the definition of s0 . Therefore

we must have us0 = 0, which gives us the result v = (s01 )u as desired.

In order to show that r is the simple root of the characteristic polynomial

of A, we need to prove that there is only one Jordan block associated with the

eigenvalue r and that this Jordan block can only be 1 1. To do so, we first

show that the geometric multiplicity of r is 1. Then we show that there is no

generalized eigenvector. That is, the equation

(A rIn )v = u,

v Cn ,

(8.3.18)

has no solution.

Suppose otherwise that the dimension of the eigenspace Er is greater than

one. Then there is some v Er that is not a scalar multiple of u. Write v as

v = v1 + iv2 with v1 , v2 Rn . Then Av1 = rv1 and Av2 = rv2 . We assert

that one of the sets of vectors {u, v1 } and {u, v2 } must be linearly independent

over R. Otherwise there are a1 , a2 R such that v1 = a1 u and v2 = a2 u which

imply v = (a1 + ia2 )u, a contradiction. Use w to denote either v1 or v2 which

is linearly independent from u over R. Then we know that w can never be

non-negative. Thus there are components of w that have different signs. Now

consider the vector

us = u + sw,

s > 0.

(8.3.19)

It is clear that us > 0 when s > 0 is sufficiently small since u > 0. So there

is some s0 > 0 such that us0 0 but a component of us0 is zero. However,

because us0

= 0 owing to the presence of a positive component in w, we arrive

at a contradiction because us0 is seen to be an eigenvector of A associated to r.

We next show that (8.3.18) has no solution. Since u Rn , we need only to

consider v Rn in (8.3.18). We proceed as follows.

Consider

ws = v + su,

s > 0.

(8.3.20)

Since Au = ru, we see that ws also satisfies (8.3.18). Take s > 0 sufficiently

large so that ws > 0. Hence (A rIn )ws = u > 0 or

Aws > rws .

(8.3.21)

if necessary, we may assume ws S. This indicates r + which violates

the definition of r stated in (8.3.7).

So the assertion that r is a simple root of the characteristic polynomial of A

follows.

241

an associated eigenvector. From Av = v we have

|||bi |

n

aij |bj |,

i = 1, . . . , n.

(8.3.22)

j =1

|b1 |

.

w=

.. S.

(8.3.23)

|bn |

Since (8.3.22) implies Aw rw, we see in view of the definition of that

|| . Using (8.3.7), we have || r.

If || < r, there is nothing more to do. If || = r, the discussion just made

in the earlier part of the proof shows that w is a non-negative eigenvector of

A associated to r and thus equality holds in (8.3.22) for all i = 1, . . . , n.

Therefore, since A > 0, the complex numbers b1 , . . . , bn must share the same

phase angle, R, so that

bi = |bi |ei ,

i = 1, . . . , n.

(8.3.24)

there is a number a > 0 such that

w = au.

(8.3.25)

Finally let v be an arbitrary eigenvector of A which is non-negative and

associated to an eigenvalue C. Since At > 0 and the characteristic polynomial of At is the same as that of A, we conclude that r is also the dominant

eigenvalue of At . Now use w to denote a positive eigenvector of At associated

to r. Then At w = rw. Thus

v t w = (Av)t w = v t At w = rv t w.

(8.3.26)

= 0, w > 0, we have v t w > 0. So

it follows from (8.3.26) that = r.

The proof of the theorem is complete.

The dominant eigenvalue r of the distinguished characteristics of a positive

matrix A stated in Theorem 8.6 is also called the Perron or PerronFrobenius

eigenvalue of the matrix A.

For an interesting historical account of the PerronFrobenius theory and a

discussion of its many applications and generalizations and attributions of the

proofs including the one presented here, see Bellman [6].

242

Selected topics

Exercises

8.3.1 Let A = (aij ) R(n, n) be a positive matrix and r > 0 the Perron

Frobenius eigenvalue of A. Show that r satisfies the estimate

n

n

min

aij r max

aij .

(8.3.27)

1in

1in

j =1

A= 2

3

j =1

1 .

(8.3.28)

(ii) Check to see that u = (1, 1, 1)t is a positive eigenvector of A. Use u

and Theorem 8.6 to find the dominant eigenvalue of A and confirm

that this is exactly what was obtained in part (i).

(iii) Compute all the eigenvalues of A directly and confirm the result

obtained in part (i) or (ii).

8.3.3 Let A, B R(n, n) be positive matrices and use rA , rB to denote

the dominant eigenvalues of A, B, respectively. Show that rA rB if

A B.

8.3.4 Let A R(n, n) be a positive matrix and u = (ai ) Rn a non-negative

eigenvector of A. Show that the dominant eigenvalue r of A may be

computed by the formula

n

1

(8.3.29)

r = 0n

aj k ak .

i=1 ai

j,k=1

In this section we discuss a type of nonnegative matrices known as the Markov

or stochastic matrices which are important in many areas of applications.

Definition 8.7 Let A = (aij ) R(n, n) so that aij 0 for i, j = 1, . . . , n. If

A satisfies

n

j =1

aij = 1,

i = 1, . . . , n,

(8.4.1)

243

that is, the components of each row vector in A sum up to 1, then A is called

a Markov or stochastic matrix. If there is an integer m 1 such that Am is a

positive matrix, then A is called a regular Markov or regular stochastic matrix.

A few immediate consequences follow directly from the definition of a

Markov matrix and are stated below.

Theorem 8.8 Let A = (aij ) R(n, n) be a Markov matrix. Then 1 is an

eigenvalue of A which enjoys the following properties.

(1) The vector u = (1, . . . , 1)t Rn is an eigenvector of A associated to the

eigenvalue 1.

(2) Any eigenvalue C of A satisfies

|| 1.

(8.4.2)

Proof Using (8.4.1), the fact that u = (1, . . . , 1)t satisfies Au = u may be

checked directly.

Now let C be any eigenvalue of A and v = (bi ) Cn an associated

eigenvector. Then there is some i0 = 1, . . . , n such that

|bi0 | = max{|bi | | i = 1, . . . , n}.

(8.4.3)

bi0 =

n

(8.4.4)

ai0 j bj ,

j =1

|||bi0 |

n

j =1

n

ai0 j = |bi0 |.

(8.4.5)

j =1

In fact, for a nonnegative element A R(n, n), it is clear that A being a

Markov matrix is equivalent to the vector u = (1, . . . , 1)t being an eigenvector of A associated to the eigenvalue 1. This simple fact establishes that the

product of any number of the Markov matrices in R(n, n) is also a Markov

matrix.

Let A R(n, n) be a Markov matrix. It will be interesting to identify a

certain condition under which the eigenvalue 1 of A becomes dominant as in

the PerronFrobenius theorem. Below is such a result.

244

Selected topics

1 of A is the dominant eigenvalue of A which satisfies the following properties.

(1) The absolute value of any other eigenvalue C of A is less than 1. That

is, || < 1.

(2) 1 is a simple root of the characteristic polynomial of A.

Proof Let m 1 be an integer such that Am > 0. Since Am is a Markov

matrix, we see that 1 is the dominant eigenvalue of Am . On the other hand, if

C is any eigenvalue of A other than 1, since m is an eigenvalue of Am

other than 1, we have in view of Theorem 8.6 that |m | < 1, which proves

|| < 1.

We now show that 1 is a simple root of the characteristic polynomial of A. If

m = 1, the conclusion follows from Theorem 8.6. So we may assume m 2.

Recall that there is an invertible matrix C C(n, n) such that

A = CBC 1 ,

(8.4.6)

B = diag{J1 , , Jk }

(8.4.7)

Jordan block of the form

J =

..

.

0

..

.

..

.

..

.

1

..

..

..

..

0

..

.

,

0

(8.4.8)

rewrite J as the sum of two matrices, in the form

J = I + P ,

(8.4.9)

have P l = 0. Hence, in view of the binomial expansion formula, we have

245

J m = (I + P )m

=

m

s=0

l1

s=0

m!

ms P s

s!(m s)!

m!

ms P s

s!(m s)!

= m I + m1 mP + + ml+1

m(m 1) (m [l 2]) l1

P ,

(l 1)!

(8.4.10)

which is an upper triangular matrix with the diagonal entries all equal to m .

From (8.4.6) and (8.4.7), we have

Am = Cdiag{J1m , . . . , Jkm }C 1 .

(8.4.11)

conclude that there exists exactly one Jordan block among the Jordan blocks

J1 , . . . , Jk of A with = 1 and that such a Jordan block can only be 1 1.

The proof of the theorem is thus complete.

Since in (8.4.10) there are l terms, we see that J m 0 as m when

|| < 1. This observation leads to the following important convergence theorem for the Markov matrices.

Theorem 8.10 If A R(n, n) is a regular Markov matrix, then

lim Am = K,

(8.4.12)

vnt = v t , where v = (b1 , . . . , bn )t Rn is the unique positive vector satisfying

At v = v,

n

bi = 1.

(8.4.13)

i=1

that Am > 0. Thus (At )m = (Am )t > 0. Since At and A have the same

characteristic polynomial, At has 1 as its dominant eigenvalue as A does. Let

v = (bi ) be an eigenvector of At associated to the eigenvalue 1. Then v is also

an eigenvector of (At )m associated to 1. Since (At )m > 0 and its eigenvalue 1

as the dominant eigenvalue of (At )m is simple, we see in view of Theorem 8.6

that we may choose v Rn so that either v > 0 or v < 0. We now choose

v > 0 and normalize it so that its components sum to 1. That is, v satisfies

(8.4.13).

246

Selected topics

A = Cdiag{1, J1 , . . . , Jk }C 1 .

(8.4.14)

vector of C is an eigenvector of A associated to the eigenvalue 1 which is

simple by Theorem 8.9. Hence we may choose this first column vector of C to

be u = a(1, . . . , 1)t for some a C, a

= 0.

On the other hand, rewrite D = C 1 and express (8.4.14) as

DA = diag{1, J1 , . . . , Jk }D.

(8.4.15)

We see that the first row vector of D, say w for some w C , satisfies

w t A = w t . Hence At w = w. Since 1 is a simple root of the characteristic

polynomial of At , we conclude that there is some b C, b

= 0, such that

w = bv where v satisfies (8.4.13).

Since D = C 1 , we have w t u = 1, which leads to

t

n

bi = ab = 1.

(8.4.16)

i=1

Am = Cdiag{1, J1m , . . . , Jkm }D.

(8.4.17)

Since each of the Jordan blocks J1 , . . . , Jk is of the form (8.4.9) for some

C satisfying || < 1, we have seen that

J1m , . . . , Jkm 0

as m .

(8.4.18)

lim Am = Cdiag{1, 0, . . . , 0}D.

(8.4.19)

Inserting the results that the first column vector of C is u = a(1, . . . , 1)t , the

first row vector of D is w t = bv t , where v satisfies (8.4.13), and ab = 1 into

(8.4.19), we obtain

b1 bn

.

..

..

(8.4.20)

lim Am =

. = K,

m

b1

bn

as asserted.

For a Markov matrix A R(n, n), the power of A, Am , may or may not

approach a limiting matrix as m . If the limit of Am as m exists

and is some K R(n, n), then it is not hard to show that K is also a Markov

matrix, which is called the stable matrix of A and A is said to be a stable

247

Markov matrix. Theorem 8.10 says that a regular Markov matrix A is stable

and, at the same time, gives us a constructive method to find the stable matrix

of A.

Exercises

8.4.1 Let A R(n, n) be a stable Markov matrix and K R(n, n) the stable

matrix of A. Show that K is also a Markov matrix and satisfies AK =

KA = K.

8.4.2 Show that the matrix

0 1

A=

(8.4.21)

1 0

is a Markov matrix which is not regular. Is A stable?

8.4.3 Consider the Markov matrix

1 1 0

1 1

1

.

A=

1

2

2

2

0

(8.4.22)

(ii) Find the stable matrix of A.

(iii) Use induction to establish the formula

1

3

m2

m1

m2

1)

2

1)

+

(2

+

(2

2

1

m

m2

m1

m2

,

A = m

2

2

2

1

3

m2

m1

m2

1) 2

1)

+ (2

+ (2

2

2

m = 1, 2, . . . .

(8.4.23)

Take m in (8.4.23) and verify your result obtained in (ii).

8.4.4 Let A R(n, n) be a Markov matrix. If At is also a Markov matrix, A

is said to be a doubly Markov matrix. Show that the stable matrix K of

a regular doubly Markov matrix is simply given by

1 1

1 .

.. ...

K=

(8.4.24)

.

n

1 1

8.4.5 Let A1 , . . . , Ak R(n, n) be k doubly Markov matrices. Show that their

product A = A1 Ak is also a doubly Markov matrix.

9

Excursion: Quantum mechanics in a nutshell

The content of this chapter may serve as yet another supplemental topic to meet

the needs and interests beyond those of a usual course curriculum. Here we

shall present an over-simplified, but hopefully totally transparent, description

of some of the fundamental ideas and concepts of quantum mechanics, using

a pure linear algebra formalism.

Consider the vector space Cn , consisting of column vectors, and use

{e1 , . . . , en } to denote the standard basis of Cn . For u, v Cn with

a1

n

.

=

.

u=

ai ei ,

.

an

i=1

b1

n

.

=

.

v=

bi ei ,

.

bn

(9.1.1)

i=1

(u, v) = u v =

n

a i bi ,

(9.1.2)

i=1

In quantum mechanics, it is customary to rewrite the scalar product (9.1.2)

in a bracket form, u|v. Then it was Dirac who suggested to view u|v as the

scalar pairing of a bra vector u| and a ket vector |v, representing the row

vector u and the column vector v. Thus we may use |e1 , . . . , |en to denote

the standard basis vectors of Cn and represent the vector u in Cn as

248

|u =

n

ai |ei .

249

(9.1.3)

i=1

u| = (|u) =

n

a i ei |.

(9.1.4)

i=1

en } becomes

ei |ej = ij ,

i, j = 1, . . . , n,

(9.1.5)

and the Hermitian scalar product of the vectors |u and |v assumes the form

u|v =

n

a i bi = v|u.

(9.1.6)

i=1

ai = ei |u,

i = 1, . . . , n.

(9.1.7)

|u =

n

|ei ai ,

(9.1.8)

i=1

n

n

|u =

|ei ei |u

|ei ei | |u,

i=1

(9.1.9)

i=1

n

i=1

n

|ei ei | = I,

(9.1.10)

i=1

Similarly, we have

n

n

n

u| =

ei |uei | =

u|ei ei | = u|

|ei ei | .

(9.1.11)

i=1

i=1

i=1

Thus (9.1.10) can be applied to both bra and ket vectors symmetrically and

what it expresses is simply the fact that |e1 , . . . , |en form an orthonormal

basis of Cn .

250

We may reexamine some familiar linear mappings under the new notation.

Let |u be a unit vector in Cn . Use P|u to denote the mapping that projects

n

C onto Span{|u} along (Span{|u}) . Then we have

P|u |v = u|v|u,

|v Cn .

(9.1.12)

Placing the scalar number u|v to the right-hand side of the above expression,

we see that the mapping P|u can be rewritten as

P|u = |uu|.

(9.1.13)

onto Span{|ui } along

Span{|u1 , . . . , |ui , . . . , |un }.

(9.1.14)

It is clear that

I=

n

(9.1.15)

P|ui ,

i=1

If T is an arbitrary Hermitian operator with eigenvalues 1 , . . . , n and the

associated orthonormal eigenvectors u1 , . . . , un (which form a basis of Cn ),

then we have the representation

T =

n

i |ui ui |,

(9.1.16)

i=1

either to the ket vector |v, from the left, or to the bra vector u|, from the right,

which will not cause any ambiguity.

To summarize, we may realize Cn by column vectors as ket vectors

a1

b1

.

.

n

(9.1.17)

|u =

.. , |v = .. C ,

an

bn

u| = (|u) = (a 1 , . . . , a n ),

(9.1.18)

Consequently,

u|v = (|u) |v =

n

i=1

a i bi ,

(9.1.19)

251

a1

.

|uv| = |u(|v) =

.. (b1 , . . . , bn )

an

a1 b1

an b1

a1 bn

= (ai bj ).

n

0

|ei ei | =

0

i=1

(9.1.20)

an bn

1

0

..

.

.

0

(9.1.21)

a self-adjoint mapping over Cn ), it is clear to explain A in u|A|v as to be

applied on the ket vector |v from the left or on the row vector u| from the

right, without ambiguity, since

u|A|v = u|(A|v) = (u|A)|v = (A|u) |v.

(9.1.22)

Exercises

9.1.1 Consider the orthonormal basis of C2 consisting of the ket vectors

1+i

1 1+i

1

|u1 =

, |u2 =

.

(9.1.23)

2 1i

2 1 + i

Verify the identity

|u1 u1 | + |u2 u2 | = I2 .

9.1.2 Consider the Hermitian matrix

A=

3+i

3i

(9.1.24)

(9.1.25)

in C(2, 2).

(i) Find an orthonormal basis of C2 consisting of eigenvectors, say

|u1 , |u2 , associated with the eigenvalues, say 1 , 2 , of A.

(ii) Find the matrices that represent the orthogonal projections

P|u1 = |u1 u1 |,

(9.1.26)

252

A = 1 P|u1 + 2 P|u2 = 1 |u1 u1 | + 2 |u2 u2 |.

(9.1.27)

In physics literature, quantum mechanics may be formulated in terms of two

objects referred to as states and observables. A state, also called a wave function, contains statistical information of a certain physical observable, which

often describes one of some measurable quantities such as energy, momenta,

and position coordinates of a mechanical system, such as those encountered

in describing the motion of a hypothetical particle. Mathematically, states are

vectors in a complex vector space with a positive definite Hermitian scalar

product, called the state space, and observables are Hermitian mappings over

the state space given. In this section, we present an over-simplified formalism

of quantum mechanics using the space Cn as the state space and Hermitian

matrices in C(n, n) as observables.

To proceed, we state a number of axioms, in our context, called the quantum

mechanical postulates, based on which quantum mechanics is built.

State postulate. The state of a mechanical system, hereby formally referred

to as a particle, is described by a unit ket vector | in Cn .

Observable postulate. A physically measurable quantity of the particle such

as energy, momenta, etc., called an observable, is represented by a Hermitian

matrix A C(n, n) so that its expected value for the particle in the state |,

denoted by A, is given by

A = |A|.

(9.2.1)

1 , . . . , n , which are known to be all real, and the corresponding eigenvectors of A be denoted by |u1 , . . . , |un , which form an orthonormal basis

of Cn . As a random variable, the measurement XA of the observable A,

when the particle lies in the state |, can result only in a reading among the

eigenvalues of A, and obeys the probability distribution

|i |2 ,

if is an eigenvalue,

i =

(9.2.2)

P ({XA = }) =

0,

if is not an eigenvalue,

253

{u1 , . . . , un }, that is,

| =

n

i |ui .

(9.2.3)

i=1

from the expected (or expectation) value formula

A =

n

i |i |2 ,

(9.2.4)

i=1

In the context of the measurement postulate, we have the following quantum

mechanical interpretation of an eigenstate.

Theorem 9.1 The particle lies in an eigenstate of the observable A if and

only if all measurements of A render the same value which must be some eigenvalue of A.

Proof Suppose that is an eigenvalue of A. Use E to denote the corresponding eigenspace. If the particle lies in a state | E , then i = ui | = 0

when i

= , where 1 , . . . , n are all the possible eigenvalues of A and

|u1 , . . . , |un are the corresponding eigenvectors that form an orthonormal

basis of Cn . Therefore

P ({XA = i }) = 0

when i

= ,

(9.2.5)

Conversely, if P ({XA = }) = 1 holds for some when the particle lies

in the state |, then according to the measurement postulate (9.2.2), we have

= i for some i = 1, . . . , n. Hence

|j |2 ,

(9.2.6)

0 = 1 P ({XA = i }) =

j

=i

= i. Thus | Ei as claimed.

We are now at a position to consider how a state evolves itself with respect

to time t.

Time evolution postulate. A state vector | follows time evolution according to the law

ih

d

| = H |,

dt

(9.2.7)

254

and h > 0 a universal constant called the Planck constant. Equation (9.2.7) is

the matrix version of the celebrated Schrdinger equation.

In classical mechanics, the Hamiltonian H of a system measures the total

energy, which may be written as the sum of the kinetic energy K and potential

energy V ,

H = K + V.

(9.2.8)

expressed in terms of the momentum P of the particle through the relation

K=

1 2

P .

2m

(9.2.9)

In quantum mechanics in our context here, both P and V are taken to be Hermitian matrices.

Let |(0) = |0 be the initial state of the time-dependent state vector

|(t). Solving (9.2.7), we obtain

i

U (t) = e h tH .

|(t) = U (t)|0 ,

(9.2.10)

i

Since H is Hermitian, H is anti-Hermitian. Hence U (t) is unitary,

h

U (t)U (t) = I,

(9.2.11)

which ensures the conservation of the normality of the state vector |(t).

That is,

(t)|(t) = 0 |0 = 1.

(9.2.12)

u1 , . . . , un be the associated eigenstates of H that form an orthonormal

basis of Cn . With the expansion

|0 =

n

0,i |ui ,

(9.2.13)

i=1

|(t) =

n

i=1

0,i e h tH |ui =

n

i=1

e h i t 0,i |ui =

n

i=1

(9.2.14)

255

where

i =

i

,

h

i = 1, . . . , n,

(9.2.15)

eii t 0,i |ui ,

i = 1, . . . , n.

(9.2.16)

eigenmodes with the associated angular frequencies determined by the eigenvalues of the Hamiltonian through (9.2.15).

As an observable, the Hamiltonian H measures the total energy. Thus the

eigenvalues 1 , . . . , n of H are the possible energy values of the system. For

this reason, we may use E to denote a generic energy value, among 1 , . . . , n .

Correspondingly, we use to denote a generic angular frequency, among

1 , . . . , n . Therefore we arrive at the generic relation

E = h,

(9.2.17)

towards an understanding of the photoelectric effect, which later became one of

the two basic equations in the waveparticle duality hypothesis of de Broglie

and laid the very foundation of quantum mechanics. Roughly speaking, the

formula (9.2.17) indicates that a particle of energy E behaves like a wave of

angular frequency and that a wave of angular frequency also behaves like

a particle of energy E.

Let A be an observable and assume that the system lies in the state

|(t) which is governed by the Schrdinger equation (9.2.7). We investigate

whether the expectation value A(t) = (t)|A|(t) is conserved or timeindependent. For this purpose, we compute

d

d

A(t) = (t)|A|(t)

dt

dt

d

d

(t)| A|(t) + (t)|A

|(t)

=

dt

dt

d

d

=

|(t)| A|(t) + (t)|A

|(t)

dt

dt

i

i

= H |(t) A|(t) + (t)|A H |(t)

h

h

1

2

i

= (t) [H, A](t) ,

(9.2.18)

h

256

where

[H, A] = H A AH

(9.2.19)

and A. Hence we see that A is time-independent if A commutes with H :

[H, A] = 0.

(9.2.20)

related through the momentum P and potential V through (9.2.8) and (9.2.9)

so that P commutes with V , then [H, P ] = 0 and the average momentum

P is also conserved. These are the quantum mechanical extensions of laws

of conservation for energy and momentum.

Exercises

9.2.1 Consider the Hermitian matrix

A = i

0

i

3

1i

1+i

(9.2.21)

as an observable of a system.

(i) Find the eigenvalues of A as possible readings when measuring the

observable A.

(ii) Find the corresponding unit eigenvectors of A, say |u1 , |u2 , |u3 ,

which form an orthonormal basis of the state space C3 .

(iii) Assume that the state the system occupies is given by the vector

i

1

(9.2.22)

| = 2 + i

15

3

and resolve | in terms of |u1 , |u2 , |u3 .

(iv) If the system stays in the state |, determine the probability distribution function of the random variable XA , which is the random

value read each time when a measurement about A is made.

(v) If the system stays in the state |, evaluate the expected value, A,

of XA .

9.2.2 Consider the Schrdinger equation

d

ih | = H |,

dt

H =

(9.2.23)

257

the solution |(t) of (9.2.23) satisfying the initial condition

1

1

|(0) = |0 =

.

(9.2.24)

2 1

(ii) Consider a perturbation of the Hamiltonian H given as

0 1

, R,

H = H + 1 , 1 =

1 0

(9.2.25)

where 1 is known as one of the Pauli matrices. Show that the commutator of H and H is

1 0

[H, H ] = 2i3 , 3 =

,

(9.2.26)

0 1

where 3 is another Pauli matrix, and use it and (9.2.18) to evaluate

the rate of change of the time-dependent expected value H (t) =

(t)|H |(t).

(iii) Establish the formula

(t)|H |(t) = 0 |H |0 + (t)|1 |(t)

(9.2.27)

d

and use it to verify the result regarding H (t) obtained in (ii)

dt

through the commutator identity (9.2.18).

Let {|ui } be an orthonormal set of eigenstates of a Hermitian matrix A

C(n, n) with the associated real eigenvalues {i } and use XA to denote the

random variable of measurement of A. If the system lies in the state |

with i = ui | (i = 1, 2, ), the distribution function of XA is as

given in (9.2.2). Thus the variance of XA can be calculated according to the

formula

A2 =

n

(i A)2 |i |2

i=1

= |(A AI )|(A AI )|,

(9.3.1)

258

which is given in a form free of the choice of the basis {|ui }. Thus,

with (9.3.1), if A and B are two observables, then the Schwarz inequality

implies that

A2 B2 ||(A AI )|(B BI )||2 |c|2 ,

(9.3.2)

c = |(A AI )|(B BI )|

= |(A AI )(B BI )|

= |AB| B|A| A|B| + AB|

= AB AB.

(9.3.3)

c = |(B BI )|(A AI )|

= BA AB.

(9.3.4)

Therefore, we obtain

1

1

(c c) = [A, B].

2i

2i

Inserting (9.3.5) into (9.3.2), we arrive at the inequality

2

1

[A, B] ,

A2 B2

2i

(c) =

(9.3.5)

(9.3.6)

achieve simultaneous high-precision measurements for them. To put the statement in another way, if we know one observable with high precision, we do

not know the other observable at the same time with high precision. This fact,

in particular, the inequality (9.3.6), in quantum mechanics, is known as the

Heisenberg uncertainty principle.

On the other hand, when A and B commute, we know that A and B share

the same eigenstates which may form an orthonormal basis of Cn . Let be a

commonly shared eigenstate so that

A| = A |,

B| = B |,

A , B R.

(9.3.7)

If the system lies in the state |, then, with simultaneous full certainty, the

measured values of the observables A and B are A and B , respectively.

Let k 1 be an integer and define the kth moment of an observable A in the

state | by

Ak = |Ak |.

(9.3.8)

259

state | may be computed using the formula

A2 = A2 A2 .

(9.3.9)

In probability theory, the radical root of the variance, A = A2 , is called

standard deviation. In quantum mechanics, A is also called uncertainty,

which measures the randomness of the observed values of the observable A.

It will be instructive to identify those states which will render the maximum

uncertainty. To simplify our discussion, we shall assume that A C(n, n) has

n distinct eigenvalues 1 , . . . , n . As before, we use |u1 , . . . , |un to denote

the corresponding eigenstates of A which form an orthonormal basis of the

state space Cn . In order to emphasize the dependence of the uncertainty on the

2

to denote the associated variance. We are

underlying state |, we use A,|

to solve the problem

2

max{A,|

| | = 1}.

(9.3.10)

| =

n

i |ui .

(9.3.11)

i=1

A2 =

n

2i |i |2 ,

A =

i=1

n

i |i |2 .

(9.3.12)

i=1

Hence, we have

2

A,|

n

i=1

2i |i |2

n

2

i |i |

(9.3.13)

i=1

instead the constrained maximization problem

n

2

n

max

2i xi2

i xi2

,

i=1

i=1

(9.3.14)

n

xi = 1.

i=1

260

Thus, using calculus, the maximum points are to be sought among the solutions

of the equations

n

j xj2 = 0, i = 1, . . . , n,

xi 2i 2i

(9.3.15)

j =1

summing over i = 1, . . . , n, we find

= A2 2A2 .

Consequently (9.3.15) takes the form

$

#

xi 2i 2Ai [A2 2A2 ] = 0,

(9.3.16)

i = 1, . . . , n.

(9.3.17)

2 2A (A2 2A2 ) = 0

has two real roots

= A

(

A2 A2 = A A

(9.3.18)

(9.3.19)

in the nontrivial situation A > 0, we see that there are at least n 2 values of

i = 1, . . . , n such that

2i 2Ai (A2 2A2 )

= 0,

(9.3.20)

of i. For definiteness, we assume xi = 0 when i

= 1, 2. Hence (9.3.14) is

reduced into

#

$2

max 2 x 2 + 2 x 2 1 x 2 + 2 x 2

,

1 1

2 2

1

2

(9.3.21)

x 2 + x 2 = 1.

1

2

Using the constraint in (9.3.21), we may simplify the objective function of the

problem (9.3.21) into the form

(1 2 )2 (x12 x14 ),

(9.3.22)

1

|x1 | = |x2 | = .

2

(9.3.23)

261

In this case, it is straightforward to check that 1 and 2 are indeed the two

roots of equation (9.3.19). In particular,

2

=

A,|

1

(1 2 )2 .

4

(9.3.24)

Consequently, if we use min and max to denote the smallest and largest eigenvalues of A, then we see in view of (9.3.24) that the maximum uncertainty is

given by

A,max =

1

(max min ),

2

(9.3.25)

which is achieved when the system occupies the maximum uncertainty state

|max = a|umax + b|umin ,

a, b C,

1

|a| = |b| = .

2

(9.3.26)

A,|max = A,max .

(9.3.27)

Exercises

9.3.1 Let A, B C(2, 2) be two observables given by

1 i

2 1 i

A=

, B=

.

i 2

1+i

3

Evaluate the quantities A2 , B2 , and [A, B] in the state

1

1

,

| =

2

5

(9.3.28)

(9.3.29)

9.3.2 Consider an observable A C(n, n), which has n distinct eigenvalues 1 , . . . , n . Let |u1 , . . . , |un be the corresponding eigenstates of

A which form an orthonormal basis of Cn . Consider the uniform state

given by

1

|ui .

| =

n

n

(9.3.30)

i=1

(ii) Compare your result with the maximum uncertainty given in the

expression (9.3.25).

262

1

2+i 0

A = 2 i 3 0 .

0

(9.3.31)

(ii) Find all maximum uncertainty states of A.

(iii) Let | be the uniform state defined in Exercise 9.3.2. Find the

uncertainty of A in the state | and compare it with the maximum

uncertainty of A found in (i).

So far our description of quantum mechanics has been based on the

Schrdinger equation, which governs the evolution of a state vector. Such a

description of quantum mechanics is also called a Schrdinger picture. Here

we study another important description of quantum mechanics called the

Heisenberg picture within which the state vector is time-independent but

observables evolve with respect to time following a dynamical equation similar

to that seen in classical mechanics.

We start from the Schrdinger equation (9.2.7) defined by the Hamiltonian H . Let |(t) be a state vector and |(0) = |0 . Then

i

|(t) = e h tH |0 .

(9.4.1)

Thus, for any observable A, its expected value in the state |(t) is given by

i

i

A(t) = (t)|A|(t) = 0 |e h tH Ae h tH |0 .

(9.4.2)

with a correctly formulated time-dependent version, A(t), we are to have the

same mechanical conclusion. In particular, the expected value of A at time t in

the state |(t) must be equal to the expected value of A(t) in the state |0 .

That is,

(t)|A|(t) = 0 |A(t)|0 .

(9.4.3)

i

i

0 |e h tH Ae h tH A(t)|0 = 0.

Therefore, using the arbitrariness of |0 , we arrive at the relation

(9.4.4)

i

i

A(t) = e h tH Ae h tH ,

263

(9.4.5)

which indicates how an observable should evolve itself with respect to time.

Differentiating (9.4.5), we are led to the equation

i

d

A(t) = [H, A(t)],

(9.4.6)

dt

h

which spells out the dynamical law that a time-dependent observable must

follow and is known as the Heisenberg equation.

We next show that (9.4.6) implies (9.2.7) as well.

As preparation, we establish the following Gronwall inequality which is

useful in the study of differential equations: If f (t) and g(t) are continuous

non-negative functions in t 0 and satisfy

t

f ( )g( ) d, t 0,

(9.4.7)

f (t) a +

0

f (t) a exp

g( ) d ,

t 0.

(9.4.8)

t

f (t) < a + +

f ( )g( ) d,

t 0,

(9.4.9)

t 0,

(9.4.10)

and set

h(t) = a + +

f ( )g( ) d,

0

f (t)g(t) for t > 0 and h(0) = a + . Multiplying (9.4.9) by g(t), we have

h (t)

g(t),

h(t)

Integrating (9.4.11), we obtain

t 0.

(9.4.11)

g( ) d ,

t 0.

(9.4.12)

However, (9.4.10) and (9.4.11) indicate that f (t) < h(t) (t 0). Hence we

may use (9.4.12) to get

t

f (t) < (a + ) exp

g( ) d , t 0.

(9.4.13)

0

264

We now turn our attention back to the Heisenberg equation (9.4.6). Suppose

that B(t) is another solution of the equation. Then C(t) = A(t)B(t) satisfies

d

i

C(t) = [H, C(t)],

dt

h

Therefore, we have

2

H C(t).

h

On the other hand, we may use the triangle inequality to get

C (t)

h > 0,

(9.4.14)

(9.4.15)

(9.4.16)

d

(9.4.17)

C(t) C (t).

dt

Inserting (9.4.17) into (9.4.15) and integrating, we find

2 t

C(t) C(0) +

H C( ) d, t 0.

(9.4.18)

h 0

Consequently, it follows from applying the Gronwall inequality that

2

C(t) C(0)e h H t ,

t 0.

(9.4.19)

The same argument may be carried out in the domain t 0 with the time

flipping t # t. Thus, in summary, we obtain the collective conclusion

2

t R.

(9.4.20)

the initial value problem of the Heisenberg equation (9.4.6) is unique. Hence,

if A(0) = A, then the solution is uniquely given by (9.4.5). As a consequence,

if A commutes with H , A(t) = A for all time t. In other words, if an observable commutes with the Hamiltonian initially, it stays commutative with the

Hamiltonian and remains in fact constant for all time.

We can now derive the Schrdinger equation (9.2.7) from the Heisenberg

equation (9.4.6).

In fact, let A(t) be the unique solution of (9.4.6) evolving from its initial

state A. Then A(t) is given by (9.4.5). Let |(t) denote the state vector that

evolves with respect to t from its initial state vector |0 so that it gives rise

to the same expected value as that evaluated using the Heisenberg equation

through A(t). Then (9.4.3) holds. We hope to examine, to what extent, the

relation (9.4.1) must be valid. To this end, we assume that

u|A|u = v|A|v

(9.4.21)

265

holds for any Hermitian matrix A C(n, n) for some |u, |v Cn and we

investigate how |u and |v are related.

If |v = 0, then u|A|u = 0 for any Hermitian matrix A, which implies

|u = 0 as well. Thus, in the following, we only consider the nontrivial situation |u

= 0, |v

= 0.

If v

= 0, we set V = Span{|v} and W = V . Choose a Hermitian matrix

A so that |x # A|x (|x Cn ) defines the projection of Cn onto V along

W . Write |u = a|v + |w, where a C and |w W . Then A|u = a|v.

Thus

u|A|u = |a|2 v|v = |a|2 v|A|v,

(9.4.22)

which leads to

|a| = 1

or

a = ei ,

R.

(9.4.23)

defines the projection of Cn onto W along V . Then A|v = 0 and A|w = |w.

Inserting these into (9.4.21), we find

0 = v|A|v = u|A|u = w|w,

(9.4.24)

which gives us the result w = 0. In other words, that (9.4.21) holds for any

Hermitian matrix A C(n, n) implies that |u and |v differ from each other

by a phase factor a satisfying (9.4.23). That is,

|u = ei |v,

R.

(9.4.25)

(t)|A|(t) = (t)|A|(t),

(9.4.26)

where

i

|(t) = e h tH |0 .

(9.4.27)

i

|(t) = ei(t) e h tH |0 ,

(9.4.28)

(9.4.26) and may well taken to be zero. In other words, we are prompted to

conclude that the state vector should follow the law of evolution given simply

by (9.4.1) or by (9.4.28) with setting (t) 0, which is the unique solution of

the Schrdinger equation (9.2.7) subject to the initial condition |(0) = |0 .

Thus the Schrdinger equation inevitably comes into being as anticipated.

266

Exercises

9.4.1 Consider the Heisenberg equation (9.4.6) subject to the initial condition

A(0) = A0 . An integration of (9.4.6) gives

i t

A(t) = A0 +

[H, A( )] d.

(9.4.29)

h 0

(i) From (9.4.29) derive the result

[H, A(t)] = [H, A0 ] +

i

h

(9.4.30)

(ii) Use (9.4.30) and the Gronwall inequality to come up with an alternative proof that H and A(t) commute for all time if and only if

they do so initially.

9.4.2 Consider the Hamiltonian H and an observable A given by

2 i

1

1i

H =

, A=

.

i 2

1+i

1

(9.4.31)

(i) Solve the Schrdinger equation (9.2.7) to obtain the timedependent state |(t) evolving from the initial state

1

1

.

(9.4.32)

|0 =

i

2

(ii) Evaluate the expected value of the observable A assuming that the

system lies in the state |(t).

(iii) Solve the Heisenberg equation (9.4.6) with the initial condition

A(0) = A and use it to evaluate the expected value of the same

observable within the Heisenberg picture. That is, compute the

quantity 0 |A(t)|0 . Compare the result with that obtained in (ii)

and explain.

Section 1.1

1.1.2 Let n satisfy 1 n < p and write 2n = kp + l for some integers k and

l where 0 l < p. Then n + (n l) = kp. Thus [n l] = [n].

If [n]

= [0], then n is not a multiple of p. Since p is a prime, so the

greatest common divisor of n and p is 1. Thus there are integers k, l such

that

kp + ln = 1.

(S1)

[l] = [n]1 .

We may also prove the existence of [n]1 without using (S1) but by

using an interesting statement called Fermats Little Theorem: For any

nonnegative integer m and positive integer p, the integer mp m is

divisible by p. We prove this theorem by induction. When m = 0, there

is nothing to show. Assume the statement is true at m 0. At m + 1 we

have

p1

p

p

(S2)

mk + (mp m),

(m + 1) (m + 1) =

k

k=1

which is clearly divisible by p in view of the inductive assumption and

the definition of binomial coefficients. So the theorem follows.

Now we come back to the construction of [n]1 . Since p is a prime

and divides np n = n(np1 1) but not n, p divides np1 1. So

there is an integer k such that np1 1 = kp. Therefore nnp2 = 1

modulo p. That is, [np2 ] = [n]1 .

1.1.7 Multiplying the relation AB = I by C from the left we have C(AB) =

C which gives us B = I B = (CA)B = C(AB) = C.

267

268

Section 1.2

1.2.7 We have

a1

.

ut v =

.. (b1 , . . . , bn ) =

an

a1 b1

a2 b1

an b1

a1 b2

a2 b2

an b2

a1 bn

a2 bn

,

an bn

ai (b1 , b2 , . . . , bn ) and

aj (b1 , b2 , . . . , bn ),

For column vectors we proceed similarly.

1.2.8 If either U1 U2 or U2 U1 , the statement is trivially true. Suppose

otherwise and pick u1 U1 but u1

U2 and u2 U2 but u2

U1 . We

assert that u = u1 +u2

U1 U2 . If u U1 U2 then u U1 or u U2 .

So, respectively, u2 = u + (u1 ) U1 or u1 = u + (u2 ) U2 , which

is false.

1.2.9 Without loss of generality we may assume that there are no such i, j ,

i, j = 1, . . . , k, i

= j , that Ui Uj . Thus, in view of the previous

exercise, we know that for k = 2 we can find u1 U1 and u2 U2

such that u = u1 + u2

U1 U2 . Assume that the statement of the

problem is true at k = m 2. We proceed to prove the statement at

k = m + 1.

Assume otherwise that U = U1 Um+1 . By the inductive assumption there is some u U such that

u

U1 Um .

(S3)

Um+1 and consider the following m + 1

vectors

w1 = u + v,

w2 = 2u + v,

...,

wm+1 = (m + 1)u + v.

U1 Um .

Otherwise, if wi U1 Um for all i = 1, . . . , m + 1, then there are

i, j = 1, . . . , m + 1, i

= j , such that wi , wj lie in one of the subspaces

U1 , . . . , Um , say Ul (1 l m), which leads to wi wj = (i j )u

Ul or u Ul , a contradiction to (S3).

Now assume wi

U1 Um for some i = 1, . . . , m + 1. Thus

wi Um+1 . So v = wi + (iu) Um+1 since u Um+1 , which is

again false.

269

Section 1.4

1.4.3 Without loss of generality assume f

= 0. Then there is some u U

such that f (u)

= 0. For any v U consider

w=v

f (v)

u.

f (u)

g(v) =

g(u)

f (v) af (v),

f (u)

v U.

That is, g = af .

1.4.4 Let {v1 , . . . , vn1 } be a basis of V . Extend it to get a basis of Fn , say

{v1 , . . . , vn1 , vn }. Let {f1 , . . . , fn1 , fn } be a basis of (Fn ) dual to

{v1 , . . . , vn1 , vn }. It is clear that V 0 = Span{fn }. On the other hand,

for any (x1 , . . . , xn ) Fn we have

v = (x1 , x2 , . . . , xn ) (x1 + x2 + + xn , 0, . . . , 0) V .

So

0 = fn (v) = fn (x1 , . . . , xn ) (x1 + + xn )fn (e1 ),

(x1 , . . . , xn ) Fn .

(S4)

of (S4) we obtain

0

n

xi ,

(x1 , . . . , xn ) Fn .

i=1

Section 1.7

1.7.2 Assume the nontrivial situation u

= 0. It is clear that up u for

any p 1. Thus lim sup up u .

p

(0, u ) we can find an interval around t0 , say I [a, b], such that

|u(t)| > u when t I . Thus

1

p

1

up

|u(t)|p dt

(u ) |I | p ,

(S5)

I

we find lim inf up u . Since > 0 may be chosen arbitrarily

p

p

270

1.7.3

(i) Positivity: Of course u 0 for any u U . If u = 0,

then |u (u)| = 0 for any u U satisfying u = 1. Thus, for any

1

v) = 0, which shows

v U, v

= 0, we have u (v) = vu (

v

u (v) = 0 for any v U . So u = 0.

(ii) Homogeneity: For any a F and u U , we have

au = sup{|au (u)| | u U, u = 1}

= |a| sup{|u (u)| | u U, u = 1} = |a|u .

(iii) Triangle inequality: For any u , v U , we have

u + v = sup{|u (u) + v (u)| | u U, u = 1}

sup{|u (u)| + |v (u)| | u U, u = 1}

sup{|u (u)| | u U, u = 1}

+ sup{|v (v)| | v U, v = 1}

= u + v .

1.7.4

(i) Positivity: Of course u 0. If u = 0, then |u (u)| = 0 for all

u =

1. Then for any v U , v

= 0, we

u U satisfying

1

have v (u) = v

v (u) = 0, which shows v (u) = 0 for

v

any v U . Hence u = 0.

(ii) Homogeneity: Let a F and u U . We have

au = sup{|u (au)| | u U , u = 1}

= |a| sup{|u (u)| | u U , u = 1} = |a|u.

(iii) Triangle inequality: For u, v U , we have

u + v = sup{|u (u + v)| | u U , u = 1}

sup{|u (u)| + |u (v)| | u U , u = 1}

sup{|u (u)| | u U , u = 1}

+ sup{|v (v)| | v U , v = 1}

= u + v.

Section 2.1

2.1.7 (i) Since f, g

= 0 we have dim(f 0 ) = dim(g 0 ) = n 1. Let

{u1 , . . . , un1 } be a basis of f 0 and take v g 0 but v

f 0 .

Then u1 , . . . , un1 , v are linearly independent, and hence, form a

basis of U . In particular, U = f 0 + g 0 .

271

n = dim(f 0 ) + dim(g 0 ) dim(f 0 g 0 )

the answer follows.

2.1.8 We have N (T ) N (T 2 ) and R(T 2 ) R(T ). On the other hand, if

n = dim(U ), the rank equation gives us

n(T ) + r(T ) = n = n(T 2 ) + r(T 2 ).

Thus n(T ) = n(T 2 ) if and only if r(T ) = r(T 2 ). So N (T 2 ) = N (T )

if and only if R(T 2 ) = R(T ).

2.1.10 Let w1 , . . . , wk W be a basis of R(S T ). Then there are

u1 , . . . , uk U such that S(T (ui )) = wi , i = 1, . . . , k.

Let vi = T (ui ), i = 1, . . . , k. Then Span{v1 , . . . , vk } R(T ).

Hence k r(T ). Of course, R(S T ) R(S). So k r(S) as well.

Next, let u1 , . . . , uk U form a basis of N (T ) N (S T ). Let

z1 , . . . , zl N (S T ) so that u1 , . . . , uk , z1 , . . . , zl form a basis of

N(S T ). Of course T (z1 ), . . . , T (zl ) N (S). We assert that these

are linearly independent. In fact, if there are scalars a1 , . . . , al such that

a1 T (z1 ) + + al T (zl ) = 0, then T (a1 z1 + + al zl ) = 0, which

indicates that a1 z1 + + al zl N (T ). Hence a1 z1 + + al zl =

b1 u1 + + bk uk for some scalars b1 , . . . , bk . However, we know that

u1 , . . . , uk , z1 , . . . , zl are linearly independent. So as and bs are all

zero. In particular, T (z1 ), . . . , T (zl ) are linearly independent. Hence

l n(S), which proves k + l n(T ) + n(S).

2.1.12 Define T L(Fm , Fn ) by T (x) = Bx, x Fm . Then, since R(T )

Rn , we have r(T ) n. By the rank equation n(T ) + r(T ) = m and the

condition m > n, we see that n(T ) > 0. Hence there is a nonzero vector

x Fm such that T (x) = 0 or Bx = 0. Thus (AB)x = A(Bx) = 0

which proves that the m m matrix AB cannot be invertible.

2.1.14 Use N and R to denote the null-space and range of a mapping. Let

N (R) R(S T ) = Span{w1 , ..., wk } W,

where w1 , ..., wk are independent vectors.

Expand {w1 , ..., wk } to get a basis for R(S T ) so that

R(S T ) = Span{w1 , ..., wk , y1 , ..., yl }.

Then {R(y1 ), ..., R(yl )} is a basis for R(R S T ) since

R(w1 ) = ... = R(wk ) = 0

272

and R(y1 ), ..., R(yl ) are independent. In particular, r(R S T ) = l.

Since R(S T ) R(S), we can expand {w1 , ..., wk , y1 , ..., yl } to

get a basis for R(S) so that

R(S) = Span{w1 , ..., wk , y1 , ..., yl , z1 , ..., zm }.

Now we can count the numbers as follows:

r(R S) = dim (Span{R(w1 ), ..., R(wk ),

R(y1 ), ..., R(yl ), R(z1 ), ..., R(zm )})

= dim(Span{R(y1 ), ...,R(yl ), R(z1 ), ..., R(zm )})

l + m,

r(S T ) = k + l.

So

r(R S) + r(S T ) l + m + k + l

= (k + l + m) + l

= r(S) + r(R S T ).

T (y1 , . . . , yk , z1 , . . . , zl ) =

k

yi vi +

l

zj w j ,

j =1

i=1

(y1 , . . . , yk , z1 , . . . , zl ) Fk+l .

Then N(T ) = S. From the rank equation we have n(T ) + r(T ) = k + l

or dim(S) + r(T ) = k + l. On the other hand, it is clear that R(T ) =

V + W . So r(T ) = dim(V + W ). From the dimensionality equation

(1.5.6) we have dim(V + W ) = dim(V ) + dim(W ) dim(V W ), or

r(T ) + dim(V W ) = k + l. Therefore dim(S) = dim(V W ).

Section 2.2

2.2.4 Define T L(R2 ) by setting

a b

T (x) =

x,

c d

x=

Then we have

T (e1 ) = ae1 + ce2 ,

T (e2 ) = be1 + de2 ,

x1

x2

R2 .

273

and

1

1

(a + b + c + d)u1 + (a + b c d)u2 ,

2

2

1

1

T (u2 ) = (a b + c d)u1 + (a b c + d)u2 ,

2

2

so that the problem follows.

2.2.5 Let A = (aij ) and define T L(Fn ) by setting T (x) = Ax where

x Fn is taken to be a column vector. Then

n

aij ei , j = 1, . . . , n.

T (ej ) =

T (u1 ) =

i=1

n

f1 = en ,

f2 = en1 ,

...,

fn = e1 ,

T (fj ) =

n

ani+1,nj +1 fi

i=1

n

bij fi ,

j = 1, . . . , n.

i=1

under the bases {e1 , . . . , en } and {f1 , . . . , fn }, respectively, we conclude

that A B.

Section 2.3

2.3.3 That the equation T (u) = v has a solution for some u U is equivalent

to v R(T ), which is equivalent to v N (T )0 (by Theorem 2.8),

which is equivalent to v, v = 0 for all v N (T ) or T (v ) = 0.

Section 2.4

2.4.3 Let k = r(T ) = dim(R(T )). Let {[v1 ]Y , . . . , [vk ]Y } be a basis of R(T )

in V /Y . Then there are [u1 ]X , . . . , [uk ]X U/X such that

T ([u1 ]X ) = [v1 ]Y ,

...,

T ([uk ]) = [vk ]Y .

[T (ui )]Y for i = 1, . . . , k. We claim that T (u1 ), . . . , T (uk ) are linearly

independent. In fact, let a1 , . . . , ak be scalars such that

a1 T (u1 ) + + ak T (uk ) = 0.

Taking cosets, we get a1 [T (u1 )]Y + + ak [T (uk )]Y = [0]Y , which

leads to a1 [v1 ]Y + + ak [vk ]Y = [0]Y . Thus a1 = = ak = 0. This

proves r(T ) k.

274

Section 2.5

2.5.12 (i) Assume R(S) = R(T ). Since S projects U onto R(S) = R(T )

along N (S), we have for any u U the result S(T (u)) = T (u).

Thus S T = T . Similarly, T S = S.

Furthermore, from S T = T , then T (U ) = (S T )(U ) implies

R(T ) R(S). Likewise, T S = S implies R(S) R(T ). So

R(S) = R(T ).

(ii) Assume N (S) = N (T ). For any u U , rewrite u as u = v + w

with v R(T ) and w N (T ). Then T (v) = v, T (w) = 0, and

S(w) = 0 give us

(S T )(u) = S(T (v) + T (w)) = S(v) = S(v + w) = S(u).

Hence S T = S. Similarly, T S = T .

Assume S T = S, T S = T . Let u N (T ). Then S(u) =

S(T (u)) = 0. So u N (S). So N (T ) N (S). Interchange S

and T . We get N (S) N (T ).

2.5.14 (i) We have

T 2 (u) = T (u, u1 u1 + u, u2 u2 )

= u, u1 T (u1 ) + u, u2 T (u2 )

= u, u1 (u1 , u1 u1 + u1 , u2 u2 )

+ u, u2 (u2 , u1 u1 + u2 , u2 u2 )

= (u, u1 u1 , u1 + u, u2 u2 , u1 )u1

+ (u, u1 u1 , u2 + u, u2 u2 , u2 )u2

= T (u) = u, u1 u1 + u, u2 u2 .

(S6)

u, u1 u1 , u1 + u, u2 u2 , u1 = u, u1 ,

u, u1 u1 , u2 + u, u2 u2 , u2 = u, u2 .

Namely,

u, u1 , u1 u1 + u2 , u1 u2 = u, u1 ,

u, u1 , u2 u1 + u2 , u2 u2 = u, u2 .

Since u U is arbitrary, we have

u1 , u1 u1 + u2 , u1 u2 = u1 ,

275

u1 , u1 = 1,

u2 , u1 = 0,

u1 , u2 = 0,

u2 , u2 = 1.

(S7)

T 2 (u) = u, u1 u1 + u, u2 u2 = T (u),

u U.

that T is a projection.

(ii) Assume T is a projection. Then (S7) holds. From V = {u

U | T (u) = u} and the definition of T , we see that V

Span{u1 , u2 }. Using (S7), we also have u1 , u2 V . So V =

Span{u1 , u2 }.

If T (u) = 0, then u, u1 u1 + u, u2 u2 = 0. Since u1 , u2

are independent, we have u, u1 = 0, u, u2 = 0. So W =

(Span{u1 , u2 })0 .

2.5.15 We have T (T aI ) = 0 or (aI T )T = T (aI T ) = 0. Besides,

1

1

1

1

I = I T + T = (aI T ) + T .

a

a

a

a

Therefore, we may rewrite any u U as

u = I (u) =

Thus

1

1

(aI T )(u) + T (u) v + w.

a

a

(S8)

1

(aI T )(u) = 0,

a

1

T (u) = 0.

(aI T )(w) = (aI T )

a

T (v) = T

Let u N (aI T ) N (T ). Inserting this into (S8), we get u = 0.

So N(aI T ) N (T ) = {0} and U = N (aI T ) N (T ).

Since T commutes with aI T and T , it is clear that N (aI T )

and N(T ) are invariant under T .

2.5.25 (i) Since T is nilpotent of degree k, there is an element u U such

that T k1 (u)

= 0 but T k (u) = 0. It is clear that

V = Span{u, T (u), . . . , T k1 (u)}

is a k-dimensional subspace of U invariant under T . Let W be the

complement of Span{T k1 (u)} in N (T ). Then dim(W ) = l 1.

276

Consider the subspace X = V + W of U . Pick v V W and

write v as

v = a0 u + a1 T (u) + + ak1 T k1 (u),

a0 , a1 , . . . , ak1 F.

leads to a0 = a1 = = ak2 = 0. So v Span{T k1 (u)} and

v W which indicates v = 0. Hence X = V W . However, since

dim(X) = dim(V ) + dim(W ) = k + (l 1) = n, we have X = U

so that T is reducible over V , W .

(ii) It is clear that R(T ) = Span{T (u), . . . , T k1 (u)} and r(T ) =

k 1.

2.5.26 (i) Write S T as S (I S 1 T ) and set P = S 1 T . Then

P is nilpotent as well. Let the degree of P be k. Assume P

= 0.

Then k 2. It may be checked that I P is invertible since

I + P + + P k1 is the inverse of I P .

(ii) For A R(2, 2) define TA L(R2 ) by TA (x) = Ax where x R2

is a column vector. Now set

0 1

0 1

A=

, B=

.

1 0

0 0

Then TA is invertible, TB is nilpotent, but TA TB is not invertible.

It is direct to check that TA , TB do not commute.

2.5.27 Let v V . Then T (v) R(T ) V since V is invariant under T . Since

R(T ) V = {0} we have T (v) = 0. So v N (T ). That is, V N (T ).

On the other hand, the rank equation indicates that dim(V ) = n(T ) =

dim(N(T )). So V = N (T ).

Section 2.6

2.6.1 Write Sn = Tn T . Then

T = Tn Sn = Tn (I Rn ),

(S9)

as n , we may assume

that {Tn1 } is bounded without loss of generality. Since Sn 0

as n we have Rn Tn1 Sn 0 as n as well.

So in view of Theorem 2.25 we see that I Rn is invertible when n is

sufficiently large, which leads to the false conclusion that T is invertible

in view of (S9).

277

that T is invertible for all

= 0 and P T 0 as 0. Since

T is invertible so it can never be nilpotent.

Section 3.1

3.1.3

(i) Let C be a closed curve around but away from the origin of R2 and

consider

u : C S1,

u(x, y) =

f (x, y)

,

f (x, y)

(x, y) C.

R

R

x =

cos , y =

sin , 0 2. Then u = (u1 , u2 ) =

a

b

(cos , sin ) (on C). So ind(f |C ) = deg(u) = 1. On the other

hand, on C, we have

g(x, y)2 = (a 2 x 2 b2 y 2 )2 + 4a 2 b2 x 2 y 2 = (a 2 x 2 + b2 y 2 )2 = R 4 .

Now we can construct v : C S 1 by setting

v(x, y)

g(x, y)

1

= 2 (a 2 x 2 b2 y 2 , 2abxy)

g(x, y)

R

= (cos2 sin2 , 2 cos sin ).

(ii) The origin of R2 is the only zero of f and g that is a simple zero of

f and a double zero of g.

(iii) Since f and g are small perturbations of f and g, by stability we

deduce ind(f |C ) = 1, ind(g |C ) = 2. However, f still has exactly

one simple zero, x = , y = 0, but g has two simple zeros, x =

of the latter splits into two simple zeros. Note that, algebraically,

a double zero and two simple zeros are both counted as two zeros,

which is, loosely speaking, indicative of the result ind(g |C ) = 2.

3.1.4 Consider the vector field

$

#

2 2

v = x 3 3xy 2 5 cos2 (x + y), 3x 2 y y 3 + 2ex y .

We are to show that v has a zero somewhere. We use the deformation

method again to simplify the problem. For this purpose, we set

#

$

2 2

v t = x 3 3xy 2 5t cos2 (x + y), 3x 2 y y 3 +2tex y , 0 t 1.

278

for x 2 + y 2 = r R, t [0, 1]. Thus, we only need to show

that over a suitably large circle given by CR = {r = R}, we have

ind(v 0 |CR )

= 0. Over CR we have x = R cos , y = R sin ,

[0, 2]. On the other hand, for z = x + iy, we have z3 = (x 3 3xy 2 ) +

i(y 3 + 3x 2 y). Hence, with z = Rei , we get v 0 = R 3 (cos 3, sin 3 ),

1

so that u= 0 v 0 =(cos 3, sin 3 ). Thus deg(u) = 3. So ind(v|CR ) =

v

ind(v 1 |CR ) = ind(v 0 |CR ) = 3

= 0 as expected and v must vanish at

some point inside CR .

3.1.5 Inserting the expression for the stereographic projection given and going

through a tedious calculation, we obtain

1

1

dxdy

deg(u) =

R2 (1 + x 2 + y 2 )2

r

1 2

=

d

dr = 1.

0

(1

+

r 2 )2

0

3.1.6 Inserting the hedgehog expression and integrating, we obtain

deg(u) =

=

2

u u

1

u

d dr

4 0

r

0

n 0

sin f df = n,

2

which indicates that the map covers the 2-sphere, while preserving the

orientation, n times.

Section 3.2

3.2.6 Consider the matrix C = aA + bB. The entries of C not in the j th

column are the corresponding entries of A multiplied by (a + b) but the

j th column of C is equal to the sum of the a-multiple of the j th column

of A and b-multiple of the j th column of B. So by the properties of

determinants, we have

.

..

.

..

an1

= (a + b)

n1

aanj + bbnj

(a det(A) + b det(B)).

ann

279

3.2.7 Let the column vectors in A(t) be denoted by A1 (t), . . . , An (t). Then

det(A(t + h)) det(A(t))

= |A1 (t + h), A2 (t + h), . . . , An (t + h)|

|A1 (t), A2 (t), . . . , An (t)|

= |A1 (t + h), A2 (t + h), . . . , An (t + h)|

|A1 (t), A2 (t + h), . . . , An (t + h)|

+ |A1 (t), A2 (t + h), A3 (t + h) . . . , An (t + h)|

|A1 (t), A2 (t), A3 (t + h), . . . , An (t + h)|

+ |A1 (t), A2 (t), A3 (t + h), . . . , An (t + h)|

+ + |A1 (t), A2 (t), . . . , An (t + h)|

|A1 (t), A2 (t), . . . , An (t)|

= |A1 (t + h) A1 (t), A2 (t + h), . . . , An (t + h)|

+ |A1 (t), A2 (t + h) A2 (t), A3 (t + h), . . . , An (t + h)|

+ + |A1 (t), A2 (t), . . . , An (t + h) An (t)|.

Dividing the above by h

= 0, we get

1

(det(A(t + h)) det(A(t)))

h

1

= (A1 (t + h) A1 (t)), A2 (t + h), . . . , An (t + h)

h

1

+ A1 (t), (A2 (t + h) A2 (t)), . . . , An (t + h)

h

1

+ + A1 (t), A2 (t), . . . , (An (t + h) An (t)) .

h

Now taking the h 0 limits on both sides of the above, we arrive at

d

det(A(t)) = |A1 (t), A2 (t), . . . , An (t)|

dt

+ |A1 (t), A2 (t), . . . , An (t)|

+ + |A1 (t), A2 (t), . . . , An (t)|.

Finally, expanding the j th determinant along the j th column on the

right-hand side of the above by the cofactors, j = 1, 2, . . . , n,

we have

280

d

det(A(t)) =

dt

n

ai1

(t)Ci1 (t) +

i=1

+ +

n

ai2

(t)Ci2 (t)

i=1

n

ain

(t)Cin (t).

i=1

3.2.8 Adding all columns to the first column of the matrix, we get

n

x+

ai a1 a2 an

i=1

n

x+

ai x a2 an

i=1

n

x + a a

x an

i

2

i=1

..

..

.. . .

..

. .

.

.

.

n

x+

ai a2 a3 x

i=1

1 a1 a2 an

1 x a a

2

n

n

ai 1 a2 x an .

= x+

. .

.. . .

..

i=1

. .

.

. .

.

.

1 a a x

2

3

Consider the (n + 1) (n + 1) determinant on the right-hand side of

the above. We now subtract row n from row n + 1, row n 1 from

row n,..., row 2 from row 3, row 1 from row 2. Then that determinant

becomes

1

a2

an

a1

0 xa

0

0

1

0

a

2

a

.

.

..

..

..

..

.

.

.

.

.

.

0

x a

n

Since the minor of the entry at the position (1, 1) is the determinant of

n

'

(x ai ).

a lower triangular matrix, we get a =

i=1

281

3.2.9 If c1 , . . . , cn+2 are not all distinct then it is clear that the determinant is

zero since there are two identical columns. Assume now c1 , . . . , cn+2

are distinct and consider the function of x given by

p(x) = det

p1 (x)

p1 (c2 )

p1 (cn+2 )

p2 (x)

..

.

p2 (c2 )

..

.

..

.

p2 (cn+2 )

..

.

pn+2 (c2 )

pn+2 (x)

pn+2 (cn+2 )

By the cofactor expansion along the first column, we see that p(x)

is a polynomial of degree at most n which vanishes at n + 1 points:

c2 , . . . , cn+2 . Hence p(x) = 0 for all x. In particular, p(c1 ) = 0 as well.

3.2.10 We use Dn+1 to denote the determinant and implement induction to do

the computation. When n = 1, we have D2 = a1 x + a0 . At n 1,

we assume Dn = an1 x n1 + + a1 x + a0 . At n, by the cofactor

expansion according to the first column, we get

Dn+1 = xDn + (1)n+2 a0 (1)n

= x(an x n1 + + a2 x + a1 ) + a0 .

3.2.11 Use D() to denote the determinant on the left-hand side of (3.2.46). It

is clear that D(0) = 0. So (3.2.46) is true at = 0.

Now assume

= 0 and rewrite D() as

D() = det

a1

a2

an

1

..

.

a1

..

.

a2

..

.

..

.

an

..

.

a1

a2

an

Adding the (a1 ) multiple of column 1 to column 2, the (a2 ) multiple of column 1 to column 3,..., and the (an ) multiple of column 1

to column n + 1, we see that D() becomes

282

D() = det

1 a1

a2

an

1

..

.

0

..

.

..

.

..

.

0

..

.

a1

1

1 1

0

= n det

1

.

..

.

.

.

1

a2

an

0

1

..

.

..

.

0

..

.

column n + 1 to column 1, we get

1

ai

1

i=1

n

D() = det

..

=

1

1

n

i=1

a2

0

..

.

1

..

.

..

.

an

..

a1

ai (1) = (1)

n

n n1

n

ai .

i=1

det(A)2 = 1. Using det(A) < 0, we get det(A) = 1. Hence

det(A + In ) = det(A + AAt ) = det(A) det(In + At ) = det(In + A),

which results in det(In + A) = 0.

3.2.13 Since a11 = 1 or 1, we may add or subtract the first row from the

other rows of A so that we reduce A into a matrix B = (bij ) satisfying

b11 = 1 or 1, bi1 = 0 for i 2, and all entries in the submatrix of

B with the first row and column of B deleted are even numbers. By the

cofactor expansion along the first column of B we see that det(B) =

an even integer. Since det(A) = det(B), the proof follows.

283

= 0, it suffices to show that A is invertible, or equivalently, N (A) = {x Rn | Ax = 0} = {0}. In fact, take x N (A) and

assume otherwise x = (x1 , . . . , xn )t

= 0. Let i = 1, . . . , n be such

that

|xi | = max {|xj |}.

(S10)

1j n

Then |xi | > 0. On the other hand, the ith component of the equation

Ax = 0 reads

ai1 x1 + + aii xi + + ain xn = 0.

(S11)

0 = |ai1 x1 + + aii xi + + ain xn |

|aii ||xi |

|aij ||xj |

1j n,j

=i

|xi | |aii |

|aij | > 0,

1j n,j

=i

which is a contradiction.

3.2.15 Consider the modified matrix

A(t) = D + t (A D),

0 t 1,

in the previous exercise. So det(A(t))

= 0. Furthermore, det(A(0)) =

det(D) > 0. So det(A(1)) > 0 as well otherwise there is a point t0

(0, 1) such that det(A(t0 )) = 0, which is false.

3.2.17 Let = (a1 , . . . , bn ), = (b1 , . . . , bn ).

To compute

1 a1 b1

a1 b2

a1 bn

a2 b1

1 a2 b2

a2 bn

t

det(In ) =

,

..

..

..

..

.

.

.

.

an b1

an b2

1 an bn

we artificially enlarge it into an (n + 1) (n + 1) determinant of

the form

284

t

det(In ) =

b1

b2

1 a1 b1

a1 b2

0

..

.

a2 b1

..

.

1 a2 b2

..

.

..

.

an b1

an b2

a1 bn

a2 bn .

..

.

1 an bn

bn

1 to row 3,..., and an multiple of row 1 to the last row, of the above

determinant, we get

1 b1 b2 bn

a

1 1 0 0

det(In t ) = a2 0 1 0 .

.

..

.. . .

.

.

. ..

.

.

.

a

0 0 1

n

Next, subtracting the b1 multiple of row 2 from row 1, b2 multiple of

row 3 from row 1,..., and bn multiple of the last row from row 1, we

obtain

n

1

a

b

0

0

0

i i

i=1

n

1 0 0

a1

=1

det(In t ) =

ai bi .

0 1 0

a2

i=1

..

.. .. . .

.

. ..

.

. .

a

0 0 1

n

adding the (x1 ) multiple of row 2 to row 1, (x2 ) multiple of row 3

to row 1,..., and (xn ) multiple of the last row to row 1, we get

n

2

x

0

0

0

100

i=1

1 0 0

x1

f (x1 , . . . , xn ) = det

0 1 0

x2

..

.. .. . .

..

. .

.

. .

xn

= 100

n

i=1

xi2 .

0 0

285

and of radius 10.

Section 3.3

3.3.3 First we observe that for A F(m, n) and B F(n, l) written as B =

(B1 , . . . , Bl ) where B1 , . . . , Bl Fn are l column vectors we have

AB = (AB1 , . . . , ABl ),

(S12)

Now for A F(n, n) we recall the relation (3.3.5). If det(A)

= 0, we

can evaluate determinants on both sides of (3.3.5) to arrive at (3.3.15). If

det(A) = 0 then in view of (3.3.5) and (S12) we have

adj(A)A1 = 0,

...,

adj(A)An = 0,

(S13)

= 0, then at

least one of A1 , . . . , An is nonzero. In other words, n(adj(A)) 1 so

that r(adj(A)) n 1. Hence det(adj(A)) = 0 and (3.3.15) is valid. If

A = 0 then adj(A) = 0 and (3.3.15) trivially holds.

3.3.4 If r(A) = n then both A and adj(A) are invertible by (3.3.5). So we

obtain r(adj(A)) = n. If r(A) = n1 then A contains an (n1)(n1)

submatrix whose determinant is nonzero. So adj(A)

= 0 which leads to

r(adj(A)) 1. On the other hand, from (S13), we see that n(adj(A))

n 1. So r(adj(A)) 1. This proves r(adj(A)) = 1. If r(A) n 2

then any n 1 row vectors of A are linearly dependent, which indicates

that all row vectors of any (n 1) (n 1) submatrix of A are linearly

dependent. In particular all cofactors of A are zero. Hence adj(A) = 0

and r(adj(A)) = 0.

3.3.6 Let n 3 and assume A is invertible. Using (3.3.5) and (3.3.15) we have

adj(A)adj(adj(A)) = det(adj(A))In = (det(A))n1 In .

Comparing this with (3.3.5) again we obtain adj(adj(A)) =

(det(A))n2 A. If A is not invertible, then det(A) = 0 and r(adj(A)) 1.

Since n 3, we have r(adj(A)) 1 n 2. So in view of

Exercise 3.3.4 we have r(adj(adj(A))) = 0 or adj(adj(A)) = 0. So

adj(adj(A)) = (det(A))n2 A is trivially true.

If n = 2 it is direct to verify the relation adj(adj(A)) = A.

286

3.3.7 For A = (aij ), we use MijA and CijA to denote the minor and cofactor of

the entry aij , respectively. Then we have

t

MijA = MjAi ,

(S14)

t

(If A is invertible, we may use (3.3.5) to write adj(A) = det(A)A1 .

Therefore (adj(A))t = det(A)(A1 )t = det(At )(At )1 = adj(At ).)

3.3.8 From (3.3.5) we have AAt = det(A)In . For B = (bij ) = AAt , we get

bii =

n

aij2 ,

i = 1, . . . , n.

j =1

3.3.9 First assume that A, B are both invertible. Then, in view of (3.3.5), we

have

adj(AB) = det(AB)(AB)1 = det(A) det(B)B 1 A1

= (det(B)B 1 )(det(A)A1 ) = adj(B)adj(A).

Next we prove the conclusion without assuming that A, B are invertible.

Since det(A I ) and det(B I ) are polynomials of the variable ,

of degree n, they have at most n roots. Let {k } be a sequence in R so

that k 0 as k and det(A k I )

= 0, det(B k I )

= 0 for

k = 1, 2, . . . . Hence we have

adj((A k I )(B k I )) = adj(B k I )adj(A k I ), k = 1, 2, . . . .

Letting k in the above we arrive at adj(AB) = adj(B)adj(A)

again.

Section 3.4

3.4.1 Using CijA to denote the cofactor of the entry aij of the matrix A = (aij )

and applying (3.2.42), we have

d

d

=

pA ()

(det(In A))

a1 =

d

d

=0

=0

n

n

(A)

=

Cii

= (1)n1

CiiA .

i=1

i=1

287

3.4.2 It suffices to show that M = C(n, n) \ D is closed in C(n, n). Let {Ak }

be a sequence in M which converges to some A C(n, n). We need to

prove that A M. To this end, consider pAk () and let {k } be multiple

roots of pAk (). Then we have

pAk (k ) = 0,

pA

(k ) = 0,

k

k = 1, 2, . . . .

(S15)

On the other hand, we know that the coefficients of pA () are continuously dependent on the entries of A. So the coefficients of pAk ()

converge to those of pA (), respectively. In particular, the coefficients

k

}, . . . , {a1k }, {a0k }, are bounded sequences. Thus

of pAk (), say {an1

|k |

n

|nk

pAk (k )| + |pAk (k )|

n1

|aik ||k |i ,

i=0

which indicates that {k } is a bounded sequence in C. Passing to a subsequence if necessary, we may assume without loss of generality that

k some 0 C as k . Letting k in (S15), we obtain

(0 ) = 0. Hence 0 is a multiple root of pA () which

pA (0 ) = 0, pA

proves A M.

3.4.4 Using (3.4.31), we have

adj(In A) = An1 n1 + + A1 + A0 ,

where the matrices An1 , An2 , . . . , A1 , A0 are determined through

(3.4.32). Hence, setting = 0 in the above, we get

adj(A) = A0 = a1 In + a2 A + + an1 An2 + An1 .

That is,

adj(A) = (1)n1 (a1 In + a2 A+ +an1 An2 + An1 ).

(S16)

Note that (S16) may be used to prove some known facts more

easily. For example, the relation adj(At ) = (adj(A))t (see Exercise

3.3.7) follows immediately.

3.4.6 First assume that A is invertible. Then

pAB () = det(In AB) = det(A[A1 B])

= det(A) det(A1 B) = det([A1 B]A)

= det(In BA) = pBA ().

288

Next assume A is arbitrary. If F = R or C, we may proceed as follows.

Let {ak } be a sequence in F such that A ak In is invertible and ak 0

as k . Then we have

p(Aak In )B () = pB(Aak In ) (),

k = 1, 2, . . . .

If A is not invertible or F is not R or C, the above methods fail and

a different method needs to be used. To this end, we recognize the following matrix relations in F(2n, 2n):

In

In

In

In

In

In

In

In

=

=

In AB

In

B

A

In

In BA

,

.

The determinants of the left-hand sides of the above are the same. The

determinants of the right-hand sides of the above are

(1)n n det(In AB),

3.4.7 We consider the nontrivial case n 2. Using the notation and method

in the solution to Exercise 3.2.17, we have

det(In t ) =

b1

b2

a1

a2

..

.

0

..

.

..

.

..

.

an

bn

0

0 .

..

.

Assume

= 0. Subtracting the b1 / multiple of row 2 from row 1,

b2 / multiple of row 3 from row 1,..., and bn / multiple of the last row

from row 1, we obtain

1

t

det(In ) =

1

ai bi

289

a1

a2

..

.

0

..

.

..

.

..

.

0

..

.

i=1

0 0

n

1

ai bi n = n1 ( t ).

= 1

an

i=1

Since both sides of the above also agree at = 0, they are identical for

all .

3.4.8 (i) We have pA () = 3 22 + 2. From pA (A) = 0 we have

A(A2 2A In ) = 2In . So

1 1 0

1 2

1

0

0

.

A = (A 2A In ) =

2

2

3

1

2

2

(ii) We divide 10 by pA () to find

10 =

pA ()(7 + 27 + 55 + 104 + 213 + 422 + 85 + 170)

+ 3412 340.

Consequently, inserting A in the above, we have

1 2046

10

2

A = 341A 340In = 0 1024

0

1705

0 .

1

Section 4.1

4.1.8 Let u1 and u2 be linearly independent vectors in U and assume

(u1 , u1 )

= 0 and (u2 , u2 )

= 0 otherwise there is nothing to show. Set

u3 = a1 u1 + u2 where a1 = (u1 , u2 )/(u1 , u1 ). Then u3

= 0 and is

perpendicular to u1 . So u1 , u3 are linearly independent as well. Consider

u = cu1 +u3 (c C). Of course u

= 0 for any c. Since (u1 , u3 ) = 0, we

290

have (u, u) = c2 (u1 , u1 ) + (u3 , u3 ). Thus, in order to have (u, u) = 0,

we may choose

7

(u3 , u3 )

.

c=

(u1 , u1 )

Section 4.2

4.2.6 We first show that the definition has no ambiguity. For this purpose,

assume [u1 ] = [u] and [v1 ] = [v]. We need to show that (u1 , v1 ) =

(u, v). In fact, since u1 [u] and v1 [v], we know that u1 u U0

and v1 v U0 . Hence there are x, y U0 such that u1 u = x, v1

v = y. So (u1 , v1 ) = (u + x, v + y) = (u, v) + (u, y) + (x, v + y) =

(u, v).

It is obvious that ([u], [v]) is bilinear and symmetric.

To show that the scalar product is non-degenerate, we assume [u]

U/U0 satisfies ([u], [v]) = 0 for any [v] U/U0 . Thus (u, v) = 0 for

all v U which implies u U0 . In other words, [u] = [0].

4.2.8 Let w V . Then (v, w) = 0 for any v V . Hence 0 = (v, w) =

v, (w), v V , which implies (w) V 0 . Hence (V ) V 0 .

On the other hand, take w V 0 . Then v, w = 0 for all v V .

Since : U U is an isomorphism, there is a unique w U such

that (w) = w . Thus (v, w) = v, (w) = v, w = 0, v V . That

is, w V . Thus w (V ), which proves V 0 (V ).

4.2.10 Recall (4.1.26). Replace V , W in (4.1.26) by V , W and use

(V ) = V , (W ) = W . We get (V + W ) = V W. Taking

4.2.11 From Exercise 4.1.8 we know that there is a nonzero vector u U such

that (u, u) = 0. Since (, ) is non-degenerate, there is some w U

such that (u, w)

= 0. It is clear that u, w are linearly independent. If

w satisfies (w, w) = 0, then we take v = w/(u, w). Hence (v, v) = 0

and (u, v) = 1. Assume now (w, w)

= 0 and consider x = u+cw (c

C). Then x can never be zero for any c C. Set (x, x) = 0. We get

2c(u, w) + c2 (w, w) = 0. So we may choose c = 2(u, w)/(w, w).

Thus, with v = x/(u, x) where

(u, x) =

2(u, w)2

= 0,

(w, w)

291

Section 4.3

4.3.1 We consider the real case for simplicity. The complex case is similar.

Suppose M is nonsingular and consider a1 u1 + + ak uk = 0, where

a1 , . . . , ak are scalars. Taking scalar products of this equation with the

vectors u1 , . . . , uk consecutively, we get

a1 (u1 , ui ) + + ak (uk , ui ) = 0,

i = 1, . . . , k.

Suppose M is singular. Then so is M t and the above system of equations will have a solution (a1 , . . . , ak )

= (0, . . . , 0). We rewrite the

above system as

k

k

ai ui , u1 = 0, . . . ,

ai ui , uk = 0.

i=1

i=1

k

ai ui , u1 = 0,

a1

...,

ak

i=1

k

ai ui , uk

=0

i=1

k

k

ai ui ,

aj uj = 0.

i=1

j =1

k

i=1

dependent.

4.3.2 Suppose u Span{u1 , . . . , uk }. Then there are scalars a1 , . . . , ak such

that u = a1 u1 + + ak uk . Taking scalar products of this equation with

u1 , . . . , uk , we have

(ui , u) = a1 (ui , u1 ) + + ak (ui , uk ),

which establishes

(u1 , u1 )

(u1 , u)

.

..

Span

..

.

(uk , u)

(uk , u1 )

i = 1, . . . , k,

(u1 , uk )

..

,...,

.

.

(uk , uk )

292

On the other hand, let V = Span{u1 , . . . , uk }. If k < dim(U ), then

V

= {0}. Take u V , u

= 0. Then

0

(u1 , u)

.

..

= .

.

.

0

(uk , u)

(u1 , u1 )

..

Span

.

(uk , u1 )

(u1 , uk )

..

,...,

,

.

(uk , uk )

but u

Span{u1 , . . . , uk }.

4.3.4 For any u U , we have

(I S)u2 = ((I S)u, (I S)u)

= u2 + Su2 (u, Su) (Su, u)

= u2 + Su2 (S u, u) (Su, u)

= u2 + Su2 (Su, u) (Su, u)

= u2 + Su2 .

So (I S)u

= 0 whenever u

= 0. Thus n(I S) = 0 and I S must

be invertible.

4.3.6 Set R(A) = {u Cm | u = Ax for some x Cn }. Rewrite Cm as Cm =

R(A) (R(A)) and assert (R(A)) = {y Cm | A y = 0} = N (A ).

In fact, if A y = 0, then (y, Ax) = y Ax = (x A y) = 0 for any

x Cn . That is, N (A ) R(A) . On the other hand, take z R(A) .

Then for any x Cn we have 0 = (z, Ax) = z Ax = (A z) x =

(A z, x), which indicates A z = 0. So z N (A ). Thus R(A)

N(A ).

Therefore the equation Ax = b has a solution if and only if b R(A)

or, if and only if b is perpendicular to R(A) = N (A ).

Section 4.4

4.4.2 It is clear that P L(U ). Let {w1 , . . . , wl } be an orthogonal basis of

W = V . Then {v1 , . . . , vk , w1 , . . . , wl } is an orthogonal basis of U .

Now it is direct to check that P (vj ) = vj , j = 1, . . . , k, P (ws ) =

0, s = 1, . . . , l. So P 2 (vj ) = P (vj ), j = 1, . . . , k, P 2 (ws ) =

P (ws ), s = 1, . . . , l. Hence P 2 = P , R(P ) = V , and N (P ) = W

as claimed. In other words, P : U U is the projection of U onto V

along W .

293

4.4.4 (i) We know that we have the direct sum U = V V . For any x, y [u]

we have x = v1 + w1 , y = v2 + w2 , vi V , wi V , i = 1, 2. So

x y = (v1 v2 ) + (w1 w2 ). However, since x y V , we have

w1 = w2 . In other words, for any x [u], there is a unique w V

so that x = v + w for some v V . Of course, [x] = [w] = [u] and

x2 = v2 + w2 whose minimum is attained at v = 0. This proves

[u] = w.

(ii) So we need to find the unique w V such that [w] = [u]. First

find an orthogonal basis {v1 , . . . , vk } of V . Then for u we know that the

projection of u onto V along V is given by the Fourier expansion (see

Exercise 4.4.2)

v=

k

(vi , u)

vi .

(vi , vi )

i=1

w =u

k

(vi , u)

vi .

(vi , vi )

i=1

Section 4.5

4.5.5 (i) It is direct to check that T (x) = x for x Rn .

(ii) Since T (T (x)) = x for x Rn , we have T 2 = I . Consequently,

T = T . So if is an eigenvalue of T then 2 = 1. Thus the eigenvalues of T may only be 1. In fact, for u = (1, 0, . . . , 0, 1)t and

v = (1, 0, . . . , 0, 1)t , we have T (u) = u and T (v) = v. So 1

are both eigenvalues of T .

(iii) Let x, y Rn satisfy T (x) = x and T (y) = y. Then (x, y) =

(T (x), T (y)) = (x, y) = (x, y). So (x, y) = 0.

(iv) Since T 2 I = 0 and T

= I , we see that the minimal polynomial of T is mT () = 2 1.

Section 5.2

5.2.7 We can write A as A = P t DP where P is orthogonal and

D = diag{d1 , . . . , dn },

di = 1,

i = 1, . . . , n.

and of length 1. So D is orthogonal as well. Hence A must be orthogonal as a product of three orthogonal matrices.

294

1

u. Expand

u

n

{u1 } to get an orthonormal basis {u1 , u2 , . . . , un } of R (with the usual

scalar product) where u1 , . . . , un are all column vectors. Let Q be the

nn matrix whose 1st, 2nd,. . . , nth column vectors are u1 , u2 , . . . , un .

Then

ut Q = ut Q = ut (u1 , u2 , . . . , un )

= (ut u1 , ut u2 , . . . , ut un ) = (u, 0, . . . , 0).

Therefore

Qt (uut )Q = (ut Q)t (ut Q)

u

..

0

Section 5.3

5.3.8 For definiteness, assume m < n. So r(A) = m. Use to denote the

standard norm of Rl . Now check

q(x) = x t (At A)x = (Ax)t (Ax) = Ax2 0,

x Rn ,

y Rm .

definite.

If q(x) = 0, we have Ax = 0. Since r(A) = m < n, we know that

n(A) = n r(A) > 0. Thus there is some x

= 0 such that Ax = 0.

Hence q cannot be positive definite.

If Q(y) = 0, we have At y = 0. Since r(At ) = r(A) = m, we know

that n(At ) = m r(At ) = 0. Thus y = 0. Hence Q(y) > 0 whenever

y

= 0 and Q is positive definite.

5.3.11 Assume A is semi-positive definite. Then there is a semi-positive definite matrix B such that A = B 2 . If x S, then 0 = x t Ax = x t B 2 x =

(Bx)t (Bx) = Bx2 . That is, Bx = 0, which implies Ax = B 2 x = 0.

Conversely, Ax = 0 clearly implies x S.

If A is indefinite, N (A) may not be equal to S. For example, take

A = diag{1, 1}. Then N (A) = {0} and

S= x=

x1

x2

295

R2 x12

x22

=0 ,

cannot even be a subspace of R2 .

5.3.12 Since A is positive definite, there is an invertible matrix P R(n, n)

such that P t AP = In . Since P t BP is symmetric, there is an orthogonal matrix Q R(n, n) such that Qt (P t BP )Q is a real diagonal

matrix. Set C = P Q. Then both C t AC and C t BC are diagonal.

5.3.13 First we observe that if C is symmetric then the eigenvalues of C are

greater than or equal to c if and only if x t Cx cx t x = cx2 , x Rn .

In fact, if 1 , . . . , n are eigenvalues of C, then there is an orthogonal

matrix P such that C = P t DP , where D = diag{1 , . . . , n }. So with

y = P x we have

x t Cx = (P x)t D(P x) = y t Dy =

n

i yi2 0 y2 ,

i=1

us x t Cx cy2 = cx2 . Conversely, we see that

x t Cx cx2 0 implies

n

i yi cy2 =

i=1

n

(i c)yi2 0.

i=1

Now from x t Ax ax2 , x t Bx bx2 , we infer x t (A + B)x

(a + b)x2 . Thus the eigenvalues of A + B are greater than or equal

to a + b.

5.3.14 Since A is positive definite, there is another positive definite matrix

P R(n, n) such that A = P 2 . Let C = AB. Then we have C = P 2 B.

Thus P 1 CP = P BP . Since the right-hand side of this relation is positive definite, its eigenvalues are all positive. Hence all the eigenvalues

of C are positive.

5.3.16 Let P be a nonsingular matrix such that A = P t P . Then det(A) =

det(P t P ) = det(P )2 . Now

det(A + B) = det(P t P + B)

= det(P t ) det(I + [P 1 ]t B[P 1 ]) det(P ).

Consider C = [P 1 ]t B[P 1 ] which is clearly semi-positive definite.

Let the eigenvalues of C be 1 , . . . , n 0. Then there is an orthogonal

matrix Q such that Qt CQ = D where D = diag{1 , . . . , n }. Thus

296

det(C) = det(Qt DQ) = det(D) = 1 n . Combining the above

results and using det(Q)2 = 1, we obtain

det(A + B) = det(Qt ) det(P t ) det(I + C) det(P ) det(Q)

= det(A) det(I + Qt CQ) = det(A) det(I + D)

= det(A)

n

'

(1 + i )

i=1

det(A)(1 + 1 n )

= det(A) + det(A) det(C).

However, recall the relation between B and C we get B = P t CP

which gives us det(B) = det(P )2 det(C) = det(A) det(C) so that the

desired inequality is established.

If the inequality becomes an equality, then all 1 , . . . , n vanish. So

C = 0 which indicates that B = 0.

5.3.17 Since A is positive definite, there is an invertible matrix P R(n, n)

such that P t AP = In . Hence we have

(det(P ))2 det(A B) = det(P t (( 1)A (B A))P )

= det(( 1)In P t (B A)P ).

Since P t (B A)P is positive semi-definite whose eigenvalues are all

non-negative, we see that the roots of the equation det(A B) = 0

must satisfy 1 0.

5.3.18 (i) Let x U be a minimum point of f . Then for any y U we have

g() g(0), where g() = f (x + y) ( R). Thus

1

1

dg

= (y, T (x)) + (x, T (y)) (y, b)

0=

d =0

2

2

= (y, T (x) b).

Since y U is arbitrary, we get T (x)b = 0 and (5.3.25) follows.

Conversely, if x U satisfies (5.3.25), then

1

1

(u, T (u)) (u, T (x)) (x, T (x)) + (x, T (x))

2

2

1

= (u x, T (u x)) 0 u x2 , u U,

2

(S17)

f (u) f (x) =

for some constant 0 > 0. That is, f (u) f (x) for all u U .

297

(ii) If x, y U solve (5.3.25), then (i) indicates that x, y are the minimum points of f . Hence f (x) = f (y). Replacing u by y in (S17)

we arrive at x = y.

5.3.19 Let S L(U ) be positive semi-definite so that S 2 = T . Then q(u) =

(S(u), S(u)) = S(u)2 for any u U . Hence the Schwarz inequality

(4.3.10) gives us

q(u + v) = (S(u + v), S(u + v))

= 2 S(u)2 + 2(S(u), S(v)) + 2 S(v)2

2 S(u)2 + 2S(u)S(v) + 2 S(v)2

2 S(u)2 + (S(u)2 + S(v)2 )

+ 2 S(v)2

= ( + )S(u)2 + ( + )S(v)2

= q(u) + q(v),

u, v U,

Section 5.4

5.4.1 Suppose otherwise that A is positive definite. Using Theorem 5.11 we

have

a a a a a a

2 1

3 3

1

1

a1 , a2 , a3 > 0,

,

,

> 0,

a2 a3 a3 a2 a1 a2

which contradicts det(A) > 0.

5.4.2 In the general case, n 3, we have

q(x) =

n1

i=1

x1

.

n

x=

.. R .

xn

n

yi2 with

i=1

yi = xi + ai xi+1 ,

i = 1, . . . , n 1,

yn = xn + an x1 .

(S18)

It is seen that q(x) is positive definite if and only if the change of variables given in (S18) is invertible, which is equivalent to the condition

1 + (1)n+1 a1 a2 an

= 0.

5.4.3 If A is positive semi-definite, then A + In is positive definite for all

> 0. Hence all the leading principal minors of A + In are positive

298

the conclusion. The converse is not true, however. For example, take

A = diag{1, 0, 1}. Then all the leading principal minors of A are nonnegative but A is indefinite.

5.4.5 (i) The left-hand side of equation (5.4.24) reads

An1

An1 +

.

t An1 + t t An1 + t + t + ann

Comparing the above with the right-hand side of (5.4.24), we arrive

at the equation An1 = , which has a unique solution since

An1 is invertible. Inserting this result into the entry at the position

(n, n) in the above matrix and comparing with the right-hand side

1

of (5.4.24) again, we get a = t (A1

n1 ) 0 since An1 is

positive definite.

(ii) Taking determinants on both sides of (5.4.24) and using the facts

det(An1 ) > 0 and a 0, we obtain

det(A) = det(An1 )(ann + a) det(An1 )ann .

(iii) This follows directly.

(iv) If A is positive semi-definite, then A + In is positive definite for

> 0. Hence det(A + In ) (a11 + ) (ann + ), > 0.

Letting 0+ in the above inequality we see that (5.4.26) holds

again.

5.4.6 (i) It is clear that the entry at the position (i, j ) of the matrix At A is

(ui , uj ).

(ii) Thus, applying (5.4.26), we have det(At A) (u1 , u1 ) (un , un )

or (det(A))2 u1 2 un 2 .

5.4.7 Using the definition of the standard Euclidean scalar product we have

1

ui an 2 , i = 1, . . . , n. Thus (5.4.29) follows.

Section 5.5

5.5.1 Assume the nontrivial case dim(U ) 2. First suppose that T satisfies

T (u) Span{u} for any u U . If T

= aI for some a F, then there are

some linearly independent vectors u, v U such that T (u) = au and

T (v) = bv for some a, b F with a

= b. Since T (u+v) Span{u+v},

we have T (u + v) = c(u + v) for some c F. Thus c(u + v) = au + bv,

which implies a = c and b = c because u, v are linearly independent.

This is false. Next we assume that there is some u U such that v =

T (u)

Span{u}. Thus u, v are linearly independent. Let S L(U ) be

such that S(u) = v and S(v) = u. Then T (v) = T (S(u)) = S(T (u)) =

299

u = T (v) = T (R(u)) = R(T (u)) = R(v) = 0, which is again false.

5.5.3 Let 1 , . . . , n F be the n distinct eigenvalues of T and u1 , . . . , un

the associated eigenvectors, respectively, which form a basis of U . We

have T (S(ui )) = S(T (ui )) = i S(ui ). So S(ui ) Ei . In other words,

there is some bi F such that S(ui ) = bi ui for i = 1, . . . , n. It is clear

that the scalars b1 , . . . , bn determine S uniquely. On the other hand,

consider a mapping R L(U ) defined by

R = a0 I + a1 T + + an1 T n1 ,

a0 , a1 , . . . , an1 F. (S19)

Then

)ui ,

R(ui ) = (a0 + a1 i + + an1 n1

i

i = 1, . . . , n.

= bi ,

a0 + a1 i + + an1 n1

i

i = 1, . . . , n,

(S20)

that the non-homogeneous system of equations (S20) has a unique solution in a0 , a1 , . . . , an1 . Thus R may be constructed by (S19) to yield

R = S.

5.5.4 From the previous exercise we have CT = Span{I, T , . . . , T n1 }. It

remains to show that, as vectors in L(U ), the mappings I, T , . . . , T n1

are linearly independent. To see this, we consider

c0 I + c1 T + + cn1 T n1 = 0,

c0 , c1 , . . . , cn1 F.

= 0,

c0 + c1 i + + cn1 n1

i

i = 1, . . . , n,

determinant. So dim(CT ) = n.

Section 5.6

5.6.3

U . So v R(T ) . Conversely, if v R(T ) , then for any u U

we have 0 = (T (u), v)V = (u, T (v))U . So T (v) = 0 or v

N(T ). Thus N (T ) = R(T ) . Taking on this relation we obtain

R(T ) = N (T ) .

300

(ii) From R(T ) = N (T ) and V = R(T ) R(T ) we have

dim(V ) = r(T ) + n(T ). Applying this result in the rank equation

r(T ) + n(T ) = dim(V ) we get r(T ) = r(T ).

(iii) It is clear that N (T ) N (T T ). If u N (T T ) then 0 =

(u, (T T )(u))U = (T (u), T (u))V . Thus u N (T ). This proves

N(T ) = N (T T ). Hence n(T ) = n(T T ). Using the rank

equations dim(U ) = r(T ) + n(T ) and dim(U ) = r(T T ) +

n(T T ) we obtain r(T ) = r(T T ). Replacing T by T , we get

r(T ) = r(T T ).

Section 6.2

6.2.5 Since P 2 = P , we know that P projects U onto R(P ) along N (P ). If

N(P ) = R(P ) , we show that P = P . To this end, for any u1 , u2

U we rewrite them as u1 = v1 + w1 , u2 = v2 + w2 where

v1 , v2 R(P ), w1 , w2 R(P ) . Then we have P (v1 ) = v1 , P (v2 ) =

v2 , P (w1 ) = P (w2 ) = 0. Hence (u1 , P (u2 )) = (v1 + w1 , v2 ) =

(v1 , v2 ), (P (u1 ), u2 ) = (v1 , v2 + w2 ) = (v1 , v2 ). This proves

(u1 , P (u2 )) = (P (u1 ), u2 ), which indicates P = P . Conversely, we

note that for any T L(U ) there holds R(T ) = N (T ) (cf. Exercise

5.6.3). If P = P , we have R(P ) = N (P ).

6.2.8 (i) If T k = 0 for some k 1 then eigenvalues of T vanish. Let

{u1 , . . . , un } be a basis of U consisting of eigenvectors of T . Then

T (ui ) = 0 for all i = 1, . . . , n. Hence T = 0.

(ii) If k = 1, then there is nothing to show. Assume that the statement

is true up to k = l 1. We show that the statement is true at k = l +1. If

l = odd, then k = 2m for some integer m 1. Hence T 2m (u) = 0 gives

us (u, T 2m (u)) = (T m (u), T m u) = 0. So T m (u) = 0. Since m l, we

deduce T (u) = 0. If l = even, then l = 2m for some integer m 1 and

T 2m+1 (u) = 0 gives us T 2m (v) = 0 where v = T (u). Hence T (v) = 0.

That is, T 2 (u) = 0. So it follows again that T (u) = 0.

Section 6.3

6.3.11 Since A A is positive definite, there is a positive definite Hermitian

matrix B C(n, n) such that A A = B 2 . Thus we can rewrite A as

A = P B with P = (A )1 B which may be checked to satisfy

P P = (A )1 BB A1 = (A )1 B 2 A1 = (A )1 A AA1 = In .

301

are in C(n, n) such that C is unitary and B positive definite. Hence

A

( A = B(. So if 1 , . . . , n > 0 are the eigenvalues of A A then

1 , . . . , n are those of B. Let Q C(n, n) be unitary such that

B = Q DQ, where D is given in (6.3.26). Then A = CQ DQ. Let

P = CQ . We see that P is unitary and the problem follows.

6.3.13 Since T is positive definite, (u, v)T (u, T (v)), u, v U , also defines a positive definite scalar product over U . So the problem follows

from the Schwarz inequality stated in terms of this new scalar product.

Section 6.4

6.4.4 Assume that A = (aij ) is upper triangular. Then we see that the diagonal entries of A A and AA are

|a11 |2 , . . . ,

i

|aj i |2 , . . . ,

j =1

n

|a1j |2 , . . . ,

n

|aj n |2 ,

j =1

n

|aij |2 , . . . , |ann |2 ,

j =i

j =1

get aij = 0 for i < j . So A is diagonal.

6.4.6 (i) The conclusion T = 0 follows from the fact that all eigenvalues of

T vanish and U has a basis consisting of eigenvectors of T .

(ii) We have (T T )k (u) = 0 since T , T commute. Using Exercise 6.2.8 we have (T T )(u) = 0. Hence 0 = (u, (T T )(u)) =

(T (u), T (u)) which gives us T (u) = 0.

6.4.10 Assume that there are positive semi-definite R and unitary S, in L(U ),

such that

T = R S = S R.

(S21)

Then T = S R = R S . So T T = T T and T is normal. Now assume that T is normal. Then U has an orthonormal basis,

say {u1 , . . . , un }, consisting of eigenvectors of T . Set T (ui ) = i ui

(i = 1, . . . , n). Make polar decompositions of these eigenvalues,

i = |i |eii , i = 1, . . . , n, with the convention that i = 0 if i = 0.

Define R, S L(U ) by setting

R(ui ) = |i |ui ,

S(ui ) = eii ui ,

i = 1, . . . , n.

302

holds.

6.4.12 Assume that T is normal. Let 1 , . . . , k C be all the distinct eigenvalues of T and E1 , . . . , Ek the corresponding eigenspaces. Then

U = E1 Ek . Let {ui,1 , . . . , ui,mi } be an orthonormal basis of

Ei (i = 1, . . . , k). Then T (u) = i u for any u Ei (i = 1, . . . , k).

On the other hand, using the Vandermonde determinant, we know that

we can find a0 , a1 , . . . , ak1 C such that

a0 + a1 i + + ak1 k1

= i ,

i

i = 1, . . . , k.

any u Ei (i = 1, . . . , k). Thus T = p(T ).

6.4.13 Assume that T is normal. If dim(U ) = n = 1, there is nothing to do.

Assume n 2. Then the previous exercise establishes that there is

a polynomial p(t) such that T = p(T ). Hence any subspace invariant under T is also invariant under T . However, T = p(T ) implies

T = q(T ) where q is the polynomial obtained from p by replacing

the coefficients of p by the complex conjugates of the corresponding

coefficients of p. Thus any invariant subspace of T is also invariant

under T .

Now assume that T and T have the same invariant subspaces. We

show that T is normal. We proceed inductively on dim(U ) = n. If

n = 1, there is nothing to do. Assume that the conclusion is true at

n = k 1. We consider n = k + 1. Let C be an eigenvalue of

T and u U an associated eigenvector of T . Then V = Span{u} is

invariant under T and T . We claim that V is invariant under T and T

as well. In fact, take w V . Since T (u) = au for some a C, we

have (u, T (w)) = (T (u), w) = a(u, w) = 0. So T (w) V . Hence

V is invariant under T and T as well. Since dim(V ) = n 1 = k,

we see that T T = T T on V . Besides, we have T (T (u)) =

T (au) = au and T (T (u)) = T (u) = au. So T T = T T on

V . This proves that T is normal on U since U = V V .

Section 6.5

6.5.4 Since T is normal, we have (T 2 ) T 2 = (T T )2 so that the eigenvalues

of (T 2 ) T 2 are those of T T squared. This proves T 2 = T 2 in

particular. Likewise, (T m ) T m = (T T )m so that the eigenvalues

of (T m ) T m are those of T T raised to the mth power. So T m =

T m is true in general.

303

6.5.8 Note that some special forms of this problem have appeared as Exercises 6.3.11 and 6.3.12. By the singular value decomposition for A we

may rewrite A as A = P Q where Q, P C(n, n) are unitary and

R(n, n) is diagonal whose diagonal entries are all nonnegative.

Alternatively, we also have A = (P Q)(Q Q) = (P P )(P Q), as

products of a unitary and positive semi-definite matrices, expressed in

two different orders.

Section 7.1

7.1.4 Since g

= 0, we also have f

= 0. Assume f n |g n . We show that f |g. If

n = 1, there is nothing to do. Assume n 2. Let h = gcd(f, g). Then

we have f = hp, g = hq, p, q P, and gcd(p, q) = 1. If p is a scalar,

then f |g. Suppose otherwise that p is not a scalar. We rewrite f n and g n

as f n = hn pn and g n = hn q n . Since f n |g n , we have g n = f n r, where

r P. Hence hn q n = hn pn r. Therefore q n = pn r. In particular, p|q n .

However, since gcd(p, q) = 1, we have p|q n1 . Arguing repeatedly we

arrive at p|q, which is a contradiction.

7.1.5 Let h = gcd(f, g). Then f = hp, g = hq, and gcd(p, q) = 1.

Thus f n = hn pn , g n = hn q n , and gcd(p n , q n ) = 1, which implies

gcd(f n , g n ) = hn .

Section 7.2

7.2.1 If pS () are pT () are relatively prime, then there are polynomials f, g

such that f ()pS () + g()pT () = 1. Thus, I = f (T )pS (T ) +

g(T )pT (T ) = f (T )pS (T ) which implies pS (T ) is invertible and

pS (T )1 = f (T ). Similarly we see that pT (S) is also invertible.

7.2.2 Using the notation of the previous exercise, we have I = f (T )pS (T ).

Thus, applying R S = T R, we get

R = f (T )pS (T ) R = R f (S)pS (S) = 0.

7.2.3 It is clear that N (T ) R(I T ). It is also clear that R(I T )

N(T ) when T 2 = T . So N (T ) = R(I T ) if T 2 = T . Thus (7.2.14)

follows from the rank equation r(T ) + n(T ) = dim(U ). Conversely,

assume (7.2.14) holds. From this and the rank equation again we get

r(I T ) = n(T ). So N (T ) = R(I T ) which establishes T 2 = T .

7.2.4 We have f1 g1 + f2 g2 = 1 for some polynomials f1 , f2 . So

I = f1 (T )g1 (T ) + f2 (T )g2 (T ).

(S22)

304

v = f1 (T )g1 (T )u and w = f2 (T )g2 (T )u. Hence g2 (T )v =

f1 (T )pT (T )u = 0 and g1 (T )w = f2 (T )pT (T )u = 0. This shows v

N(g2 (T )) and w N (g1 (T )). Therefore U = N (g1 (T )) + N (g2 (T )).

Pick u N (g1 (T ))N (g2 (T )). Applying (S22) to u we see that u = 0.

Hence the problem follows.

7.2.6 We proceed by induction on k. If k = 1, (7.2.16) follows from Exercise 7.2.3. Assume that at k 1 1 the relation (7.2.16) holds. That is,

U = R(T1 ) R(Tk1 ) W,

W = N (T1 ) N (Tk1 ).

(S23)

R(Tk1 ) V . In fact, pick any u N (Tk ). Then (S23)

indicates that u = u1 + + uk1 + w for u1 R(T1 ), . . . , uk1

R(Tk1 ), w W . Since Tk Ti = 0 for i = 1, . . . , k 1, we see that

u1 , . . . , uk1 N (Tk ). Hence w N (Tk ). So w V . This establishes

the assertion and the problem follows.

Section 7.3

7.3.2 (ii) From (7.3.41) we see that, if we set

p() = n an1 n1 a1 a0 ,

then p(T )(T k (u)) = T k (p(T )(u)) = 0 for k = 0, 1, . . . , n 1, where

T 0 = I . This establishes p(T ) = 0 since {T n1 (u), . . . , T (u), u} is a

basis of U . So pT () = p(). It is clear that mT () = pT ().

7.3.3 Let u be a cyclic vector of T . Assume that S and T commute. Let

a0 , a1 , . . . , an1 F be such that S(u) = an1 T n1 (u) + +

a1 T (u)+a0 u. Set p(t) = an1 t n1 + +a1 t +a0 . Hence S(T k (u)) =

T k (S(u)) = (T k p(T ))(u) = p(T )(T k (u)) for k = 1, . . . , n 1. This

proves S = p(T ) since u, T (u), . . . , T n1 (u) form a basis of U .

7.3.4 (i) Let u be a cyclic vector of T . Since T is normal, U has a basis

consisting of eigenvectors, say u1 , . . . , un , of T , associated with the

corresponding eigenvalues, 1 , . . . , n . Express u as u = a1 u1 + +

an un for some a1 , . . . , an C. Hence

T (u) = a1 1 u1 + + an n un ,

T

n1

n1

(u) = a1 n1

1 u1 + + an n un .

305

x1 u + x2 T (u) + + xn T n1 (u) = 0,

we obtain

a1 (x1 + 1 x2 + + n1

1 xn ) = 0,

a (x + x + + n1 x ) = 0,

2 1

2 2

n

2

n1

an (x1 + n x2 + + n xn ) = 0.

(S24)

(S25)

(x1 , . . . , xn ) Cn , which is not the zero vector for any given

a1 , . . . , an , contradicting with the linear independence of the vectors

u, T (u), . . . , T n1 (u).

(ii) With (7.3.42), we consider the linear dependence of the vectors

u, T (u), . . . , T n1 (u) as in part (i) and come up with (S24) and (S25).

Since 1 , . . . , n are distinct, in view of the Vandermonde determinant,

we see that the system (S25) has only zero solution x1 = 0, . . . , xn = 0

if and only if ai

= 0 for any i = 1, . . . , n.

7.3.5 Assume there is such an S. Then S is nilpotent of degree m where m

satisfies 2(n 1) < m 2n since S 2(n1) = T n1

= 0 and S 2n =

T n = 0. By Theorem 2.22, we arrive at 2(n 1) < m n, which is

false.

Section 7.4

7.4.5 Let the invertible matrix C C(n, n) be decomposed as C = P + iQ,

where P , Q R(n, n). If Q = 0, there is nothing to show. Assume

Q

= 0. Then from CA = BC we have P A = BP and QA = BQ.

Thus, for any real number , we have (P + Q)A = B(P + Q). It

is clear that there is some R such that det(P + Q)

= 0. For such

, set K = P + Q. Then A = K 1 BK.

7.4.9 (i) Let u Cn \ {0} be an eigenvector associated to the eigenvalue

. Then Au = u. Thus Ak u = k u. Since there is an invertible matrix B C(n, n) such that Ak = B 1 AB, we obtain

(B 1 ABu) = k u or A(Bu) = k (Bu), so that the problem

follows.

2

l

(ii) From (i) we see that if C is an eigenvalue then k , k , . . . , k

are all eigenvalues of A which cannot be all distinct when l is

large enough. So there are some integers 1 l < m such that

306

l

7.4.11

7.4.12

7.4.17

7.4.18

= 0. Hence satisfies

ml

k

= 1 as asserted.

Use (3.4.37) without assuming det(A)

= 0 or (S16).

Take u = (1, . . . , 1)t Rn . It is clear that Au = nu. It is also

clear that n(A) = n 1 since r(A) = 1. So n(A nIn ) = 1

b2

bn

and A diag{n, 0, . . . , 0}. Take v = (1, , . . . , )t Rn . Then

n

n

Bv = nv. Since r(B) = 1, we get n(B) = n 1. So n(B nIn ) = 1.

Consequently, B diag{n, 0, . . . , 0}. Thus A B.

Suppose otherwise that there is an A R(3, 3) such that m() = 2 +

3 + 4 is the minimal polynomial of A. Let pA () be the characteristic

polynomial of A. Since pA () P3 and the coefficients of pA () are

all real, so pA () has a real root. On the other hand, recall that the

roots of m() are all the roots of pA () but the former has no real root.

So we arrive at a contradiction.

(i) Let mA () be the minimal polynomial of A. Then mA ()|2 + 1.

However, 2 + 1 is prime over R, so mA () = 2 + 1.

(ii) Let pA () be the characteristic polynomial of A. Then the degree

of pA () is n. Since mA () = 2 + 1 contains all the roots of

pA () in C, which are i, which must appear in conjugate pairs

because pA () is of real coefficients, so pA () = (2 + 1)m for

some integer m 1. Hence n = 2m.

(iii) Since pA () = ( i)m ( + i)m and mA () = ( i)( + i) (i.e.,

i are single roots of mA ()), we know that

N (A iIn ) = {x Cn | Ax = ix},

N (A + iIn ) = {y Cn | Ay = iy},

are both of dimension m in Cn and Cn = N (A iIn ) N (A +

iIn ). Since A is real, we see that if x N (A iIn ) then x

N(A + iIn ), and vice versa. Moreover, if {w1 , . . . , wm } is a basis

of N (A iIn ), then {w1 , . . . , w m } is a basis of N (A + iIn ), and

vice versa. Thus {w1 , . . . , wm , w 1 , . . . , w m } is a basis of Cn . We

now make the decomposition

wi = ui + ivi ,

ui , vi Rn ,

i = 1, . . . , m.

(S26)

that these vectors are linearly independent in Rn . In fact, consider

m

ai ui +

i=1

m

bi vi = 0,

307

a1 , . . . , am , b1 , . . . , bm R. (S27)

i=1

ui =

1

(wi + w i ),

2

vi =

1

(wi wi ),

2i

i = 1, . . . , m.

(S28)

m

m

(ai ibi )wi +

(ai + ibi )w i = 0,

i=1

i=1

(iv) Take ordered basis B = {u1 , . . . , um , v1 , . . . , vm }. Then it is seen

that, with respect to B, the matrix representation of the mapping

TA L(Rn ) defined by TA (u) = Au, u Rn is simply

0

Im

C=

.

Im 0

More precisely, if a matrix called B is formed by using the vectors

in the ordered basis B as its first, second, . . . , and the nth column

vectors, then AB = BC, which establishes (7.4.24).

Section 8.1

8.1.5 If A is normal, there is a unitary matrix P C(n, n) such that A =

P DP , where D is a diagonal matrix of the form diag{1 , . . . , n } with

1 , . . . , n the eigenvalues of A which are assumed to be real. Thus

A = A. However, because A is real, we have A = At .

8.1.6 We proceed inductively on dim(U ). If dim(U ) = 1, there is nothing to

show. Assume that the problem is true at dim(U ) = n 1 1. We

prove the conclusion at dim(U ) = n 2. Let C be an eigenvalue

of T and E the associated eigenspace of T . Then for u E we have

T (S(u)) = S(T (u)) = S(u). Hence S(u) E . So E is invariant under S. As an element in L(E ), S has an eigenvalue C. Let u E

be an eigenvector of S associated with . Then u is a common eigenvector of S and T . Applying this observation to S and T since S , T

commute as well, we know that S and T also have a common eigenvector, say w, satisfying S (w) = w, T (w) = w, for some , C.

308

Let V = (Span{w}) . Then V is invariant under S and T as can be seen

from

(w, S(v)) = (S (w), v) = ( w, v) = (w, v) = 0,

(w, T (v)) = (T (w), v) = ( w, v) = (w, v) = 0,

of V , say {u1 , . . . , un1 }, under which the matrix representations of S

and T are upper triangular. Let un = w/w. Then {u1 , . . . , un1 , un }

is an orthonormal basis of U under which the matrix representations of

S and T are upper triangular.

8.1.9 Let C be any eigenvalue of T and v U an associated eigenvector.

Then we have

((T I )(u), v) = (u, (T I )(v)) = 0,

u U,

(S29)

n 1. Thus, in view of the rank equation, we obtain n(T I ) 1. In

other words, this shows that must be an eigenvalue of T .

Section 8.2

8.2.3 Let B = {u1 , . . . , un } be a basis of U and x, y Fn the coordinate

vectors of u, v U with respect to B. With A = (aij ) = (f (ui , uj ))

and (8.2.2), we see that u U0 if and only if x t Ay = 0 for all y Fn

or Ax = 0. In other words, u U0 if and only if x N (A). So

dim(U0 ) = n(A) = n r(A) = dim(U ) r(A).

8.2.4 (i) As in the previous exercise, we use B = {u1 , . . . , un } to denote

a basis of U and x, y Fn the coordinate vectors of any vectors

u, v U with respect to B. With A = (aij ) = (f (ui , uj )) and

(8.2.2), we see that u V if and only if

(Ax)t y = 0,

v V.

(S30)

y (1) , . . . , y (m) in Fn to replace the condition (S30) by

(y (1) )t (Ax) = 0, . . . , (y (m) )t (Ax) = 0.

These equations indicate that, if we use B to denote the matrix

formed by taking (y (1) )t , . . . , (y (m) )t as its first,. . . , and mth row

vectors, then Ax N (B) = {z Fn | Bz = 0}. In other words, the

subspace of Fn consisting of the coordinate vectors of the vectors in

V is given by X = {x Fn | Ax N (B)}. Since A is invertible,

309

establishes dim(V ) = dim(X) = dim(U ) dim(V ).

(ii) For v V , we have f (u, v) = 0 for any u V . So V (V ) .

On the other hand, from (i), we get

dim(V ) + dim(V ) = dim(V ) + dim((V ) ).

So dim(V ) = dim((V ) ) which implies V = (V ) .

1

8.2.7 If V = V , then V is isotropic and dim(V ) = dim(U ) in view of

2

Exercise 8.2.4. Thus V is Lagrangian. Conversely, if V is Lagrangian,

1

then V is isotropic such that V V and dim(V ) = dim(U ). From

2

1

2

Section 8.3

8.3.1 Let u = (ai ) Rn be a positive eigenvector associated to r. Then the

ith component of the relation Au = ru reads

rai =

n

aij aj ,

i = 1, . . . , n.

(S31)

j =1

ak = min{ai | i = 1, . . . , n},

al = max{ai | i = 1, . . . , n}.

rak ak

n

akj ,

ral al

j =1

n

alj .

j =1

8.3.3 Use the notation A = { R | 0, Ax x for some x S},

where S is defined by (8.3.3). Recall the construction (8.3.7). We see

that rA = sup{ A }. Since A B implies A B , we deduce

rA rB .

Section 8.4

8.4.4 From lim Am = K, we obtain lim (At )m = lim (Am )t = K t .

m

Since all the row vectors of K and K t are identical, we see that all

entries of K are identical. By the condition (8.4.13), we deduce (8.4.24).

310

8.4.5 It is clear that all the entries of A and At are non-negative. It remains

to show that 1 and u = (1, . . . , 1)t Rn are a pair of eigenvalues and

eigenvectors of both A and At . In fact, applying Ai u = u and Ati u = u

(i = 1, . . . , k) consecutively, we obtain Au = A1 Ak u = u and

At u = Atk At1 u = u, respectively.

Section 9.3

9.3.2 (i) In the uniform state (9.3.30), we have

1

i ,

n

n

A =

1 2

i .

n

n

A2 =

i=1

i=1

is given by the formula

n

2

n

1 2

1

2

2

2

i

i .

(S32)

A = A A =

n

n

i=1

i=1

n

2

n

1

1

1 2

i

i (max min )2 .

n

n

4

i=1

i=1

Bibliographic notes

We end the book by mentioning a few important but more specialized subjects

that are not touched in this book. We point out only some relevant references

for the interested.

Convex sets. In Lang [23] basic properties and characterizations of convex

sets in Rn are presented. For a deeper study of convex sets using advanced

tools such as the HahnBanach theorem see Lax [25].

Tensor products and alternating forms. These topics are covered elegantly

by Halmos [18]. In particular, there, the determinant is seen to arise as

the unique scalar, associated with each linear mapping, defined by the onedimensional space of top-degree alternating forms, over a finite-dimensional

vector space.

Minmax principle for computing the eigenvalues of self-adjoint mappings.

This is a classical variational resolution of the eigenvalue problem known

as the method of the RayleighRitz quotients. For a thorough treatment see

Bellman [6], Lancaster and Tismenetsky [22], and Lax [25].

Calculus of matrix-valued functions. These techniques are useful and powerful in applications. For an introduction see Lax [25].

Irreducible matrices. Such a notion is crucial for extending the Perron

Frobenius theorem and for exploring the Markov matrices further under more

relaxed conditions. See Berman and Plemmons [7], Horn and Johnson [21],

Lancaster and Tismenetsky [22], Meyer [29], and Xu [38] for related studies.

Transformation groups and bilinear forms. Given a non-degenerate bilinear form over a finite-dimensional vector space, the set of all linear mappings

on the space which preserve the bilinear form is a group under the operation

of composition. With a specific choice of the bilinear form, a particular such

transformation group may thus be constructed and investigated. For a concise

introduction to this subject in the context of linear algebra see Hoffman and

Kunze [19].

311

312

Bibliographic notes

solving systems of linear equations are well investigated and documented. See

Lax [25], Golub and Ortega [13], and Stoer and Bulirsch [32] for a description

of some of the methods.

Computing the eigenvalues of symmetric matrices. This is a much developed subject and many nice methods are available. See Lax [25] for methods based on the QR factorization and differential flows. See also Stoer and

Bulirsch [32].

Computing the eigenvalues of general matrices. Some nicely formulated

iterative methods may be employed to approximate the eigenvalues of a general

matrix under certain conditions. These methods include the QR convergence

algorithm and the power method. See Golub and Ortega [13] and Stoer and

Bulirsch [32].

Random matrices. The study of random matrices, the matrices whose entries

are random variables, was pioneered by Wigner [36, 37] to model the spectra of

large atoms and has recently become the focus of active mathematical research.

See Akemann, Baik, and Di Francesco [1], Anderson, Guionnet, and Zeitouni

[2], Mehta [28], and Tao [33], for textbooks, and Beenakker [5], Diaconis [12],

and Guhr, Mller-Groeling, and Weidenmller [17], for survey articles.

Besides, for a rich variety of applications of Linear Algebra and its related

studies, see Bai, Fang, and Liang [3], Bapat [4], Bellman [6], Berman and

Plemmons [7], Berry, Dumais, and OBrien [8], Brualdi and Ryser [9], Datta

[10], Davis [11], Gomide et al [14], Graham [15], Graybill [16], Horadam [20],

Latouche and Vaidyanathan [24], Leontief [26], Lyubich, Akin, Vulis, and Karpov [27], Meyn and Tweedie [30], Stinson [31], Taubes [34], Van Dooren and

Wyman [35], and references therein.

References

Matrix Theory, Oxford University Press, Oxford, 2011.

[2] G. W. Anderson, A. Guionnet, and O. Zeitouni, An Introduction to Random

Matrices, Cambridge University Press, Cambridge, 2010.

[3] Z. Bai, Z. Fang, and Y.-C. Liang, Spectral Theory of Large Dimensional Random Matrices and Its Applications to Wireless Communications and Finance

Statistics, World Scientific, Singapore, 2014.

[4] R. B. Bapat, Linear Algebra and Linear Models, 3rd edn, Universitext, SpringerVerlag and Hindustan Book Agency, New Delhi, 2012.

[5] C. W. J. Beenakker, Random-matrix theory of quantum transport, Reviews of

Modern Physics 69 (1997) 731808.

[6] R. Bellman, Introduction to Matrix Analysis, 2nd edn, Society of Industrial and

Applied Mathematics, Philadelphia, 1997.

[7] A. Berman and R. J. Plemmons, Nonnegative Matrices in the Mathematical Sciences, Society of Industrial and Applied Mathematics, Philadelphia, 1994.

[8] M. Berry, S. Dumais, and G. OBrien, Using linear algebra for intelligent information retrieval, SIAM Review 37 (1995) 573595.

[9] R. A. Brualdi and H. J. Ryser, Combinatorial Matrix Theory, Encyclopedia of

Mathematics and its Applications 39, Cambridge University Press, Cambridge,

1991.

[10] B. N. Datta, Numerical Linear Algebra and Applications, 2nd edn, Society of

Industrial and Applied Mathematics, Philadelphia, 2010.

[11] E. Davis, Linear Algebra and Probability for Computer Science Applications,

A. K. Peters/CRC Press, Boca Raton, FL, 2012.

[12] P. Diaconis, Patterns in eigenvalues: the 70th Josiah Willard Gibbs lecture, Bulletin of American Mathematical Society (New Series) 40 (2003) 155178.

[13] G. H. Golub and J. M. Ortega, Scientific Computing and Differential Equations,

Academic Press, Boston and New York, 1992.

[14] J. Gomide, R. Melo-Minardi, M. A. dos Santos, G. Neshich, W. Meira, Jr., J. C.

Lopes, and M. Santoro, Using linear algebra for protein structural comparison

and classification, Genetics and Molecular Biology 32 (2009) 645651.

[15] A. Graham, Nonnegative Matrices and Applicable Topics in Linear Algebra,

John Wiley & Sons, New York, 1987.

313

314

References

Wadsworth Publishing Company, Belmont, CA, 1969.

[17] T. Guhr, A. Mller-Groeling, and H. A. Weidenmller, Random-matrix theories

in quantum physics: common concepts, Physics Reports 299 (1998) 189425.

[18] P. R. Halmos, Finite-Dimensional Vector Spaces, 2nd edn, Springer-Verlag, New

York, 1987.

[19] K. Hoffman and R. Kunze, Linear Algebra, Prentice-Hall, Englewood Cliffs,

NJ, 1965.

[20] K. J. Horadam, Hadamard Matrices and Their Applications, Princeton University Press, Princeton, NJ, 2007.

[21] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press,

Cambridge, New York, and Melbourne, 1985.

[22] P. Lancaster and M. Tismenetsky, The Theory of Matrices, 2nd edn, Academic

Press, San Diego, New York, London, Sydney, and Tokyo, 1985.

[23] S. Lang, Linear Algebra, 3rd edn, Springer-Verlag, New York, 1987.

[24] G. Latouche and R. Vaidyanathan, Introduction to Matrix Analytic Methods in

Stochastic Modeling, Society of Industrial and Applied Mathematics, Philadelphia, 1999.

[25] P. D. Lax, Linear Algebra and Its Applications, John Wiley & Sons, Hoboken,

NJ, 2007.

[26] W. Leontief, Input-Output Economics, Oxford University Press, New York,

1986

[27] Y. I. Lyubich, E. Akin, D. Vulis, and A. Karpov, Mathematical Structures in

Population Genetics, Springer-Verlag, New York, 2011.

[28] M. L. Mehta, Random Matrices, Elsevier Academic Press, Amsterdam, 2004.

[29] C. Meyer, Matrix Analysis and Applied Linear Algebra, Society of Industrial

and Applied Mathematics, Philadelphia, 2000.

[30] S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993; 2nd edn, Cambridge University Press,

Cambridge, 2009.

[31] D. R. Stinson, Cryptography, Discrete Mathematics and its Applications, Chapman & Hall/CRC Press, Boca Raton, FL, 2005.

[32] J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, Springer-Verlag,

New York, Heidelberg, and Berlin, 1980.

[33] T. Tao, Topics in Random Matrix Theory, American Mathematical Society,

Providence, RI, 2012.

[34] C. H. Taubes, Lecture Notes on Probability, Statistics, and Linear Algebra,

Department of Mathematics, Harvard University, Cambridge, MA, 2010.

[35] P. Van Dooren and B. Wyman, Linear Algebra for Control Theory, IMA Volumes in Mathematics and its Applications, Springer-Verlag, New York, 2011.

[36] E. Wigner, Characteristic vectors of bordered matrices with infinite dimensions,

Annals of Mathematics 62 (1955) 548564.

[37] E. Wigner, On the distribution of the roots of certain symmetric matrices, Annals

of Mathematics 67 (1958) 325327.

[38] Y. Xu, Linear Algebra and Matrix Theory (in Chinese), 2nd edn, Higher Education Press, Beijing, 2008.

Index

1-form, 16

characteristic roots, 107

addition, 3

adjoint mapping, 50, 122

adjoint matrix, 103

adjugate matrix, 103

adjunct matrix, 103

algebraic multiplicity, 216

algebraic number, 13

angular frequencies, 255

annihilating polynomials, 221

annihilator, 19

anti-Hermitian mappings, 195

anti-Hermitian matrices, 7

anti-lower triangular matrix, 99

anti-self-adjoint, 126

anti-self-adjoint mappings, 195

anti-self-dual, 126

anti-symmetric, 4

anti-symmetric forms, 230

anti-upper triangular matrix, 99

basis, 13

basis change matrix, 15

basis orthogonalization, 117

basis transition matrix, 15, 45

Bessel inequality, 140

bilinear form, 147, 180

bilinear forms, 147

boxed diagonal matrix, 57

boxed lower triangular form, 99

boxed upper triangular form, 98

boxed upper triangular matrix, 56

bracket, 248

characteristic of a field, 2

characteristic polynomial, 107

characteristic polynomial of linear mapping,

109

Cholesky decomposition theorem, 168

Cholesky relation, 168

co-prime, 207

codimension, 27

cofactor, 89

cofactor expansion, 89

cokernel, 39

column rank of a matrix, 52

commutator, 256

complement, 22

complete set of orthogonal vectors, 139

component, 4

composition of mappings, 37

congruent matrices, 148

contravariant vectors, 18

convergent sequence, 28

convex, 164

coordinate vector, 14

coordinates, 14

covariant vectors, 18

Cramers formulas, 80, 104

Cramers rule, 80, 104

cyclic vector, 62, 217

cyclic vector space, 217

Darboux basis, 235

degree, 86

degree of a nilpotent mapping, 62

dense, 72

determinant, 79

determinant of a linear mapping, 105

315

316

Index

diagonal matrix, 6

diagonalizable, 222

diagonally dominant condition, 101

dimension, 13

direct product of vector spaces, 22

direct sum, 21

direct sum of mappings, 58

dominant, 237

dot product, 5

doubly Markov matrices, 247

dual basis, 17

dual mapping, 122

dual space, 16

Heisenberg uncertainty principle, 258

Hermitian congruent, 181

Hermitian conjugate, 7, 134

Hermitian matrices, 7

Hermitian matrix, 134

Hermitian scalar product, 127

Hermitian sesquilinear form, 182

Hermitian skewsymmetric forms, 236

Hermitian symmetric matrices, 7

HilbertSchmidt norm of a matrix, 137

homogeneous, 148

hyponormal mappings, 199

eigenmodes, 255

eigenspace, 57

eigenvalue, 57

eigenvector, 57

Einstein formula, 255

entry, 4

equivalence of norms, 30

equivalence relation, 48

equivalent polynomials, 207

Euclidean scalar product, 124, 127

ideal, 205

idempotent linear mappings, 60

identity matrix, 6

image, 38

indefinite, 161

index, 83

index of negativity, 119

index of nilpotence, 62

index of nullity, 119

index of positivity, 119

infinite dimensional, 13

injective, 38

invariant subspace, 55

inverse mapping, 42

invertible linear mapping, 41

invertible matrix, 7

irreducible linear mapping, 56

irreducible polynomial, 207

isometric, 142

isometry, 142

isomorphic, 42

isomorphism, 42

isotropic subspace, 235

field, 1

field of characteristic 0, 2

field of characteristic p, 2

finite dimensional, 13

finite dimensionality, 13

finitely generated, 13

form, 16

Fourier coefficients, 139

Fourier expansion, 139

Fredholm alternative, 53, 137, 178

Frobenius inequality, 44

functional, 16

Fundamental Theorem of Algebra, 85

generalized eigenvectors, 216

generalized Schwarz inequality, 194

generated, 8

generic property, 72

geometric multiplicity, 57

Gram matrix, 136

GramSchmidt procedure, 117

great common divisor, 207

Gronwall inequality, 263

Hamiltonian, 254

hedgehog map, 88

Heisenberg equation, 263

Jordan canonical form, 220

Jordan decomposition theorem, 220

Jordan matrix, 220

kernel, 38

least squares approximation, 176

left inverse, 7, 42

Legendre polynomials, 141

LevyDesplanques theorem, 101

limit, 28

linear complement, 22

linear function, 16

linear span, 8

Index

linear transformation, 55

linearly dependent, 8, 9

linearly independent, 10

linearly spanned, 8

locally nilpotent mappings, 62

lower triangular matrix, 6

mapping addition, 35

Markov matrices, 243

matrix, 4

matrix exponential, 76

matrix multiplication, 5

matrix-valued initial value problem, 76

maximum uncertainty, 261

maximum uncertainty state, 261

Measurement postulate, 252

metric matrix, 136

minimal polynomial, 67, 221

Minkowski metric, 120

Minkowski scalar product, 120

Minkowski theorem, 101

minor, 89

mutually complementary, 22

negative definite, 161

negative semi-definite, 161

nilpotent mappings, 62

non-definite, 161

non-degenerate, 120

non-negative, 161

non-negative matrices, 237

non-negative vectors, 237

non-positive, 161

nonsingular matrix, 7

norm, 28

norm that is stronger, 29

normal equation, 176

normal mappings, 172, 194, 195

normal matrices, 197

normed space, 28

null vector, 115

null-space, 38

nullity, 39

nullity-rank equation, 40

observable postulate, 252

observables, 252

one-parameter group, 74

one-to-one, 38

onto, 38

open, 73

orthogonal, 115

orthogonal mapping, 123, 132

317

orthogonal matrices, 7

orthogonal matrix, 134

orthonormal basis, 119, 130

Parseval identity, 140

Pauli matrices, 257

period, 62

permissible column operations, 95

permissible row operations, 92

perpendicular, 115

PerronFrobenius theorem, 237

photoelectric effect, 255

Planck constant, 254

polar decomposition, 198

polarization identities, 149

polarization identity, 132

positive definite Hermitian mapping, 188

positive definite Hermitian matrix, 188

positive definite quadratic form, 188

positive definite quadratic forms, 158

positive definite scalar product over a complex

vector space, 128

positive definite scalar product over a real

vector space, 128

positive definite self-adjoint mapping, 188

positive definite self-adjoint mappings, 158

positive definite symmetric matrices, 158

positive diagonally dominant condition, 101

positive matrices, 237

positive semi-definite, 161

positive vectors, 237

preimages, 38

prime polynomial, 207

principal minors, 166

product of matrices, 5

projection, 60

proper subspace, 8

Pythagoras theorem, 129

QR factorization, 134

quadratic form, 148, 181

quotient space, 26

random variable, 252

range, 38

rank, 39

rank equation, 40

rank of a matrix, 52

reducibility, 56

reducible linear mapping, 56

reflective, 19

reflectivity, 19

regular Markov matrices, 243

318

Index

relatively prime, 207

Riesz isomorphism, 122

Riesz representation theorem, 121

right inverse, 7, 42

row rank of a matrix, 52

scalar multiple of a functional, 16

scalar multiplication, 3

scalar product, 115

scalar-mapping multiplication, 35

scalars, 1, 3

Schrdinger equation, 254

Schrdinger picture, 262

Schur decomposition theorem, 226

Schwarz inequality, 129

self-adjoint mapping, 122

self-dual mapping, 122

self-dual space, 122

sesquilinear form, 180

shift matrix, 65

signed area, 79

signed volume, 80

similar matrices, 48

singular value decomposition for a mapping,

202

singular value decomposition for a matrix, 203

singular values of a mapping, 202

singular values of a matrix, 203

skewsymmetric, 4

skew-Hermitian forms, 236

skew-Hermitian matrices, 7

skewsymmetric forms, 230

special relativity, 120

square matrix, 4

stable Markov matrices, 247

stable matrix, 246

standard deviation, 259

state space, 252

states, 252

stochastic matrices, 243

subspace, 8

sum of functionals, 16

sum of subspaces, 20

surjective, 38

Sylvester inequality, 44

Sylvester theorem, 118

symmetric, 4

symmetric bilinear forms, 149

symplectic basis, 235

symplectic complement, 236

symplectic forms, 235

symplectic vector space, 235

time evolution postulate, 253

topological invariant, 82

transcendental number, 13

transpose, 4

uncertainty, 259

uniform state, 261

unit matrix, 6

unitary mapping, 132

unitary matrices, 7

unitary matrix, 134

upper triangular matrix, 6

Vandermonde determinant, 105

variance, 257

variational principle, 164

vector space, 1

vector space over a field, 3

vectors, 1, 3

wave function, 252

waveparticle duality hypothesis, 255

- Python.pdfTransféré paramanthegreat
- Numerical Linear Algebra Biswa Nath Datta PART II_Ch7-11Transféré parlpvasam
- Linear Algebra and Geometry Algebra Logic and ApplicationsTransféré par荆晓艺
- Non Linear DynamicsTransféré parZomishah Kakakhail
- 0521279488.LinearAlgebraTransféré parOmarMorales
- The Python Book - The Ultimate Guide to Coding With Python (2015)Transféré parJohn Blossom
- Real and functional AnalysisTransféré parJackPun
- Dive Into PythonTransféré parJan Tan
- Introduction to Linear Algebra for Science and Engineering 1st EdTransféré parJoe
- 251783142-Multilinear-Algebra-Greub.pdfTransféré parpedro
- Matrix Methods - Applied Linear Algebra 3rd Ed - Bronson,Costa.pdfTransféré paraufal Riswan
- Watkins D. Fundamentals of Matrix ComputationsTransféré parusersmall
- Linear Algebra - Challenging Problems for Students - Fuzhen Zhang (1)Transféré pargeeorgi23
- Matrix Analysis for Scientists and Engineers by Alan J LaubTransféré parJulio Cesar Barraza Bernaola
- O'Neill Differential GeometryTransféré paradolfoaguirre
- Python Devloper's HandbookTransféré parWasdodi Al-Assa
- Python 3 Orientado a ObjetoTransféré parJack Lummer
- Linear AlgebraTransféré parallanviento
- Lectures on Generating Functions, Sergei Lando-1Transféré parittelleppac
- Introduction to Matrix AlgebraTransféré parburmeseboyz
- Wendell Fleming Functions of Several VariablesTransféré parThiago Nobre
- Introduction to Numerical Ordinary and Partial Differential Equations Using MATLABTransféré parÖzlem Yılmaz
- - Differential Geometry With Applications To Mechanics And PhysicsTransféré paralvaro562003
- DISCRETE MATHEMATICSTransféré parKumar shantanu Basak
- Functional Programming PythonTransféré parAnonymous P2ZN8X
- Algebra - A Graduate Course (I. Martin Isaacs)Transféré parEgill Sigurðarson
- Linear Programming With MatlabTransféré parBinod Dhakal
- Numerical Linear Algebra Biswa Nath Datta PART I _Ch1-6Transféré parlpvasam

- A Cp Theory Problem Book-Functional Equaivalencies (Vladimir Tkachuk)Transféré parAnonymous 3zbI8CUQUK
- Grading on Simple Lie Algebras (Elduque & Kochetov)Transféré parAnonymous 3zbI8CUQUK
- Petica+_6__razred_II_svTransféré parAnonymous 3zbI8CUQUK
- Commutation Relations,Normal Orderings and Stirling Numbers (Mnasour & Schork)Transféré parAnonymous 3zbI8CUQUK
- lecture.pdfTransféré parAnonymous 3zbI8CUQUK
- Neke konačne sumeTransféré parMate Šimović
- MatematickaAnalizaTransféré parДејан Бунановић
- Arithmetically Cohen-Macaulay Sets of Points (Guardo & Tuyl)Transféré parAnonymous 3zbI8CUQUK
- hlatki_ana_mathos_2017_diplo_sveuc.pdfTransféré parAnonymous 3zbI8CUQUK
- klasa-e-dyte-versioni-final-16-gusht-2018Transféré parAnonymous 3zbI8CUQUK
- Petica+_6__razred_II_sv.pdfTransféré parcikassole
- (Problem Books in Mathematics) Vladimir v. Tkachuk (Auth.)-A Cp-Theory Problem Book_ Functional Equivalencies-Springer International Publishing (2016)Transféré parJavier Alejandro Puga Cañedo
- (Problem Books in Mathematics) Vladimir v. Tkachuk (Auth.)-A Cp-Theory Problem Book_ Functional Equivalencies-Springer International Publishing (2016)Transféré parJavier Alejandro Puga Cañedo
- LATransféré parBoro Saric
- A Guide to NIP Theories (Pierre Simon).pdfTransféré parAnonymous 3zbI8CUQUK
- Universiteti i Prishtinës Akm ModifikuarTransféré parAnonymous 3zbI8CUQUK
- Funksionet.pdfTransféré parAnonymous 3zbI8CUQUK
- A Concise Introduction to Numerical Integration (Blanes & Kasas)Transféré parAnonymous 3zbI8CUQUK
- Asymptotic Analysis for Functional Stochastic Differential Equations (Bao, Yin & Yuan)Transféré parAnonymous 3zbI8CUQUK
- n. reale.pdfTransféré parAnonymous 3zbI8CUQUK
- Shprehjet Algjebrike www.valmirnuredini.tkTransféré parvalmir nuredini
- TemaTransféré parAnonymous 3zbI8CUQUK
- (Problem Books in Mathematics) Vladimir v. Tkachuk (Auth.)-A Cp-Theory Problem Book_ Functional Equivalencies-Springer International Publishing (2016)Transféré parJavier Alejandro Puga Cañedo
- 3-Manifold Groups (Aschenbrenner, Friedl & Wilton)Transféré parAnonymous 3zbI8CUQUK
- A Guide to NIP Theories (Pierre Simon).pdfTransféré parAnonymous 3zbI8CUQUK
- A Brief Introduction to Symplectic Manifolds (Banyaga & Houenou)Transféré parAnonymous 3zbI8CUQUK
- A Brief Introduction to Symplectic Manifolds (Banyaga & Houenou)Transféré parAnonymous 3zbI8CUQUK
- Algjebra2Transféré parAnonymous 3zbI8CUQUK
- 6300_solutionsmanualTransféré parEhssan Ghashim
- Libri EXTRA 2.pdfTransféré parAnonymous 3zbI8CUQUK

- Advanced QMTransféré parSacha Mangerel
- A Philosopher's Understanding of Quantum Mechanics [Vermaas]Transféré parBeatrizFerreira
- Atomic Physics Lecture Notes 29Nov2011Transféré parmcanali
- Quantum Computation and Quantum Information Chapter 2 Exercises, by FoobananaTransféré parFoobanana
- Aqm Lecture 9Transféré parsayandatta1
- faTransféré parHadeelAl-hamaam
- Green's FunctionsTransféré parThomas Manning
- Ch. 5Transféré parPartha Pratim Basumallick
- Ch 1 Quantum FormalismTransféré parahsbon
- The Rigged Hilbert Space Formulation of Quantum Mechanics AnTransféré parCristin Lee
- Iteration stability for simple Newtonian stellar systemsTransféré parHaris Markakis
- Schuller's Lectures on Quantum Theory 1-8Transféré parSimon Rea
- ch1Transféré parDipro Mondal
- B.sc. Computer ScienceTransféré parAakash Santhanam
- On the Schur Complement Form of the Dirichlet-To-Neumann OperatorTransféré parthgnguyen
- baezTransféré parMaurits Tepaske
- The Geometry and Thermodynamics of Dissipative Quantum SystemsTransféré paremiranuar
- MSc Integrated Course-Of-Study Iit KapnurTransféré parsasdoc2010
- The Semiclassical Zeta Function for Geodesic Flows on Negatively Curved ManifoldsTransféré parlastname name
- hdout-chap-2-5210Transféré parpark ji hye
- Von Newman theoryTransféré parGheorgheTaran
- Intermediate Quantum MechanicsTransféré parDaljot Singh Kang
- Bounded Linear Operators on a Hilbert SpaceTransféré parhyd arnes
- 34032_GreenFuncTransféré parFrank Murillo
- greens_notes.pdfTransféré parYaj Bhattacharya
- Symmetric MatricesTransféré parRahul Oorath
- Quantum Chemistry a Concise Introduction for StudentsTransféré parchama_goz
- Apostal, Tom - Calculus, Volume 2Transféré parLeonam Cordeiro
- Self Adjoint OperatorsTransféré parErialc Ardevaas
- M.A.-M.Sc-MathsTransféré parKishan Choudhary