Vous êtes sur la page 1sur 172

Matrix

Analysis
Matrix Analysis
for Scientists
for
Scientists &
& Engineers
Engineers

This
page intentionally
intentionally left
left blank
blank
This page

Matrix
Matrix Analysis
Analysis
for Scientists
Engineers
for
Scientists &
& Engineers

Alan J.
J. Laub
Alan
Laub
University of California
Davis, California

slam.

Copyright 2005
by the
the Society
Society for
Industrial and
and Applied
Mathematics.
Copyright
2005 by
for Industrial
Applied Mathematics.
10987654321
10987654321
All
America. No
this book
All rights
rights reserved.
reserved. Printed
Printed in
in the
the United
United States
States of
of America.
No part
part of
of this
book
may be
be reproduced,
reproduced, stored,
stored, or
or transmitted
transmitted in
in any
any manner
manner without
the written
may
without the
written permission
permission
of the
publisher. For
For information,
information, write
to the
the Society
Society for
Industrial and
Applied
of
the publisher.
write to
for Industrial
and Applied
Mathematics,
Mathematics, 3600
3600 University
University City
City Science
Science Center,
Center, Philadelphia,
Philadelphia, PA
PA 19104-2688.
19104-2688.

MATLAB is
is a
a registered
registered trademark
trademark of
The MathWorks,
MathWorks, Inc.
Inc. For
For MATLAB
MATLAB product
product information,
information,
MATLAB
of The
please
contact The
Apple Hill
01760-2098 USA,
USA,
please contact
The MathWorks,
MathWorks, Inc.,
Inc., 3
3 Apple
Hill Drive,
Drive, Natick,
Natick, MA
MA 01760-2098
508-647-7000, Fax:
Fax: 508-647-7101,
508-647-7101, info@mathworks.com,
www.mathworks.com
508-647-7000,
info@mathworks.com, wwwmathworks.com
Mathematica is
is a
a registered
registered trademark
trademark of
of Wolfram
Wolfram Research,
Research, Inc.
Mathematica
Inc.
Mathcad is
is a
a registered
registered trademark
of Mathsoft
Mathsoft Engineering
Engineering &
& Education,
Education, Inc.
Mathcad
trademark of
Inc.
Library of
of Congress
Congress Cataloging-in-Publication
Cataloging-in-Publication Data
Data
Library
Laub, Alan
J., 19481948Laub,
Alan J.,
Matrix analysis
scientists and
and engineers
engineers // Alan
Matrix
analysis for
for scientists
Alan J.
J. Laub.
Laub.
p. cm.
cm.
p.
Includes bibliographical
bibliographical references
references and
and index.
Includes
index.
ISBN 0-89871-576-8
0-89871-576-8 (pbk.)
(pbk.)
ISBN
1. Matrices.
Matrices. 2.
2. Mathematical
Mathematical analysis.
analysis. I.I. Title.
Title.
1.
QA188138
2005
QA
188.L38 2005
512.9'434dc22
512.9'434-dc22

2004059962
2004059962

About
the cover:
cover: The
The original
original artwork
artwork featured
on the
cover was
created by
by freelance
About the
featured on
the cover
was created
freelance
permission .
artist
Aaron Tallon
artist Aaron
Tallon of
of Philadelphia,
Philadelphia, PA.
PA. Used
Used by
by permission.

slam

5.lam...

is a
a registered
registered trademark.
is
trademark.

To
To my
my wife,
wife, Beverley
Beverley
(who captivated
captivated me in the UBC
UBC math library
nearly
forty years ago)
nearly forty

This
page intentionally
intentionally left
left blank
blank
This page

Contents
Contents
Preface
Preface

xi
xi

11

Introduction
Introduction and
and Review
Review
1.1
Notation and
1.1 Some
Some Notation
and Terminology
Terminology
1.2 Matrix
Matrix Arithmetic
1.2
Arithmetic . . . . . . . .
1.3 Inner
Inner Products
and Orthogonality
1.3
Products and
Orthogonality .
1.4
Determinants
1.4 Determinants

11
11
33
4
44

2
2

Vector
Vector Spaces
Spaces
2.1 Definitions
Examples .
2.1
Definitions and
and Examples
2.2 Subspaces.........
2.2
Subspaces
2.3
2.3 Linear
Linear Independence
Independence . . .
2.4 Sums
and Intersections
Intersections of
2.4
Sums and
of Subspaces
Subspaces

77
77
99
10
10
13
13

33

Linear
Linear Transformations
Transformations
3.1 Definition
Definition and
Examples . . . . . . . . . . . . .
3.1
and Examples
3.2
Matrix Representation
of Linear
3.2 Matrix
Representation of
Linear Transformations
Transformations
3.3 Composition
Transformations . .
3.3
Composition of
of Transformations
3.4 Structure
of Linear
Linear Transformations
Transformations
3.4
Structure of
3.5
3.5 Four
Four Fundamental
Fundamental Subspaces
Subspaces . . . .

17
17
17
17
18
18
19
19
20
20
22
22

4
4

Introduction
Introduction to
to the
the Moore-Penrose
Moore-Penrose Pseudoinverse
Pseudoinverse
4.1
Definitions
and Characterizations
Characterizations.
4.1
Definitions and
4.2
Examples..........
4.2 Examples
4.3
Properties and
and Applications
Applications . . . .
4.3 Properties

29
29
30
30
31
31

55

Introduction
Introduction to
to the
the Singular
Singular Value
Value Decomposition
Decomposition
5.1
5.1 The
The Fundamental
Fundamental Theorem
Theorem . . .
5.2 Some
Basic Properties
Properties . . . . .
5.2
Some Basic
5.3
Row and Column
Compressions
5.3 Rowand
Column Compressions

35
35
35
35
38
40

6
6

Linear
Linear Equations
Equations
6.1 Vector
Vector Linear
Linear Equations
Equations . . . . . . . . .
6.1
6.2 Matrix
Linear Equations
Equations . . . . . . . .
6.2
Matrix Linear
6.3
6.3 A
A More
More General
General Matrix
Matrix Linear
Linear Equation
Equation
6.4 Some
Useful and
and Interesting
Inverses.
6.4
Some Useful
Interesting Inverses

43
43
43
43

vii

44
47
47
47
47

viii
viii

Contents
Contents

Projections, Inner Product Spaces, and Norms


7.1
Projections . . . . . . . . . . . . . . . . . . . . . .
7.1
Projections
7.1.1
The
fundamental orthogonal
orthogonal projections
projections
7.1.1
The four
four fundamental
7.2
Inner Product
Product Spaces
Spaces
7.2 Inner
7.3
7.3 Vector
Vector Norms
Norms
7.4
Matrix Norms
Norms . . . .
7.4 Matrix

51
51
51
51
52
52
54
54
57
57
59
59

Linear Least Squares Problems


8.1
Linear Least
Least Squares
Problem . . . . . . . . . . . . . .
8.1 The
The Linear
Squares Problem
8.2
8.2 Geometric
Geometric Solution
Solution . . . . . . . . . . . . . . . . . . . . . .
8.3
Linear Regression
Regression and
and Other
8.3 Linear
Other Linear
Linear Least
Least Squares
Squares Problems
Problems
8.3.1
Linear regression
8.3.1 Example:
Example: Linear
regression . . . . . . .
8.3.2
problems . . . . . . .
8.3.2 Other
Other least
least squares
squares problems
8.4
Least Squares
8.4 Least
Squares and
and Singular
Singular Value
Value Decomposition
Decomposition
8.5 Least
Squares and
QR Factorization
Factorization . . . . . . .
8.5
Least Squares
and QR

65
65
65
65
67
67
67
67
67
67
69
70
70
71
71

Eigenvalues and Eigenvectors


9.1
Fundamental Definitions
Definitions and
Properties
9.1
Fundamental
and Properties
9.2 Jordan
Jordan Canonical
Canonical Form
Form . . . . .
9.2
the JCF
9.3
Determination of
9.3 Determination
of the
JCF . . . . .
9.3.1
Theoretical
computation .
9.3.1
Theoretical computation
l's in
in JCF
blocks
9.3.2 On
the +
9.3.2
On the
+1's
JCF blocks
9.4 Geometric
Aspects of
JCF
of the
the JCF
9.4
Geometric Aspects
9.5 The
The Matrix
Sign Function
Function.
9.5
Matrix Sign

75
75
75
82
82
85
85
86
86
88
88
89
89
91
91

10 Canonical Forms
10.1
Basic Canonical
10.1 Some
Some Basic
Canonical Forms
Forms .
10.2 Definite
10.2
Definite Matrices
Matrices . . . . . . .
10.3
Equivalence Transformations
Transformations and
10.3 Equivalence
and Congruence
Congruence
10.3.1
matrices and
10.3.1 Block
Block matrices
and definiteness
definiteness
10.4
Rational Canonical
10.4 Rational
Canonical Form
Form . . . . . . . . .

95
95

95
95
99
102
102
104
104
104
104

11 Linear
Differential and
and Difference
Difference Equations
Equations
11
Linear Differential
11.1 Differential
ILl
Differential Equations
Equations . . . . . . . . . . . . . . . .
11.1.1
matrix exponential
11.1.1 Properties
Properties ofthe
of the matrix
exponential . . . .
11.1.2
11.1.2 Homogeneous
Homogeneous linear
linear differential
differential equations
equations
11.1.3
11.1.3 Inhomogeneous
Inhomogeneous linear
linear differential
differential equations
equations
11.1.4
Linear matrix
differential equations
11.1.4 Linear
matrix differential
equations . .
11.1.5
decompositions . . . . . . . . .
11.1.5 Modal
Modal decompositions
matrix exponential
11.1.6
11.1.6 Computation
Computation of
of the
the matrix
exponential
11.2 Difference
Equations . . . . . . . . . . . . . .
11.2
Difference Equations
11.2.1
linear difference
difference equations
11.2.1 Homogeneous
Homogeneous linear
equations
11.2.2
Inhomogeneous
linear
difference
equations
11.2.2 Inhomogeneous linear difference equations
11.2.3
powers .
11.2.3 Computation
Computation of
of matrix
matrix powers
Equations. . . . . . . . . . . . . . .
11.3
Higher-Order Equations
11.3 Higher-Order

109
109
109
109
109
109
112
112
112
112
113
113
114
114
114
114
118
118
118
118
118
118
119
119
120
120

Contents
Contents

ix
ix

12
Generalized Eigenvalue
Eigenvalue Problems
Problems
12 Generalized
12.1
The
Generalized
EigenvaluelEigenvector
12.1 The Generalized Eigenvalue/Eigenvector Problem
Problem
12.2
Forms . . . . . . . . . . . . . . . . .
12.2 Canonical
Canonical Forms
12.3
Application to
to the
the Computation
of System
Zeros .
12.3 Application
Computation of
System Zeros
12.4
Generalized Eigenvalue
Eigenvalue Problems
12.4 Symmetric
Symmetric Generalized
Problems .
12.5 Simultaneous
Simultaneous Diagonalization
12.5
Diagonalization . . . . . . . . .
12.5.1 Simultaneous
Simultaneous diagonalization
12.5.1
diagonalization via
via SVD
SVD
12.6 Higher-Order
Higher-Order Eigenvalue
Problems ..
12.6
Eigenvalue Problems
12.6.1 Conversion
Conversion to
first-order form
form
12.6.1
to first-order

125
125
125
127
127
130
131
131
133
133
133
135
135
135

13 Kronecker
13
Kronecker Products
Products
13.1 Definition
and Examples
Examples . . . . . . . . . . . . .
13.1
Definition and
13.2 Properties
Properties of
of the
the Kronecker
Kronecker Product
Product . . . . . . .
13.2
13.3
Application to
to Sylvester
and Lyapunov
Lyapunov Equations
Equations
13.3 Application
Sylvester and

139
139
139
139
140
144
144

Bibliography
Bibliography

151

Index
Index

153

This
page intentionally
intentionally left
left blank
blank
This page

Preface
Preface
This
intended to
for beginning
(or even
even senior-level)
This book
book is
is intended
to be
be used
used as
as aa text
text for
beginning graduate-level
graduate-level (or
senior-level)
students in
the sciences,
sciences, mathematics,
computer science,
science, or
students
in engineering,
engineering, the
mathematics, computer
or computational
computational
science who wish to be familar with enough
prepared to
science
enough matrix analysis
analysis that they
they are
are prepared
to use its
tools and
ideas comfortably
in aa variety
variety of
applications. By
By matrix
matrix analysis
analysis II mean
mean linear
tools
and ideas
comfortably in
of applications.
linear
algebra and
and matrix
application to
algebra
matrix theory
theory together
together with
with their
their intrinsic
intrinsic interaction
interaction with
with and
and application
to
linear
linear differential
text
linear dynamical
dynamical systems
systems (systems
(systems of
of linear
differential or
or difference
difference equations).
equations). The
The text
can
be used
used in
one-quarter or
or one-semester
one-semester course
course to
to provide
provide aa compact
compact overview
of
can be
in aa one-quarter
overview of
much
important and
and useful
useful mathematics
mathematics that,
that, in
many cases,
cases, students
meant to
to learn
learn
much of
of the
the important
in many
students meant
thoroughly
somehow didn't
manage to
topics
thoroughly as
as undergraduates,
undergraduates, but
but somehow
didn't quite
quite manage
to do.
do. Certain
Certain topics
that may
may have
have been
been treated
treated cursorily
cursorily in
in undergraduate
undergraduate courses
courses are
treated in
more depth
that
are treated
in more
depth
and more
more advanced
is introduced.
only the
and
advanced material
material is
introduced. II have
have tried
tried throughout
throughout to
to emphasize
emphasize only
the
more important and "useful" tools, methods, and mathematical structures. Instructors are
encouraged
to supplement
the book
book with
with specific
specific application
from their
their own
own
encouraged to
supplement the
application examples
examples from
particular
area.
particular subject
subject area.
The
choice of
algebra and
and matrix
matrix theory
theory is
is motivated
motivated both
both by
by
The choice
of topics
topics covered
covered in
in linear
linear algebra
applications and
computational utility
relevance. The
The concept
of matrix
applications
and by
by computational
utility and
and relevance.
concept of
matrix factorization
factorization
is
is emphasized
emphasized throughout
throughout to
to provide
provide aa foundation
foundation for
for aa later
later course
course in
in numerical
numerical linear
linear
algebra.
are stressed
than abstract
vector spaces,
spaces, although
although Chapters
and 3
3
algebra. Matrices
Matrices are
stressed more
more than
abstract vector
Chapters 22 and
do cover
cover some
geometric (i.e.,
subspace) aspects
aspects of
fundamental
do
some geometric
(i.e., basis-free
basis-free or
or subspace)
of many
many of
of the
the fundamental
notions. The books by Meyer [18], Noble and Daniel [20], Ortega
Ortega [21], and Strang [24]
are
excellent companion
companion texts
for this
book. Upon
course based
based on
on this
this
are excellent
texts for
this book.
Upon completion
completion of
of aa course
text,
the student
is then
then well-equipped
to pursue,
pursue, either
via formal
formal courses
through selftext, the
student is
well-equipped to
either via
courses or
or through
selfstudy, follow-on topics on the computational side (at the level of [7], [II],
[11], [23], or [25], for
example) or
or on
on the
side (at
level of
[12], [13],
[13], or
[16], for
example).
of [12],
or [16],
for example).
example)
the theoretical
theoretical side
(at the
the level
essentially just an understanding
Prerequisites for
for using this
this text are quite modest: essentially
understanding
of
and definitely
some previous
previous exposure
to matrices
matrices and
linear algebra.
Basic
of calculus
calculus and
definitely some
exposure to
and linear
algebra. Basic
concepts such
such as
determinants, singularity
singularity of
eigenvalues and
concepts
as determinants,
of matrices,
matrices, eigenvalues
and eigenvectors,
eigenvectors, and
and
positive definite matrices
matrices should have been covered at least
least once, even though their recollection
may occasionally
occasionally be
be "hazy."
However, requiring
requiring such
material as
as prerequisite
prerequisite permits
tion may
"hazy." However,
such material
permits
the early
"out-of-order" by
standards) introduction
of topics
the
early (but
(but "out-of-order"
by conventional
conventional standards)
introduction of
topics such
such as
as pseupseudoinverses and
and the
singular value
decomposition (SVD).
tools
doinverses
the singular
value decomposition
(SVD). These
These powerful
powerful and
and versatile
versatile tools
can
can then be exploited
exploited to
to provide a unifying foundation
foundation upon which to base subsequent
subsequent toptopics.
Because tools
tools such
the SVD
are not
not generally
generally amenable
to "hand
"hand computation,"
computation," this
this
ics. Because
such as
as the
SVD are
amenable to
approach necessarily
availability of
of appropriate
mathematical software
software on
appropriate mathematical
on
approach
necessarily presupposes
presupposes the
the availability
aa digital
digital computer.
computer. For
For this,
this, II highly
highly recommend
recommend MAlLAB
MATLAB although
although other
other software
software such
such as
as

xi
xi

xii
xii

Preface
Preface

Mathcad is also excellent. Since this text is not intended for a course in
Mathematica or Mathcad
numerical linear algebra per
per se,
se, the details of most of the numerical aspects of linear algebra
are
deferred to
are deferred
to such
such aa course.
course.
The presentation of the material in this book is
is strongly influenced
influenced by
by computacomputational issues for two principal reasons. First, "real-life"
"real-life" problems seldom yield to simple
closed-form
closed-form formulas or solutions. They must generally be solved computationally and
it is important to know which types of algorithms can be relied upon and which cannot.
Some of
of the
numerical linear
linear algebra,
form the
Some
the key
key algorithms
algorithms of
of numerical
algebra, in
in particular,
particular, form
the foundation
foundation
virtually all of modern
modem scientific and engineering computation. A second
upon which rests virtually
motivation for a computational emphasis is that it provides many of the essential tools for
what I call "qualitative mathematics."
mathematics." For example, in an elementary linear algebra course,
a set of vectors is either linearly independent or it is not. This is an absolutely fundamental
fundamental
concept. But in most engineering or scientific contexts we want to know more than that.
If
linearly independent,
independent, how "nearly dependent" are the vectors? If
If a set of vectors is linearly
If they
are linearly dependent, are there "best" linearly independent subsets? These tum
turn out to
be
more difficult
difficult problems
frequently involve
involve research-level
research-level questions
questions when
be much
much more
problems and
and frequently
when set
set
in the context of
of the finite-precision, finite-range floating-point arithmetic environment of
of
most modem
modern computing platforms.
Some of
of the
the applications
applications of
of matrix
matrix analysis
analysis mentioned
mentioned briefly
briefly in
in this
this book
book derive
modem state-space
from the modern
state-space approach to dynamical systems. State-space
State-space methods are
modem engineering where, for example, control systems with
now standard
standard in much of modern
large numbers
numbers of interacting inputs, outputs, and states often give rise to models
models of very
high order that must be analyzed, simulated, and evaluated. The "language" in which such
described involves vectors and matrices. It is thus crucial to acquire
models are conveniently described
knowledge of the vocabulary
vocabulary and grammar of this language. The tools of matrix
a working knowledge
analysis are also applied
applied on a daily basis to problems in biology, chemistry, econometrics,
physics, statistics, and a wide variety of other fields, and thus the text can serve a rather
diverse audience.
audience. Mastery of the material in this text should enable the student to read and
diverse
understand the modern
modem language of matrices used throughout mathematics, science, and
engineering.
prerequisites for this text are modest, and while most material is developed
developed from
While prerequisites
basic ideas in the book, the student does require a certain amount of what is conventionally
referred to as "mathematical maturity." Proofs
Proofs are given for many theorems. When they are
referred
not
given explicitly,
obvious or
or easily
easily found
found in
literature. This
This is
is ideal
ideal
not given
explicitly, they
they are
are either
either obvious
in the
the literature.
material from which to learn a bit about mathematical proofs and the mathematical maturity
and insight gained thereby. It is my firm conviction
conviction that such maturity is neither
neither encouraged
nor nurtured by relegating the mathematical aspects of applications (for example, linear
algebra for elementary state-space theory) to
introducing it "on-the-f1y"
"on-the-fly" when
algebra
to an appendix or introducing
foundation upon
necessary. Rather,
Rather, one must
must lay
lay a firm
firm foundation
upon which
which subsequent applications and
and
perspectives can be built in a logical, consistent, and coherent fashion.
perspectives
I have taught this material for many years, many times at UCSB and twice at UC
Davis,
course has
successful at
enabling students
students from
from
Davis, and
and the
the course
has proven
proven to
to be
be remarkably
remarkably successful
at enabling
disparate backgrounds to acquire a quite acceptable
acceptable level of mathematical maturity and
graduate studies in a variety of disciplines. Indeed, many students who
rigor for subsequent graduate
completed the course, especially
especially the first few times it was offered,
offered, remarked afterward that
completed
if only they had had this course before they took linear systems, or signal processing.
processing,
if

Preface
Preface

xiii
XIII

or estimation theory, etc., they would have been able to concentrate on the new ideas
deficiencies in their
they wanted to learn, rather than having to spend time making up for deficiencies
background in matrices and linear algebra. My fellow instructors, too, realized that by
background
requiring this course as a prerequisite, they no longer had to provide as much time for
"review" and could focus instead on the subject at hand. The concept seems to work.

-AJL,
AJL, June 2004

This
page intentionally
intentionally left
left blank
blank
This page

Chapter 1
Chapter
1

Introduction and
and Review
Introduction
Review

1.1
1.1

Some Notation
Notation and
and Terminology
Terminology
Some

We
begin with
with aa brief
brief introduction
notation and
used
We begin
introduction to
to some
some standard
standard notation
and terminology
terminology to
to be
be used
throughout the
text. This
This is
review of
of some
some basic
notions in
throughout
the text.
is followed
followed by
by aa review
basic notions
in matrix
matrix analysis
analysis
and linear
linear algebra.
algebra.
and
The
The following
following sets
sets appear
appear frequently
frequently throughout
throughout subsequent
subsequent chapters:
chapters:

1.
Rnn== the
the set
set of
of n-tuples
n-tuples of
of real
real numbers
as column
column vectors.
vectors. Thus,
Thus, xx Ee Rn
I. IR
numbers represented
represented as
IR n
means
means

where Xi
xi Ee R
for ii Ee !!.
n.
IR for
where
Henceforth,
the notation!!
notation n denotes
denotes the
the set
set {I,
{1, ...
..., , nn}.
Henceforth, the
}.
Note: Vectors
Vectors are
vectors. A
vector is
where
Note:
are always
always column
column vectors.
A row
row vector
is denoted
denoted by
by y~
yT, where
yy G
E Rn
IR n and
and the
the superscript
superscript T
T is
is the
the transpose
transpose operation.
operation. That
That aa vector
vector is
is always
always aa
column vector
vector rather
rather than
row vector
vector is
entirely arbitrary,
arbitrary, but
this convention
convention makes
makes
column
than aa row
is entirely
but this
it
text that,
x TTyy is
while
it easy
easy to
to recognize
recognize immediately
immediately throughout
throughout the
the text
that, e.g.,
e.g., X
is aa scalar
scalar while
T
xy
is an
an nn xx nn matrix.
xyT is
matrix.

en

2. Cn = the
the set
set of
of n-tuples
n-tuples of
of complex
complex numbers
numbers represented
represented as
as column
column vectors.
vectors.
2.
3. IR
xn =
Rrnmxn
= the
the set
set of
of real
real (or
(or real-valued)
real-valued) m
m xx nn matrices.
matrices.

4. 1R;n
xn
Rmxnr

=
xn denotes
= the set
set of
of real
real m x n matrices of
of rank
rank r. Thus,
Thus, IR~
Rnxnn
denotes the
the set
set of
of real
real
nonsingular
matrices.
nonsingular n
n xx nn matrices.

mxn
5.
=
5. Crnxn
= the
the set
set of
of complex
complex (or
(or complex-valued)
complex-valued) m xx nn matrices.
matrices.

6. e;n
xn
Cmxn

=
n matrices
= the
the set
set of
of complex
complex m
m xx n
matrices of
of rank
rank r.
r.
1

Chapter 1.
1. Introduction
Introduction and
and Review
Review
Chapter

22

We now classify some of the more familiar "shaped" matrices. A matrix A Ee IRn xn
x
(or A
A E
enxn
) is
eC"
")is

diagonal if
if aij
a,7 == 00 for
forii i=
^ }.j.
diagonal
upper triangular
triangular if
if aij
a,; == 00 for
forii >> }.j.
upper
lower triangular
triangular if
if aij
a,7 == 00 for
for i/ << }.j.
lower
tridiagonal if
if aij
a(y =
= 00 for
for Ii|z -JI
j\ >
> 1.
tridiagonal
1.
pentadiagonal if
if aij
ai; =
= 00 for
for Ii|/ -J
j\I >> 2.
pentadiagonal
2.
upper Hessenberg
Hessenberg if
if aij
afj == 00 for
for ii - jj >> 1.
upper
1.
lower Hessenberg
Hessenberg if
if aij
a,; == 00 for
for }j -ii >> 1.
lower
1.
Each of the above also has a "block" analogue obtained by replacing scalar components in
nxn
mxn
the respective definitions
definitions by block
block submatrices.
submatrices. For
For example,
example, if
if A Ee IR
Rnxn
, , B Ee IR
R nxm
,, and
C Ee jRmxm,
Rmxm, then
then the
the (m
(m + n)
n) xx (m
(m + n)
n) matrix
matrix [~
[A0Bc
block upper
upper triangular.
triangular.
~]] isisblock
C
T
A is
AT and
is the
matrix whose
entry
The transpose of
The
of aa matrix
matrix A
is denoted
denoted by
by A
and is
the matrix
whose (i, j)th
j)th entry
7
mx
A, that is, (AT)ij
A E
jRmxn,
AT7" e
E jRnxm.
is the (j,
(7, i)th
Oth entry of A,
(A ),, = aji.
a,,. Note that if A
e R
", then A
E" xm .
If
A Ee em
If A
C mxxn,
", then its Hermitian
Hermitian transpose (or conjugate
conjugate transpose) is denoted by AHH (or
H
sometimes
A*) and
j)th entry
is (AH)ij
the bar
bar indicates
sometimes A*)
and its
its (i, j)\h
entry is
(A ), 7 =
= (aji),
(77), where
where the
indicates complex
complex
= a
IX + jf$
jfJ (j
= ii =
jfJ. A
A is
conjugation; i.e.,
i.e., if z =
(j =
= R),
v^T), then z =
= IX
a - jfi.
A matrix A
is symmetric
T
H
if
A =
A T and Hermitian
A =
A H.
We henceforth
if A
= A
Hermitian if A
= A
. We
henceforth adopt the convention that,
that, unless
otherwise noted,
an equation
equation like
= A
ATT implies
implies that
that A
is real-valued
real-valued while
while aa statement
A =
A is
statement
otherwise
noted, an
like A
H
like A
A =
AH implies that A
A is complex-valued.
= A
complex-valued.

Remark
While \/\
most commonly
commonly denoted
denoted by
in mathematics
mathematics texts,
Remark 1.1. While
R isis most
by ii in
texts, }j is
is
the
common notation
notation in
in electrical
and system
system theory.
is some
some
the more
more common
electrical engineering
engineering and
theory. There
There is
advantage to being conversant with both notations. The notation j is used throughout the
text but
but reminders
reminders are
text
are placed
placed at
at strategic
strategic locations.
locations.
Example 1.2.
1.2.
Example

1. A = [ ;

2. A

= [ 7+}

3 A -- [ 7 -5 j

is symmetric
symmetric (and
Hermitian).
] is
(and Hermitian).
7+
is complex-valued
symmetric but
Hermitian.
2 j ] is
complex-valued symmetric
but not
not Hermitian.

7+}
is Hermitian
Hermitian (but
symmetric).
2 ] is
(but not
not symmetric).

Transposes
block matrices
be defined
defined in
obvious way.
is
Transposes of
of block
matrices can
can be
in an
an obvious
way. For
For example,
example, it
it is
easy to
to see
see that
that if
if A,,
are appropriately
appropriately dimensioned
dimensioned subblocks,
subblocks, then
easy
Aij are
then

= [

1.2. Matrix Arithmetic

11.2
.2 Matrix Arithmetic
Arithmetic
It is assumed that the reader is familiar with the fundamental notions of matrix addition,
multiplication of a matrix by a scalar, and multiplication of matrices.
A special case of matrix multiplication
multiplication occurs when the second
second matrix is a column
i.e., the matrix-vector product Ax.
Ax. A very important way to view this product is
vector x, i.e.,
interpret it as a weighted
weighted sum (linear
combination) of the columns of A. That is, suppose
to interpret
(linear combination)
suppose

A =

la' .... a"1

m
JR " with a,

Then
Ax =

Xjal

JRm and x =

+ ... + Xnan

Il ;xn~

E jRm.

The importance
importance of this interpretation
interpretation cannot be overemphasized. As a numerical example,
take
= [96
take A
A =
[~ 85 74]x
~], x ==

2 . Then
can quickly
quickly calculate
dot products
rows of
[~].
Then we
we can
calculate dot
products of
of the
the rows
of A
A

column x to find Ax
Ax =
= [50[;~],
matrix-vector product
product can also be computed
with the column
32]' but this matrix-vector
computed
via
v1a

3.[ ~ J+2.[ ~ J+l.[ ~ l

For large arrays of numbers, there can be important computer-architecture-related


computer-architecture-related advantages to preferring the latter calculation method.
mxn
nxp
multiplication, suppose
A e
E R
jRmxn and
and B = [bi,...,b
[hI,.'" hpp]] e
E R
jRnxp with
For matrix multiplication,
suppose A
1
hi E
jRn.. Then the matrix product A
AB
bi
e W
B can be thought of as above, applied p times:

There is also an alternative, but equivalent, formulation of matrix multiplication that appears
frequently in the text and is presented below as a theorem. Again, its importance cannot be
overemphasized. It
It is deceptively simple and its full understanding is well rewarded.
pxn
Theorem 1.3.
[Uj, ....
Theorem
1.3. Let U
U = [MI,
. . ,, un]
un]Ee jRmxn
Rmxn with
withUiut Ee jRm
Rm and
andVV == [VI,
[v{..
,...,, Vn]
vn]Ee lRRPxn
p
jRP.
with Vi
vt eE R
. Then

UV T

LUiVr E jRmxp.
i=I

If
(C D)TT =
If matrices C and D are compatible for multiplication, recall that (CD)
= DT
DT C TT
H
H H
(or (CD}
(C D)H = DH
C H).). This gives a dual to the matrix-vector
matrix-vector result above. Namely, if
if
D C
mxn
jRmxn has
C EeR
has row
row vectors cJ
cj Ee jRlxn,
E l x ", and
and is
is premultiplied
premultiplied by
by aa row
row vector yT
yTeE jRlxm,
Rlxm,
then the product can be written as a weighted linear sum of the rows of C as follows:
follows:

yTC=YICf +"'+Ymc~

EjRlxn.

Theorem 1.3 can then also be generalized to its "row


reader.
Theorem
"row dual." The details are left
left to the readei

4
4

1.3
1.3

Chapter
Review
Chapter 1.
1. Introduction
Introduction and
and Review

Inner
Inner Products
Products and
and Orthogonality
Orthogonality

For
IRn, the
Euclidean inner
inner product
For vectors
vectors x, yy E
e R",
the Euclidean
product (or inner
inner product, for
for short)
short) of x and
is given
given by
by
yy is
n

T
(x, y) := x y = Lx;y;.
;=1

Note that
that the
inner product
product is
is aa scalar.
Note
the inner
scalar.
If
we define
complex Euclidean
inner product
product (or
(or inner
inner product,
product,
If x, y Ee <en,
C", we
define their
their complex
Euclidean inner
for short)
short) by
for
by
n

(x'Y}c :=xHy

= Lx;y;.
;=1

y)c
x}c,
Note that (x,
(x, y)
= (y,
(y, x)
i.e., the order
order in
in which
which xx and yy appear
appear in
in the complex inner
c =
c, i.e.,
product is
is important.
important. The
The more
more conventional
conventional definition
definition of
of the
the complex
inner product
product is
is
product
complex inner
((x,
x , yy)c
)c =
yHxx =
Eni=1 x;y;
xiyi but
the text
text we
with the
= yH
= L:7=1
but throughout
throughout the
we prefer
prefer the
the symmetry
symmetry with
the real
real
case.
case.

Example
1.4. Let
[1j]] and
and yy == [~].
[1/2]. Then
Then
Example 1.4.
Let xx =
= [}
(x, Y}c = [ }

JH [ ~ ] =

[I

- j] [

] = 1 - 2j

while
while

and we
see that,
indeed, (x,
(x, Y}c
y)c =
= {y,
(y, x)c'
x)c.
and
we see
that, indeed,
Note that
that xx TTxx =
= 0
0 if
if and
and only
only if
if xx =
= 00 when
when xx eE Rn
IRn but
but that
that this
this is
is not
not true
true if
ifxx eE Cn.
en.
Note
HH
What
is true
complex case
and only
if x = 0.
illustrate, consider
consider
What is
true in
in the
the complex
case is
is that
that X
x x = 00 if
if and
only if
O. To
To illustrate,
T
H
the
nonzero vector
=0
the nonzero
vector xx above.
above. Then
Then X
x TXx =
0 but
but X
x HXX =
= 2.2.
n
Two nonzero
nonzero vectors
vectors x,
x, y eE IR
to be
be orthogonal
if their
their inner
product is
is
Two
R are
are said
said to
orthogonal if
inner product
H
zero,
i.e., xxTTyy =
= 0.
if X
0. If xx and
zero, i.e.,
O. Nonzero
Nonzero complex
complex vectors
vectors are
are orthogonal
orthogonal if
x Hyy =
= O.
and yy are
are
T
T
orthogonal and
and X
x TXx =
and yyT
= 1,1, then
then we
we say
say that
that xx and
are orthonormal.
orthonormal. A
A
orthogonal
= 11 and
yy =
and yy are
nxn
T
T
nxn
matrix A eE IR
is an
orthogonal matrix
matrix if
if A
AT
AAT
=
I, where
where /I is
is the
the n
n x
x nn
matrix
R
is
an orthogonal
AA =
= AA
= /,
nx
identity matrix.
matrix. The notation /
In is sometimes
identity
sometimes used
used to denote
denote the identity matrix in
in IRRnxn
"
x
nxn
H
H
(or en xn).
A eE en
= I. Clearly
(orC"
"). Similarly,
Similarly, a matrix A
C xn is said
said to be unitary if A H A =
= AA H =
an orthogonal
orthogonal or
or unitary
unitary matrix
rows and
is
an
matrix has
has orthonormal
orthonormal rows
and orthonormal
orthonormal columns.
columns. There
There is
mxn
no special
name attached
attached to
to aa nonsquare
nonsquare matrix
matrix A
A e
E ]Rrn"n
(or
E e
))with
no
special name
R mxn (or
Cmxn
with orthonormal
orthonormal
rows
columns.
rows or
or columns.

1.4
1.4

Determinants
Determinants

It
A E
IRnnxn
xn
It is assumed
assumed that the reader is familiar with the basic theory of
of determinants.
determinants. For A
eR
nxn
(or A
A 6
E en
we use
use the
the notation
det A
A for
determinant of
of A.
A. We
We list
list below
below some
some of
of
(or
C xn)
) we
notation det
for the
the determinant

1.4. Determinants
1.4.
Determinants

properties of determinants. Note that this is


the more
more useful properties
is not aa minimal set, i.e., several
of
one
or
more
of
the
others.
properties
are
consequences
properties are consequences of one or more of the others.
1. If
If A
A has a zero row or if any two rows of A
A are equal, then det A
A =
= 0.o.

= 0.
2. If
If A
A has
has aa zero
zero column
column or
or if
if any
any two
two columns
columns of
of A
A are
are equal,
equal, then
then det
det A
A =
O.
3. Interchanging
of A
sign of
3.
Interchanging two
two rows
rows of
A changes
changes only
only the
the sign
of the
the determinant.
determinant.
4. Interchanging two columns of A changes only the sign of
of the determinant.
5.
scalar a
5. Multiplying
Multiplying aa row
row of
of A
A by
by aa scalar
ex results
results in
in aa new
new matrix
matrix whose
whose determinant
determinant is
is
a det A.
exdetA.
Multiplying a column of A
A by a scalar
6. Multiplying
scalar ex
a results in a new matrix whose determinant
determinant
is
a det
is ex
det A.
A.

7. Multiplying
of A
scalar and
and then
then adding
adding it
it to
7.
Multiplying aa row
row of
A by
by aa scalar
to another
another row
row does
does not
not change
change
the
the determinant.
determinant.
8. Multiplying aa column
8.
column of
of A by a scalar
scalar and then adding it to another column
column does
does not
change the
the determinant.
change
determinant.
nxn
9. det
detAT
= det
detA
= detA
A eE C
C"X").
AT =
A (detA
(det AHH =
det A if A
).

10.
If A is diagonal,
diagonal, then det A =
=a11a22
alla22 ...
10. If
ann,
ann, i.e.,
i.e., det
det AA isis the
the product
product of
of its
its diagonal
diagonal
elements.
a22 ...
11.
11. If
If A is upper triangular, then det
det A =
= all
a11a22
a"n.
ann.

12. If
triangular, then
= a11a22
ann.
ann.
12.
If A
A is
is lower
lower triangUlar,
then det
det A
A=
alla22 ...
13.
A is block
block diagonal
block upper triangular or block lower triangular), with
13. If A
diagonal (or
(or block
A 11, A22,
A 22 , ...
An"
A ==
square diagonal blocks A11,
,, A
(of possibly different
different sizes), then det A
nn (of
det
A 11 det
det A22
A22 ...
det Ann.
det A11
det
Ann.
xn
14. If
eRIRnnxn
,thendet(AB)
= det
5.
14.
If A,
A, B
B E
, then det(AB) =
det A
A det
det B.
1
15. If
If A
Rnxn, then
=1det
15.
A
E lR~xn,
then det(Adet(A- 1)) =
de: AA.
.
nxn
xm
mxm
16.
A eE R
lR~xn
and D
DE
IR m
detA
det(D - CA
CA-l 1 B).
B).
16. If
If A
and
eR
,, then det
det [~
[Ac B~]
A det(D
D] = del
Proof:
from the
LU factorization
Proof" This
This follows
follows easily
easily from
the block
block LU
factorization

[~ ~J=[

~ ][ ~

xn
mxm
17. If
If A
and D
D eE RM
, then
then det
det [~
[Ac B~]
BD
11C
).
17.
A Ee R
IRnnxn
and
lR~xm,
det D
D det(A
det(A - B
DC).
D] = det
Proof" This follows easily from the block UL factorization
Proof:

BD- 1
I

][

Chapter 1.
1. Introduction
Introduction and
and Review
Chapter
Review

6
6

Remark
1.5. The
factorization of
of aa matrix
into the
of aa unit
lower triangular
Remark 1.5.
The factorization
matrix A
A into
the product
product of
unit lower
triangular
matrix L
L (i.e., lower triangular with all l's
1's on the diagonal) and an
an upper triangular matrix
V
U is
is called an
an LV
LU factorization;
factorization; see,
see, for example,
example, [24].
[24]. Another
Another such
such factorization
factorization is
is VL
UL
where V
U is unit upper triangular and L is lower triangular.
triangular. The factorizations used above
are block analogues of these.
Remark
[~ BD].
~ ].
Remark 1.6. The matrix D - e
C A -I1 BB is called the Schur complement of A in[AC

D l C is the Schur complement of


in [~
[AC B~D ].
Similarly, A - B
BD-Ie
of D
Din

EXERCISES
EXERCISES
1. If A eE jRnxn
a is a scalar, what is det(aA)? What is det(A)?
det(-A)?
Rnxn and or
A is orthogonal, what is det A?
A? If
A is unitary,
unitary, what is det A?
A?
2. If
If A
If A

3. Let
Letx,y
jRn. Show
Showthatdet(l-xyT)
x, y eE Rn.
that det(I xyT) = 11 yTx.
yTx.
4. Let U1,
VI, V2,
E jRn
xn be orthogonal matrices. Show that the product V
U2, ...
. . .,,Vk
Uk
Rnxn
U =
=
VI
U1 V2
U2 ...
V
Ukk is
is an
an orthogonal matrix.
5. Let A
A E
of A, denoted
denoted TrA,
Tr A, is defined as the sum of its diagonal
e jRNxn.
R n x n . The trace of
aii.
elements,
Eni=1 au
elements, i.e.,
i.e., TrA
TrA =
= L~=I
linear function; i.e., if A, B eE JRn
xn and a, ft
f3 eE R,
JR, then
(a) Show that the trace is a linear
Rnxn
Tr(aA + f3B)
fiB)=
+ fiTrB.
Tr(aA
= aTrA
aTrA +
f3TrB.
(b) Show that Tr(AB)
= Tr(BA),
AB i=
BA.
Tr(Afl) =
Tr(A), even though in general AB
^ B
A.
nxn
(c) Let S
E R
jRnxn be skew-symmetric,
skew-symmetric, i.e., S
STT =
= -So
TrS = 0.
O. Then
-S. Show that TrS
either prove the converse or provide a counterexample.
x
6. A matrix A
A eE W
jRnxn
A22 = A.
" is said to be idempotent if A
22
/ x
.
,
,
2cos<9
0
(a) Show that the matrix A
_..
A =
= --2!I [T|_ 2cos
.
2f)
sin 2^
sm
0

J. .

sin 20 1 . .d_,
..lor all
II
_sin. 20
is idempotent
for
2sin
aII #.
o.
r
2z 0
2sm2rt
# J IS I empotent

X
(b) Suppose
A eE IR"
jRn xn"isisidempotent
Suppose A
idempotentand
andAAi=^ I.I. Show
Showthat
thatAAmust
mustbe
besingular.
singular.

Chapter 2

Vector
Vector Spaces
Spaces

In
this chapter
of some
the basic
of vector
In this
chapter we
we give
give aa brief
brief review
review of
some of
of the
basic concepts
concepts of
vector spaces.
spaces. The
The
emphasis
is on
vector spaces,
by special
emphasis is
on finite-dimensional
finite-dimensional vector
spaces, including
including spaces
spaces formed
formed by
special classes
classes
of
matrices, but
but some
cited. An
An excellent
of matrices,
some infinite-dimensional
infinite-dimensional examples
examples are
are also
also cited.
excellent reference
reference
for
the next
next chapter
chapter is
where some
proofs that
that are
are not
not given
given here
here may
may
for this
this and
and the
is [10],
[10], where
some of
of the
the proofs
be
found.
be found.

2.1

Definitions and Examples

Definition
A field
field is
set F
IF together
together with
operations +,
IF >
~ IF
such that
that
Definition 2.1.
2.1. A
is aa set
with two
two operations
+, . : IF
F xx F
F such

(Al) a
(P + y)
y) =
= (a
(a +,8)
+ p ) + yy ffor
o r all
all a,,8,
a, ft, yy Elf.
F.
(Al)
a + (,8
(A2)
element 0
IF such
such that
0=
for all
all a
a Ee F.
IF.
(A2) there
there exists
exists an
an element
0 Ee F
that aa + 0
= aa. for
(A3)
for all
IF, there
element (-a)
IF such
a + (-a)
O.
(A3) for
all aa eE F,
there exists
exists an
an element
(a) eE F
such that
that a
(a) = 0.
(A4) a
a+
= ft
,8 + afar
a for all
all a,
a, ,8
Elf.
(A4)
+ ,8
p=
ft e
F.

(Ml) aa- ((,8,


p - yy)) = (a,8)
( a - p ) - yyf for
o r all
all a,,8,
a, p, yy Elf.
e F.
(Ml)
(M2)
IF such that a . II =
for all aa Ee F.
IF.
(M2) there exists an element I1 Ee F
= a for
(M3)
IF, a f.
IF such that a . a-I
1.
(M3) for all a Ee ,
^ 0,
0, there exists an element a-I
a"1 E F
a~l == 1.
(M4)
(M4) aa,8
p =,8
= P . afar
a for all
all a,
a, ,8
p Ee IF.
F.
(D)
(D)

= a,8 +a yy for
for all
a, ,8,
y Elf.
aa- ((,8
p + y)
y)=ci-p+aalia,
p,ye.

Axioms (Al)-(A3)
that (IF,
+) is
is aa group
an abelian
if (A4)
also holds.
Axioms
(A1)-(A3) state
state that
(F, +)
group and
and an
abelian group
group if
(A4) also
holds.
to), .)) isis an
Axioms
Axioms (MI)-(M4)
(M1)-(M4) state
state that
that (IF
(F \\ {0},
an abelian
abelian group.
group.
Generally speaking,
speaking, when
when no
no confusion
confusion can
can arise,
arise, the
the multiplication
multiplication operator
operator "."
is
Generally
"" is
not
explicitly.
not written
written explicitly.
7

Chapter 2. Vector Spaces

Example 2.2.
2.2.
Example
1.
addition and
is aa field.
IR with
with ordinary
ordinary addition
and multiplication
multiplication is
field.
I. R
2.
C with
complex addition
multiplication is
2. e
with ordinary
ordinary complex
addition and
and multiplication
is aa field.
field.
3.
Raf.r] =
= the
field of
3. Ra[x]
the field
of rational
rational functions
functions in
in the
the indeterminate
indeterminate xx
=

{ao+

f30 +

atX
f3t X

+ ... + apxP
+ ... + f3qXq

:aj,f3i EIR ;P,qEZ

+} ,

where
= {O,l,2,
{0,1,2, ...
...},}, is
where Z+
Z+ =
is aa field.
field.
mxn
IR~ xn =
4.
4.RMr
= {m
m xx nn matrices
matrices of
of rank
rank rr with
with real
real coefficients}
coefficients) is
is clearly
clearly not
not aa field
field since,
since,
x
for
(Ml) does
m=
= n.
n. Moreover,
" is
is not
not aa field
for example,
example, (MI)
does not
not hold
hold unless
unless m
Moreover, R"
lR~xn
field either
either
since (M4)
(M4) does
does not
not hold
(although the
other 88 axioms
hold).
since
hold in
in general
general (although
the other
axioms hold).

Definition
vector space
V together
operations
Definition 2.3.
2.3. A
A vector
space over
over a
a field
field F
IF is
is a
a set
set V
together with
with two
two operations
-^VV and
and- :: IFF xxV
-VV such
such that
that
+ ::VV xx VV -+
V -+
(VI)
group.
(VI) (V,
(V, +)
+) is
is an
an abelian
abelian group.
all a,
a, f3
E F
IF and
andfor
all vv E
(V2)
(V2) (a
( a - pf3)) -. vv =
= aa - .( (f3
P ' V. v)
) f ofor
r all
p e
for all
e V.
V.

(V3)
(a + f3).
ft) vv == a
a vv +
+ pf3. vv for
F and
for all vv e
(V3) (a
for all
all a,
a, p
f3
Elf
andforall
E V.
V.
(V4) a
a-(v
w)=a-v
w for
all aa eElF
F and
for all
v, w
w Ee V.
(V4)
(v + w)
= a . v + aa .w
for all
andfor
all v,
V.
for all
all vv E
(V5)
(V5) I
1 vv =
= vv for
eV
V (1
(1 eElf).
F).
A vector
vector space
space is
is denoted
denoted by
by (V,
(V, F)
IF) or,
or, when
when there
there is
is no
no possibility
possibility of
of confusion
confusion as
as to
to the
the
A
underlying
Id, simply
V.
underlying fie
field,
simply by
by V.

Remark
2.4. Note
that +
+ and
from the
+ and
and . in
Definition
Remark 2.4.
Note that
and in
in Definition
Definition 2.3
2.3 are
are different
different from
the +
in Definition
2.1 in
on different
different objects
in different
different sets.
In practice,
practice, this
this causes
causes
2.1
in the
the sense
sense of
of operating
operating on
objects in
sets. In
no
confusion and
the operator
operator is
even written
is usually
usually not
not even
written explicitly.
explicitly.
no confusion
and the
Example
2.5.
Example 2.5.
1.
(R", R)
IR) with
with addition
addition defined
defined by
by
I. (IRn,

and scalar
multiplication defined
by
and
scalar multiplication
defined by

(en, e).
is
vector space.
Similar definitions
definitions hold
hold for
for (C",
is aa vector
space. Similar
C).

2.2. Subspaces
2.2.
Subspaces

99

JR) is
vector space
with addition
addition defined
defined by
by
2. (JRmxn,
(E mxn , E)
is aa vector
space with
2.

A+B=

[ ."
P"
a21 +
+ fJ2I
.

amI

+ fJml

al2
a22

+ fJI2
+ fJ22

aln + fJln
a2n + fJ2n

am2 + fJm2

a mn

and scalar
scalar multiplication
and
multiplication defined
defined by
by
[ ya"
y a 21
yA =

y a l2
y a 22

yaml

yam 2

ya,"
ya2n

+ fJmn

yamn

3.
be an
vector space
be an
be the
3. Let
Let (V,
(V, IF)
F) be
an arbitrary
arbitrary vector
space and
and '0
V be
an arbitrary
arbitrary set.
set. Let
Let cf>('O,
O(X>, V)
V) be
the
set of
of functions
functions f/ mapping
D to
V. Then
Then cf>('O,
O(D, V)
V) is
is aa vector
space with
with addition
addition
set
mapping '0
to V.
vector space
defined
defined by
by
(f

+ g)(d) =

fed)

+ g(d)

for all d E '0 and for all f, g E cf>

and
multiplication defined
by
and scalar
scalar multiplication
defined by
(af)(d) = af(d) for all a E IF, for all d ED, and for all f E cf>.
Special
Special Cases:
Cases:
n
(a) '0
V =
= [to,
[to, td,
t\], (V,
(V, IF)
F) =
= (JR
(IR",
E), and
and the
functions are
are piecewise
(a)
the functions
piecewise continuous
continuous
, JR),
n
n
=:
(C[to,
td)n.
=: (PC[to,
(PC[f0, td)n
t\]) or continuous
continuous =:
=: (C[?
,
h])
.
0

(b) '0

= [to, +00),

(V, IF)

= (JRn, JR), etc.

A E
Ax(t)} is
vector space
4. Let
4.
Let A
JR(nxn.
R"x". Then
Then {x(t)
(x(t) :: x(t)
x(t) =
= Ax(t}}
is aa vector
space (of
(of dimension
dimension n).
n).

2.2
2.2

Subspaces
Subspaces

Definition 2.6.
2.6. Let
(V, IF)
F) be
be aa vector
vector space
space and
and let
let W
W c~ V,
V, W
W f=
= 0.
0. Then
Then (W,
(W, IF)
F) is
Definition
Let (V,
is aa
subspace
is itself
space or,
subspace of
of (V,
(V, IF)
F) if
if and
and only
only ifif (W,
(W, IF)
F) is
itself aa vector
vector space
or, equivalently,
equivalently, if
if and
and only
only
i f ( a w 1 + fJw2)
W2) eE W for
all a,
a, fJ eE IF
andforall
and for all WI,
w1, W2
w2 Ee W.
if(awl
foral!
Remark 2.7.
2.7. The
The latter
latter characterization
characterization of
of aa subspace
subspace is
is often
often the
the easiest
easiest way
way to
to check
check
Remark
or
that something
in
or prove
prove that
something is
is indeed
indeed aa subspace
subspace (or
(or vector
vector space);
space); i.e.,
i.e., verify
verify that
that the
the set
set in
question
Note, too,
too, that
this
question is
is closed
closed under
under addition
addition and
and scalar
scalar multiplication.
multiplication. Note,
that since
since 00 Ee IF,
F, this
implies that
the zero
vector must
be in
in any
any subspace.
subspace.
implies
that the
zero vector
must be
Notation:
When the
the underlying
underlying field
field is
understood, we
we write
write W
W c~ V,
the symbol
Notation: When
is understood,
V, and
and the
symbol ~,
c,
when
with vector
vector spaces,
spaces, is
is henceforth
henceforth understood
to mean
mean "is
"is aa subspace
subspace of."
of." The
The
when used
used with
understood to
less restrictive
restrictive meaning
meaning "is
"is aa subset
subset of'
of" is
is specifically
specifically flagged
flagged as
as such.
such.
less

10

Chapter 2. Vector Spaces

Example 2.S.
Example
2.8.
x
1.
(V,lF)
and let W =
{A e
E R"
JR.nxn
A is symmetric}. Then
1. Consider (V,
F) =
= (JR.nxn,JR.)
(R" X ",R) and
= [A
" :: A

We
V.
W~V.

Proof:
symmetric. Then
easily shown
shown that
+ f3A2
fiAi is
Proof' Suppose
Suppose A\,
AI, A
A22 are
are symmetric.
Then it
it is
is easily
that ctA\
aAI +
is
symmetric for
for all
all a,
a, f3
R
symmetric
ft eE R.
x
]Rnxn
not a subspace of JR.nxn.
2. Let W = {A E R"
" :: A is orthogonal}. Then W is /wf
R"x".
2
2
3.
(V, F)
= (R
= [v1v2
identify v1
3. Consider
Consider (V,
IF) =
(]R2,, R)
JR.) and
and for
for each
each vv E R
]R2 of
of the
the form
form vv =
[~~ ]] identify
VI with
with
with the
the y-coordinate.
y-coordinate. For
For a, f3
R define
the jc-coordinate
x-coordinate in
in the
the plane
plane and
and V2
the
u2 with
eE R,
define

W",/l =

{V : v =

[ ac

f3 ]

JR.} .

Then
Wa, is
V if
and only
= 0.
interesting exercise,
Then W",/l
is aa subspace
subspace of
of V
if and
only if
if f3 =
O. As
As an
an interesting
exercise, sketch
sketch
W2,1,
W2,o,W1/2,1,
andW1/2,
too, that
that the
the vertical
vertical line
line through
through the
the origin
origin (i.e.,
(i.e.,
W2.I, W2,O,
Wi,I' and
Wi,o,
Note, too,
0. Note,
a == 00)
a
oo) is
is also
also aa subspace.
subspace.
All
origin are
Shifted subspaces
Wa, with
All lines
lines through
through the
the origin
are subspaces.
subspaces. Shifted
subspaces W",/l
with f3 =
=1= 0
0 are
are
called linear
called
linear varieties.
varieties.
Henceforth,
dependence of
space on
Henceforth, we
we drop
drop the
the explicit
explicit dependence
of aa vector
vector space
on an
an underlying
underlying field.
field.
Thus,
usually denotes
denotes aa vector
vector space
space with
with the
the underlying
underlying field
field generally
generally being
being R
JR. unless
Thus, V
V usually
unless
explicitly stated
stated otherwise.
explicitly
otherwise.
Definition
12, and
vector spaces
(or subspaces),
then RR =
Definition 2.9.
2.9. IfffR
and SS are
are vector
spaces (or
subspaces), then
= SS if
if and
and only
only ifif
S and S C
R.
RC
~SandS
~ R.
To prove
prove two
two vector
vector spaces
are equal,
equal, one
one usually
usually proves
proves the
the two
two inclusions
inclusions separately:
Note: To
spaces are
separately:
An
is shown
shown to
arbitrary s5 E S is
is shown
shown to
An arbitrary
arbitrary rr eE R is
to be
be an
an element
element of
of S and
and then
then an
an arbitrary
to
be an
element of
be
an element
of R.
R.

2.3
2.3

Linear Independence
Independence
Linear

Let
}} be
V.
Let X
X = {v1,
{VI, v2,
V2, .
be aa nonempty
nonempty collection
collection of
of vectors
vectors u,
Vi in
in some
some vector
vector space
space V.
Definition 2.10.
2.10. X
X is a linearly
linearly dependent set
set of
of vectors if
and only
Definition
if and
only if
if there exist
exist k distinct
distinct
X and
and scalars
scalars aI,
not all
all zero
zero such
such that
that
elements VI,
elements
v1, ...
. . . ,, Vk
vk eE X
a1, ..
. . . ,, (Xk
ak not

X
linearly independent
if and
and only
any collection
collection of
distinct
X is
is aa linearly
independent set
set of
of vectors
vectors if
only ififfor
for any
of kk distinct
elements VI,
v1, ...
. . . ,,Vk
of X
. . . ,, ak,
elements
Vk of
X and
and for
for any
any scalars
scalars a1,
aI,
ak,
al VI

+ ... + (XkVk = 0 implies

al

= 0, ... , ak = O.

11
11

2.3.
Linear Independence
Independence
2.3. Linear
Example
2.11.
Example 2.11.

~,

1.
LetV
= R3. Then
Then {[
I. 1t
V =

However, [
Howe,."I

HiHi] } Ime~ly

is a linearly independent
set. Why?
i"
independent..
Why?

i1[i1[l ]}

de~ndent ~t

linearly dependent set


iss aa Iin=ly

(since
v2 +
+ v3
0).
(since 2v\
2vI
- V2
V3 =
= 0).
xm
tA
m
A E
]Rnxm.
Thenconsider
considerthe
therows
rows of
ofeetA
as vectors
vectors in
in C
em[t
[to,
tIl
2. Let A
e ]Rnxn
R xn and 5B eE R"
. Then
BB as
0, t1]
fA
(recall that etA
e denotes the matrix exponential, which is discussed in more detail in
Chapter 11).
11). Independence
these vectors
vectors turns
concept
Chapter
Independence of
of these
turns out
out to
to be
be equivalent
equivalent to
to aa concept
called
to be
be studied
further in
in what
what follows.
follows.
called controllability, to
studied further

]Rn, ii E
consider the matrix V =
, Vk] eE ]Rnxk.
Let Vi
vf eE R",
e If,
k, and consider
= [VI,
[ v 1 , ...
... ,Vk]
Rnxk. The
The linear
dependence of
of this
this set
of vectors
vectors is
is equivalent
to the
the existence
existence of
nonzero vector
vector a eE ]Rk
dependence
set of
equivalent to
of aa nonzero
Rk
O. An equivalent condition
linear dependence
dependence is that the k x
x k matrix
such that Va = 0.
condition for linear
VT
VT VV is
is singular.
singular. If
If the
the set
set of
of vectors
vectors is
is independent,
independent, and
and there
there exists
exists aa Ee ]Rk
R* such
such that
that
Va
then a =
= 0.
O. An
An equivalent
equivalent condition
for linear
independence is
is that
that the
the matrix
Va =
= 0,
0, then
condition for
linear independence
matrix
V TTVV is
is nonsingular.
nonsingular.

Definition 2.12.
2.12. Let X
X =
Vi E
span of
of
Definition
= {VI,
[ v 1 , V2,
v 2 , ...
. . }} be a collection of
of vectors vi.
e V. Then the span
X is
as
X
is defined
defined as
Sp(X) = Sp{VI,
= {v :

V2, ... }

= (Xl VI

+ ... + (XkVk

(Xi ElF,

Vi

EX, kEN},

where N = {I,
{1, 2, ...
...}.
}.
n
2.13. Let V =
]Rn and
Example 2.13.
=R
and define

el

0
0

, e2 =

0
1
0

,'" ,en =

0
0
0

SpIel, e2,
, en}
]Rn.
Then Sp{e1,
e2, ...
...,e
= Rn.
n} =

Definition
of vectors
V if
if and
Definition 2.14.
2.14. A
A set
set of
vectors X
X is
is aa basis
basis for
for V
and only
only ijif

1. X
X is a linearly independent set (of
(of basis vectors), and
and
2. Sp(X)
Sp(X) =
= V.
2.
V.

Chapter 2. Vector Spaces

12
12

Example 2.15. {el,


... , en}
for]Rn
[e\,...,
en} is a basis for
IR" (sometimes called the natural
natural basis).
Now let bb1,
..., , bnn be a basis (with a specific order associated with the basis vectors)
l , ...
for V. Then
Then for
for all
all v E
e V there
there exists
exists aa unique
unique n-tuple
{E1 , ...
. . . , ,E~n}
n } such
such that
that
for
n-tuple {~I'

v=
where

~Ibl

+ ... + ~nbn

[b".,b.l. x

= Bx,

DJ

Definition
2.16. The scalars {Ei}
components (or sometimes the coordinates)
coordinates)
Definition 2.16.
{~i } are called the components
of
...
,
b
}
and
are
unique.
We
that
the
vector x of
of
,
of v with respect to the basis {b
(b1,
...,
]
unique.
We
say
nn
l
represents the vector v with respect to the basis B.
B.
components represents
Example 2.17. In]Rn,
In Rn,
VI ]

+ V2e2 + ... + vne n

= vlel

Vn

We can
can also
also determine
determine components
components of
of vv with
with respect
respect to
to another
another basis.
For example,
example, while
We
basis. For
while
[

] = I . el

+ 2 . e2,

with respect
respect to
to the
basis
with
the basis

{[-~l[-!J}
we have
we
have
[

] = 3.[

-~

-~

+ 4 [

To see this, write


[

] =

XI [

= [ -~
Then
Then

[ ~~ ] = [ -;

~ +
]

X2 [

_! ]

-! ][ ~~ l
-1

I [

=[ ~

Theorem 2.18.
2.18. The
The number
number of
of elements
elements in
in aa basis
basis of
of aa vector
is independent
independent of
of the
the
Theorem
vector space
space is
particular
basis considered.
particular basis
considered.
Definition 2.19.
2.19. If
V= 0)
0) has n elements, V
V is
is said to
Definition
If a basis X
X for
for a vector space V(Jf
be n.dimensional
n-dimensional or
or have
have dimension
dimension nn and
and we
we write
write dim
dim(V)
n or
or dim
dim V
V n.
n. For
be
(V) = n
For

2.4.
2.4. Sums
Sums and
and Intersections
Intersectionsof
of Subspaces
Subspaces

13
13

consistency,
space, we
we define
define dim(O)
O. A
A
consistency, and
and because
because the
the 00 vector
vector is
is in
in any
any vector
vector space,
dim(O) =
= 0.
vector space
V is
is finite-dimensional
finite-dimensional if
there exists
exists aa basis
basis X
with nn <
< +00
elements;
if there
X with
+00 elements;
vector
space V
otherwise,
otherwise, V
V is
is infinite-dimensional.
infinite-dimensional.

Thus, Theorem
2.18 says
says that
number of
in aa basis.
basis.
the number
of elements
elements in
Thus,
Theorem 2.18
that dim(V)
dim (V) = the
Example
Example 2.20.
2.20.
1. dim(~n)
= n.
dim(Rn)=n.
2. dim(~mXn)
dim(R mxn ) = mn.
mn.

Note: Check
basis for
by the
mn matrices
m, jj Ee ~,
Note:
Check that
that aa basis
for ~mxn
Rmxn is
is given
given by
the mn
matrices Eij;
Eij; ii eE m,
n,
where
Efj is
all of
elements are
are 00 except
except for
(i, J)th
j)th location.
location.
where Eij
is aa matrix
matrix all
of whose
whose elements
for aa 11 in
in the
the (i,
The collection
of E;j
Eij matrices
matrices can
can be
be called
called the
the "natural
"natural basis
matrices."
The
collection of
basis matrices."
3. dim(C[to, tJJ)
t1]) =
- +00.
+00.
T
A =A
AT}
4. dim{A
dim{A E ~nxn
Rnxn :: A
} =
= !n(n
{1/2(n + 1).
1
(To see
see why,
why, determine
1) symmetric
basis matrices.)
matrices.)
(To
determine1/2n(n
!n(n
+
1)
symmetric basis
2

5.
A is
upper (lower)
triangular} =
!n(n + 1).
1).
5. dim{A
dim{A Ee ~nxn
Rnxn :: A
is upper
(lower) triangular}
=1/2n(n

2.4
2.4

Sums and Intersections of Subspaces


Subspaces

Definition 2.21.
2.21. Let
(V, JF')
F) be
vector space
let 71,
c V.
sum and
and intersection
Definition
Let (V,
be a
a vector
space and
and let
R, SS S;
V. The
The sum
intersection
ofR
and SS are
defined respectively
respectively by:
of R, and
are defined
by:

1. R
n+S
S = {r
{r + ss :: rr eE R,
U, ss eE S}.
5}.
1.
2. ft
R n
S =
R and
S}.
2.
H5
= {v
{v :: vv Ee 7^
and vv Ee 5}.

Theorem 2.22.
2.22.
Theorem
kK

1. K
CV
V (in
(in general,
U\ -\+
1.
R + SS S;
general, RI

=: L
]T R;
ft/ S;
C V,
V, for
for finite
finite k).
k).
... +h 7^
Rk =:
1=1

;=1

2.
D5
CV
V (in
(in general,
2. 72.
R n
S S;
general,

f]
n

a eA
CiEA

*R,
CV
V/or
an arbitrary
arbitrary index
index set
A).
Raa S;
for an
set A).

Remark 2.23.
2.23. The
U S,
Remark
The union
union of
of two
two subspaces,
subspaces, R C
S, is
is not
not necessarily
necessarily aa subspace.
subspace.

Definition
= R
0 SS is
is the
the direct
direct sum
sum of
R and
and SS ifif
Definition 2.24.
2.24. T =
REB
ofR
1.
R n S == 0,
0, and
1. n
and

(L

2.
(in general
am/ ]P ft,2. U
R + SS =
= T (in
general, ft;
R; n (^ ft,-)
R j ) == 00 and
Ri == T).
T).
H;
y>f

The
subspaces R,
Rand
are said
said to
complements of
of each
The subspaces
and SS are
to be
be complements
each other
other in
in T.
T.

14
14

Chapter 2.
2. Vector
Vector Spaces
Spaces
Chapter

2
Remark 2.25.
unique. For example, consider V =
jR2
2.25. The complement of R
ft (or S)
S) is not unique.
=R
and let
let R
ft be
be any
any line
line through
through the
the origin.
any other
other distinct
line through
origin is
and
origin. Then
Then any
distinct line
through the
the origin
is
a complement of R.
ft. Among all the complements there is a unique
unique one orthogonal to R.
ft.
We discuss more about orthogonal complements elsewhere in the text.

Theorem
2.26. Suppose
=RR O
Theorem 2.26.
Suppose T =
EB S. Then
Then

1. every
written uniquely
uniquely in
every tt E T can
can be
be written
in the
the form
form tt = rr + ss with
with rr Ee Rand
R and ss Ee S.
S.
2.
2. dim(T) =
= dim(R)
dim(ft) +
+ dim(S).

Proof: To
Proof:
To prove the first part, suppose an arbitrary vector tt Ee T can be written in two ways
as tt = r1
S2, where
r2 Ee Rand
R. and s1,
e S. Then
r2 = s2
s\. But
rl + s1
Sl = r2
r2 + S2,
where r1,
rl, r2
SI, S2
S2 E
Then r1
r,
- r2
S2 - SI.
But
as
r1 -r2
ft and 52
S. Since
Since Rft n
fl S = 0,
0, we
r\ = r2
r-i and
and s\
from
rl
r2
E Rand
S2 - si
SI eE S.
we must
must have
have rl
SI = si
S2 from
which
uniqueness follows.
which uniqueness
follows.
0
The statement of the second part is a special case of the next theorem. D
Theorem 2.27.
ft, S
S of
of a
vector space
space V,
V,
Theorem
2.27. For
For arbitrary
arbitrary subspaces
subspaces R,
a vector
dim(R + S) = dim(R)

+ dim(S) -

dim(R n S).

x
Example 2.28.
jRn xn
the
Example
2.28. Let U be the subspace of upper triangular matrices in E"
" and let .c be the
nxn
subspace of lower triangUlar
jRn xn.
xn
triangular matrices in R
. Then it may be checked that U +
+ .c
L=
= jRn
Rnxn
nxn
un.c
jRnxn.
while U
n is the set of diagonal matrices in R
. Using the fact that dim {diagonal
(diagonal
matrices} =
= n,
n, together with Examples 2.20.2 and 2.20.5, one can easily verify the validity
validity
of
formula given
given in
Theorem 2.27.
2.27.
of the
the formula
in Theorem

Example 2.29.
Example
2.29.
x
jRnxn,
and let
R"
", and
let S

Let (V,
jR), let R
(V, IF)
F) = (jRnxn,
(R n x n , R),
ft be the set of skew-symmetric matrices in
x
be the
set
of
symmetric
matrices
in
jRnxn.
the set
in R"
". Then V = U $0 S.
S.

x
Proof: This follows easily from the fact that any A
A E
jRnxn
written in the form
Proof:
e R"
" can be written

TIT

A=2:(A+A )+2:(A-A).

The first matrix on the right-hand side above is in S while the second is in R.
ft.

EXERCISES
EXERCISES
1.
... , Vk}
vd is a linearly dependent set. Then show that one of the vectors
1. Suppose {VI,
{vi,...,
must be a linear combination of the others.
XI, *2,
X2, ...
Xk E
jRn be nonzero mutually
... ,
2. Let x\,
. . . ,, x/c
E R"
mutually orthogonal vectors. Show that {XI,
[x\,...,
Xk}
must be
linearly independent
independent set.
set.
Xk} must
be aa linearly

3. Let VI,
... ,v
, Vn
jRn. Show that Av\,...,
Av" .. , Av
AVnn are
orv\,...
are also orn be orthonormal vectors in R".
x
jRnxn
thonormal if and only if A Ee R"
" is orthogonal.
4. Consider
= [2
Consider the vectors VI
v\
[2 1l]fr and V2
1*2== [3[3 1f.
l] r .Prove
Provethat
thatVIviand
andV2V2form
forma abasis
basis
2
for R
v=
= [4
[4 If
l]r with respect to this basis.
jR2.. Find the components of the vector v

Exercises
Exercises

15

5. Let
Let P denote
set of
polynomials of
degree less
than or
or equal
two of
the form
form
5.
denote the
the set
of polynomials
of degree
less than
equal to
to two
of the
2
Po
p\xX + pix
where Po,
po, PI,
p\, p2
e R.
Show that
is aa vector
vector space
space over
over R
E. Show
Show
Po + PI
P2x2,, where
P2 E
R Show
that P is
x, and
- 1
basis for
Find the
the
that
the polynomials
polynomials 1,
that the
1, *,
and 2x2
2x2
1 are
are aa basis
for P. Find
the components
components of
of the
22
polynomial 22 +
+ 3x
3x + 4x
basis.
4x with
with respect
respect to
to this
this basis.
polynomial
6.
Prove Theorem
case of
only).
6. Prove
Theorem 2.22
2.22 (for
(for the
the case
of two
two subspaces
subspaces Rand
R and S only).

7. Let
denote the
vector space
space of
of degree
degree less
equal to
and of
7.
Let Pnn denote
the vector
of polynomials
polynomials of
less than
than or
or equal
to n,
n, and
of
n
the
form
p
(
x
)
=
po
+
p\x
+

+
p
x
,
where
the
coefficients
/?,
are
all
real.
Let
PE
the form p(x) Po + PIX + ... + Pnxn,
where the coefficients Pi are all real. Let PE
n
denote
subspace of
all even
even polynomials
in Pnn,, i.e.,
i.e., those
that satisfy
satisfy the
property
denote the
the subspace
of all
polynomials in
those that
the property
p(x}
= p(x).
Similarly, let
let PQ
denote the
subspace of
of all
all odd
polynomials, i.e.,
i.e.,
p( -x) =
p(x). Similarly,
Po denote
the subspace
odd polynomials,
those
satisfying
p(x}
=

p
(
x
)
.
Show
that
P
=
P

POthose satisfying p(-x) = -p(x). Show that nn = PE


E EB Po
8. Repeat
using instead
instead the
subspaces T
7" of
of tridiagonal
8.
Repeat Example
Example 2.28
2.28 using
the two
two subspaces
tridiagonal matrices
matrices and
and
U of
of upper
upper triangular
triangular matrices.
matrices.
U

This
page intentionally
intentionally left
left blank
blank
This page

Chapter 3
Chapter
3

Linear
Linear Transformations
Transformations

3.1
3.1

Definition
Definition and
and Examples
Examples

definition of
of aa linear
linear transformation (or
(or linear map, linear function,
function,
We begin with the basic definition
or
linear operator)
between two
two vector
vector spaces.
or linear
operator) between
spaces.
Let (V, F)
IF) and
and (W,
IF) be
be vector
vector spaces.
spaces. Then
I:- :: V -+
Definition 3.1. Let
(W, F)
Then C
-> W is aa linear
transformation
if and
if
transformation if
and only
only if
I:-(avi + {3V2)
al:-vi + {3I:-V2
for all
all a,
a, {3
ElF
and for
all v
VI,
V22e
E V.
(avi
pv2) =
= aCv\
fiv2 far
e
F and
far all
V.
},v
The
vector space
space V is called
called the
I:- while
while VV,
W, the
space into
into
the domain of
of the
the transformation
transformation C
the space
The vector
which it
it maps,
maps, is
called the
which
is called
the co-domain.

Example 3.2.
Example
3.2.
1.
Let F
IF = R
JR and
take V
W = PC[f
PC[to,
and take
V= W
+00).
1. Let
0, +00).
Define
I:- :: PC[t
PC[to,
+00) -+
PC[to,
+00) by
by
Define
-> PC[t
0, +00)
0, +00)
vet)

f--+

wet) = (I:-v)(t) =

11

e-(t-r)v(r) dr.

to
mxm
2. Let
Let F
IF = R
JR and
W = JRmxn.
Fix M
MEe R
JRmxm..
and take V
V= W
R mx ". Fix
mx -+ JRmxn
mxn by
Define
I:- :: JRmxn
R " -> M by

f--+

= I:-X = MX.

n : a,
3.
Let F
IF =
=n R
JR and
take V
= P"
pn = {p(x)
ao0 + ct
alx
+ ... +h aanx
ai E
E R}
JR} and
and
3. Let
and take
V=
(p(x) = a
}x H
nx"
1
W =
pn-l.
w
= -p
-.
I:- : V -+
p', where'
where I denotes
Define C.:
> W by I:-Lpp = p',
denotes differentiation
differentiation with respect to x.
x.

17

Chapter 3. Linear
Chapters.
Li near Transformations
Transformations

18

3.2
3.2

Matrix Representation
Representation of
Linear Transformations
Transformations
Matrix
of Linear

Linear
conLinear transformations
transformations between
between vector
vector spaces
spaces with
with specific
specific bases
bases can
can be
be represented
represented conSpecifically, suppose L : (V, F)
IF) >
~ (W,
IF) is linear and further
veniently in matrix form. Specifically,
(W, F)
suppose that {Vi,
~} and {Wj,
{w j, j E
{u,, i eE n}
e !!!.}
m] are bases for V
V and W, respectively. Then the
ith column of A =
= Mat
L (the matrix representation of L with respect to the given bases
for V
V and
and W)
of i>,
{w}j,, jj eE m}.
raj. In
for
W) is
is the
the representation
representation of
LVi with
with respect
respect to
to {w
In other
other words,
words,

al

A=

JR.mxn

a mn

represents
since
since
represents L
LVi = aliwl

+ ... + amiWm

=Wai,

where
W=
= [w\,...,
wm]]and
where W
[WI, ... , w
and

L depends on the particular bases for V


is the ith
z'th column of A. Note that A = Mat
V and W.
This could be reflected by subscripts, say, in the notation, but this is usually
usually not done.
uniquely determined
determined (by linearity)
The action of L on an arbitrary vector Vv eE V
V is uniquely
by
action on
on aa basis.
Thus, if
v =
= E1v1
+ ...
+
+ E
vn =
= V
Vxx (where
and hence
by its
its action
basis. Thus,
if V
~I VI +
~nnVn
(where u,
v, and
hence jc,
x, is
is
arbitrary), then
arbitrary),
then
LVx = Lv = ~ILvI

+ ... + ~nLvn

=~IWal+"'+~nWan

= WAx.

Thus, V
WA
since xx was
was arbitrary.
arbitrary.
Thus,
LV = W
A since
When
V=
= R",
W == lR.
Rmm and
and {Vi,
[ v i , ii Ee n},
[ w jj', jj eE !!!.}
m} are
are the
(natural) bases,
bases
When V
JR.n, W
~}, {W
the usual
usual (natural)
WA
linea
LV = W
A becomes simply
L =
= A.
A. We
We thus commonly identify
identify A
A as a linear
the equation V
transformation with
its matrix
i.e.,
transformation
with its
matrix representation,
representation, i.e.,

m
Thinking of
as aa matrix
matrix and
from Rn
Rm
usually
Thinking
of A both
both as
and as
as aa linear
linear transformation
transformation from
JR." to
to lR.
usually causes
causes no
no
naturally to appropriate matrix multiplication.
confusion. Change of basis then corresponds naturally

3.3. Composition
Transformations
3.3.
Composition of
ofTransformations

3.3

19
19

Composition
Composition of Transformations

Consider three vector spaces U, V, and W


Wand
and transformations B from U to V and A from
V to
to W.
W. Then
Then we
we can
can define
define aa new
new transformation
transformation C
C as
as follows:
follows:

C
The
above diagram
C =
= AB.
The above
diagram illustrates
illustrates the
the composition
composition of
of transformations
transformations C
AB. Note
Note that
that in
in
most texts, the arrows above are reversed as follows:

C
However, it might be useful to prefer the former since the transformations A and B appear
in the
same order
order in
dimZ// =
= p,
= n,
n,
in
the same
in both
both the
the diagram
diagram and
and the
the equation.
equation. If
If dimU
p, dimV
dim V =
and
W = m,
and if
if we
associate matrices
the transformations
transformations in
in the
and dim
dim W
m, and
we associate
matrices with
with the
the usual
usual way,
way,
then composition
composition of
corresponds to
to standard
standard matrix
multiplication. That
That is,
then
of transformations
transformations corresponds
matrix mUltiplication.
is,
we have C
C A
AB
B .. The above is sometimes expressed
expressed componentwise by the
mxp

nxp

formula
n

cij

aikbkj.

k=1

Two
Two Special
Special Cases:
Inner
Product: Let x, y eE Rn.
~n. Then their inner product is the scalar
Inner Product:
n

xTy = Lx;y;.
;=1

m
Outer
~m,
Outer Product:
Product: Let x eE R
, yy eE ~n.
Rn. Then their outer product is the m x n
matrix
matrix

Note that any rank-one matrix A eE ~mxn


Rmxn can be written in the form A =
= xyT
xyT
H
mxn
mxn
above (or xy
xyH if A Ee C
c ).). A rank-one symmetric matrix can be written in
the form XX
xx TT (or xx
XXHH).).

20
20

Chapter
Chapter 3.
3. LinearTransformations
Li near Transformations

3.4
3.4

Structure
Structure of
of Linear
Linear Transformations
Transformations

Let
A :: V
--+ W be
transformation.
Let A
V >
be aa linear
linear transformation.

Definition3.3.
A, denotedR(A),
set {w
Av for
for some
Definition 3.3. The
The range
range of
of A,
denotedlZ( A), is
is the
the set
{w Ee W : w
w=
= Av
some vv Ee V}.
V}.
Equivalently, R(A) = {Av
{Av : v Ee V}.
V}. The range of
of A is also known as the image of
of A and
denoted
denoted Im(A).
Im(A).
The
Av =
of
The nullspace of
of A, denoted
denoted N(A),
N(A), is
is the
the set {v
{v Ee V
V : Av
= O}.
0}. The
The nullspace
nullspace of
kernel of
of A and
and denoted Ker (A).
A is also known as the kernel
(A).
Theorem 3.4. Let
Let A
A :: V
V --+
> W be
be aa linear
linear transformation.
transformation. Then
Then
1.
R(A)
1. R
( A ) S;
C W.
W.

2.
V.
2. N(A)
N(A) S;
c V.

Note
N(A) and
Note that
that N(A)
and R(A)
R(A) are,
are, in
in general,
general, subspaces
subspaces of
of different
different spaces.
spaces.
mxn
Theorem 3.5. Let A Ee R
~mxn.. If
... ,,a
an],
If A is written in
in terms of
of its columns as A =
= [ai,
[a\,...
n],
then
then
R(A) = Sp{al, ... , an} .
then

Proof: The
the defiProof:
The proof
proof of
of this
this theorem
theorem is
is easy,
easy, essentially
essentially following
following immediately
immediately from
from the
definition.
0
nition. D

Remark 3.6. Note


Note that
is
that in
in Theorem
Theorem 3.5 and
and throughout
throughout the
the text,
text, the
the same
same symbol
symbol (A)
(A) is
used
to denote
both aa linear
the
used to
denote both
linear transformation
transformation and
and its
its matrix
matrix representation
representation with
with respect
respect to
to the
usual
usual (natural)
(natural) bases.
bases. See
See also
also the
the last paragraph of
of Section
Section 3.2.
3.2.
Definition 3.7.
... , vk]
vd be a set of
3.7. Let {VI,
{v1,...,
of nonzero vectors Vi
u, Ee ~n.
Rn. The set is said to
be orthogonal
orthogonal if
if' vr
vjvjv j = 00 for
^ jj and
and orthonormal
orthonormal if
if vr
vf vvjj = 88ij'
8tj is
is the
for ii f=
where 8ij
the
be
ij, where
Kronecker
Kronecker delta
delta defined
defined by
by

8 = {I0
ij

ifi=j,
if i f= j.

Example 3.8.
3.8.

J. [-: J}

1. {[

2. {[

~~i

is an
an orthogonal
orthogonal set.
set.
is

],[ -:~~ J}

is an
an orthonormal
orthonormal set.
set.
is

. h Vi

hogonaI set,
. isan
an
en {I~
~, ...
~ | IS
33.. If {{VI,
t > i .
, . . . ,,Vk
Vk}} Wit
with
u, E.IN,.
1Tlln
M." IS
is
an ort
orthogonal
set,ththen
/==,
- -.,, ~}
/===
an
orthonormal
orthonormal set.
set.

I ~VI
^/v, VI
vi

^/v'k vk
~~~

3.4.
of Li
near Transformations
Transformations
3.4. Structure
Structure of
Linear

21
21

Definition 3.9. Let S <;


]Rn. Then
the orthogonal complement of
c Rn.
Then the
of S is defined
defined as the set
1
S~={VE]Rn:
S
- = {v e Rn : vTs=OforallsES}.
VTS = 0 for all s e S}.

Example 3.10.
3.10. Let

Then it can be shown that

Working from the definition, the computation involved is simply to find all nontrivial (i.e.,
nonzero) solutions of the system of equations
3xI
-4xI

+ 5X2 + 7X3 = 0,
+ X2 + X3 = 0.

Note that there is nothing special about the two vectors in the basis defining S being orthogonal. Any set of vectors will do, including dependent spanning vectors (which would,
of course, then give rise to redundant equations).

n,

n
Theorem 3.11.
311 Let
Theorem
Let R SS C
<; R
]Rn. The
Then

2. S \B S~ = ]Rn.
3. (S~)l.

= S.

4.

n <; S

5.

(n + S)~ = nl. n S~.

6.

(n n S)~

if and only if S~ <;

= n~

n~.

+ S~.

Proof:
Proof: We prove and discuss only item 2 here. The proofs of the other results are left
left as
exercises. Let {VI,
]Rn be an arbitrary
{v1, ...
..., , Vk}
vk} be an orthonormal basis for S and let x E
e Rn
vector.
vector. Set
Set
k

L (xT Vi)Vi,

XI

X2

=X

;=1
-XI.

22
22

Chapter
3. Li
Linear
Chapters.
near Transformations
Transformations

Then
e <S
and, since
since
Then x\
XI E
S and,
x 2TVj = XTVj - XITVj
=XTVj-XTVj=O,

we
see that
is orthogonal
orthogonal to
..., , Vk
Vk and
and hence
of these
these
we see
that x2
X2 is
to v1,
VI, ..
hence to
to any
any linear
linear combination
combination of
vectors.
other words,
S. We
vectors. In
In other
words, X2
X2 is
is orthogonal
orthogonal to
to any
any vector
vector in
in S.
We have
have thus
thus shown
shown that
that
IRn. We
We also have that SS U
n S.l
S + S.l
S1 == Rn.
S1 ==00 since the
the only vector s Ee S orthogonal
orthogonal to
everything in
(i.e., including
everything
in S (i.e.,
including itself)
itself) is
is 0.
O.
It
decompositions, we
It is also easy to see directly that, when we have such direct sum decompositions,
can
write vectors
vectors in
unique way
way with
with respect
respect to
to the
the corresponding
corresponding subspaces.
can write
in aa unique
subspaces. Suppose,
Suppose,
1
for example,
= x'1+
,
where
x\,
x
1
E
S
and
x2,
x'
e
S
. Then
Then
for
example, that
that xx =
= x1
XI + x2.
X2 =
x; + x'
x~,
where
XI,
x;
E
Sand
X2,
x~
E
S.l.
2
2
T
T
(x;
- x1)
XI/(x'
(x~
- x2)
X2) = 0
by definition
definition of
of ST.
S.l. But
But then
then (x'1
(x;
- XI)T
xd = 00 since
(x'1
0 by
x1) (x;
(x'1 - x1)
since
2
xx~2
X2 =
(x'1
x1) (which
x'2). Thus,
-X2
=
-(x;
-XI)
(which follows
follows by
by rearranging
rearranging the
the equation
equation x1+x2
XI +X2 =
= x'1
x; +
+x~).
Thus,
XI
= x'1
x; andx2
0
x1
and x2 == xx~.
D
2.
m
Theorem 3.12.
3.12. Let
Let A
A :: IR
-+ R
IRm.
Then
Theorem
Rnn >
. Then
R(A Tr ).). (Note:
for finite-dimensional
1. N(A).l
N(A)1" = 7(A
(Note: This
This holds only for
finite-dimensional vector spaces.)
spaces.)
1
2.
R(A).l
= J\f(A
N(ATT).). (Note:
also holds
holds for
for infinite-dimensional
infinite-dimensional vector
vector spaces.)
2. 'R,(A)
~
(Note: This
This also
spaces.)

Proof: To
To prove the first part, take an
N(A). Then Ax
Ax =
= 0 and
Proof:
an arbitrary xx eE A/"(A).
and this is
T
T
Ax =
But yyT
Ax =
= (AT
x. Thus,
Thus, Ax
Ax =
= 0
if and
and only
only if
if xx
equivalent to
to yyT
equivalent
Ax
= 00 for
for all
all y.
v. But
Ax
( A T yy{
) x.
0 if
T
r
orthogonal to all vectors of the form
AT y,
is orthogonal
form A
v, i.e.,
i.e., xx eE R(AT).l.
R(A ) . Since
Since xx was arbitrary, we
).
have established
established thatN(A).l
that N(A)1 = R(A
U(ATT}.
The proof
proof of
of the
the second
second part
part is
is similar
similar and
and is
left as
as an
an exercise.
0
The
is left
exercise. D
m
Let A
A :: R
IRnn -+
IRm.
IRn :: Av
Av =
= 0}
O} is
is sometimes
sometimes called
called the
the
Definition 3.13.
3.13. Let
Definition
-> R
. Then
Then {v
{v Ee R"
m
m
TT
right nullspace
nullspace of
of A.
A. Similarly,
Similarly, (w
{w e
E R
IR :: w
A =
= 0}
O} is
is called
called the
left nullspace
nullspace of
right
W A
the left
of A.
A.
Clearly,
the right
right nullspace
nullspace is
is A/"(A)
N(A) while
while the
the left
).
Clearly, the
left nullspace
nullspace is
is N(A
J\f(ATT).

Theorem
3.12 and
and part
Theorem 3.12
part 22 of
of Theorem
Theorem 3.11
3.11 can
can be
be combined
combined to
to give
give two
two very
very funfundamental and useful decompositions
decompositions of vectors in the domain and
damental
and co-domain of a linear
transformation
See also
2.26.
A. See
also Theorem
Theorem 2.26.
transformation A.
m
Theorem
R"n ->
. Then
Theorem 3.14
3.14 (Decomposition
(Decomposition Theorem).
Theorem). Let
Let A
A :: IR
-+ R
IRm.
Then

1. every
every vector
space R"
IRn can
can be
written in
in a
a unique
unique way
way as
as vv =
7.
vector vv in
in the
the domain
domain space
be written
= xx + y,
y,

E M(A)
N(A) and
E J\f(A)
N(A).l = R(AT)
N(A) EB
.
where x
and y
ft(Ar) (i.e.,
(i.e., IR
R"n = M(A)
0 R(A
ft(ATr)).

2. every
in the
the co-domain
Rmm can
a unique
asww =
x+y,
every vector
vector w in
co-domain space
space IR
can be
be written
written in
ina
unique way
way as
= x+y,
1
R(A) and
and y e
E ft(A)
R(A).l- = Af(A
N(AT)T ) (i.e.,
IRmm =
R(A) 0
EBN(A
.
where x eE U(A)
(i.e., R
= 7l(A)
M(ATT)).
This key
key theorem
theorem becomes
becomes very
very easy
easy to
to remember
remember by
by carefully
studying and
underThis
carefully studying
and understanding Figure
Figure 3.1
in the
the next
next section.
standing
3.1 in
section.

3.5
3.5

Four
Four Fundamental
Fundamental Subspaces
Subspaces

x
Consider aa general
general matrix
matrix A
A
E E^
lR;,xn.
When thought
thought of
of as
as aa linear
linear transformation
transformation from
Consider
". When
from IR
E"n
m
to
of A
can be
in terms
fundamental subspaces
subspaces
to R
IRm,, many
many properties
properties of
A can
be developed
developed in
terms of
of the
the four
four fundamental

3.5. Four
Four Fundamental
Fundamental Subspaces
Subspaces
3.5.

23
23

N(A)1-

EB {OJ

{O}Gl

m -r

n-r

Figure
fundamental subspaces.
Figure 3.1.
3.1. Four fundamental
subspaces.
R(A), 'R.(A)^,
R(A)1-, AN(A),
properties seem almost
7(A),
f ( A ) , and N(A)1-.
N(A)T. Figure 3.1
3.1 makes many key properties
obvious and
and we
return to
to this
this figure
figure frequently
frequently both
both in
in the
the context
context of
of linear
linear transformations
obvious
we return
transformations
and in
in illustrating
illustrating concepts
concepts such
such as
as controllability
controllability and
and observability.
observability.
and

be aa linear
linear transfortransforDefinition
3.15. Let
W be
spaces and
and let
let A
Definition 3.15.
Let V and
and W
be vector
vector spaces
A :: V -+ W be
motion.
mation.
1. A
is onto
onto (also
(also called
called epic
epic or
or surjective)
surjective) ifR(A)
ifR,(A) =
= W.
W.
1.
A is
2. A
is one-to-one
one-to-one or
or 1-1
1-1 (also
(also called
called monic
monic or
or injective)
infective) if
ifJ\f(A)
0. Two
Two equivalent
equivalent
2.
A is
N(A) == O.
characterizations
of A
1-1 that
that are
are often
often easier
to verify
verify in
in practice
are the
the
characterizations of
A being
being 1-1
easier to
practice are
following:
following:
(a) AVI = AV2
(b)

VI

===} VI

= V2 .

t= V2 ===} AVI t= AV2 .

m
Definition 3.16.
3.16. Let A : E"
IR n -+
IRm.
rank(A) = dim
R(A). This is sometimes called
-> R
. Then rank(A)
dimftCA).
the column
column rank
rank of
of A (maximum number of
of independent
independent columns). The row
row rank
rank of
of A is

24
24

Chapter
3. LinearTransformations
Chapter3.
Linear Transformations

r
dim 7(A
R(AT)
) (maximum number of
of independent
independent rows).
rows). The dual notion to rank is the nullity
of
A, sometimes denoted
of A,
denoted nullity(A)
nullity(A) or
or corank(A),
corank(A), and
and is
is defined
defined as
as dimN(A).
dim A/"(A).
n
m
Theorem 3.17.
3.17. Let A :: R
]Rn ->
~ R
]Rm.. Then dim K(A)
R(A) = dimNCA)-L.
dimA/'(A) . (Note:
(Note: Since
1
TT
N(A)-L" =
= 7l(A
R(A ),), this theorem is sometimes colloquially
A/^A)
colloquially stated "row rank of
of A == column
rank of
of A.")
A.")

Proof: Define a linear transformation T : N(A)-L


Proof:
J\f(A)~L ~
>R(A)
7(A)byby
Tv

Av for all v

N(A)-L.

Clearly T is 1-1 (since A/"(T)


N(T) =
= 0). To
To see that T is also onto, take any W
w eE R(A).
7(A). Then
by definition there is a vector xx Ee ]Rn
Ax =
R" such that Ax
w.
w. Write xx = Xl
x\ + X2,
X2, where
1
Xl
N(A)-L
N(A). Then Ajti
AXI = W
N(A)-L.1. The last equality
x\ Ee A/^A)
- andx2
and jc2 eE A/"(A).
u; = TXI
r*i since Xl
*i eE A/^A)shows that T
R(A) =
T is onto. We thus have that dim
dim7?.(A)
= dimN(A)-L
dimA/^A^ since it is easily shown
1
basis for
N(A)-L,, then {TVI,
basis for R(A).
if
that if {VI,
{ui, ...
. . . ,, viv}
abasis
forA/'CA)
{Tv\, ...
. . . ,, Tv
Tvrr]} is aabasis
7?.(A). Finally, if
r } is a
following string of equalities follows
follows easily:
we apply this and several previous results, the following
T
"column
A" = rank(A)
R(A) =
R(AT)
"column rank of A"
rank(A) =
= dim
dim7e(A)
= dimN(A)-L
dim A/^A)1 =
= dim
dim7l(A
) =
= rank(AT)
rank(A r ) ==
"row rank of
0
of A."
D
The following corollary is immediate. Like the theorem, it is a statement about equality
of
dimensions;
the subspaces
subspaces themselves
themselves are
are not
not necessarily
in the
the same
same vector
vector space.
space.
of dimensions; the
necessarily in
m
Corollary 3.18.
]Rn ~
]Rm.. Then dimN(A)
R(A) =
= n, where n is the
3.18. Let A : R"
-> R
dimA/"(A) +
+ dim
dimft(A)
dimension of
dimension
of the
the domain
domain of
of A.
A.

Proof:
Theorems 3.11
3.11 and
and 3.17
3.17 we
we see
see immediately
Proof: From
From Theorems
immediately that
that
n = dimN(A)
= dimN(A)

+ dimN(A)-L
+ dim R(A) .

For completeness,
completeness, we include here a few miscellaneous results about ranks of sums
and products of matrices.
xn
Theorem 3.19.
]Rnxn.
3.19. Let A, B Ee R"
. Then

1. O:s rank(A
2. rank(A)

+ B)

:s rank(A)

+ rank(B) -

+ rank(B).

n :s rank(AB) :s min{rank(A), rank(B)}.

3. nullity(B) :s nullity(AB) :s nullity(A)


4.

if B is nonsingular,

rank(AB)

+ nullity(B).

= rank(BA) = rank(A) and N(BA) = N(A).

Part 44 of
of Theorem
3.19 suggests
suggests looking
looking atthe
at the general
general problem
of the
four fundamental
fundamental
Part
Theorem 3.19
problem of
the four
subspaces of matrix products. The basic results are contained in the following
following easily proved
theorem.

3.5.
3.5. Four
Four Fundamental
Fundamental Subspaces
Subspaces

25
25

mxn
nxp
Theorem 3.20.
IRmxn,
IRnxp.
3.20. Let A Ee R
, B Ee R
. Then

1. RCAB) S; RCA).
2. N(AB) ;2 N(B).
3. RAB)T) S; R(B T ).
4. NAB)T) ;2 N(A T ).

The
It
The next
next theorem
theorem is
is closely
closely related
related to
to Theorem
Theorem 3.20
3.20 and
and is
is also
also easily
easily proved.
proved. It
is
and
is extremely
extremely useful
useful in
in text
text that
that follows,
follows, especially
especially when
when dealing
dealing with
with pseudoinverses
pseudoinverses and
linear
linear least
least squares
squares problems.
problems.
mxn
Theorem 3.21.
Let A Ee R
IRmxn.
3.21. Let
. Then

1. R(A)

= R(AA T ).

2. R(AT)

= R(A T A).

3. N(A) = N(A T A).


4. N(A T ) = N(AA T ).

We now
now characterize
characterize I-I
1-1 and
and onto
onto transformations
transformations and
and provide
provide characterizations
characterizations in
We
in
terms of
of rank
and invertibility.
terms
rank and
invertibility.
Theorem
Theorem 3.22.
3.22. Let A :: IR
Rnn -+
- IRm.
Rm. Then
1. A
is onto
onto if
if and
and only
only if
//"rank(A)
m
m (A
(A has
has linearly
linearly independent
independent rows
rows or
or is
is said
said to
to
1.
A is
rank (A) =
have
full row
AATT is
have full
row rank;
rank; equivalently,
equivalently, AA
is nonsingular).
nonsingular).
2.
A is
said
2. A
is 1-1
1-1 if
if and
and only
only ifrank(A)
z/rank(A) =
= nn (A
(A has
has linearly
linearly independent
independent columns
columns or
or is
is said
T
to
full column
AT
A is nonsingular).
to have
have full
column rank;
rank; equivalently,
equivalently, A
A
nonsingular).

Proof' Proof
part 1:
A is
R(A) =
Proof:
Proof of
of part
1: If
If A
is onto,
onto, dim
dim7?,(A)
m
m =
rank(A).
rank (A). Conversely,
Conversely, let
let yy Ee IRm
Rm
T
T ]
n
be arbitrary. Let jc
x =A
AT(AA
(AAT)-I
IRn.. Then y = Ax, i.e., y Ee R(A),
A is onto.
)~ y Y Ee R
7?.(A), so A
A is
=
Proof
Proof of
of part
part 2:
2: If
If A
is 1-1,
1-1, then
then N(A)
A/"(A) =
= 0,
0, which
which implies
implies that
that dimN(A)1dim A/^A)-1 =
nn
dim R(A
7(ATr ),), and
and hence
hence dim
dim 7(A)
Theorem 3.17.
3.17. Conversely,
Conversely, suppose
suppose AXI
Ax\ = Ax^.
dim
R(A) = nn by
by Theorem
AX2.
T
Then A
ATr A;ti
AXI = A
AT
AX2, which implies x\
XI =
X2 since A
ATrAA is invertible. Thus, A
A is
Ax2,
= x^.
1-1.
D
1-1.
D

Definition
A :: V
Definition 3.23.
3.23. A
V -+
W
W is
is invertible
invertible (or
(or bijective)
bijective) if
if and
and only
only if
if it
it is
is 1-1
1-1 and
and onto.
onto.
Note that
that if
if A
is invertible,
invertible, then
then dim
dim V
V =
dim
dim W.
W. Also,
- E"
is invertible
invertible or
or
A is
Also, A
A :: W
IRn1 -+
IR n is
Note
nonsingular ifand
nonsingular
if and only
only ifrank(A)
z/rank(A) =
= n.
n.
x
A E R"
IR~xn,
Note that in the special case when A
", the transformations A,
A, AT,
Ar, and A-I
A"1

are
N(A)1- and
R(A). The
are all
all 1-1
1-1 and
and onto
onto between
between the
the two
two spaces
spaces M(A)
and 7(A).
The transformations
transformations AT
AT
!
and
-I have
range but
is
and A
A~
have the
the same
same domain
domain and
and range
but are
are in
in general
general different
different maps
maps unless
unless A
A is
T
orthogonal. Similar remarks apply to A
A and A~
A -T.
.

26

Chapter 3. linear
Chapters.
Li near Transformations
Transformations

If
linear transformation is not invertible, it may still be right or left
left invertible. DefiIf a linear
concepts are followed by a theorem characterizing left
left and right invertible
nitions of these concepts
transformations.

Definition
V ->
Definition 3.24.
3.24. Let
Let A
A :: V
-+ W. Then
Then
1. A is said to be right invertible ifif there exists a right inverse transformation A~
A-RR ::
R
AA -R =
W -+
> V such that AA~
= Iww,, where IIw
transformation on W.
w denotes the identity transfonnation
L
left inverse transformation
A -L
-+
2. A is said to
to be left invertible ifif there exists a left
transformation A~
:: W >
L
V such
-L A
A == Iv,
such that
that AA~
Iv, where
where Iv
Iv denotes
denotes the
the identity
identity transfonnation
transformation on
on V.
V.

Let A : V -+
Theorem 3.25. Let
-> W. Then
1. A
A is right
right invertible
invertible ifif and
and only
only ifif it
it is onto.
1.
onto.
left invertible
and only ifit
2. A is
is left
invertible if
if and
if it is 1-1.
and only if
and left
left invertible,
i.e., both
Moreover, A is
is invertible if
if and
if it is both right and
invertible, i.e.,
both1-1
1-1 and
and
R
L
onto, in
in which case A~
A -Il =
= A~
A -R
= A~
A -L.
=
.
m
Theorem 3.22
3.22 we see that if A : E"
]Rn -+
]Rm
Note: From Theorem
-> E
is onto, then a right inverse
R
T
T
is given by A~
A -R =
= A
AT(AA
(AAT)
left inverse is given by
) -I.. Similarly, if A is 1-1, then a left
L
T
L = (ATTA)-I1AT.
AA~ = (A A)~ A .

3.26. Let
Let A : V -
-+ V.
V.
Theorem 3.26.
1. If
A - RR such that AA~
A A - RR =
= I, then A is invertible.
If there exists a unique right inverse A~
L
left inverse A~
A -L
A -LLA
A =
2. If
If there exists a unique left
such that A~
= I, then A is invertible.

Proof: We prove the first part and


proof of
second to the reader. Notice the
Proof:
and leave
leave the proof
of the second
the
following:
following:
A(A- R + A-RA -I)

= AA- R + AA-RA = I

+IA -

A since AA -R = I

= I.
R
(A -R
+ AA -RRAA
- /)I) must
must be
be aa right
right inverse
inverse and,
and, therefore,
Thus, (A
+
therefore, by
by uniqueness
uniqueness itit must
must be
be
R
R
R
A -R
+ A~
A -RRA
A - I =
A -R.
A -RRA
A =
= /,
I, i.e.,
i.e., that A~
A -R
the case that A~
+
= A~
. But this implies that A~
is
aa left
left inverse.
inverse. It
It then
then follows
follows from
from Theorem
Theorem 3.25
3.25 that
that A
A is
is invertible.
invertible. D
0

Example 3.27.
1. Let A =
2]:]R2
-+ E
]R1I.. Then A is onto. (Proof:
(Proo!' Take any a E
]R1I;
= [1
[1 2]
: E2 -
E
; then one
2
can
such that
rank
can always
always find
find vv eE E
]R2 such
that [1
[1 2][^]
2][ ~~] =
= a).
a). Obviously
Obviously A
A has
has full
full row
row rank
(= 1) and
A - RR =
_~]j is a right
(=1)
and A~
= [ _j
right inverse.
inverse. Also, it is clear that there are
are infinitely
infinitely many
A. In Chapter
right inverses for A.
Chapter 6 we characterize
characterize all right inverses of a matrix by
characterizing all
solutions of
the linear
linear matrix
matrix equation
equation AR
AR =
characterizing
all solutions
of the
= I.I.

27

Exercises

2.
LetA
~ ]R2.
Then A is 1-1.
The only
2. Let
A = [i]:]Rl
[J] : E1 ->
E2. ThenAis
1-1. (Proof
(Proof: The
only solution
solution toO
to 0 =
= Av
Av =
= [i]v
[I2]v
is
N(A) =
A is
that A
A has
has full
is vv = 0,
0, whence
whence A/"(A)
= 00 so
so A
is 1-1).
1-1). It
It is
is now
now obvious
obvious that
full column
column
L
rank (=1) and A~
A -L
=
= [3
[3 -1]
1] is a left inverse. Again, it is clear that there are
A. In
we characterize
infinitely
infinitely many
many left
left inverses
inverses for
for A.
In Chapter
Chapter 66 we
characterize all
all left
left inverses
inverses of
of aa
matrix
LA =
matrix by characterizing
characterizing all
all solutions
solutions of
of the
the linear
linear matrix
matrix equation
equation LA
= I.I.

3.
The matrix
3. The
matrix
A =

1 1
2 1
[ 3 1

when
onto.
give
when considered
considered as
as aa linear
linear transformation on
on IE]R3,\ isisneither
neither 1-1
1-1nor
nor
onto.We
We
give
below bases
bases for
four fundamental
below
for its
its four
fundamental subspaces.
subspaces.

EXERCISES
EXERCISES
3
1.
Let A
A =
consider A
A as a linear
linear transformation
transformation mapping E
]R3 to ]R2.
1. Let
= [[~8 5;3 i)
J4 and consider
E2.
Find
A with respect to
Find the
the matrix
matrix representation
representation of
of A
to the bases
bases

{[lHHU]}
of R3 and

{[il[~J}

of E .
nx
2. Consider
vector space
]Rnxn
]R, let
2.
Consider the
the vector
space R
" over
over E,
let S denote
denote the
the subspace
subspace of
of symmetric
symmetric
matrices,
R denote
matrices, and
and let
let 7
denote the
the subspace
subspace of
of skew-symmetric
skew-symmetric matrices.
matrices. For
For matrices
matrices
nx
]Rnxn
y) =
Y). Show that, with
X, Y
Y Ee E
" define their inner product by (X,
(X, Y)
= Tr(X
Tr(XTr F).
J. .
respect
this inner
inner product,
product, R
respect to
to this
'R, =
SS^.

3. Consider
, defined in Example 3.2.3. Is ,
,
Consider the differentiation
differentiation operator C
I-I?
1-1? IsIs
onto?
onto?
4.
Prove Theorem
Theorem 3.4.
4. Prove
3.4.

Chapter
3. Linear
Transformations
Chapters.
Linear Transformations

28
5. Prove Theorem 3.11.4.
3.Il.4.
Theorem 3.12.2.
6. Prove Theorem

7. Determine
Detennine bases for the four fundamental
fundamental subspaces of the matrix

A=[~2 5~ 5~ ~].
3
mxn
8. Suppose
xn has a left
left inverse. Show that ATT has a right inverse.
Suppose A Ee IR
Rm

9. Let
= [[~J o]. Determine
A/"(A) and
and 7(A).
Are they
equal? Is
general?
9.
Let A =
DetennineN(A)
R(A). Are
they equal?
Is this
this true
true in
in general?
If
If this is true in general, prove it; if not, provide a counterexample.
9x48
E Mg
1R~9X48.
linearly independent
independent solutions
10.
10. Suppose A
. How many linearly
solutions can be found to the
homogeneous
= 0?
Ax =
O?
homogeneous linear
linear system
system Ax
T
3.1 to illustrate the four fundamental subspaces associated
e
11. Modify Figure 3.1
associated with A
ATE
nxm
m
IR nxm thought of as a transformation from
from R
IR m to IRn.
R
R".

Chapter
Chapter 4
4

Introduction
to the
the
Introduction to
Moore-Penrose
Moore-Pen rose
Pseudoinverse
Pseudoinverse
In this
introduction to
generIn
this chapter
chapter we
we give
give aa brief
brief introduction
to the
the Moore-Penrose
Moore-Penrose pseudoinverse,
pseudoinverse, aa generalization of the inverse of a matrix. The Moore-Penrose pseudoinverse is defined for any
matrix and,
as is
is shown
in the
the following
text, brings
brings great
notational and
conceptual clarity
matrix
and, as
shown in
following text,
great notational
and conceptual
clarity
to
of solutions
solutions to
arbitrary systems
of linear
linear equations
equations and
and linear
linear least
to arbitrary
systems of
least squares
squares
to the
the study
study of
problems.
problems.

4.1
4.1

Definitions
Definitions and
and Characterizations
Characterizations

Consider aa linear
linear transformation
> y,y, where
whereX Xand
andY y arearearbitrary
arbitraryfinitefiniteConsider
transformation A
A :: X
X ---+
1
dimensional
N(A).l
dimensional vector spaces. Define
Define a transformation
transformation T
T :: Af(A)
- ---+
> R(A)
Tl(A) by
by
Tx = Ax for all x E NCA).l.

Then,
as noted
in the
3.17, T
T is
(1-1 and
and onto),
onto), and
Then, as
noted in
the proof
proof of
of Theorem
Theorem 3.17,
is bijective
bijective Cl-l
and hence
hence we
we
can define a unique inverse transformation TRCA) ---+
can
T~l 1 :: 7(A)
>NCA).l.
J\f(A}~L. This
Thistransformation
transformation
can
be used
to give
our first
first definition
A ++,, the
the Moore-Penrose
Moore-Penrose pseudoinverse
pseudoinverse of
of A.
can be
used to
give our
definition of
of A
A.
neither provides
provides nor suggests a good
computational strategy
Unfortunately, the definition neither
good computational
strategy
for
determining AA++..
for determining
Definition
A and
and T
as defined
defined above,
above, define
define aa transformation
transformation A
A++ : Y
---+ X
X by
Definition 4.1.
4.1. With
With A
T as
y
by

L
+
where
y =
= YI
y\ + Yz
j2 with
y\ eE 7(A)
yi eE Tl(A}
Then A
is the
where Y
with Yl
RCA) and
and Yz
RCA).l.. Then
A+ is
the Moore-Penrose
Moore-Penrose
pseudoinverse
A.
pseudoinverse of
of A.

Although X
X and
and Y
were arbitrary
vector spaces
let us
us henceforth
henceforth consider
consider the
the
Although
y were
arbitrary vector
spaces above,
above, let
1
X
X =W
~n and Y
lP1.mm.. We
We have thus defined A+
A + for all A
A Ee IR
lP1.;" xn.
case X
y =R
". A purely algebraic
characterization
A ++ is
is given
in the
the next
next theorem,
theorem, which
proved by
by Penrose
Penrose in
characterization of
of A
given in
which was
was proved
in 1955;
1955;
see
see [22].
[22].

29

30

Chapter 4.
Introduction to
to the
the Moore-Penrose
Moore-Penrose Pseudoinverse
Pseudoinverse
Chapter
4. Introduction

xn
Theorem
Let A
A Ee lR;"
A++ if
Theorem 4.2.
4.2. Let
R?xn.
. Then
Then G
G=
=A
if and
and only
only ifif

(Pl)
AGA =
A.
(PI) AGA
= A.

(P2) GAG
GAG = G.
(P2)
G.

(P3)
(P3) (AG)T
(AGf = AG.
AG.
(P4)
(P4) (GA)T
(GA)T == GA.
GA.

Furthermore,
A++ always
Furthermore, A
always exists
exists and
and is
is unique.
unique.

Note that
nonsingular matrix
matrix satisfies
Penrose properties.
Note
that the
the inverse
inverse of
of aa nonsingular
satisfies all
all four
four Penrose
properties. Also,
Also,
aa right
right or
or left
left inverse
inverse satisfies
satisfies no
no fewer
fewer than
than three
three of
of the
the four
four properties.
properties. Unfortunately,
Unfortunately, as
as
with
4.1, neither
its proof
with Definition
Definition 4.1,
neither the
the statement
statement of
of Theorem
Theorem 4.2
4.2 nor
nor its
proof suggests
suggests aa computacomputational
However, the
the great
providing aa
tional algorithm.
algorithm. However,
the Penrose
Penrose properties
properties do
do offer
offer the
great virtue
virtue of
of providing
checkable
the following
following sense.
that is
is aa candidate
checkable criterion
criterion in
in the
sense. Given
Given aa matrix
matrix G
G that
candidate for
for being
being
the
G
the pseudoinverse
pseudoinverse of
of A,
A, one
one need
need simply
simply verify
verify the
the four
four Penrose
Penrose conditions
conditions (P1)-(P4).
(P1)-(P4). If
If G
satisfies
all four,
must be
A++.. Such
often relatively
satisfies all
four, then
then by
by uniqueness,
uniqueness, it
it must
be A
Such aa verification
verification is
is often
relatively
straightforward.
straightforward.

[a

[!

+
Example
Verify directly
A+ =
Example 4.3.
4.3. Consider
Consider A
A == [']. Verify
directly that
that A
= [| ~]
f ] satisfies
satisfies (PI)-(P4).
(P1)-(P4).
L
A -L =
Note
Note that
that other
other left
left inverses
inverses (for
(for example,
example, A~
= [3
[3 - 1])
1]) satisfy
satisfy properties
properties (PI),
(PI), (P2),
(P2),
and
and (P4)
(P4) but
but not
not (P3).
(P3).

A++ is
given in
the following
Still
another characterization
Still another
characterization of
of A
is given
in the
following theorem,
theorem, whose
whose proof
proof
can
While not
this
can be
be found
found in
in [1,
[1, p.
p. 19].
19]. While
not generally
generally suitable
suitable for
for computer
computer implementation,
implementation, this
characterization can
can be
be useful
for hand
calculation of
of small
small examples.
examples.
characterization
useful for
hand calculation
xn
Theorem
Let A
A Ee lR;"
Theorem 4.4.
4.4. Let
Rxn.
. Then
Then

A+

= lim (AT A + 82 1) -I AT

(4.1)

= limAT(AAT +8 2 1)-1.

(4.2)

6--+0
6--+0

4.2
4.2

Examples
Examples

verified by
by using
the above
Each of
Each
of the
the following
following can
can be
be derived
derived or
or verified
using the
above definitions
definitions or
or characcharacterizations.
terizations.
T
Example
AT
(AATT)
A is
Example 4.5.
4.5. X
A+t == A
(AA
)~-I if
if A
is onto
onto (independent
(independent rows)
rows) (A
(A is
is right invertible).

Example 4.6.
A)-I AT
A is
invertible).
Example
4.6. A+
A+ =
= (AT
(AT A)~
AT if
if A
is 1-1
1-1 (independent
(independent columns)
columns) (A
(A is
is left
left invertible).
Example
Example 4.7.
4.7. For
For any
any scalar
scalar a,
a,
if a

t= 0,

if a =0.

4.3. Properties
Properties and
and Applications
4.3.
Applications

31
31

Example
jRn,
Example 4.8.
4.8. For
For any
any vector
vector v Ee M",
if v i= 0,
if v = O.

Example 4.9.
Example
4.9.

Example
4.10.
Example 4.10.

4.3
4.3

[~ ~

[~

=[

~l

r 1
I

=[

4
I

Properties and
and Applications
Properties
Applications

This section
miscellaneous useful
useful results
on pseudoinverses.
these
This
section presents
presents some
some miscellaneous
results on
pseudoinverses. Many
Many of
of these
are
are used
used in
in the
the text
text that
that follows.
follows.
mx
jRmxn"and
orthogonal
Theorem 4.11.
4.11. Let A Ee R
andsuppose
supposeUUEejRmxm,
Rmxm,VVEejRnxn
R n x "areare
orthogonal(M(Mis is
T
-11
orthogonal if
if MT
M = MM ).
Then
). Then
orthogonal

Proof: For
For the
simply verify
verify that
that the
the expression
expression above
above does
indeed satisfy
satisfy each
each cof
Proof:
the proof,
proof, simply
does indeed
the four
0
the
four Penrose
Penrose conditions.
conditions. D
nxn
Theorem
Let S
jRnxn be
with U
SU =
D, where
where U
and
Theorem 4.12.
4.12. Let
S Ee R
be symmetric
symmetric with
UTTSU
= D,
U is
is orthogonal
orthogonal an
+
+ TT
+
D
is diagonal.
diagonal. Then
Then S
S+ = U
D+U
where D
D+ is
is again
again a
a diagonal
diagonal matrix
matrix whose
whose diagonc
diagonal
D is
UD
U , , where
elements are
are determined
to Example
elements
determined according
according to
Example 4.7.
4.7.

Theorem 4.13.
A E
4.13. For
For all A
e jRmxn,
Rmxn,

1. A+

= (AT A)+ AT = AT (AA T)+.

2. (A T )+ = (A+{.

Proof: Both
results can
can be
proved using
the limit
limit characterization
characterization of
of Theorem
Theorem 4.4.
The
Proof:
Both results
be proved
using the
4.4. The
proof of
of the
the first
is not
particularly easy
easy and
and does
not even
even have
the virtue
virtue of
of being
being
proof
first result
result is
not particularly
does not
have the
especially illuminating.
illuminating. The
The interested
interested reader
reader can
can consult
consult the
proof in
in [1,
[1, p.
p. 27].
The
especially
the proof
27]. The
proof
of the
the second
second result
(which can
can also
also be
easily by
by verifying
the four
four Penrose
Penrose
proof of
result (which
be proved
proved easily
verifying the
conditions) is
is as
as follows:
follows:
conditions)
(A T )+ = lim (AA T
~--+O

+ 82 l)-IA

= lim [AT(AAT
~--+O

= [limAT(AAT

+ 82 l)-1{
+ 82 l)-1{

~--+O

= (A+{.

32

Chapter 4.
Introduction to
to the
the Moore-Penrose
Moore-Penrose Pseudo
Pseudoinverse
Chapter
4. Introduction
inverse

4.12 and 4.13


Note that by combining Theorems 4.12
4.13 we can,
can, in theory at least, compute
the Moore-Penrose pseudoinverse of any matrix (since AAT
A AT and AT
AT A are symmetric). This
e.g., [7],
[7], [II],
[11],
turns out to be a poor
poor approach in finite-precision arithmetic, however (see,
(see, e.g.,
[23]), and better methods are suggested in text that follows.
Theorem
Theorem 4.11
4.11 is suggestive of a "reverse-order" property for pseudoinverses of prodnets of
of matrices
such as
as exists
exists for
of products.
nroducts TTnfortnnatelv.
in general,
peneraK
ucts
matrices such
for inverses
inverses of
Unfortunately, in

As
example consider
[0 1J
B=
A =
= [0
I] and
and B
= [LI.
: J. Then
Then
As an
an example
consider A
(AB)+ = 1+ = I

while
while
B+ A+

= [~

[]

~J ~ = ~.

sufficient conditions under which the reverse-order


reverse-order property does
However, necessary and sufficient
hold are known and we quote a couple of moderately useful results for reference.
+ +
Theorem 4.14.
4.14. (AB)+
(AB)+ =
= B
B+ A
A + ifif and
and only
only if
if

1. n(BB T AT) ~ n(AT)


and
2. n(A T AB) ~ nCB) .

Proof: For the proof, see [9].


Proof:
[9].

0
D

+
Theorem 4.15.
= B?A+,
where BI
AB\B+.
4.15. (AB)
(AB)+ =
B{ Ai, where
BI =
= A+AB
A+ AB and
and A)
AI =
= ABIB{.

Proof: For the proof, see [5].


Proof:
[5].

0
D

n xr
r xm
lR~xr,
lR~xm,
A+.
Theorem 4.16.
4.16. If
If A eE R
eR
(AB)+ == B+
B+A+.
r , B E
r , then (AB)+
n xr
T
+
Proof' Since A Ee R
lR~xr,
A)-IlAAT,
A+
Proof:
A+ =
= (AT
(ATA)~
, whence A
AA = fIrr . Similarly,
Similarly, since
r , then A+
xm
+
T
T
+
B e
E W
lR;xm,
we
B+
BT(BBT)-I,
BB+
f
The
by
.
,
we
have
B
=
B
(BB
)~\
whence
BB
=
I
.
The
result
then
follows by
r
rr
taking
BIt =
= B,At
B, A\ =
=A
in Theorem
Theorem 4.15.
4.15. D
takingB
A in
0

The following theorem gives some additional useful properties


properties of pseudoinverses.
mxn
Theorem 4.17.
4.17. For
For all A E
e lR
Rmxn
,,

1. (A+)+ = A.
2. (AT A)+ = A+(A T)+, (AA T )+ = (A T)+ A+.
3. n(A+)

= n(A T) = n(A+ A) = n(A TA).

4. N(A+)

= N(AA+) =

5.

If A

NAA T)+)

is normal, then AkA+

= N(AA T) = N(A T).

A+ Ak and (Ak)+ = (A+)kforall integers k > O.

Exercises

33

xn
Note: Recall
Recall that
A eE R"
IRn xn
is normal
A ATT =
= A
ATTA.
A. For
For example,
example, if
if A
A is
is symmetric,
symmetric,
Note:
that A
is
normal if
if AA
then it
it is
is normal.
normal. However,
However, aa matrix
matrix can
can be
be none
none of
the
skew-symmetric,
skew-symmetric, or
or orthogonal,
orthogonal, then
of the
preceding but
but still
be normal,
normal, such
as
preceding
still be
such as

A=[ -ba ab]


for
scalars a,
E.
for scalars
a, b
b eE R
The next
next theorem
facilitating aa compact
and unifying
approach
The
theorem is
is fundamental
fundamental to
to facilitating
compact and
unifying approach
to studying
studying the
of solutions
solutions of
equations and
linear least
squares
to
the existence
existence of
of (matrix)
(matrix) linear
linear equations
and linear
least squares
problems.
problems.
nxp
MXm
IRnxp,
IRnxm.
Theorem 4.18. Suppose
Suppose A Ee R
, B Ee E
. Then
Then R(B)
K(B) cS; R(A)
U(A) if
if and
and only
only ifif
B.
AA+B == B.
m
Proof: Suppose
R(A) and
and take
arbitrary jc
x E
IRm.
RCA), so
so
Proof:
Suppose R(B)
K(B) cS; U(A)
take arbitrary
eR
. Then
Then Bx
Bx eE R(B)
H(B) cS; H(A),
p
there
exists aa vector
such that
= Bx.
have
there exists
vector yy Ee R
IRP such
that Ay
Ay =
Bx. Then
Then we
we have

Bx

= Ay = AA + Ay = AA + Bx,

where one
the Penrose
is used
arbitrary, we
where
one of
of the
Penrose properties
properties is
used above.
above. Since
Since xx was
was arbitrary,
we have
have shown
shown
that
B =
AA+ B.
that B
= AA+B.
+
To prove
prove the
converse, assume
assume that
that AA
AA +B
B = B
take arbitrary
arbitrary yy eE K(B).
R(B). Then
To
the converse,
B and
and take
Then
m
m
there
vector xx E
IR such
that Bx
Bx =
y, whereupon
whereupon
there exists
exists aa vector
eR
such that
= y,
0

y = Bx = AA+Bx E R(A).

EXERCISES
EXERCISES

U ;].1

1.
Use Theorem
4.4 to
to compute
pseudoinverse of
of \ 2
1. Use
Theorem 4.4
compute the
the pseudoinverse

T + T + T
x, Y
IRn, show that (xyT)+
2. If jc,
y eE R",
(xyT)+ == (x T(xx)+(yT
x) (yy)+
y) yx
yxT.
mxn
r
3. For
For A
A eE R
IRmxn,
prove that
that 7(A)
RCA) =
= 7(AA
R(AAT)
using only
only definitions
definitions and
and elementary
3.
, prove
) using
elementary
properties
Moore-Penrose pseudoinverse.
pseudoinverse.
of the
the Moore-Penrose
properties of
mxn
4. For A
A e
E R
IRmxn,
, prove that R(A+)
ft(A+) = R(A
ft(ATr).
pxn
mx
5. For A
A E
IRPxn
and BE
IRmxn,
thatN(A)
S; A/"(S)
N(B) if and
A = B.
eR
5 R
", show that
JV(A) C
and only if BA+
fiA+A
B.
xn
m
A G
E M"
IRn xn,
IRmmxm
xm and suppose further that D
6. Let A
, 5B eE JRn
E n xxm
, and D E E
D is nonsingular.
6.

(a) Prove
Prove or
or disprove
disprove that
that

[~

AB
D

(b)
(b) Prove
Prove or
or disprove
disprove that
that

[~

B
D

r
r=[
=[

A+

A+

-A+ABD- i
D- i

-A+BD- 1
D- i

].

This
page intentionally
intentionally left
left blank
blank
This page

Chapter
Chapter 5
5

Introduction to
Introduction
to the
the Singular
Singular
Value Decomposition
Value
Decomposition

In this
this chapter
chapter we
we give
give aa brief
brief introduction
introduction to
to the
the singular
value decomposition
decomposition (SVD).
(SVD). We
We
In
singular value
show that
matrix has
an SVD
SVD and
and describe
describe some
show
that every
every matrix
has an
some useful
useful properties
properties and
and applications
applications
of this
this important
important matrix
matrix factorization.
factorization. The
The SVD
plays aa key
key conceptual
and computational
of
SVD plays
conceptual and
computational
role throughout
throughout (numerical)
and its
applications.
role
(numerical) linear
linear algebra
algebra and
its applications.

5.1

The Fundamental Theorem


Theorem

xn
mxm
Theorem
5.1. Let A eE R
IR~xn.. Then there exist orthogonal matrices U E
IRmxm and
and
Theorem 5.1.
e R
nxn
nxn
V
V E IR
R such
such that
that

= U~VT,

(5.1)

rxr

IRrxr,, and a\
UI >
ur ) e
E R
diag(ul, ...
where
= [J
... ,,o>)
> >
> U
orr >
More
> 0.
O. More
where S
~ =
[~ 0], SS = diagfcri,
specifically, we have
specifically,

= [U I

U2) [

= Ulsvt

0
0

V IT
VT

][ ]

(5.2)

(5.3)

nxr
The submatrix sizes are all determined by r (which must be S
n}), i.e.,
i.e., UI
IRmxr,,
< min{m,
min{m, }),
U\ eE W
U2 eE ^x(m-r)
xr j yV22 E Rnxfo-r^
U2
IRrnx(m-rl,; Vi
VI eE RIRnxr,
IRnx(n-r),and
andthethe0-O-subblocks
inE~are
arecompatibly
compatibly
JM^/ocJb in
dimensioned.
dimensioned.

r
r
Proof: Since
AT A
(ATAAi is
symmetric and
and nonnegative
nonnegative definite;
recall, for
example,
Proof:
Since A
A >:::::00( A
s symmetric
definite; recall,
for example,
[24,
Ch. 6]),
eigenvalues are
are all
real and
and nonnegative.
nonnegative. (Note:
The rest
rest of
the proof
proof follows
[24, Ch.
6]), its
its eigenvalues
all real
(Note: The
of the
follows
analogously if
if we
we start
start with
with the
the observation
observation that
that A
AAT
analogously
A T :::::
> 00 and
and the
the details
detailsare
are left
left to
to the
the reader
reader
T
of eigenvalues
AT A
A by
by {U?,
with UI
as
an exercise.)
Denote the
the set
as an
exercise.) Denote
set of
eigenvalues of
of A
{of , i/ eE !!.}
n} with
a\ :::::
> ...
:::::
>U
arr >>
0 = Ur+1
o>+i =
= ...
= Un.
an. Let
Let {Vi,
{u, , ii Ee !!.}
n} be
be aa set
set of
of corresponding
corresponding orthonormal
orthonormal eigenvectors
eigenvectors
0=
and
V\ = [v\,
...,,Vvr r),] , V2Vi == [Vr+I,
[vr+\,...
. . .,V, vn n].].LettingS
Letting S =diag(uI,
diag(cri,...
. . .,u
, rcf),r),we
wecan
can
and let
let VI
[VI, ...
r
2
T
2
A TAVi
A VI =
= VI
S2.. Premultiplying
by vt
A TAVi
A VI = vt
VI S2 =
the latter
latter
write A
write
ViS
Premultiplying by
Vf gives
gives vt
Vf A
VfV^S
= S2,
S2, the
equality following
andpostmultiplying
postmultiplyingby
by
of the
the r;,
Vi vectors.
vectors. PrePre- and
equality
following from
from the
the orthonormality
orthonormality of
S-I
the emotion
equation
S~l gives
eives the

(5.4)

35

Chapter
to the
Chapter 5.
5. Introduction
Introduction to
the Singular
Singular Value
Value Decomposition
Decomposition

36
36

Turning now
to the
the eigenvalue
eigenvalue equations
equations corresponding
to the
the eigenvalues
eigenvalues ar+l,
or+\,...
. . . ,, a
Turning
now to
corresponding to
ann we
we
T
have that A
A TTAV
A V2z = VzO
=
0,
whence
Vi
A
T
A
V
=
O.
Thus,
A
V
=
O.
Now
define
the
V20
Vf A AV22
0.
AV2
0. Now
mx/
l
matrix VI
IRmxr
VI = AViS~
AViS-I.
Ui E
e M
" by U\
. Then from (5.4)
(5.4) we see
see that VrVI
UfU\ =
= /; i.e., the
77IX(
r)
columns
of
VI
are
orthonormal.
Choose
any
matrix
V2
E
IRmx(m-r)
such
that
[VI
columns
U\
orthonormal. Choose
U2 ^
~
[U\ V2]
U2] is
orthogonal. Then
T
V AV

=[
=[

VrAVI

Vr AVz

VIAVI

vI AVz

VrAVI

~]

vIA VI

since A
AV
V22 ==0.O. Referring
the equation
equation V
U\I == A
A VI
V\ S-I
S l defining
since
Referring to
to the
defining U\,
VI, we
we see
see that
that U{
V r AV\
A VI =
=
S and
and vI
1/2 AVi
= vI
U^UiS
= O.
0. The
The latter
latter equality
equality follows
follows from
from the
the orthogonality
orthogonality of
of the
S
A VI =
VI S =
the
V 2.. Thus, we see that, in fact, VT
A V = [~
~], and defining this matrix
columns of VI
U\ and
andU
UTAV
[Q Q],
to
S completes
completes the
to be
be ~
the proof.
proof. D
0
Definition
Definition 5.2.
5.2. Let A
A == V"i:.
t/E VT
VT be an SVD
SVD of
of A
A as in Theorem 5.1.
5.1.
1. The set {ai,
... , ar}} is called
called the set of
[a\,...,
of (nonzero) singular values
values of
of the matrix A and
iI T
proof of
A;'-(2 (AT
A) ==
is denoted ~(A).
(A). From the proof
of Theorem 5.1 we see that ai(A)
cr,(A) = A
(A A)
I

AtA.? (AA
(AATT).).

min{m, n}
Note that there are also min{m,
n] - r zero singular
singular values.

2. The columns
ofUV are called
called the left
singular vectors
orthonormal
columns of
left singular
vectors of
of A (and are the orthonormal
eigenvectors of
of AA
AATT).).
eigenvectors
3. The columns of
right singular
of V are called
called the right
singular vectors
vectors of
of A (and are the orthonormal
orthonormal
eigenvectors of
of AT
A1A).
A).
x
Remark
complex case in which A E
IC~ xn" is quite straightforward.
Remark 5.3.
5.3. The analogous
analogous complex
e C
straightforward.
H
The decomposition
A =
proof is essentially
decomposition is A
= V"i:.
t/E V
V H,, where V
U and V
V are unitary and the proof
identical, except for Hermitian transposes replacing transposes.

Remark
5.4. Note that V
Remark 5.4.
U and V can be
be interpreted
interpreted as changes
changes of basis in both the domain
domain
and co-domain
co-domain spaces
spaces with
respect to
has aa diagonal
diagonal matrix
matrix representation.
representation.
and
with respect
to which
which A
A then
then has
Specifically,
Specifically, let C,
C denote
denoteAAthought
thought of
ofasasaalinear
linear transformation
transformation mapping
mapping IRWn totoIRm.
W. Then
Then
T
rewriting A
A =
VT as
as AV
A V = V"i:.
Mat C
the bases
= V"i:.
U^V
U E we
we see
see that Mat
is
is "i:.
S with respect
respect to the
m (see
[v\,...,
for IR
R"n and
and {u
{u\,...,
for R
(see the
Section 3.2).
3.2). See
See also
also
{VI,
... , vn }} for
I, .. , u m
IRm
the discussion
discussion in
in Section
m]} for
Remark 5.16.
5.16.
Remark
Remark
decomposition is not unique.
Remark 5.5.
5.5. The !:ingular
singular value decomposition
unique. For example, an examination
of the proof
proof of Theorem
Theorem 5.1 reveals that
any orthonormal
orthonormal basis
basis for
for N(A)
jV(A) can
can be
be used
used for
for V2.
V2.
lny
there may
be nonuniqueness
nonuniqueness associated
the columns
V\ (and
(and hence
hence VI)
U\) corcor there
may be
associated with
with the
columns of
of VI
responding
to multiple
cr/'s.
responding to
multiple O'i'S.

37
37

5.1.
5.1. The
The Fundamental
Fundamental Theorem
Theorem

any U2
C/2can be used so long as [U
[U\I U2]
Ui] is orthogonal.
orthogonal.
U and V
V can be changed (in tandem) by sign (or multiplier of the form
form
columns of U
eejej8 in the
the complex case).
case).
What is unique, however, is the matrix I:
E and the span of the columns of UI,
U\, U2,
f/2, VI,
Vi, and
V
22 (see Theorem
Theorem 5.11). Note, too,
too,that
thataa"full
"full SVD"
SVD"(5.2)
(5.2)can
canalways
alwaysbe
beconstructed
constructedfrom
from
a "compact SVD"
SVD" (5.3).
(5.3).

Computing an SVD by working directly with the eigenproblem for A


ATT A
A or
Remark 5.6.
5.6. Computing
T
AA T is numerically poor in finite-precision arithmetic. Better algorithms exist that work
AA
directly on A via a sequence of orthogonal
orthogonal transformations;
transformations; see,
e.g., [7],
see, e.g.,
[7], [11],
[11],[25],
[25].
F/vamnlp
Example 5.7.

A -- [10 01] - U I UT,

2 x 2 orthogonal
orthogonal matrix, is an SVD.
where U
U is an arbitrary
arbitrary 2x2
5.8.
Example 5.8.
A _ [ 1

-~ ]

sin e
cose

cose
= [ - sine

J[~ ~J[

cose
sine

Sine]
-cose '

where e
0 is arbitrary, is an SVD.
Example 5.9.
5.9.
Example
I

A=U

-2y'5

-5-

y'5

n=[
[]
3
2

S-

2~

4y'5
15

][

3~
0

_y'5
-3-

0][
0
0

v'2
T
v'2
T

v'2
T
-v'2
-2-

3
2

3J2

[~ ~]

is an SVD.
MX
A e
E IR
Example 5.10.
5.10. Let A
R nxn
" be symmetric
symmetric and positive definite. Let V
V be an orthogonal
orthogonal
matrix of eigenvectors
A, i.e.,
AV =
A =
A VTT is an
eigenvectors that diagonalizes A,
i.e., VT
VT AV
=A >
> O.
0. Then A
= V
VAV
SVDof
A.
SVD of A.

A factorization UI:
VTr of
m x nn matrix A
A qualifies as an SVD if U
t/SV
o f aann m
U and V are
orthogonal and I:
is an m x n "diagonal" matrix whose diagonal elements in the upper
left comer
A = UI:V
A, then
corner are positive (and ordered). For example, if A
f/E VTT is an SVD of A,
r
r
T
T
VI:TU
V S C / i is
s aan
n SSVD
V D ooff AT.
A .

38
38

Chapter
Introduction to
the Singular
Decomposition
Chapter 5.
5. Introduction
to the
Singular Value
Value Decomposition

5.2
5.2

Some
Some Basic
Basic Properties
Properties

mxn
Theorem 5.11.
Let A
A Ee R
jRrnxn
have
singular value
value decomposition
A =
VTT.. Using
Theorem
5.11. Let
have aa singular
decomposition A
= U'
VLV
Using
the notation
the following
hold:
the
notation of
of Theorem
Theorem 5.1,
5.1, the
following properties
properties hold:

1.
A.
1. rank(A)
rank(A) =
= rr == the
the number
number of
of nonzero
nonzero singular
singular values
values of
of A.
2. Let
Let U
V =.
= [HI,
[UI, ....
and V
A has
has the
the dyadic
dyadic (or
2.
. . ,, uurn]
V =
= [VI,
[v\,...
..., , vvnn].]. Then
Then A
(or outer
outer
m] and
product) expansion
product)
expansion
r

A = Laiuiv;.

(5.5)

i=1

3. The
singular vectors
vectors satisfy
satisfy the
the relations
relations
3.
The singular
AVi

= ajui,

AT Uj = aivi

for i E

(5.6)
(5.7)

r.

4. LetUI = [UI, ... , u r ], U2 = [Ur+I, ... , urn], VI = [VI, ... , vr ], andV2 = [Vr+I, ... , Vn].
Then
(a) R(VI) = R(A) = N(A T / .
(b) R(U2) = R(A)1- = N(A T ).
(c) R(VI)

= N(A)1- = R(A T ).

(d) R(V2)

= N(A) =

R(AT)1-.

Remark
5.12. Part
Part 4
4 of
theorem provides
provides aa numerically
numerically superior
superior method
method for
Remark 5.12.
of the
the above
above theorem
for
finding
bases for
four fundamental
to methods
finding (orthonormal)
(orthonormal) bases
for the
the four
fundamental subspaces
subspaces compared
compared to
methods based
based
column echelon
echelon form.
form. Note
Note that
that each
each subspace
on, for
example, reduction
reduction to
row or
on,
for example,
to row
or column
subspace requires
requires
knowledge
of the
The relationship
subspaces is
is summarized
summarized
knowledge of
the rank
rank r.
r. The
relationship to
to the
the four
four fundamental
fundamental subspaces
nicely in
Figure 5.1.
nicely
in Figure
5.1.
Remark 5.13.
5.13. The
the dyadic
decomposition (5.5)
as aa sum
of outer
outer products
Remark
The elegance
elegance of
of the
dyadic decomposition
(5.5) as
sum of
products
SVD
and the
key vector
vector relations
relations (5.6)
explain why
why it
conventional to
to write
the SVD
and
the key
(5.6) and
and (5.7)
(5.7) explain
it is
is conventional
write the
as
= U'V
UZVTT rather
say, A
= U,V.
UZV.
as A
A =
rather than,
than, say,
A =
mx
Theorem
Let A
A E
jRmxn
singular value
value decomposition
A =
in
Theorem 5.14. Let
e E
" have
have aa singular
decomposition A
= U,V
UHVTT as
as in
Theorem
5.1. Then
Then
TheoremS.].

(5.8)

where
where

39
39

5.2.
Some Basic
Properties
5.2. Some
Basic Properties

E9 {O}

{O)<!l

n-r

m-r

Figure 5.1.
and the
subspaces.
Figure
5.1. SVD
SVD and
the four
four fundamental
fundamental subspaces.
with
Q-subblocks appropriately
U and
we let
let the
the columns
columns of
of U
and V
V
with the
the O-subblocks
appropriately sized.
sized. Furthermore,
Furthermore, ifif we
be
as defined
then
be as
defined in
in Theorem
Theorem 5.11,
5.11, then

-v;u;,

(5.10)

;=1 U;

Proof' The
proof follows
follows easily
easily by
by verifying
verifying the
the four
Penrose conditions.
conditions.
Proof:
The proof
four Penrose

0
D

+
Remark
expressions above
an SVD
SVD of
Remark 5.15.
5.15. Note
Note that
that none
none of
of the
the expressions
above quite
quite qualifies
qualifies as
as an
of AA+
if
insist that
singular values
ordered from
smallest. However,
However, aa simple
simple
if we
we insist
that the
the singular
values be
be ordered
from largest
largest to
to smallest.
reordering
reordering accomplishes
accomplishes the
the task:
task:

(5.11)

This
also be
identity matrix
matrix
This can
can also
be written
written in
in matrix
matrix terms
terms by
by using
using the
the so-called
so-called reverse-order
reverse-order identity
(or
..., , e^,
symmetric.
(or exchange
exchange matrix)
matrix) P
P =
= \e
[e rr,,eer-I,
e2, e\\,
ed, which
which is
is clearly
clearly orthogonal
orthogonal and
and symmetric.
r^\, ...

40
40

Chapters.
Introduction to
to the
Singular Value
Decomposition
Chapter
5. Introduction
the Singular
Value Decomposition

Then
Then
A+

= (VI p)(PS-1 p)(PVr)

is the
the matrix
matrix version
version of
of (5.11).
A "full
be similarly
similarly constructed.
is
(5.11). A
"full SVD"
SVD" can
can be
constructed.

Remark 5.16.
5.16. Recall
Recall the
the linear
linear transformation
transformation T
used in
in the
the proof
proof of
of Theorem
Theorem 3.17
and
Remark
T used
3.17 and
is determined
determined by
by its
its action
action on
on aa basis,
basis, and
and since
in Definition
Definition 4.1.
4.1. Since
in
Since T
T is
since ({VI,
v \ , ...
. . .,,vvr r}}isisaa
basisforN(A).l,
then TT can
can be
be defined
defined by
by TVj
u rr}}
basis
forJ\f(A), then
TV; == OjUj
cr, w,, ,i / E~.
e r. Similarly,
Similarly, since
since {UI,
[u\,...
... , ,u
is a basis forR(A), then
then TcanbedefinedbyT-Iu;
= tv;
From Section
Section 3.2,
the
isabasisfor7(.4),
T~lI can
be defined by T^'M, =
^-u, ,i
, / eE~.
r. From
3.2, the
with respect
respect to
to the
the bases
bases {{VI,
and {u
clearly
matrix representation
representation for
matrix
for T
T with
v \ , ...
..., , vvrr}} and
{MII,, ...
. . . ,, uurr]} is
is clearly
with respect
respect to
to
S,
while the
the matrix
matrix representation
representation for
the inverse
linear transformation
transformation TS, while
for the
inverse linear
T~lI with
1
the same
bases is
is 5""
S-I..
the
same bases

5.3
5.3

Rowand
Column Compressions
Row and Column
Compressions

Row compression
Let A
A E
E lR.
have an
by (5.1).
(5.1). Then
Let
Rmxn have
an SVD
SVD given
given by
Then

VT A = :EVT
=

[~ ~ ] [ ~i

-- [ SVr
0 ]

mxn
E lR.
.

rx
Notice that
that M(A)
N(A) =
and the
the matrix
matrix SVf
SVr Ee R
lR. rxll" has
has full
row
Notice
- N(V
M(UT T A) = N(svr>
A/"(SV,r) and
full row
T
other words,
words, premultiplication
premultiplication of
of A
A by
by VT
is an
an orthogonal
orthogonal transformation
transformation that
that
rank. In
rank.
In other
U is
A by
by row
row transformations.
transformations. Such
row compression
compression can
can also
also be
be accomplished
"compresses"
"compresses" A
Such aa row
accomplished
by orthogonal
orthogonal row
row transformations
transformations performed
performed directly
directly on
A to
to reduce
reduce it
it to
to the
the form
form [~],
by
on A
0 ,
where R
R is
is upper
upper triangular.
triangular. Both
Both compressions
compressions are
are analogous
analogous to
to the
the so-called
where
so-called row-reduced
row-reduced
echelon form
form which,
which, when
when derived
by aa Gaussian
Gaussian elimination
elimination algorithm
implemented in
in
echelon
derived by
algorithm implemented
finite-precision arithmetic,
arithmetic, is
is not
not generally
generally as
as reliable
reliable aa procedure.
finite-precision
procedure.
D _

Column compression
compression
Column
Again,
SVD given
Then
Again, let
let A
A eE R
lR.mxn have
have an
an SVD
given by
by (5.1).
(5.1). Then
AV = V:E

[VI

U2]

[~ ~

=[VIS 0] ElR.mxn.
mxr
This time,
time, notice
notice that
that H(A)
R(A) = K(AV)
R(A V) = R(UI
S) and
and the
the matrix
matrix UiS
VI S eE R
lR. m xr has
has full
This
K(UiS)
full
In other
other words,
words, postmultiplication
postmultiplication of
of A
A by
by V
is an
transformation
column rank.
rank. In
column
V is
an orthogonal
orthogonal transformation
A by
by column
I;olumn transformations.
transformations. Such
compression is
is analogous
to the
the
that "compresses"
"compresses" A
Such aa compression
analogous to

Exercises
Exercises

41
41

so-called column-reduced
column-reduced echelon
echelon form,
form, which
which is
not generally
generally aa reliable
reliable procedure
procedure when
when
so-called
is not
performed by
by Gauss
transformations in
in finite-precision
For details,
see, for
for
performed
Gauss transformations
finite-precision arithmetic.
arithmetic. For
details, see,
example, [7],
[7], [11],
[11],[23],
[23],[25].
[25].

EXERCISES
EXERCISES
mx
T
1. Let X E
IRmxn.
XT
= 0, show that X == 0.
o.
M
". If
If X
XX =
T
2. Prove
Prove Theorem
Theorem 5.1
5.1 starting
starting from
the observation
that AA
AAT
~ 0.
O.
2.
from the
observation that
>
xn
A eE E"
IRnxn
indefinite. Determine an
SVD of A.
3. Let A
be symmetric but indefinite.
an SVD
A.
m
n
4.
IRm,
~n be nonzero vectors. Determine
Determine an SVD of
A E
~~ xn
4. Let x eE R
, yy eE R
of the matrix A
e R
defined by
by A
A = xyT.
defined
xyT.

Determine SVDs
the matrices
matrices
5. Determine
SVDs of
of the
5.
(a)
(b)

[
]
[ ~l
-1
0

-1

mxn
nxn
mxm and
6. Let
Let A
A e
E R
~mxn and
E IRmxm
and Y
~nxn are
are orthogonal.
and suppose W
W eR
7 eE R

(a)
that A and
and WAY
(a) Show
Show that
W A F have
have the
the same
same singular
singular values
values (and
(and hence
hence the
the same
same rank).
rank).
(b) Suppose that W
Wand
Yare
A
and Y
are nonsingular but not necessarily orthogonal. Do A
and
have the
the same
they have
have the
the same
same rank?
rank?
and WAY
WAY have
same singular
singular values?
values? Do
Do they
XM
Let A
E R"
~~xn. . Use
Use the
the SVD
to determine
factorization of
of A, i.e.,
i.e., AA== QQP
P
7.
7. Let
SVD to
determine aa polar factorization
where Q
Q is
is orthogonal
orthogonal and
and P
p TT >
> 0.
O. Note:
Note: this
this is
is analogous
to the
the polar
polar form
form
where
P = P
analogous to
iO
zz = rel&
ofa
of a complex
complex scalar
scalar zz (where
(where ii = jj = V^T).
J=I).

This
page intentionally
intentionally left
left blank
blank
This page

Chapter 6
6
Chapter

Linear Equations
Equations
Linear

In this
this chapter
we examine
uniqueness of
In
chapter we
examine existence
existence and
and uniqueness
of solutions
solutions of
of systems
systems of
of linear
linear
equations.
the form
equations. General
General linear
linear systems
systems of
of the
form
(6.1)

are
special case,
case, the
the familiar
familiar vector
vector system
are studied
studied and
and include,
include, as
as aa special
system
Ax = b; A

6.1
6.1

E ]Rn xn,

E ]Rn.

(6.2)

Vector
Equations
Vector Linear
Linear Equations

We begin
review of
principal results
We
begin with
with aa review
of some
some of
of the
the principal
results associated
associated with
with vector
vector linear
linear systems.
systems.
Theorem 6.1.
system of
Theorem
6.1. Consider
Consider the
the system
of linear
linear equations
equations

Ax = b; A

E lRmxn,

E lRm.

(6.3)

1.
solution to
b E R(A).
1. There
There exists
exists aa solution
to (6.3)
(6.3) if
if and
and only
only ififbeH(A).
m, i.e.,
2.
for all b Ee lR
2. There exists a solution to (6.3)
(6.3} for
Rmm if
if and only
only ifR(A)
ifU(A) == lR
W",
i.e., A is
onto; equivalently,
equivalently, there
there exists
exists aa solution
if and
and only
only ifrank([A,
j/"rank([A, b])
b]) =
= rank(A),
and
rank(A), and
onto;
solution if
this
is possible
only ifm
ifm ::::
< nn (since
(since m
m = dim
dimT^(A)
= rank(A)
< min{m,
min{m, nn}).
this is
possible only
R(A) =
rank(A) ::::

n.

3.
A solution
solution to
N(A) =
A is
3. A
to (6.3)
(6.3) is
is unique
unique if
if and
and only
only if
ifJ\f(A)
= 0,
0, i.e.,
i.e., A
is 1-1.
1-1.
4.
for all
4. There
There exists
exists aa unique
unique solution to
to (6.3)
(6.3) for
all bb Ee ]Rm
W" if
if and
and only
only if
if A is
is nonsingular;
nonsingular;
mxm
equivalently, A
Mmxm
and A
has neither
singular value
nor aa 0 eigenvalue.
eigenvalue.
equivalently,
A EG lR
and
A has
neither aa 0 singular
value nor
1
5.
at most
for all
if
of
5. There
There exists
exists at
most one
one solution
solution to
to (6.3)
(6.3) for
all bb Ee lR
Wm
if and
and only
only if
if the
the columns
columns of
A
n.
A are
are linearly
linearly independent,
independent, i.e.,
i.e., A/"(A)
N(A) = 0,
0, and
and this
this is
is possible
possible only
only ifm
ifm >
::: n.

6.
Ax =
6. There exists a nontrivial solution to the homogeneous system Ax
= 0
0 if
if and only
only ifif
rank(A)
< n.
rank(A) <
n.
43

44

Chapter
Linear Equations
Chapter 6.
6. Linear
Equations

Proof: The
The proofs
proofs are
are straightforward
straightforward and
and can
can be
be consulted
consulted in
in standard
standard texts
texts on
on linear
Proof:
linear
algebra. Note
Note that
that some
parts of
of the
the theorem
theorem follow
follow directly
directly from
from others.
others. For
example, to
to
algebra.
some parts
For example,
prove part
part 6,
note that
that xx = 0
0 is
is always
to the
the homogeneous
homogeneous system.
Therefore, we
we
prove
6, note
always aa solution
solution to
system. Therefore,
must have
have the
the case
case of
of aa nonunique
nonunique solution,
A is
not I-I,
which implies
implies rank(A)
rank(A) <
< n
n
must
solution, i.e.,
i.e., A
is not
1-1, which
by part
part 3.
0
by
D

6.2
6.2

Matrix Linear Equations

In
some of
and uniqueness
In this
this section
section we
we present
present some
of the
the principal
principal results
results concerning
concerning existence
existence and
uniqueness
of
solutions to
to the
the general
general matrix
matrix linear
linear system
(6.1). Note
Note that
that the
the results
results of
of solutions
system (6.1).
of Theorem
Theorem
6.1
follow from
those below
below for
the special
case k =
= 1,1, while
while results
results for
(6.2) follow
6.1 follow
from those
for the
special case
for (6.2)
follow by
by
specializing even
even further
= n.
n.
specializing
further to
to the
the case
case m
m=
Theorem 6.2
6.2 (Existence).
equation
Theorem
(Existence). The
The matrix
matrix linear
linear equation

AX = B; A

E JR. mxn ,

BE

JR.mxk,

(6.4)

has
has aa solution
solution ifif and
and only
only ifl^(B)
ifR(B) C
S; 7(A);
R(A); equivalently,
equivalently, aa solution
solution exists
exists ifif and
and only
only ifif
+
AA
B = B.
AA+B
B.
Proof:
follows essentially
range
Proof: The
The subspace
subspace inclusion
inclusion criterion
criterion follows
essentially from
from the
the definition
definition of
of the
the range
of aa matrix.
criterion is
is Theorem
of
matrix. The
The matrix
matrix criterion
Theorem 4.18.
4.18.
0
mxn
mxk
+
Theorem 6.3.
6.3. Let A eE R
JR.mxn,, B E
JR.mxk and suppose that AA +B
B =
Theorem
eR
= B. Then any matrix
of
form
of the
the form
X = A+ B + (/ - A+ A)Y, where Y E JR.nxk is arbitrary,
(6.5)

is
is aa solution
solution of
of
AX=B.

(6.6)

Furthermore,
all solutions
of (6.6)
(6.6) are
of this
form.
Furthermore, all
solutions of
are of
this form.
Proof: To
To verify
verify that
that (6.5)
(6.5) is
is aa solution,
premultiply by
by A:
Proof:
solution, premultiply
A:
AX

= AA+ B + A(I = B

+ (A -

A+ A)Y

AA+ A)Y by hypothesis

= B since AA + A = A by the first Penrose condition.

That all
solutions arc
of this
seen as
follows. Let
Let Z
arbitrary solution
That
all solutions
are of
this form
form can
can be
be seen
as follows.
Z be
be an
an arbitrary
solution of
of
(6.6).
i.e .. AZ
AZ ::::
B. Then
Then we
we can
can write
write
(6.6), i.e.,
B.

Z=A+AZ+(I-A+A)Z
=A+B+(I-A+A)Z

and
(6.5).
and this
this is
is clearly
clearly of
of the
the form
form (6.5).

6.2. Matrix
Matrix Linear
Linear Equations
Equations
6.2.

45

+
+
Remark
A is square and nonsingular, A
A+
= A"
A-I1 and so (I
A+
A) = O.
Remark 6.4. When A
(/ - A
A)
0. Thus,
1
=
A-I
B.
there is no "arbitrary" component, leaving only the unique solution X
X = A~ B.

Remark
Remark 6.5.
6.5. It
It can be shown that the particular
particular solution X = A++BB is the solution of (6.6)
(6.6)
7
that minimizes
minimizes TrXT
TrX X. (TrO
(Tr(-) denotes
denotes the
the trace
of aa matrix;
that TrXT
TrX r X =
= \
jcj.)
trace of
matrix; recall
recall that
Li,jxlj.)
that

Theorem
6.6 (Uniqueness).
(Uniqueness). A
of the
the matrix
linear equation
equation
Theorem 6.6
A solution
solution of
matrix linear
AX

= B;

A E lR,mxn, BE lR,mxk

(6,7)

and only
only if
A ++AA =
I; equivalently,
has aa unique
and only
only if
if
is unique
unique if
if and
if A
= /;
equivalently, (6.7)
(6.7) has
unique solution
solution if
if and
M(A)
= 0.
N(A) =
O.
Proof: The first equivalence is immediate from Theorem 6.3. The second follows by noting
Proof:
thatA+
A =
rank(A) (recallr
n), But
Butrank(A)
that A+A
= I/ can occur only ifr
if r =
n,
n, wherer
where r = rank(A)
(recall r :::
< h).
rank(A) =
= nn
if and
and only
if A
is I-lor
1-1 or _/V(A)
0. D
if
only if
A is
N(A) =
= O.
0

Example
A Ee lR,nxn.
Ax =
Example 6.7.
6.7. Suppose A
E"x". Find all solutions of the homogeneous system Ax
0,
0.
Solution:
x=A+O+(I-A+A)y
= (I-A+A)y,
+
where yy eE lR,n
A+
t= I,I.
R" is arbitrary. Hence, there exists a nonzero solution if and only if A
AA /=
rank(A)
A being singular. Clearly, if there exists a
This is equivalent to either rank
(A) =
= r <
< n or A
unique,
nonzero solution, it is not unique.
Computation: Since yy is arbitrary, it is easy to see that all solutions are generated
from a basis for 7(7
R(I - A
A ++ A).
A). But if A
A has an SVD given by A
A =
= U
f/Eh VT,
VT, then it is easily
r
checked that 1/ - A+A
V2V
and R(Vz
U(V2V^) =
= R(Vz)
K(V2) =
=N(A),
N(A).
A+ A = Vz
V[
2 and

vD

Example
A Ee lR,mxn;
Example 6.S.
6.8. Characterize
Characterize all right inverses of a matrix A
]Rmx"; equivalently, find all
AR =
solutions R of the equation AR
= 1Imm., Here, we write 1m
Im to emphasize the m x m identity
matrix,
matrix.
Solution: There exists a right inverse if and only if R(Im)
R(A) and this is
7(/m) S;
c 7(A)
equivalent to
Im. Clearly,
Clearly, this
can occur
occur if
if and
only if
if rank(A)
rank(A) =
= rr = m
(since
AA +
+I1m
this can
and only
m (since
equivalent
to AA
m = 1m.
+
rr :::
is then a right inverse). All right inverses
< m)
m) and this is equivalent to A
A being onto (A
(A+
of
A are then of the form
of A
R = A+ 1m

+ (In

- A+ A)Y

=A++(I-A+A)Y,
+
A+
A = I/
where Y Ee lR,nxm
E"xm is arbitrary,
arbitrary. There is a unique right inverse if and
and only if A
A
1
(N(A)
= 0), in which case A
A must be invertible and R
R = A-I.
(AA(A) =
A" .

Example 6.9.
6.9. Consider
Consider the system of linear first-order difference equations
Example
(6,8)

46
46

Equations
Chapter 6. Linear Equations

nxmxm
IR nxxn" and B E IR
(n(rc>l,ra>l).
~ I, m ~ I). The vector Jt*
Xk in linear system theory is
with A Ee R"
fieR"
at time
time k while
while Uk
is the
the input
(control) vector.
known as
as the
known
the state
state vector
vector at
Uk is
input (control)
vector. The
The general
general
solution
solution of
of (6.8)
(6.8) is
is given
given by
by

k-J
Xk

= Akxo

+ LAk-J-j BUj

(6.9)

j=O

~Axo+[B.AB ... A

Uk-J ]
Uk-2

k-J

~o

B]

(6.10)

for kk >
1. We
the question:
question: Given
Given XQ
0, does
does there
exist an
sequence
for
~ 1.
We might
might now
now ask
ask the
Xo = 0,
there exist
an input
input sequence
k
{uj
x^
va in
[Uj
}}y~Q
jj^ such
such that
takes an
an arbitrary
arbitrary value
W ? In
In linear
linear system
system theory,
is aa question
{u j 1
~:b
that Xk
Xk takes
value
in 1R"?
theory, this
this is
question
of
of reacbability.
reachability. Since
Since m ~
> I,
1, from
from the
the fundamental
fundamental Existence
Existence Theorem,
Theorem, Theorem 6.2, we
see that (6.8) is reachable if and only if
if
R([ B, AB, ... , A n - J B]) = 1R"

or,
equivalently, if
if
or, equivalently,
if and
and only
only if
rank [B, AB, ... , A n - J B]

= n.

A related
related question
question is
is the
the following:
following: Given
Given an
an arbitrary
arbitrary initial
initial vector
vector XQ,
does there
there exexA
Xo, does
j
such
ist an
an input
input sequence
sequence {u
{"y}"~o
such that
that xXnn =
= O?
0? In linear
linear system
system theory,
theory, this
this is
is called
called
controllability.
if
controllability. Again from Theorem
Theorem 6.2, we see that (6.8) is controllable if and only if

l'/:b

Clearly, reachability always implies controllability and, if A


A is nonsingular, control1
lability and
and reachability
are equivalent.
equivalent. The
The matrices
= [~
[ ~]
andB5 == [~]
f ^ 1provide
providean
an
A =
lability
reachability are
matrices A
Q1and
example
example of
of aa system
system that
that is
is controllable
controllable but
but not
not reachable.
reachable.
The
standard conditions
conditions with
analogues for
continuous-time models
The above are standard
with analogues
for continuous-time
models (i.e.,
(i.e.,
linear
linear differential
differential equations).
equations). There
There are
are many
many other
other algebraically
algebraically equivalent
equivalent conditions.
conditions.

Example
We now
now introduce
Example 6.10.
6.10. We
introduce an
an output
output vector
vector Yk
yk to
to the
the system
system (6.8)
(6.8) of
of Example
Example 6.9
6.9
by
the equation
by appending
appending the
equation
(6.11)
pxn
E
IR Pxn
e R

pxm
E
IR Pxm
R

with C
and
(p
pose some
the
with
and D
(p ~
> 1).
1). We
We can
can then
then pose
some new
new questions
questions about
about the
overall system
that are
are dual
to reachability
reachability and
and controllability.
overall
system that
dual in
in the
the system-theoretic
system-theoretic sense
sense to
controllability.
The
The answers
answers are
are cast
cast in
in terms
terms that
that are
are dual
dual in
in the
the linear
linear algebra
algebra sense
sense as
as well.
well. The
The condition
condition
dual
reachability is
knowledge of
l';:b
dual to
to reachability
is called
called observability:
observability: When
When does
does knowledge
of {u
{"7j r/:b
}"!Q and
and {Yj
{y_/}"~o
suffice
to determine
xo?
As aa dual
we have
have the
of
suffice to
determine (uniquely)
(uniquely) Jt
dual to
to controllability,
controllability, we
the notion
notion of
0? As
reconstructibility:
When does
knowledge of
r/:b and
and {;y/}"Io
{YJ lj:b suffice
to determine
reconstructibility: When
does knowledge
of {u
{wjy }"~Q
suffice to
determine
result from
theory is
the following:
following:
(uniquely) xxn?
The fundamental
fundamental duality
duality result
from linear
linear system
system theory
is the
(uniquely)
nl The

(A.
[controllablcl if
(AT,T. B TT)] is observable
observable [reconsrrucrible]
(A, B)
B) iJ
is reachable [controllable]
if and
and only if
if(A
[reconstructive].

6.4
Inverses
6.4 Some
Some Useful
Useful and
and Interesting
Interesting Inverses

47

To
To derive
derive aa condition
condition for
for observability,
observability, notice
notice that
that

k-l

Yk = CAkxo

+L

CAk-1-j BUj

+ DUk.

(6.12)

j=O

Thus,
Thus,

Yo - Duo
Yl - CBuo - Du]
(6.13)

Yn-] -

Lj:~ CA n - 2 -j BUj - DUn-l

Let
denote the
the (known)
(known) vector
vector on
on the
the left-hand
of (6.13)
(6.13) and
denote the
the matrix
on
Let v denote
left-hand side
side of
and let
let R denote
matrix on
the
By the
fundamental
the right-hand
right-hand side.
side. Then,
Then, by
by definition,
definition, v Ee R(R),
Tl(R), so
so aa solution
solution exists.
exists. By
the fundamental
Uniqueness Theorem,
Theorem, Theorem
Theorem 6.6,
Uniqueness
6.6,the
thesolution
solutionisisthen
thenunique
uniqueififand
andonly
onlyififN(R)
N(R) ==0,0,
or,
if
or, equivalently,
equivalently, if
if and
and only
only if

6.3
6.3

A More
Equation
A
More General
General Matrix
Matrix Linear
Linear Equation

mxn
mxq
q , and C E jRpxq.
Theorem 6.11. Let A Ee R
jRmxn,
B Ee R
jRmx
,B
, and C e Rpxti. Then the
the equation

AXC=B

(6.14)

+
+
has
AA +
BC+C
B, in
case the
general solution
solution is
the
has aa solution
solution if
if and
and only
only if
if AA
BC
C =
= B,
in which
which case
the general
is of
of the
form
(6.15)
n p
jRnxp
where Y E R
* is arbitrary.
arbitrary.

A
the notion
A compact
compact matrix
matrix criterion
criterion for
for uniqueness
uniqueness of
of solutions
solutions to
to (6.14)
(6.14) requires
requires the
notion
+
of
the
Kronecker
product
of
matrices
for
its
statement.
Such
a
criterion
(C
C+

A
of the Kronecker product of matrices for its statement. Such a criterion (CC <g) A++AA =I)I)
is
proved in
is stated
stated and
and proved
in Theorem
Theorem 13.27.
13.27.

6.4
6.4

Some
Some Useful
Useful and
and Interesting
Interesting Inverses
Inverses

In
interest are
nonsingular. Listed
In many
many applications,
applications, the
the coefficient
coefficient matrices
matrices of
of interest
are square
square and
and nonsingular.
Listed
below is
useful matrix
block matrices,
below
is aa small
small collection
collection of
of useful
matrix identities,
identities, particularly
particularly for
for block
matrices, asasnxn
nxm
jRnxn,, B E
jRnxm,, C E
sociated
sociated with matrix inverses. In these identities, A Ee R
ER
e jRmxn,
Rmxn,
mxm
and D
D
E E
jRm xm.. Invertibility
Invertibility is
is assumed
assumed for
for any
any component
or subblock
subblock whose
whose inverse
inverse is
is
and
component or
indicated.
Verification of
of each
identity is
recommended as
indicated. Verification
each identity
is recommended
as an
an exercise
exercise for
for the
the reader.
reader.

48

Chapter 6.
6. Linear
Linear Equations
Equations
Chapter
1.
A-Il -- A-IB(D1. (A
(A + BDC)-I
BDCr1 == A~
A~lB(D~lI + CA-IB)-ICA-I.
CA~lB)~[CA~l.
This
the Sherman-Morrison-Woodbury
many
This result
result is
is known
known as
as the
Sherman-Morrison-Woodbury formula.
formula. It
It has
has many
applications
(and
is
frequently
"rediscovered")
including,
for
example,
formulas
applications (and is frequently "rediscovered") including, for example, formulas for
for
the
matrices such
the inverse
inverse of
of aa sum
sum of
of matrices
such as
as (A
(A +
+ D)-lor
D)"1 or (A-I
(A"1 +
+ D-I)-I.
D"1) . It
It also
also
as
yields
yields very
very efficient
efficient "updating"
"updating" or
or "downdating"
"downdating" formulas
formulas in
in expressions
expressions such
such as
1
x
(A
xx TT )) -I
(with
A E
lRnxn
x Ee E")
lRn) that
that arise
in optimization
(A +
+ JUT
(with symmetric
symmetric A
e R"
" and
and ;c
arise in
optimization
theory.
theory.

2.

[~ ~

3.

[~ !/

4.

= [

r [~ -~ l [~ ~/ r [~ -~ 1
l

=
=
Both
of
these
matrices
satisfy
the
matrix
equation
which it
Both of these matrices satisfy the matrix equation X2
X^ == /I from
from which
it is
is obvious
obvious
l
that
X-I
=
X.
Note
that
the
positions
of
the
/
and
/
blocks
may
be
exchanged.
that X~
X. Note that the positions of the / and / blocks may be exchanged.

r A~I
[~ ~ r [-D~I~A-I D~I 1
~r
~~B 1
r
l
[~ ~ r [-D~CF
+-~~I~;BD-I l
[~ ~

-A-I BD- I ]
D- I

= [

5.

BC
6. [ / +c

7.

[~ ~

= [!C

= [ A-I

+_~~!~CA-I -A~BE

where
the inverse
where E
E == (D
(D - CACA I B)-I
B) (E
(E is
is the
inverse of
of the
the Schur
Schur complement
complement of
of A).
A). This
This
result
property 16
1.4.
result follows
follows easily
easily from
from the
the block
block LU
LU factorization
factorization in
in property
16 of
of Section
Section 1.4.
l

8.

D- I

where
D- I C)
where F
F =
= (A
(A -B
ED
C) -I.. This
This result
result follows
follows easily
easily from
from the
the block
block UL
UL factorfactorization
in property
property 17
1.4.
ization in
17 of
of Section
Section 1.4.

EXERCISES
EXERCISES
mx

1.
A E
lR m xn"..
1. As
As in
in Example
Example 6.8,
6.8, characterize
characterize all
all left
left inverses
inverses of
of aa matrix
matrix A
eM
mxk
2.
A E lRmxn,
and
2. Let
Let A
E mx ",BB EelRRfflxk
andsuppose
supposeAAhas
hasananSVD
SVDasasininTheorem
Theorem5.1.
5.1.Assuming
Assuming
R(B)
all solutions
of the
matrix linear
7Z(B) ~
c R(A),
7(A), characterize
characterize all
solutions of
the matrix
linear equation
equation

AX=B
in
in terms
terms of
of the
the SVD
SVD of
of AA.

Exercises
Exercises

49

3. Let
Let jc,
x, yy Ee E"
IRn and
that X
x TTyy i=
that
3.
and suppose
suppose further
further that
^ 1.
1. Show
Show that
T -1

(/ - xy)

= I -

xTy -1

xy .

4.
IRn and
x TTyy ^i= 1.
4. Let x, y E E"
and suppose
suppose further that
that X
1. Show
Show that
that

-cxJ
C

'

where Cc = 1/(1
where
1/(1 - x xTTy).
y).
x
5. Let
Let A
A e
E R"
1R~ xn
and let
A -11 have
have columns
columns c\,
Cl, ...
and individual
elements Yij.
" and
let A"
..., ,Ccn and
individual elements
y;y.
l
T
Assume
that Yji
x/( i=
7^ 00 for
some i/ and
and j.j. Show
Show that
that the
A
eie
: (i.e.,
(i.e.,
Assume that
for some
the matrix
matrix B
B =
A - ~i
e;e;
A with
with
yl subtracted
subtracted from
from its
its (zy)th
(ij)th element)
element) is
is singular.
A
singular.
l'
Hint:
Show that
ct E<=N(B).
M(B).
Hint: Show
that Ci

6.
reconstructibility takes the
6. As in
in Example
Example 6.10,
6.10, check
check directly
directly that the condition
condition for
for reconstructibility
the
form
form

N[

fA J

CA n -

N(A n ).

This
page intentionally
intentionally left
left blank
blank
This page

Chapter 7
Chapter
7

Projections,
Projections, Inner
Inner Product
Product
Spaces, and
and Norms
Norms
Spaces,

7.1
7.1

Projections

Definition
7.1. Let
V be
vector space
with V
V=X
0 Y.
y. By
Theorem 2.26,
2.26, every
every vv Ee V
Definition 7.1.
Let V
be a
a vector
space with
X EEl
By Theorem
V
has aa unique
unique decomposition
with xx eE X and
and yy Ee y.
y : V
V ---+
> X <;
c V
y. Define
Define PX
pX,y
V
has
decomposition vv = xx + yy with
by
by
PX,yV = x for all v

V.

PX,y is
called the
projection on
on X
X along
y.
Px,y
is called
the (oblique)
(oblique) projection
along 3^.

Figure 7.1
7.1 displays
projection of
on both
and Y
3^ in
the case
case V =
=
Figure
displays the
the projection
of vvon
both X and
in the

]R2.

Figure
Figure 7.1.
7.1. Oblique
Oblique projections.
projections.

Theorem
px.y is
and P#
pl. yy
= px.y.
Theorem 7.2.
7.2. Px,y
is linear
linear and
Px,y-

Theorem
7.3. A
linear transformation
transformation P
is aa projection
if and
if it
it is
Theorem 7.3.
A linear
P is
projection if
and only
only if
is idempotent,
idempotent, i.e.,
i.e.,
P
a projection if and only if I P
a projection. Infact,
Px,yp22 =
= P.
P. Also,
Also, P
P is
isaprojectionifandonlyifl
-P is
isaprojection.
Infact, Py,x
Py.x
= II
-px.y.
Proof:
Suppose P
say on
along Y
y (using
(using the
the notation
of Definition
Definition 7.1).
7.1).
Proof: Suppose
P is
is aa projection,
projection, say
on X
X along
notation of
51
51

52
52

Chapter
Product Spaces,
Norms
Chapter 7.
7. Projections,
Projections, Inner
Inner Product
Spaces, and
and Norms

2
Let u
e V
V be
be arbitrary.
arbitrary. Then
Then Pv
Pv =
= P(x
P(x +
+ y)
y) =
= Px
Px =
= x.
x. Moreover,
Moreover, P
= P
PPv
Let
v E
p 2vv =
Pv
=
2
2
Px
Pv. Thus,
p2 =
p2 =
P. Let
X =
v}
Px =
= xx =
= Pv.
Thus, P
= P.
P. Conversely,
Conversely, suppose
suppose P
= P.
Let X
= {v
{v Ee V
V :: Pv
Pv =
= v}
and Y
y = {v
{v E V
V :: Pv
0}. It
It is
is easy
easy to
to check
check that
that X
and Y
3^are
aresubspaces.
subspaces. We
Wenow
nowprove
prove
and
Pv = OJ.
X and
that
V= X
y. First
First note
note that
that iftfveX,
then Pv
If vv Ee Y,
y, then
= O.
0. Hence
Hence
that V
X $0 y.
v E X, then
Pv = v.
v. If
then Pv
Pv =
if
X ny,
be arbitrary.
Let
if vv E X
n y, then
then vv =
= O.
0. Now
Now let
let vu Ee V
V be
arbitrary. Then
Then vv = Pv
Pv + (I
(I -- P)v.
P)v. Let
xx =
= Pv,
Pv, y
y =
= (I
(I -- P)v.
P)v. Then
Then Px
Px =
= P
p 22vv =
= Pv
Pv =
= x
x so
so xx Ee X,
while
Py
=
P(l
P)v
X, while Py = P(I - P}v==
2
2
Pv -- P
0 so
so Y
y Ee y. Thus,
Thus, V
V= X
y and
and the
on X
along Y
y is
is P.
P.
Pv
p vv = 0
X $0 Y
the projection
projection on
X along
Essentially the
the same
same argument
argument shows
shows that
is the
the projection
on Y
y along
along X.
D
Essentially
that /I - P
P is
projection on
X.
0
L
Definition 7.4.
where Y
X1-, PX.X
px.xl.
is
Definition
7.4. In
In the
the special
special case
case where
y = X^,
*s called
called an
an orthogonal
orthogonal projecprojecL
tion and
tion
and we
we then
then use
use the
the notation
notation P
PX
= PX.XL
PX,X x =
xn
Theorem 7.5.
P E
jRnxn is
projection (onto
R(P)) if
7.5. P
e E"
is the
the matrix
matrix of
of an
an orthogonal
orthogonal projection
(onto K(P)}
if and
and only
only
2
T
ifPp2 = p
P .
if
P = pT.
L
Proof: Let
Let P
be an
an orthogonal
orthogonal projection
projection (on
(on X,
say,along
alongXX1-)
} and
andlet
letx,jc,yy Ee jR"
R"bebe
Proof:
P be
X, say,
arbitrary. Note
that (I
(/ -- P)x
= (I
(I -- PX,X^X
= P
Theorem 7.3.
7.3. Thus,
Thus,
P)x =
px.xJ.)x =
PXJ..xx
by Theorem
arbitrary.
Note that
x,xx by
L
(I
P)x Ee X
X1-.
Py Ee X,
X, we
(I - - P)x
(/ -- P)x
. Since Py
wehave
have(py)T
( P y f ((II - - P)x
P)x==yT
yTpT
PT(I
P)x==O.0.
T
T
T
Since
and yy were
arbitrary, we
have P
pT (I
- P)
P) =
pT
=
pT
Since xx and
were arbitrary,
we must
must have
(I
= O.
0. Hence
Hence P
= P
PP =
= P,
P,
T
with
the second
second equality
equality following
following since
since P
is symmetric.
symmetric. Conversely,
Conversely, suppose
suppose P is
is aa
with the
pTPP is
symmetric projection
projection matrix
and let
let xx be
arbitrary. Write
Write xx =
= P
Pxx + (I
(I - P)x.
Then
symmetric
matrix and
be arbitrary.
P)x. Then
T
x TTPpT
P)x =
= x
x TTP(I
P(l -- P}x
P)x =
= 0.
O. Thus, since Px
Px e
E U(P),
R(P), then (/
(I -- P)x
P)x 6
E R(P)1x
(I(I -- P)x
ft(P)1
and P
P must
must be
an orthogonal
orthogonal projection.
projection. D
and
be an
0

7.1.1
7.1
.1

The four
orthogonal projections
projections
The
four fundamental
fundamental orthogonal

mxn
Using the notation of Theorems
A E
jRmxII with SVD A
A =
Theorems 5.1
5.1 and 5.11,
5.11, let A
6 R
= U!:V
UT,VTT =
UtSVf. Then
Then
U\SVr
r

PR(A)

AA+

U\U[

Lu;uT,
;=1
m

PR(A).L

1- AA+

U2 U

LUiUT,

i=r+l
11

PN(A)

1- A+A

V2V{

ViVf,

i=r+l

PN(A)J.

A+A

VIV{

r
LViVT
i=l

are easily
easily checked
checked to
to be
be (unique)
(unique) orthogonal
orthogonal projections
projections onto
onto the
the respective
four fundafundaare
respective four
mental
mental subspaces.
subspaces,

7.1.
7.1. Projections
Projections

53

n
Example 7.6.
Determine the
the orthogonal
orthogonal projection
M"
on another
another nonzero
Example
7.6. Determine
projection of
of aa vector
vector v Ee IR
on
nonzero
n
vector w Ee IRn.
R.
Solution: Think
Think of
of the
the vector
w as
as an
an element
element of
of the
the one-dimensional
one-dimensional subspace
subspace R(
IZ(w).
Solution:
vector w
w).
Then
desired projection
Then the
the desired
projection is
is simply
simply

Pn(w)v = ww+v
wwTv

(using
4.8)
(using Example
Example 4.8)

(WTV)
T
W.
W

Moreover, the
the vector
orthogonal to
w and such
such that
that v = P
Pvv + zz is
is given
given by
Moreover,
vector zz that
that is
is orthogonal
to wand
by
zz =
= PK(
Pn(w)"'
= (/(l
- PK(W))V
Pn(wv =
= vv
- (^-^
(:;~)j w.
w. See
See Figure
Figure 7.2.
7.2. A
A direct
direct calculation
calculation shows
shows
W)Vv =
that z and
and u;
are, in
fact, orthogonal:
orthogonal:
ware,
in fact,
that

Pv

Figure 7.2.
projection on
on aa "line."
Figure
7.2. Orthogonal
Orthogonal projection
"line."

Example 7.7.
7.7. Recall
Recall the
of Theorem
Theorem 3.11.
3.11. There,
{ v \ ,...
. . . ,, Vk}
Vk} was
was an
an orthomormal
orthornormal
Example
the proof
proof of
There, {VI,
basis
for aa subset
subset S of
arbitrary vector
vector xx Ee R"
chosen and
and aa formula
formula for
for XI
x\
basis for
of W
IRn.1. An
An arbitrary
IRn was
was chosen
appeared rather
XI is simply the orthogonal projection
projection of
of
rather mysteriously. The expression for x\
xX on
on S. Specifically,
Specifically,

Example 7.8.
7.8. Recall
Recall the
diagram of
the four
four fundamental
subspaces. The
indicated direct
direct
Example
the diagram
of the
fundamental subspaces.
The indicated
sum decompositions of the domain E"
IR n and co-domain IR
Rmm are given easily as follows.
Let
Wn1 be
arbitrary vector.
vector. Then
Then
be an
an arbitrary
Let Xx Ee IR
X

PN(A)u

+ PN(A)X

= A+ Ax + (I
= VI

- A+ A)x

vt x + V Vi x
2

(recall VVT = I).

Chapter 7.
7. Projections,
Projections, Inner
Inner Product
Product Spaces,
Spaces, and
and Norms
Norms
Chapter

54

Similarly, let
y E
e ]R
arbitrary vector.
Then
Similarly,
let Y
IR mm be
be an
an arbitrary
vector. Then
Y

= PR(A)Y + PR(A)~Y
= AA+y + ( l - AA+)y
= U1Ur y + U2U[ Y (recall UU T =

I).

Example 7.9.
7.9. Let
Let
Example

Then
Then

1/4 1/4 ]
1/4 1/4

o o
4]
into the
sum of
of aa vector
in N(A)-L
A/'CA)-1
4V uniquely
uniquely into
the sum
vector in
r

and
can decompose
[2 3
and we
we can
decompose the
the vector
vector [2
3
and aa vector
vector in
in J\f(A),
N(A), respectively,
respectively, as
as follows:
follows:
and

[!]~
=

7.2

A' Ax

+ (l -

A' A)x

1/2 -1/2
1/2 1/2 0] [ 2] [ -1/2 1/2
+
[ 1~2 1~2 ~
o
o

5/2] [-1/2]
1~2
.
[ 5~2 +

Inner Product
Product Spaces
Inner

Definition
V be
vector space
Then (',
{ , .) ) :: V
V xx V
is a
inner
Definition 7.10.
7.10. Let
Let V
be aa vector
space over
over R.
IR. Then
V -+ IR is
a real
real inner
product ifif
1. (x, x) :::
Ofor aU
E V and (x,
only ifx
O.
> Qfor
all x 6V
( x , xx)} ==00 if
if and only
ifx =
= 0.

2.
(x, y)
(y,x)forallx,y
2. (x,
y) =
= (y,
x) for all x, y eE V.
V.
3. (x,
{*, aYI
cryi +
+ PY2)
^2) == a(x,
a(x,Yl)
y\) +
+ f3(x,
/3(jt, Y2)
y^}for
for all
allx,
jc,Yl,
yi,Y2
j2 E^ VVand/or
and for all
alia,
R.
a, f3ftEe IR.
3.
T
Example 7.11.
7.11. Let
Let V
= R".
IRn. Then
Then {^,
(x, y}
y) = X
x TyY is
is the
the "usual"
Euclidean inner
inner product
product or
or
Example
V=
"usual" Euclidean
dot product.
T
Example
IRn. Then (x,
y)QQ =
X T Qy,
Qy, where Q
Q =
= Q
Q TT >
Example 7.12.
7.12. Let V
V=
= E".
(jc, y)
=X
> 0 is
is an
an arbitrary
n x n positive definite
definite matrix, defines
defines a "weighted" inner product.
T
Definition 7.13.
7.13. IfIf A Ee R
IRmmxxn,
ATE
IR nnxm
xm is the unique linear transformation
transformation or map
Definition
", then A
e R
T
E R
IRmm and
andfor
IRn.
such that {x,
(x, Ay) =- {AT
(A x, y) for all x
for all y e R".

7.2.
7.2. Inner
Inner product
Product Spaces
Spaces

55
55

It
is easy
easy to
to check
check that,
that, with
with this
this more
more "abstract"
of transpose,
transpose, and
It is
"abstract" definition
definition of
and if
if the
the
T
(i, y)th
j)th element
element of
of A
A is
is a
aij,
then the
the (i,
(i, y)th
j)th element
element of
of A
AT is
ap. It
can also
be checked
(/,
is a/,.
It can
also be
checked
(; , then
T T
that all
the usual
usual properties
properties of
of the
the transpose
transpose hold,
hold, such
= B
BT
AT.
the
that
all the
such as
as (AB)
(Afl) =
A
. However,
However, the

definition above
allows us
us to
to extend
the concept
concept of
of transpose
transpose to
to the
the case
case of
of weighted
weighted inner
inner
definition
above allows
extend the
mxn
products in the following way. Suppose A
A eE R
]Rm xn and let (.,
.)
Q
and
(.,
.)
R,
with
Q
{-, -}g
(, -}R, with Qand
and
R positive
positive definite,
definite, be
be weighted
weighted inner
inner products
products on
on R
IRmm and
and W,
IRn, respectively.
respectively. Then
Then we
we can
can
define the
the "weighted
transpose" A
A## as
the unique
unique map
map that
that satisfies
define
"weighted transpose"
as the
satisfies
#
m
(x, Ay)
AY)Q
(A#x,
all xx E
IRm
IRn.1.
(x,
= (A
x, Y)R
y)R for all
eR
and for all Yy Ee W
Q =

T
#
By Example
Example 7.12
7.l2 above,
above, we
we must
must then
then have
have X
xT
QAy = x
x TT(A
(A#{
Ry for
all x,
x, y.
y. Hence
Hence we
we
By
QAy
) Ry
for all
#
T
#
= (A#{
R. Taking transposes
transposes (of
AT
Q =
= RA
RA#.
must have QA
QA =
(A ) R.
(of the usual variety) gives A
Q
.
Since
R is
is nonsingular,
nonsingular, we
we find
find
Since R

A# =
R-1A TQ.
Q.
A*
= /r'A'

We can
generalize the
notion of
= 0)
to Q-orthogonality
We
can also
also generalize
the notion
of orthogonality
orthogonality (x
(xTTyy =
0) to
Q -orthogonality (Q
(Q is
is
aa positive
positive definite
definite matrix).
matrix). Two
Two vectors
vectors x,
x, yy Ee IRn
W are
are Q-orthogonal
<2-orthogonal (or
(or conjugate
conjugate with
with
T
Q) if
if (x,
X T Qy
O. Q
Q-orthogonality
is an
important tool
tool used
used in
in
respect to
to Q)
respect
( x , yy)} QQ = X
Qy = 0.
-orthogonality is
an important
studying conjugate
conjugate direction
direction methods
methods in
in optimization
optimization theory.
studying
theory.
Let V
be a
a vector
vector space
space over
over <C.
(., .)
Definition 7.14.
7.14. Let
Definition
V be
C. Then
Then {-,
} :: V
V xV
V -+
-> C is
is aa complex
complex
inner
product ifif
inner product

1.
0 for all
all xx eE V
and ((x,
and only
only if
O.
1. (x,
( x , xx)) ::::
> Qfor
V and
x , xx)) =
=00 ifif and
ifxx =
= 0.

2.
(x, y)
(y, x)
e V.
V.
2. (x,
y) = (y,
x) for
for all
all x,
x, yy E
3.
(x,ayi
= a(x,
y2}forallx,
y\, yY22 Ee V
V and
for alia,
3. (x,
aYI + fiy
f3Y2)
a(x, y\)
yll + fi(x,
f3(x, Y2)
for all x, YI,
andfor
all a, f3ft 6
E C.
c.
2) =
Remark 7.15.
could use
Remark
7.15. We
We could
use the
the notation
notation {,
(., -}
)ec to
to denote
denote aa complex
complex inner
inner product,
product, but
but
if the
the vectors
vectors involved
complex-valued, the
the complex
complex inner
inner product
product is
is to
to be
be understood.
if
involved are
are complex-valued,
understood.
Note, too,
too, from
from part
part 22 of
of the
the definition,
definition, that
that ((x,
must be
be real
real for
for all
all x.
Note,
x , xx)) must
x.
Remark 7.16.
Note from
parts 22 and
and 3
3 of
of Definition
Definition 7.14
7.14 that
that we
we have
have
Remark
7.16. Note
from parts

(ax\ + fix2, y) = a(x\, y) + P(x2, y}.


Remark 7.17.
The Euclidean
Euclidean inner
inner product
product of
x, y E
is given
given by
by
Remark
7.17. The
of x,
eC
C"n is
n

(x, y)

= LXiYi = xHy.
i=1

H
The conventional
the complex
Euclidean inner
inner product
product is
is (x,
(x, y}
y) = yyHxx but
but we
we
The
conventional definition
definition of
of the
complex Euclidean
HH
use its
its complex
complex conjugate
conjugate x yy here
here for
for symmetry
symmetry with
with the
the real
real case.
use
case.

Remark 7.1S.
7.18. A
(x, y}
Remark
A weighted
weighted inner
inner product
product can
can be
be defined
defined as
as in
in the
the real
real case
case by
by (x,
y)Q
=
Q
H
Qy, for arbitrary
arbitrary Q
Q = Q
QH
> 0.
o. The notion
notion of Q
Q-orthogonality
Xx HHQy,
>
-orthogonality can
can be
be similarly
similarly
generalized
to the
the complex
generalized to
complex case.
case.

56
56

Chapter 7.
7. Projections,
Projections, Inner
Inner Product
Product Spaces,
and Norms
Chapter
Spaces, and
Norms

Definition 7.19.
(V, IF)
F) endowed
is called
Definition
7.19. A
A vector
vector space
space (V,
endowed with
with aa specific
specific inner
inner product
product is
called an
an
inner
If F
= C,
call V
V aa complex
complex inner
space. If
inner product
product space.
space. If
IF =
e, we
we call
inner product
product space.
If FIF == R,
R we
we
call V
Va
space.
a real
real inner
inner product
product space.
call
Example 7.20.
7.20.
Example
T
1.
Check that
= IRR"n xxn" with
with the
the inner
inner product
product (A,
(A, B)
B) =
= Tr
Tr A
AT
B is
is aa real
real inner
inner product
product
1. Check
that V =
B
space. Note
other choices
choices are
since by
of the
function,
space.
Note that
that other
are possible
possible since
by properties
properties of
the trace
trace function,
T
T
BTTAA =
Tr A BTT =
= Tr
BAT.
Tr AT
TrA
BB =
= Tr
TrB
= TrAB
TrBA
.
nx
H
2.
V=
= e
Cnxn
" with
the inner
inner product
(A, B)
B) = Tr
Tr A
is aa complex
complex inner
2. Check
Check that
that V
with the
product (A,
AHBB is
inner
product
space. Again,
other choices
choices are
possible.
product space.
Again, other
are possible.

Definition
V be
inner product
V, we
(or
Definition 7.21.
7.21. Let
Let V
be an
an inner
product space.
space. For
For vv eE V,
we define
define the
the norm
norm (or
length)
\\v\\ =
= */(v,
v). This
This is
( - , -.).) .
length) ofv
ofv by
by IIvll
-J(V,V).
is called
called the
the norm
norm induced
induced by
by (',
Example
Example 7.22.
7.22.
n
1. If
If V
V =
= IR
E."
with
inner product,
1.
with the
the usual
usual inner
product, the
the induced
induced norm
norm is
is given
given by
by II||i>||
v II =
n
2 21

(Li=l V i )2.(E,=i<Y)
xV*

9\ 7

2.
If V
V =
= en
C" with
inner product,
2. If
with the
the usual
usual inner
product, the
the induced
induced norm
norm is
is given
given by
by II\\v\\
v II =
22 !
"n
(L...i=l
IVi I )*.
) .
(?
= ,l,-l
Theorem
7.23. Let
be an
an orthogonal
an inner
inner product
Then
Theorem 7.23.
Let P
P be
orthogonal projection
projection on
on an
product space
space V. Then
\\Pv\\
< Ilvll
\\v\\forallv
V.
IIPvll :::::
for all v e
E V.
#
Proof: Since
P is
is an
projection, P
p22 =
P =
pH.
the notation
p## denotes
Proof:
Since P
an orthogonal
orthogonal projection,
= P
=P
. (Here,
(Here, the
notation P
denotes
#
the unique
transformation that
that satisfies
satisfies (Pu,
( P u , vv)} =
= (u,
(u, P
v) for
for all
If this
the
unique linear
linear transformation
p#v)
all u,
u, vv eE V. If
this
#
T
= R"
IRn (or
(or en),
where P
p#
is simply
simply the
usual P
pT
(or
seems
little too
abstract, consider
seems aa little
too abstract,
consider V =
C"), where
is
the usual
(or
H
#
pH)).
Hence ((Pv,
v) =
= (P
(P 22v,
v, v)
v) =
= (Pv,
(Pv, P
p#v)
= ((Pv,
Pv) =
= \\Pv\\
IIPvll 22 :::
O. Now
Now // -- PPisis
P
)). Hence
P v , v)
v) =
P v , Pv)
> 0.
also aa projection,
so the
the above
applies and
and we
also
projection, so
above result
result applies
we get
get

0::::: ((I - P)v. v) = (v. v) - (Pv, v)


=

from
which the
theorem follows.
follows.
from which
the theorem

IIvll2 - IIPvll 2

Definition
norm induced
on an
"usual" inner
product
The norm
induced on
an inner
inner product
product space
space by
by the
the "usual"
inner product
Definition 7.24.
7.24. The
is called
norm.
natural norm.
is
called the
the natural
In
case V =
= C"
en or
or V == R",
IR n, the
the natural
natural norm
norm is
is also
also called
the Euclidean
Euclidean norm.
norm. In
In
In case
called the
the next
next section,
section, other
on these
spaces are
are defined.
defined. A
converse to
the
other norms
norms on
these vector
vector spaces
A converse
to the
the above
above
IIx II
= .j(X,X},
an inner
inner
procedure is
is also
also available.
That is,
is, given
norm defined
defined by
by \\x\\
procedure
available. That
given aa norm
>/(*> x), an
product
can be
defined via
product can
be defined
via the
the following.
following.

7.3.
7.3. Vector
Vector Norms
Norms

57
57

Theorem 7.25
Theorem
7.25 (Polarization
(Polarization Identity).
Identity).
1.
For x,
x, yy E
product is
1. For
m~n,
R", an
an inner
inner product
is defined
defined by
by

IIx+YIl2~IIX_YI12_

(x,y)=xTy=

IIx + yll2 _ IIxll2 _ lIyll2


2

2. For
For x,
x, yy eE C",
en, an
an inner
inner product
product is
by
2.
is defined
defined by

where
= ii =
= \/T.
where jj =
.J=I.

7.3
7.3

Vector
Norms
Vector Norms

Definition 7.26.
vector space.
IR is
Definition
7.26. Let
Let (V,
(V, IF)
F) be
be aa vector
space. Then
Then II\ \ -. \ II\ : V
V ---+
-> R
is aa vector
vector norm
norm ifit
if it
satisfies
following three
satisfies the
the following
three properties:
properties:
1. Ilxll::: Ofor all x E V and IIxll = 0 ifand only ifx

2. Ilaxll = lalllxllforallx

Vandforalla

= O.

IF.

3. IIx + yll :::: IIxll + IIYliforall x, y E V.


(This
seen readily
from the
illus(This is
is called
called the
the triangle
triangle inequality,
inequality, as
as seen
readily from
the usual
usual diagram
diagram illus
two vectors
vectors in
in ]R2
.)
trating
sum of
trating the
the sum
of two
R2.)
Remark 7.27.
7.27. It
the remainder
this section
to state
for complexRemark
It is
is convenient
convenient in
in the
remainder of
of this
section to
state results
results for
complexvalued vectors.
The specialization
specialization to
the real
real case
case is
is obvious.
obvious.
valued
vectors. The
to the
A vector
said to
Definition 7.28.
Definition
7.28. A
vector space
space (V,
(V, IF)
F) is
is said
to be
be aa normed
normed linear
linear space
space if
if and
and only
only ifif
there exists
exists aa vector
vector norm
norm II|| . II|| :: V
V ---+
-> ]R
R satisfying
satisfying the
the three
three conditions
conditions of
of Definition
there
Definition 7.26.
7.26.

Example
Example 7.29.
7.29.

1.
HOlder norms,
p-norms, are
by
1. For
For x Ee en,
C", the
the Holder
norms, or
or p-norms,
are defined
defined by

Special
Special cases:
cases:
(a) Ilx III = L:7=1

IXi

I (the "Manhattan" norm).


1

(b) Ilxllz = (L:7=1Ix;l2)2 =


(c) Ilxlioo

= maxlx;l
IE!!

(X

X)2

(the Euclidean norm).

lim IIxllp-

p---++oo

(The
that requires
(The second
second equality
equality is
is aa theorem
theorem that
requires proof.)
proof.)

58
58

Chapter
7. Projections,
Projections, Inner
Inner Product
Spaces, and
and Norms
Chapter 7.
Product Spaces,
Norms
2.
Some weighted
weighted p-norms:
p-norms:
2. Some
L~=ld;lx;l, whered;
O.
(a) IIxll1.D
||JC||,.D =
= E^rf/l*/!,
where 4 >
> 0.
1

(b) IIx
IIz.Q
= (x
= QH
Ikllz.g
(xhH Qx)
QXY 2,> where Q =
QH >
> 0 (this norm is more commonly
denoted II|| . IIQ)'
||c).
denoted

3.
vector space
space (C[to,
(C[to, ttl,
t \ ] , 1Ft),
R), define
define the
vector norm
3. On
On the
the vector
the vector
norm
11111 = max 1/(t)I
to:::.t~JI

On the
vector space
space e[to,
((C[to, ttlr,
t\])n, 1Ft),
R), define
define the
the vector
On
the vector
vector norm
norm
1111100 = max II/(t) 11 00 ,
tO~t:5.tl
Theorem
Inequality). Let
Let x,
x, yy E
Fhcorem 7.30
7.30 (HOlder
(Holder Inequality).
e en.
C". Then
Ther,
I

-+-=1.
A particular
particular case
the Holder
HOlder inequality
A
case of
of the
inequality is
is of
of special
special interest.
interest.

Theorem 7.31
(Cauchy-Bunyakovsky-Schwarz Inequality).
Inequality). Let
C". Then
Theorem
7.31 (Cauchy-Bunyakovsky-Schwarz
Let x,
x, y
y eE en.
Then

with equality
are linearly
dependent.
with
equality ifif and
and only
only ifif xx and
and yyare
linearly dependent.
x2
Proof' Consider
the matrix
[x y]
y] e
E en
Proof:
Consider the
matrix [x
C"x2
.. Since
Since

is
definite matrix,
matrix, its
must be
be nonnegative.
nonnegative. In
words,
is aa nonnegative
nonnegative definite
its determinant
determinant must
In other
other words,
H
H
H
H
H
H
H
H
H
o
y, we
yl ~<
0 ~
< (x
( x xx)(yH
) ( y y y)
) - (x
( x yy)(yH
) ( y x x).
) . Since
Since yH
y xx == xx y,
we see
see immediately
immediately that
that IXH
\XHy\
D
IIxll2l1yllz.
0
\\X\\2\\y\\2Note: This
is not
not the
algebraic proof
proof of
of the
the Cauchy-Bunyakovsky-Schwarz
Note:
This is
the classical
classical algebraic
Cauchy-Bunyakovsky-Schwarz
(C-B-S)
e.g., [20,
However, it
to remember.
remember.
(C-B-S) inequality
inequality (see,
(see, e.g.,
[20, p.
p. 217]).
217]). However,
it is
is particularly
particularly easy
easy to
Remark
7.32. The
between two
x, yy eE C"
en may
by
Remark 7.32.
The angle
angle e
0 between
two nonzero
nonzero vectors
vectors x,
may be
be defined
defined by
cos#
= 1I;~~1~1112'
I,

|.^||
,
0
<
0
<
5-.
The
C-B-S
inequality
is
thus
equivalent
to
the
statement
cos e =
0
~
e
~
I'
The
C-B-S
inequality
is
thus
equivalent
to
the
statement
Il-Mmlylb
^
| cose
COS 01| ~< 1.
1.
1
Remark 7.33.
Theorem 7.31
and Remark
Remark 7.32
product spaces.
Remark
7.33. Theorem
7.31 and
7.32 are
are true
true for
for general
general inner
inner product
spaces.
x
nxn
Remark
7.34. The
The norm
norm II|| . 112
||2 is
unitarily invariant,
if U
U E e
C"
" is
is unitary,
unitary, then
Remark 7.34.
is unitarily
invariant, i.e.,
i.e., if
then
H H
H
\\Ux\\2 = IIxll2
\\x\\2 (Proof
(Proof. IIUxili
\\Ux\\l = xXHUHUx
U Ux = xHx
X X =
= IIxlli)
\\x\\\). However,
However, 11111
|| - ||, and
|| - 1^
IIUxll2
and 1IIIClO

7.4. Matrix
Matrix Norms
Norms
7.4.

59
59

are not
invariant. Similar
Similar remarks
remarks apply
apply to
to the
the unitary
unitary invariance
invariance of
of norms
norms of
of real
real
are
not unitarily
unitarily invariant.
vectors under
orthogonal transformation.
vectors
under orthogonal
transformation.
Remark 7.35.
7.35. If
If x, yy E en
C" are
are orthogonal,
orthogonal, then
then we
we have
have the
Identity
Remark
the Pythagorean
Pythagorean Identity

Ilx YII~

= IIxll~

+ IIYII~,

_ _//.
the proof
proof of
of which
follows easily
easily from
from liz
||z||2
z z.
the
which follows
II~2 =
ZH

Theorem 7.36.
All norms
are equivalent;
there exist
7.36. All
norms on
on en
C" are
equivalent; i.e.,
i.e., there
exist constants
constants CI,
c\, C2
c-i (possibly
(possibly
depending on
onn)
depending
n) such
such that
that

Example 7.37.
7.37. For
For xx EG en,
C", the
the following
following inequalities
inequalities are
are all
all tight
bounds; i.e.,
i.e., there
there exist
exist
Example
tight bounds;
vectors
for which
equality holds:
holds:
vectors xx for
which equality

Ilxlll :::: Jn Ilxlb


Ilxll2:::: IIxll
IIxlloo :::: IIxll

Ilxlll :::: n IIxlloo;


IIxl12 :::: Jn Ilxll oo ;
IIxlioo :::: IIxllz.

Finally, we
Finally,
we conclude
conclude this
this section
section with
with aa theorem
theorem about
about convergence
convergence of
of vectors.
vectors. ConConvergence of
of aa sequence
sequence of
of vectors
to some
some limit
vector can
can be
converted into
into aa statement
vergence
vectors to
limit vector
be converted
statement
about
numbers, i.e.,
terms of
about convergence
convergence of
of real
real numbers,
i.e., convergence
convergence in
in terms
of vector
vector norms.
norms.

Theorem 7.38.
7.38. Let
Let II
\\ II\\ be
be aa vector
vector norm
norm and
and suppose
suppose v,
v, v(l),
i (1) , v(2),
v(2\ ...
... Ee en.
C". Then
Then
lim

V(k)

k4+00

7.4
7.4

if and only if

lim
k~+oo

II v(k)

II = O.

Matrix Norms
Norms
Matrix

In this
section we
we introduce
introduce the
the concept
concept of
of matrix
norm. As
As with
with vectors,
vectors, the
for
In
this section
matrix norm.
the motivation
motivation for
using matrix
norms is
is to
to have
have aa notion
notion of
of either
either the
the size
size of
of or
or the
the nearness
of matrices.
matrices. The
The
using
matrix norms
nearness of
of
former
the latter
to make
make sense
former notion
notion is
is useful
useful for
for perturbation
perturbation analysis,
analysis, while
while the
latter is
is needed
needed to
sense of
"convergence"
vector space
xn ,, IR)
is
"convergence" of
of matrices.
matrices. Attention
Attention is
is confined
confined to
to the
the vector
space (IRm
(Wnxn
R) since
since that
that is
what arises
arises in
in the
majority of
of applications.
applications. Extension
Extension to
to the
complex case
case is
is straightforward
what
the majority
the complex
straightforward
and
and essentially
essentially obvious.
obvious.
mx
Definition 7.39.
7.39. II
|| II|| : IR
Rmxn
" ->
E is
is aa matrix
matrix norm
if it
it satisfies
the following
Definition
~ IR
norm if
satisfies the
following three
three
properties:
properties:

IR mxn and

IIAII

2.

lIaAl1 =

3.

IIA + BII :::: IIAII + IIBII for all A, BE IRmxn.


(As with vectors, this is called the triangle inequality.)

Ofor all A

lalliAliforall A E

IR

IIAII

if and only if A

1.

mxn

= 0

andfor all a E IR.

= O.

60

Chapter
Chapter 7.
7. Projections,
Projections, Inner
Inner Product
Product Spaces,
Spaces, and
and Norms
Norms

Example 7.40.
7.40. Let A Ee lR,mxn.
R mx ". Then the Frobenius norm (or matrix Euclidean norm) is
defined by

IIAIIF

~ (t. ai;) I ~ (t.

altA)) 1

(T, (A' A)) 1

(T, (AA '));

(where
rank(A)).
^wncic r =
= laiiK^/i;;.
Example 7.41. Let A
A E
e lR,mxn.
Rmxn. Then the matrix
matrix p-norms are defined by

IIAII
P

max

IIAxll
= max
Ilxli p
IIxllp=1

-_P

Ilxllp;60

IIAxll

.
p

The following three special cases are important because they are "computable."
"computable." Each is a
theorem and requires a proof.
I. The "maximum column sum" norm is
1.

2.
2. The "maximum row sum" norm is
IIAlioo = max
rE!!l.

(t

laUI).

J=1

3.
3. The spectral norm is
tTL

IIAII2 = Amax(A A) = A~ax(AA ) = a1(A).

Note: IIA+llz

l/ar(A), where r

= rank(A).

mxn

Example 7.42.
lR,mxn.. The Schattenp-norms
Example
7.42. Let A EE R
Schatten/7-norms are defined by
I

IIAlls.p = (at'

+ ... + a!)"".

Some special cases of Schatten /?-norms


p-norms are equal to norms defined previously.
previously. For example,
|| . || 5 2 =
|| ||5i00 =
11115.2
= ||II . \\IIFF and
and 11'115,00
= ||II . ||112'2. The
The norm
norm ||II . ||115.1
is often
often called
called the
the trace
trace norm.
norm.
5>1 is
mx
Example 7.43.
lR,mxn
Example
7.43. Let A Ee K
"._ Then "mixed" norms can also be defined by

IIAII
p,q

= max IIAxil p
11.<110#0 IIxllq

Example 7.44.
7.44. The
"matrix analogue
analogue of
of the
the vector
vector I-norm,"
1-norm," IIAlis
|| A\\s == Li.j
^ j laij
\ai}; I,|, isisaa norm.
norm.
Example
The "matrix
The concept of a matrix norm alone is not altogether useful since it does not allow us
to estimate
estimate the
the size
size of
of aa matrix
matrix product
product A
B in
in terms
of the
the sizes
sizes of
of A
A and
and B
B individually.
individually.
to
AB
terms of

7.4.
7.4. Matrix
Matrix Norms
Norms

61
61

Notice that
that this
this difficulty
did not
not arise
vectors, although
although there
there are
are analogues
analogues for,
Notice
difficulty did
arise for
for vectors,
for, e.g.,
e.g.,
inner
definition.
inner products
products or
or outer
outer products
products of
of vectors.
vectors. We
We thus
thus need
need the
the following
following definition.
mxn
nxk
Let A
A eE R
]Rmxn,, B
B eE R
]Rnxk.. Then
norms \\II . II",
Ilfl'p,and
Definition 7.45.
7.45. Let
Definition
Then the
the norms
\\a, \\II \\
and II \\. lIy\\y are
are
mutually
consistent if \\ A B \\ a < IIAllfllIBlly.
\\A\\p\\B\\y. AA matrix
matrix norm
norm 1111
\\ \\isis said
said toto be
be consistent
consistent
mutuallyconsistentifIlABII,,::S
ifif \\AB\\
< ||II A
the matrix
defined.
II A B II ::s
A ||1111|| Bfi||II whenever
whenever the
matrix product
product is
is defined.

Example
Example 7.46.
7.46.
1. ||II ||/7
and II ||. II ||pp for
for all
all pp are
are consistent
consistent matrix
matrixnorms.
norms.
1.
II F and
2. The "mixed"
2.
"mixed" norm
norm

II 11 100
,

IIAxll1
= max
- - = max laijl
x;60 Ilx 1100
i,j

is
consistent. For
=B
= [:
\\
is aa matrix
matrix norm
norm but
but it
it is
is not
not consistent.
For example,
example, take
take A
A =
B =
||Afl||
l.
IIABIII,oo
2 while IIAIII,ooIlBIII,oo
li00 = 2while||A||
li00 ||B|| 1>00 = 1.

J1. Then
Then
:].

The
-norms are
are examples
examples of
(or induced
The pp-norms
of matrix
matrix norms
norms that
that are
are subordinate
subordinate to
to (or
induced by)
by)
i.e.,
aa vector
vector norm,
norm, i.e.,
IIAxl1
IIAII = max - - = max IIAxl1
x;60
IIx II
Ilxll=1
IIAxll Pp
.
11^4^11
(or,
., . . )),. For
such subordinate
oper(or, more
more generally,
generally, ||A||
IIAllp,q == max^o
maxx;60 IIxll
For such
subordmate norms,
norms, also
also called
caUedoperq
ator
norms, we
clearly have ||Ajc||
< ||A||1|jt||.
||Afijc|| ::s
< ||A||||fljc||
< IIAIIIIBllllxll,
||A||||fl||||jt||,
atornorms,
wec1earlyhave
IIAxll ::s
IIAllllxll Since
Since IIABxl1
IIAlIllBxll ::s
it follows
follows that
that all
all subordinate
norms are
are consistent.
it
subordinate norms
consistent.
Theorem
= ||A||
||jc*|| ifif the
the matrix
norm is
There exists
exists a
a vector
vector x*
x* such
such that
that ||Ajt*||
IIAx*11 =
IIAllllx*11
matrix norm
is
Theorem 7.47.
7.47. There
subordinate
the vector
norm.
subordinate to
to the
vector norm.
Theorem 7.48.
7.48. IfIf \\II . 11m
is aa consistent
matrix norm,
norm, there
there exists
norm \\II . \\IIvv
Theorem
\\m is
consistent matrix
exists aa vector
vector norm
consistent
< \\A\\
\\x\\vv.'
consistent with
with it,
it, i.e.,
i.e., HAjcJI^
IIAxliv ::s
IIAlimm Ilxli

Not
consistent matrix
example,
is subordinate
subordinate to
to aa vector
vector norm.
norm. For
For example,
Not every
every consistent
matrix norm
norm is
consider
Then||A^||
consistentwith
withII ||. II ||F,F,but
butthere
theredoes
does
II F'F.Then
II Ax 1122 ::s< II ||A||
A II Filx
112,2,sosoII ||. 112||2isisconsistent
consider ||II . \\
F ||jc||
not
exist aa vector
\^
not exist
vector norm
norm ||II . II|| such
such that
that ||A||
IIAIIFF is
is given
given by
by max^o
max x ;60 ",~~i'.
Useful Results
The following
following miscellaneous
miscellaneous results
results about
about matrix
matrix norms
norms are
collected for
for future
future reference.
The
are collected
reference.
reader is invited to prove each of them as an exercise.
The interested reader
exercise.
1. II In II p

= 1 for all p, while


x
]Rnxn,
eE R"
", the

IIIn II F

= .jii.

2. For A
A
A for
2.
following inequalities are all tight, i.e.,
i.e., there exist matrices A
which
equality holds:
holds:
which equality
IIAIII ::s .jii IIAlb IIAIII ::s n IIAlloo,
IIAIII ::s .jii II A IIF;
IIAII2 ::s.jii IIAII I, IIAII2 ::s .jii IIAlloo, IIAII2::S IIAIIF;
II A 1100 ::s n IIAII I ,
IIAlioo ::s .jii IIAII2, IIAlioo ::s .jii IIAIIF;
IIAIIF ::s.jii IIAII I , IIAIIF ::s .jii IIAlb IIAIIF ::s .jii IIAlioo'

62
62

Chapter
Norms
Chapter 7.
7. Projections,
Projections, Inner
Inner Product
Product Spaces,
Spaces, and
and Norms
mxa
3.
A EeR
IR mxn
3. For
For A
.,

max laijl :::: IIAII2 :::: ~ max laijl.


l.]

l.]

4. The
The norms
norms II|| . IIF
\\F and
and II|| . 112
||2 (as
(as well
well as
as all
all the
the Schatten
Schatten /?-norms,
not necessarily
necessarily
p-norms, but
but not
mx
other p-norms) are unitarily
unitarily invariant;
invariant; i.e.,
i.e., for all A Ee IR
Rmxn
" and for all orthogonal
x
mxm and
nxn
matrices Q
Q EzR
and Z Ee IR
M"
",, IIQAZlia
||(MZ||a = ||A||
or F.
F.
IRmxm
IIAllaa fora
fora = 2 or
matrices

Convergence
Convergence
The following theorem uses matrix norms to convert a statement about convergence of a
sequence of matrices into a statement about the convergence of an associated sequence of
of
scalars.
scalars.
(1)
(2)
Theorem 7.49.
7.49. Let II\\ 11
be a matrix norm and suppose A, A
A(I),
A(2),
-\\bea
,A
,...
... EeIRmxn.
Rmx".Then
Then

lim A (k)

k~+oo

= A if and only if

lim

k~+oo

IIA (k)

A II

= o.

EXERCISES
EXERCISES
+
1.
P is an orthogonal projection,
p+
=
1. If
If P
projection, prove that P
= P.
P.

2. Suppose P and Q are orthogonal projections and P + Q = I. Prove that P - QQ


must be
an orthogonal
orthogonal matrix.
must
be an
matrix.
+
3. Prove that /I - A+
AA is an orthogonal projection. Also, prove directly that V
V22Vl
V/ isis an
an
orthogonal projection,
where V2
2 is
is defined
defined as
as in
in Theorem
Theorem 5.1.
orthogonal
projection, where
5.1.
nxn
e IR
Wmxn
4. Suppose that a matrix A
A E
has linearly independent columns. Prove that the
orthogonal projection onto the space spanned by these column vectors is given by the
P = A(AT
-1 }AT.
matrix P
A(ATA)
A)~
AT.

5.
5. Find the (orthogonal) projection of the vector [2
[2 33 4f
4]r onto the subspace of 1R
R33
spanned by
the plane
3;c -yv +
+ 2z =
= O.
0.
spanned
by the
plane 3x
x
T
6. Prove that E"
IR nxn
ATBB is a real inner product
" with the inner product (A,
(A, B)
B) == Tr A
space.
space.

7. Show that the matrix norms II|| . 112


||2 and
and II|| . IIF
\\F are unitarily invariant.
nxn
8. Definition:
Let A
Rnxn
and denote
denote its
its set
set of
of eigenvalues
eigenvalues (not
(not necessarily
8.
Definition: Let
A Ee IR
and
necessarily distinct)
distinct)
by
{A-i , ....
. . , >, .An}.
} . The
The spectral
radius of
of A
is the
scalar
by P.l,
spectral radius
A is
the scalar

p(A) = max IA;I.


i

63
63

Exercises
Exercises

Let
Let

A=[~14

0
12

~].

Determine
||A||F, IIAII
\\A\\Ilt, IIAlb
||A||2, IIAlloo,
HA^, and
and peA).
p(A).
Determine IIAIIF'
9. Let
Let
9.

A=[~4 9~ 2~].

Determine
||A||F, IIAII
H A I dI ,, IIAlb
||A||2, IIAlloo,
H A H ^ , and
and p(A).
(An nn xx nn matrix,
all of
of whose
Determine IIAIIF'
peA). (An
matrix, all
whose
2
n (n 2 + 1)
/2,
columns
rows as
well as
columns and
and rows
as well
as main
main diagonal
diagonal and
and antidiagonal
antidiagonal sum
sum to
to ss =
= n(n
l)/2,
is called a "magic square" matrix. If M is a magic square matrix, it can be proved
that
|| M Up =
= ss for
for all
all/?.)
p.)
that IIMllp
T
10. Let
, where
e IR
R"n are
are nonzero.
Determine IIAIIF'
||A||F, IIAIII>
||A||j, IIAlb
||A||2,
10.
Let A
A =
= xy
xyT,
where both
both x,
x, y
y E
nonzero. Determine
and II||A||oo
in terms
terms of
of IIxlla
\\x\\a and/or
and/or IlylljJ,
\\y\\p, where
where ex
a and
and {3
ft take
the value
value 1,2,
1, 2, or
or (Xl
oo as
and
A 1100 in
take the
as
appropriate.
appropriate.

This
page intentionally
intentionally left
left blank
blank
This page

Chapter 8
Chapter
8

Linear
Linear Least
Least Squares
Squares
Problems
Problems

8.1
8.1

The
The Linear
Linear Least
Least Squares
Squares Problem
Problem

mx
Problem:
A E
jRmxn
Problem: Suppose
Suppose A
e R
" with
with m 2:
> nand
n and bb E
<=jRm
Rm isisaagiven
givenvector.
vector.The
Thelinear
linearleast
least
squares
consists of
of the
the set
squares problem
problem consists
of finding
finding an
an element
element of
set

x = {x

E jRn : p(x)

IIAx - bll 2 is minimized}.

Solution: The
The set
X has
verified properties:
set X
has aa number
number of
of easily
easily verified
properties:
1.
vector xx E
Ax is
the residual
residual associated
1. A vector
e X
X if
if and
and only
only if
if AT
ATrr = 0, where
where r = bb - Ax
is the
associated
T
T
with x.
Ax =
AT
x. The equations AT
A rr = 0 can be rewritten in the form A
ATTAx
=A
bb and the
latter form
form is
known as
the normal
normal equations,
i.e., xx E
latter
is commonly
commonly known
as the
equations, i.e.,
e X
X if
if and
and only
only ifif
x is
is aa solution
solution of
of the
the normal
normal equations.
equations. For
For further
further details,
details, see
see Section
Section 8.2.
8.2.
2.
X if
2. A
A vector
vector xx E X
if and
and only
onlv if
if xx is
is of
of the
the form
x=A+b+(I-A+A)y, whereyEjRnisarbitrary.

(8.1)

To
the form
form
To see
see why
why this
this must
must be
be so,
so, write
write the
the residual
residual r in
in the

r = (b - PR(A)b)

+ (PR(A)b -

Ax).

Now,
Now, (PR(A)b
(Pn(A)b - Ax)
AJC) is
is clearly
clearly in
in 'R(A),
7(A), while
while
(b - PR(A)b) = (I - PR(A))b
= PR(A),,-b E 'R(A)-L

so
so these
these two
two vectors
vectors are
are orthogonal.
orthogonal. Hence,
Hence,

IIrll~ = lib - Axll~


= lib - PR(A)bll~

+ IIPR(A)b -

Axll~

from the
Pythagorean identity
identity (Remark
(Remark 7.35).
7.35). Thus,
Thus, IIAx
||A.x - bll~
b\\\ (and
(and hence
hence p(x)
p ( x ) ==
from
the Pythagorean
\\Ax
assumes its
its minimum
minimum value
value if
if and
and only
only if
II Ax -b\\2)
b 112) assumes
if
(8.2)

65

66

Chapter 8.
Squares Problems
Problems
Chapter
8. Linear
Linear Least
Least Squares
+
and
this equation
always has
has aa solution
since AA
AA+b
R(A). By
By Theorem
6.3, all
all
and this
equation always
solution since
b eE 7(A).
Theorem 6.3,
solutions
are of
of (8.2)
(8.2) are
of the
the form
form
solutions of

x = A+ AA+b

+ (I -

A+ A)y

=A+b+(I-A+A)y,

where
]R.n is
minimum value
value of
is then
then clearly
clearly equal
equal to
to
where yy eE W
is arbitrary.
arbitrary. The
The minimum
of pp ((x)
x ) is
lib - PR(A)bll z =
~

11(1 Ilbll z,

AA+)bI1 2

the
inequality following
following by
7.23.
the last
last inequality
by Theorem
Theorem 7.23.
3. X
X is
is convex.
To see
why, consider
two arbitrary
Xl =
A++bb + (I
- A
+ A) y
3.
convex. To
see why,
consider two
arbitrary vectors
vectors jci
= A
(I
A+A)y
and
Xz =
A+b + (I
- A+A)z
A+ A)z in
in X.
X. Let
Let 86 eE [0,1].
the convex
combination
and *2
= A+b
(I
[0, 1]. Then
Then the
convex combination
8x,
(1 - - 8)xz
(I(I- -A+
(1(1
- 8)z)
is clearly
in X.
0*i + (1
#)*2 ==A+b
A+b++
A+A)(8y
A)(Oy++
- 0)z)
is clearly
in X.

4. X has
has aa unique
unique element
x" of
of minimal
minimal2-norm.
x" =
= A++bb isis the
unique vector
vector
4.
element x*
2-norm. In
In fact,
fact, x*
the unique
that
this "double
minimization" problem,
problem, i.e.,
i.e., x*
x* minimizes
minimizes the
the residual
residual p(x)
that solves
solves this
"double minimization"
p(x)
and is
of minimum
minimum 2-norm
does so.
immediately from
from
and
is the
the vector
vector of
2-norm that
that does
so. This
This follows
follows immediately
convexity
or directly
directly from
from the
the fact
fact that
that all
all xx eE X
X are
are of
of the
the form
form (8.1)
(8.1) and
and
convexity or

which
two vectors
orthogonal.
which follows
follows since
since the
the two
vectors are
are orthogonal.
5.
is aa unique
i.e., X
= {x"}
{x*} =
{A+b}, if
5. There
There is
unique solution
solution to
to the
the least
least squares
squares problem,
problem, i.e.,
X =
= {A+b},
if
+
and only
or, equivalently,
if and
and only
if rank
(A) = n.
n.
and
only if
if A
A +AA = Ilor,
equivalently, if
only if
rank(A)
Just
solution of
of linear
squares
Just as
as for
for the
the solution
linear equations,
equations, we
we can
can generalize
generalize the
the linear
linear least
least squares
problem
case.
problem to
to the
the matrix
matrix case.
mx
mxk
]R.mxn
BE
]R.mxk.. The
Theorem 8.1. Let A Ee E
" and B
R
The general solution to

min IIAX -

XElR Plxk

Bib

is of
the form
form
is
of the
X=A+B+(I-A+A)Y,
xfc
where
]R.nxk is
arbitrary. The
The unique
unique solution
solution of
of minimum
minimum 2-norm
2-norm or
F-norm is
is
where Y
Y E R"
is arbitrary.
or F-norm
X=
= A+B.
X
A+B.

Remark 8.2.
8.2. Notice
linear least
least squares
squares problem
look exactly
exactly the
Remark
Notice that
that solutions
solutions of
of the
the linear
problem look
the
same as
as solutions
solutions of
of the
system AX
= B.
B. The
is that
case
same
the linear
linear system
AX =
The only
only difference
difference is
that in
in the
the case
of linear
linear least
squares solutions,
solutions, there
there is
no "existence
condition" such
such as
as R(B)
R(A).
of
least squares
is no
"existence condition"
K(B) S;
c 7(A).
If the
to be
satisfied, then
equality holds
squares
If
the existence
existence condition
condition happens
happens to
be satisfied.
then equality
holds and
and the
the least
least squares

8.3
and Other
8.3 Linear
Linear Regression
Regression and
Other Linear
Linear Least
Least Squares
Squares Problems
Problems

67

O. Of all solutions that give a residual of 0, the unique solution X


X =
residual is 0.
= A++BB has
-norm.
minimum 2-norm or F
F-norm.
+
Remark
8.3. If
we take
Im in
in Theorem
Theorem 8.1,
8.1, then
can be
Remark 8.3.
If we
take B
B = 1m
then X = A
A+
can
be interpreted
interpreted as
as
saying that the Moore-Penrose pseudoinverse of A
A is the best (in the matrix 2-norm sense)
AX approximates the identity.
matrix such that AX

Remark 8.4. Many other interesting and useful approximation results are available for the
x
F -norm). One such is the following.
lR~xn" with SVD
matrix 2-norm (and F-norm).
following. Let A Ee M
A

= U~VT = LOiUiV!.
i=l

Then a best rank kk approximation to A


A for 1
l <:sf ck <:sr r,
, i .i.e.,
e . , aa solution to
min IIA - MIi2,

MEJRZ'xn

is given
given by
is
by
k

Mk =

LOiUiV!.
i=1

The special case in which m =


nand
A Ee
=n
and k =
= n - 1 gives a nearest singular matrix to A

8.2
8.2

lR~ xn .

Geometric Solution
Geometric
Solution

Looking at the schematic provided in Figure 8.1, it is apparent that minimizing IIAx
|| Ax -bll
b\\2 2
is equivalent to finding the vector xx E
Ax is closest to b
e lR
Wn1 for which pp =
Ax
b (in the Euclidean
Ax must be orthogonal to R(A).
Ay is an arbitrary
norm sense). Clearly, rr = bb - Ax
7(A). Thus, if Ay
R(A) (i.e., yy is arbitrary), we must have
vector in 7(A)
0= (Ay)T (b - Ax)
=yTAT(b-Ax)
= yT (ATb _ AT Ax).
T
T
Since y is arbitrary, we must have A
AT
- A
AT
Ax = 0 or A
ATr A;c
Ax =
= AT
bb
Ax
ATb.
b.
T
T
Special case: If A
A is full (column) rank, then x =
A)-l ATb.
= (AT
(A A)
A b.

8.3
8.3
8.3.1
8.3.1

Linear Regression
Regression and
and Other
Other Linear
Linear Least
Least Squares
Squares
Linear
Problems
Problems
Example: Linear regression

Suppose we have m measurements (ll,


YI), ...
(t\,y\),
. . . ,, (trn,
(tm,yYm)
m) for which we hypothesize a linear
(affine) relationship
(8.3)
y = at + f3

68

Chapter
8. Linear
Chapter 8.
Linear Least
Least Squares
Squares Problems
Problems

p=Ax

Ay E R(A)

Figure 8.1.
of bb on
on R(A).
K(A).
Figure
S.l. Projection
Projection of
for
certain constants
constants a.
a and
and {3.
way to
to solve
this problem
problem is
to find
find the
the line
line that
that best
best fits
fits
for certain
ft. One
One way
solve this
is to
the
data
in
the
least
squares
sense;
i.e.,
with
the
model
(8.3),
we
have
the data in the least squares sense; i.e., with the model (8.3), we have

YI

= all + {3 + 81 ,

Y2

= al2 + {3 + 82

where 8
and we
we wish
wish to
to minimize
minimize 8?
8;. Geometrically,
Geometrically, we
we
where
&\,...,
8mm are
are "errors"
"errors" and
8\ +
+ ...
+ 8^1 , ... , 8
are
trying to
to find
find the
the best
best line
line that
that minimizes
minimizes the
of the)
the) distances
from the
are trying
the (sum
(sum of
of squares
squares of
distances from
the
given data
data points.
See, for
for example,
example, Figure
Figure 8.2.
8.2.
given
points. See,
y

Figure 8.2.
linear regression.
regression.
Figure
8.2. Simple
Simple linear

Note
distances are
are measured
measured in
in the
sense from
to [he
the line
Note that
that distances
the vertical
venical sense
from the
the points
point!; to
line (as
(a!;
indicated,
indicated. for
for example,
example. for
for the
the point
point (t\,
(tl. y\}}.
YIn. However,
However. other
other criteria
criteria arc
nrc possible.
po~~iblc. For
For excxcould measure
the distances
in the
the horizontal
horizontal sense,
perpendiculnr distance
ample,
ample, one
one could
measure the
distances in
sense, or
or the
the perpendicular
distance
from the
the points
points to
the line
line could
could be
be used.
used. The
is called
called total least squares.
Instead
from
to the
The latter
latter is
squares. Instead
of
use 1-norms
I-norms or
two are
are computationally
of 2-norms,
2-norms, one
one could
could also
also use
or oo-norms.
oo-norms. The
The latter
latter two
computationally

8.3. Linear
Linear Regression
Regression and
and Other
Other Linear
Linear Least
Least Squares
Squares Problems
Problems
8.3.

69

much more difficult


difficult to handle, and thus we present only the more tractable 2-norm case in
text that follows.
follows.
The m
ra "error equations" can be written in matrix form as
Y = Ax +0,

where

We then want to solve the problem


minoT 0 = min (Ax - y)T (Ax - y)
x

or, equivalently,
min lIoll~ = min II Ax - YII~.
x

(8.4)

T
T
AT
Solution: xx
is aa solution
solution of
of the
equations A
yy where, for the
Solution:
= [^1
[~] is
the normal
normal equations
ATAx
Ax = A
special form
form of
of the
the matrices
above, we
special
matrices above,
we have
have

and
and
AT Y = [ Li ti Yi

LiYi

J.

The solution for the parameters a and f3ft can then be written

8.3.2

Other least squares problems

(8.3) but rather is of


of the
the form
form
Suppose the hypothesized model is not the linear equation (S.3)
y = f(t) =

Cl0!(0

+ 4- cn<t>n(t).

(8.5)
(8.5)

In (8.5) the i(t)


</>,(0 are given (basis) functions
functions and the Ci
c; are constants to be determined to
minimize
least squares
squares error.
error. The
The matrix
problem is
still (S.4),
(8.4), where
where we
now have
minimize the
the least
matrix problem
is still
we now
have

An important special case of (8.5) is least squares polynomial approximation, which


;
i 1l
- ,, i i Ee!!,
corresponds to choosing i
0,(t)
(?)==t t'~
n,although
althoughthis
thischoice
choicecan
canlead
leadtotocomputational
computational

70
70

Chapter
8. Linear
Problems
Chapter 8.
Linear Least
Least Squares
Squares Problems

difficulties
ill conditioning for large n.
difficulties because of numerical ill
n. Numerically better approaches
are based on orthogonal polynomials, piecewise polynomial functions, splines, etc.
etc.
The key feature in (8.5) is that the coefficients
coefficients Ci
c, appear
appear linearly. The basis functions
functions
</>,- can be arbitrarily nonlinear. Sometimes a problem in which the Ci'S
c, 's appear nonlinearly
i
nonlinearly
can be
converted into
into aa linear
linear problem.
For example,
example, if
if the
the fitting
fitting function
function is
is of
of the
can
be converted
problem. For
the form
form
C2
y = ff( (t)
t ) = c\e
Y
c, eC2i
/ ,, then
then taking
taking logarithms
logarithms yields
yields the
the equation
equation logy
log y =
= logci
log c, +
+ cjt.
c2f. Then
Then
logy,
c\ = log
logci,
defining yy =
log y, c,
c" and C2GI = cj_
C2 results in a standard linear least squares
problem.
problem.

8.4
8.4

Least Squares
Squares and
and Singular
Singular Value
Decomposition
Least
Value Decomposition

In the numerical linear algebra literature (e.g., [4],


[4], [7],
[7], [11], [23]), it is shown that solution
of linear least squares problems via the normal equations can be a very poor numerical
method in
in finite-precision
finite-precision arithmetic.
arithmetic. Since
Since the
the standard
standard Kalman
Kalman filter
essentially amounts
method
filter essentially
amounts
to sequential updating of normal equations, it can be expected to exhibit such poor numerical
behavior in
practice (and
(and it
it does).
does). Better
methods are
are based
based on
on algorithms
algorithms that
behavior
in practice
Better numerical
numerical methods
that
T
AT
work directly and solely on A
A itself
itself rather than A
A. Two basic classes of algorithms are
S VD and
and QR
QR (orthogonal-upper
(orthogonal-upper triangular)
triangular) factorization,
factorization, respectively.
respectively. The
The former
former
based on SVD
is much more expensive but is generally more reliable and offers
offers considerable theoretical
insight.
insight.
In this section we investigate solution of the linear least squares problem
min II Ax x

b11 2 ,

A E IRmxn , bE IR m ,

(8.6)

via the
SVD. Specifically,
Specifically, we
assume that
that A
has an
an SVD
SVD given
given by
VT = U,SVr
U\SVf
via
the SVD.
we assume
A has
by A
A = UT,
U~VT
Theorem 5.1.
5.1. We now note that
as in Theorem
IIAx -

bll~ = IIU~VT x
=

II ~ VT X

bll~

U T bll; since

II . Ib is unitarily invariant

=11~z-cll~ wherez=VTx,c=UTb
=

II

[~ ~] [ ~~ ] - [ ~~ ] II:

= II [

sz~~ c, ] II:

The last
last equality
follows from
from the
the fact
fact that
], then
then II||u||^
= II||i>i
The
equality follows
that if
if vv = [
[~~].
v II ~ =
viii\\\~ + II\\vi\\\
v211 ~ (note
(note
that
orthogonality is
is not
not what
what is
used here;
here; the
the subvectors
subvectors can
different lengths).
lengths). This
This
that orthogonality
is used
can have
have different
explains why it is convenient to work above with the square of the norm rather than the
concerned. the two are equivalent.
norm. As far as the minimization is concerned,
equivalent. In fact.
fact, the last
quantity above is clearly minimized by taking z\
S~lc\. The subvector zZ22 is arbitrary,
arbitrary,
z, = S-'c,.
while the
the minimum
minimum value
of II\\Ax
is IIl^llr
while
value of
Ax - b\\^
b II ~ is
czll ~.

8.5.
Least Squares
Squares and
B.S. Least
and QR
QR Factorization
Factorization

71
71

Now transform
transform back
back to
to the
the original
coordinates:
Now
original coordinates:
x = Vz

= [VI V2 1[ ~~
= VIZ I + V2Z2
=
=

+ V2Z2
vls-Iufb + V2Z2.
VIS-ici

The
The last
last equality
equality follows
follows from
from
c

= UTb = [

f: ]= [ ~~ l

Note that
that since
Z2 is
is arbitrary,
an arbitrary
arbitrary vector
R(V22)) =
N(A). Thus,
Thus, xx has
has
Note
since 12
arbitrary, V
V22Zz2 is
is an
vector in
in 7Z(V
= A/"(A).
m
A++A)_y,
A) y, where
where yy Ee R
ffi.m
is arbitrary.
been written in the form x =
= A++bb +
+ (I
(/ - A
is
arbitrary. This
This agrees,
agrees,
of
with (8.1).
(8.1).
of course,
course, with
The
minimum value
value of
of the
the least
residual is
is
The minimum
least squares
squares residual

and
we clearly
have that
that
and we
clearly have

minimum least
-4=> bb is
to all
in U
minimum
least squares
squares residual
residual is
is 0
0 {::=:}
is orthogonal
orthogonal to
all vectors
vectors in
U22
{::=:}
<=^

orthogonal to
all vectors
vectors in
b is
is orthogonal
to all
in R(A)l.
7l(A}L

{::=:}

b E R(A).

+
Another expression
expression for
for the
minimum residual
is II|| (I
(/ - AA
)b|| 2 . This
easily since
Another
the minimum
residual is
AA +)bllz.
This follows
follows easily
since
2
T
T
2
T
||(7
AA+)b\\
\\U2Ufb\\l
=
b
U
U^U
UJb
=
b
U
U*b
=
\\U?b\\
.
11(1- AA+)bll~2 = 11U2U!b"~ = b U2ZV!V22V!b = bTVZV!b
= IIV!bll~.2
2
Finally,
an important
case of
of the
least squares
problem is
is the
the
Finally, an
important special
special case
the linear
linear least
squares problem
X
so-called
full-rank problem,
problem, i.e.,
A eE 1R
ffi.~xn.
this case
the SVD
A is
by
so-called full-rank
i.e., A
". In
In this
case the
SVD of
of A
is given
given by

A = V:EV
there is
is thus
thus "no
to the
the solution.
A
UZVTT =
= [VI
[U{ Vzl[g]Vr,
t/2][o]^i r > and
and there
"no V
V22 part"
part" to
solution.

8.5
8.5

Least
Squares and
and QR
QR Factorization
Least Squares
Factorization

In
this section,
section, we
look at
at the
solution of
squares problem
In this
we again
again look
the solution
of the
the linear
linear least
least squares
problem (8.6)
(8.6) but
but this
this
time in
in terms
terms of
of the
the QR
factorization. This
This matrix
matrix factorization
factorization is
cheaper to
to compute
time
QR factorization.
is much
much cheaper
compute
than an
numerical enhancements,
enhancements, can
be quite
than
an SVD
SVD and,
and, with
with appropriate
appropriate numerical
can be
quite reliable.
reliable.
To simplify
the exposition,
we add
add the
the simplifying
assumption that
that A
A has
has full
column
To
simplify the
exposition, we
simplifying assumption
full column
XM
rank, i.e.,
A eE R
ffi.~xn.. It
then possible,
possible, via
via aa sequence
sequence of
Householder or
or Givens
Givens
rank,
i.e., A
It is
is then
of so-called
so-called Householder
transformations, to
to reduce
A in
in the
following way.
way. A
A finite
finite sequence
sequence of
orthogonal
transformations,
reduce A
the following
of simple
simple orthogonal
row transformations
(of Householder
Householder or
or Givens
can be
be performed
performed on
A to
to reduce
row
transformations (of
Givens type)
type) can
on A
reduce itit
to
If we
label the
such orthogonal
the
to triangular
triangular form.
form. If
we label
the product
product of
of such
orthogonal row
row transformations
transformations as
as the
mxm
orthogonal matrix QT
ffi.mxm,, we have
QT E R
(8.7)

72

Chapter 8.
8. Linear
Least Squares
Problems
Chapter
Linear Least
Squares Problems

x
mx
where R eE M
ffi.~xn" is
is upper
upper triangular.
triangular. Now
Now write
write Q
Q =
= [QI
Qz],
where Q\
QI e
E R
ffi.mxn
and
where
[Q\ Q
" and
2], where
IX(m
)
Qz2 E
Q
ffi.m
K" x(m-n).
~" . Both
Both Q
Q\I and
and Qz
<22 have
have orthonormal
orthonormal columns.
columns. Multiplying
Multiplying through
through by
by Q
Q
in (8.7),
(8.7), we
we see
that
in
see that

A=Q[~J
= [QI

Qz] [

(8.8)

= QIR.

(8.9)

(8.8), or
or (8.9)
are variously
variously referred
referred to
to as
as QR
QR factorizations
factorizations of
A. Note
Note that
that
Any of
Any
of (8.7),
(8.7), (8.8),
(8.9) are
of A.
(8.9) is
is essentially
essentially what
what is
is accomplished
accomplished by
by the
the Gram-Schmidt
Gram-Schmidt process,
process, i.e.,
by writing
writing
(8.9)
i.e., by
AR~
"triangular" linear
(given by
coefficients of
AR- l1 = Q\
QI we
we see
see that
that aa "triangular"
linear combination
combination (given
by the
the coefficients
of
R-l)I ) of
of the
the columns
columns of
yields the
the orthonormal
orthonormal columns
of Q
I.
R~
of A yields
columns of
Q\.
Now
Now note
note that
that
IIAx -

bll~

= IIQ T Ax = II [

QTbll~ since II . 112 is unitarily invariant

~ ] x - [ ~~ ] If:,

The last
last quantity
quantity above
above is
is clearly
clearly minimized
minimized by
by taking
taking xx =
R- lIc\
Cl and
and the
the minimum
minimum residual
residual
The
=R
+
1l
is
Equivalently, we
we have
have x =
Qf b == A +bb and
and the
the minimum
minimum residual
residual is
is IIIIC?^!^Qr bllz'
is Ilczllz.
\\C2\\2- Equivalently,
= RR~ Q\b

EXERCISES
EXERCISES
xn
m
+
1.
For A E
ffi. mxn
ffi. m,, and
and any
any y E
ffi. n , check
check directly
directly that
that (I
A)y and
1. For
W
, ,b
b Ee E
e R",
(I -- A++A)y
and A
A+b
b
are orthogonal
orthogonal vectors.
are
vectors.

2.
yt):
2. Consider
Consider the
the following
following set
set of
of measurements
measurements (*,,
(Xi, Yi):
(1,2), (2,1), (3,3).
(a) Find
Find the
the best
best (in
(in the
the 2-norm
2-norm sense)
sense) line
line of
of the
the form
form yy =
= ax
ax +
+ ftfJ that
that fits
fits this
(a)
this
data.
data.

(b)
(in the
sense) line
= ay
of the
the form
form jc
x =
ay + (3
fJ that
that fits
fits this
this
(b) Find
Find the
the best
best (in
the 2-norm
2-norm sense)
line of
data.
data.
n
q, and
and q
qz2 are
are two
two orthonormal
orthonormal vectors
vectors and
and b
b is
is aa fixed
fixed vector,
vector, all
in ffi.
3. Suppose
3.
Suppose qi
all in
R".

(a)
Find the
the optimal
optimallinear
combination aq^
aql +
+ (3q2
that is
is closest
closest to
to b
b (in
(in the
the 2-norm
(a) Find
linear combination
fiq2 that
2-norm
sense).
sense).
(b) Let
Let rr denote
the "error
vector" bb
- ctq\
aql
- {3qz.
that rr is
is orthogonal
(b)
denote the
"error vector"
flq2- Show
Show that
orthogonal to
to
both^i
q2.
both ql and
and q2.

Exercises
Exercises

73

4.
Find all
all solutions
the linear
linear least
4. Find
solutions of
of the
least squares
squares problem
problem
min II Ax - bll 2
x

when A = [

5.
5. Consider the problem of
of finding the minimum 2-norm
2-norm solution
solution of
of the linear least
least
rmarp nrrh1<=>m
squares
problem
min II Ax - bl1 2
x

when A =

[~ ~

] and b = [

!1

The solution is

(a)
Consider aa perturbation
E\ =
= [~
[0 pi
of A,
is aa small
small positive
(a) Consider
perturbation EI
~] of
A, where
where 88 is
positive number.
number.
of the
the above
Solve
the perturbed
perturbed version
version of
Solve the
above problem,
problem,

where
AI =
A+
What happens
- yyII2
approaches 0?
O?
where AI
= A
+ E
E\.I . What
happens to
to IIx*
||jt*
||2 as
as 88 approaches

(b) Now
the perturbation
(b)
Now consider
consider the
perturbation EI
E2 == \[~0 s~\ of
of A,
A, where
where again
again 88 is
is aa small
small
positive number.
number. Solve
the perturbed
perturbed problem
positive
Solve the
problem
min II A 2 z - bib
z

where
A
happens to
to IIx*
\\x* - z||
as 88 approaches
approaches O?
0?
where A
A22 =
A +E
E22. What
What happens
zll22 as
6. Use
four Penrose
Penrose conditions
conditions and
the fact
fact that
that QI
Q\ has
has orthonormal
orthonormal columns
to
6.
Use the
the four
and the
columns to
x
if A
~;:,xn"can
verify
verify that
that if
A eE R
canbe
befactored
factoredininthe
theform
form(8.9),
(8.9),then
thenA+
A+== RR~IlQf.
Q\.
x
7. Let
Let A
A eE R"
~nxn,
not necessarily
necessarily nonsingular,
nonsingular, and
and suppose
suppose A
A = QR, where
where Q is
is
1.
", not
QTT.
orthogonal. Prove that A
A ++ = R+
R+Q

This
page intentionally
intentionally left
left blank
blank
This page

Chapter 9
Chapter
9

Eigenvalues
Eigenvalues and
and
Eigenvectors
Eigenvectors

9.1
9.1

Fundamental Definitions
Definitions and
and Properties
Fundamental
Properties

nxn
Definition 9.1.
A nonzero
nonzero vector
vector xx eE C"
en is
right eigenvector
eigenvector of
of A
A eE C
e nxn if
there exists
exists
Definition
9.1. A
is aa right
if there
aa scalar
scalar A
A. Ee e,
C, called
called an
an eigenvalue,
eigenvalue, such
such that
that

(9.1)

Ax = AX.

Similarly, aa nonzero
nonzero vector
vector yy eE C"
en is
is a
a left
left eigenvector
corresponding to
to an
an eigenvalue
eigenvalue
Similarly,
eigenvector corresponding
a if
Mif

(9.2)
By taking
taking Hermitian
Hennitian transposes
transposes in
in (9.1),
(9.1), we
we see
immediately that
that X
x HH is
By
see immediately
is aa left
left eigeneigenH
vector of
of A
A H associated
associated with
with I.
Note that
that if
if xx [y]
[y] is
is aa right
right [left]
[left] eigenvector
eigenvector of
of A,
A, then
then
vector
A. Note
so
[ay] for
for any
any nonzero
nonzero scalar
E C. One
One often-used
often-used scaling
scaling for
for an
an eigenvector
eigenvector is
is
so is
is ax
ax [ay]
scalar aa E
aa =
1/
\j'||;t||
so that
that the
the scaled
scaled eigenvector
eigenvector has
has nonn
norm 1.
1. The
The 2-nonn
2-norm is
is the
the most
most common
common
IIx II so
nonn used
used for
for such
such scaling.
norm
scaling.
polynomialn
det (A - Al) is
is called
called the
the characteristic
characteristic polynomial
polynomial
Definition 9.2.
9.2. The
Definition
The polynomial
n (A)
(A.) == det(AA,/)
of
(Note that
This
of A.
A. (Note
that the
the characteristic
characteristic polynomial
polynomial can
can also
also be
be defined
defined as
as det(A./
det(Al
- A).
A). This
results in
in at
at most
most aa change
change of
of sign
sign and,
and, as
as aa matter
matter of
of convenience,
convenience, we
we use
use both
both forms
results
forms
throughout the
the text.}
text.)
throughout

The
It can
The following
following classical
classical theorem
theorem can
can be
be very
very useful
useful in
in hand
hand calculation.
calculation. It
can be
be
proved easily
easily from
the Jordan
Jordan canonical
canonical fonn
to be
be discussed
discussed in
the text
text to
to follow
(see, for
proved
from the
form to
in the
follow (see,
for
example, [21D
or directly
directly using
using elementary
elementary properties
properties of
of inverses
inverses and
and determinants
determinants (see,
(see,
example,
[21]) or
for example,
example, [3]).
for
[3]).
nxn
Theorem 9.3
9.3 (Cayley-Hamilton).
(Cayley-Hamilton). For
For any
any A
A eE C
enxn
n(A) =
= 0.
O.
Theorem
,, n(A)
2
Example
+ 2A,
aneasy
easyexercise
exercise toto
Example 9.4.
9.4. Let
Let A
A = [~g
[-~ ~g].
-~]. Then
Then n(k)
n(A) = X
A2 +
2A - 3.3. ItItisisan
2
verify
n(A) =
=A
A2 + 2A
2A -- 31
31 = 0.
O.
verify that
that n(A)
x

nxn
It can
determinants that
C"
",, then
then
It
can be
be proved
proved from
from elementary
elementary properties
properties of
of detenninants
that if
if A
A eE e

n(A)
is aa polynomial
polynomial of
of degree
n. Thus,
Thus, the
the Fundamental
Fundamental Theorem
Theorem of
of Algebra
Algebra says
says that
that
7t
(X) is
degree n.

75

76

Chapter
Eigenvectors
Chapter 9.
9. Eigenvalues
Eigenvalues and
and Eigenvectors

n(A) has
has nn roots,
roots, possibly
possibly repeated.
the determinant
7r(A)
repeated. These
These roots,
roots, as
as solutions
solutions of
of the
determinant equation
equation
n(A)

= det(A -

AI)

= 0,

(9.3)

are
the eigenvalues
A and
the singularity
matrix A
A - XI,
AI, and
are the
eigenvalues of
of A
and imply
imply the
singularity of
of the
the matrix
and hence
hence further
further
guarantee
corresponding nonzero
nonzero eigenvectors.
guarantee the
the existence
existence of
of corresponding
eigenvectors.

x
of A
A Ee C"nxn
of A,
A, i.e.,
of
Definition
Definition 9.5.
9.5. The
The spectrum
spectrum of
" is
is the
the set
set of
of all
all eigenvalues
eigenvalues of
i.e., the
the set
set of
all
polynomialn(A).
spectrum of
of A
A is
denoted A
A(A).
all roots
roots of
of its
its characteristic
characteristic polynomial
n(X). The
The spectrum
is denoted
(A).

form
form

A Ee en
A], ... , X
An.
Let the eigenvalues of A
C"xxn
" be denoted X\,...,
n. Then if we write (9.3) in the
n(A) = det(A - AI) = (A] - A) ... (An - A)

(9.4)

and
= 00 in
we get
get the
fact that
A] . A.2
A2 ...
and set
set A
X=
in this
this identity,
identity, we
the interesting
interesting fact
that det(A)
del (A) =
= AI
AnAM(see
(see
also Theorem
Theorem 9.25).
If
n(A) has real coefficients.
coefficients. Hence the roots of 7r(A),
n(A), i.e., the
If A Ee 1Ftnxn,
Wxn, then n(X)
eigenvalues
A, must
must occur
eigenvalues of
of A,
occur in
in complex
complex conjugate
conjugate pairs.
pairs.

Example 9.6.
9.6. Let
a, ft
R and
and let
= [[~f3
_^ !].
]. Then
Then n(A)
jr(A.) = A
A.22- - 2aA
2aA++aa22++f32
ft2 and
and
Example
Let a,
f3 Ee 1Ft
let A
A =
A has
has eigenvalues
f3j (where
A
eigenvalues aa fij
(where j = ii = R).
>/!)
If
A E
If A
1Ftnxn,
R"x", then there is an easily checked
checked relationship between the left and right
T
A and
AT
(take
transposes of
if
eigenvectors
eigenvectors of
of A
and A
(take Hermitian
Hermitian transposes
of both
both sides
sides of
of (9.2.
(9.2)). Specifically,
Specifically, if
left eigenvector of
of A
A corresponding to A
A eE A(A),
A(A), then yy is a right eigenvector of
of AT
y is a left
AT
corresponding to IA. E A
A(A).
(A). Note, too, that by elementary properties of
of the determinant,
r
we
have A(A)
A(A) =
= A(A
A(AT),
A(A) =
A(A) only
A E
we always
always have
), but
but that
that A(A)
= A(A)
only if
if A
e 1Ftnxn.
R"x".

Definition
9.7. IfX
is aa root
multiplicity m
m ofjr(X),
that A
X is
is an
an eigenvalue
A
Definition 9.7.
If A is
root of
of multiplicity
of n(A), we
we say
say that
eigenvalue of
of A
of algebraic
multiplicity m.
multiplicity of
of
algebraic multiplicity
m. The
The geometric
geometric multiplicity
ofXA is
is the
the number
number of
of associated
associated
independent eigenvectors
eigenvectors =
= nn - rank(A
A/) =
= dimN(A
dim J\f(A - AI).
XI).
independent
rank(A - AI)
If
AE
A(A) has
has algebraic
then 1I ::::
if
If A
A(A)
algebraic multiplicity
multiplicity m,
m, then
< dimN(A
dimA/"(A - AI)
A/) ::::
< m.
m. Thus,
Thus, if
we
denote the
the geometric
geometric multiplicity
of A
A by
by g,
we must
have 1I ::::
< gg ::::
< m.
m.
we denote
multiplicity of
g, then
then we
must have
x
Definition
A matrix
matrix A
A Ee W
1Ftnxn
is said
said to
an eigenvalue
whose
Definition 9.8.
9.8. A
" is
to be
be defective
defective if
if it
it has
has an
eigenvalue whose
geometric multiplicity
multiplicity is
geometric
is not
not equal
equal to
to (i.e.,
(i.e., less
less than)
than) its
its algebraic
algebraic multiplicity.
multiplicity. Equivalently,
Equivalently,
A
A is
is said
said to
to be
be defective
defective ifif it
it does
does not
not have
have nn linearly
linearly independent
independent (right
(right or
or left)
left) eigenvectors.
eigenvectors.

From the Cayley-Hamilton Theorem, we know that n(A)


O. However,
n(A) =
= 0.
However, it is possible for
for A
to satisfy
satisfy aa lower-order
example, if
= \[~1Q ],
satA to
lower-order polynomial.
polynomial. For
For example,
if A
A =
~], then
then A
A satsible
2
(Je - 1)2
= O.0. But
the smaller
isfies (1
isfies
I) =
But it
it also
also clearly
clearly satisfies
satisfies the
smaller degree
degree polynomial
polynomial equation
equation

a - n =0o.

(it. - 1) ;;;:;

neftnhion
minimal polynomial
polynomial of
Of A
A G
l::: l!if.nxn
ix the
(hI' polynomial
polynomilll o/(X)
a(A) oJ
Definition ~.~.
5.5. Thll
The minimal
K""" is
of IPll.ft
least
degree
such that
O.
degree such
that a(A)
a (A) ~=0.

It
a(Je) is
unique (unique
the coefficient
It can
can be
be shown
shown that
that or(l)
is essentially
essentially unique
(unique if
if we
we force
force the
coefficient
of
the highest
A to
to be
such aa polynomial
polynomial is
is said
to be
monic and
and we
we
of the
highest power
power of
of A
be +
+1,1. say;
say; such
said to
be monic
generally
write et
a(A)
generally write
(A) as
as aa monic
monic polynomial
polynomial throughout
throughout the
the text).
text). Moreover,
Moreover, itit can
can also
also be
be

9.1. Fundamental
9.1.
Fundamental Definitions
Definitions and
and Properties
Properties

77
77

nonzero polynomial
polynomial fi(k}
f3(A) for which ftf3(A)
O. In particular,
shown that aa(A)
(A.) divides every
every nonzero
(A) = 0.
particular,
a(A)
a(X) divides n(A).
n(X).
a(A)
There is an algorithm to determine or
(A.) directly
directly (without
(withoutknowing
knowing eigenvalues
eigenvalues and
and asasUnfortunately, this algorithm,
algorithm, called the Bezout
Bezout algorithm,
sociated eigenvector
eigenvector structure). Unfortunately,
algorithm,
is numerically unstable.
Example 9.10.
Example
9.10. The above definitions are illustrated below for a series of matrices, each
4
4, i.e., n(A)
(A
- 2)
2)4.
of which has an eigenvalue 2 of algebraic multiplicity 4,
7r(A) = (A
. We denote
the geometric multiplicity by g.
g.

A-[~
-

A~[~
A~U

A~U

0
0

0
I
2

0
0

0
0
I
2

0
0
0
2

0
0

!]

ha,"(A)

] ha< a(A)

(A - 2)' ""d g

(A - 2)' ""d g

~
~

1.

2.

0
0
0
2

] h'" a(A)

(A - 2)2 ""d g

3.

0
0
0
2

] ha<a(A)

(A - 2) andg

4.

g plus the degree of a must always be five.


At this point, one might speculate that g
Unfortunately, such is not the case. The matrix

A~U
has a(A)

= (A -

2)2 and g

I
2

0
0

0
0

!]

= 2.

x
Theorem
9.11. Let
" ana
Theorem 9.11.
Let A
A eE C
ccnxn
and [let
Ai be
be an
an eigenvalue
eigenvalue of
of A
A with
with corresponding
corresponding right
right
et A.,
eigenvector jc,-.
yj be a left
A (A)
Xi. Furthermore, let Yj
left eigenvector corresponding to any A
Aj; eE l\(A)
such
= A.,.
Then yfx{Xi =
= O.
0.
such that
that Xj
Aj 1=
Ai. Then

YY

Proof:
Since Ax
Proof' Since
AXit = A,*,,
AiXi,
(9.5)

78

Chapter
Eigenvalues and
and Eigenvectors
Chapter 9.
9. Eigenvalues
Eigenvectors

yy,

Similarly, since
since YY
y" A
= AjXjyf,
Similarly,
A =
(9.6)

Subtracting (9.6)
(9.6) from
(9.5), we
= (Ai
(A.,-- Aj)YY
A y )j^jc,.
SinceAiA,,-- AjA.7- =1=^ 0,0,we
wemust
musthave
have
Subtracting
from (9.5),
we find
find 00 =
xi. Since
yfxt
=0.O.
YyXi =
0
The
of Theorem
9.11 is
is very
similar to
to two
two other
other fundamental
important
The proof
proof of
Theorem 9.11
very similar
fundamental and
and important
results.
results.

x
H
Let A
A E
be Hermitian,
Hermitian, i.e.,
i.e., A
A =
AH.. Then
all eigenvalues
eigenvalues of
of A
A must
Theorem
9.12. Let
Theorem 9.12.
e C"nxn
" be
=A
Then all
must
be real.
real.
be

Proof:
Suppose (A,
(A., x)
an arbitrary
arbitrary eigenvalue/eigenvector
= A.JC.
Then
x) is
is an
eigenvalue/eigenvector pair
pair such
such that
that Ax
Ax =
AX. Then
Proof: Suppose
(9.7)
Taking
in (9.7)
yields
Taking Hermitian
Hermitian transposes
transposes in
(9.7) yields

H
Using
the fact
fact that
Hermitian, we
have that
that IXH
XxHxx =
= Xx
However, since
since xx is
is an
Using the
that A
A is
is Hermitian,
we have
AXHx.
x. However,
an
H
eigenvector,
A, i.e.,
A isisreal.
eigenvector, we have xH
X Xx =1=
/= 0,
0, from
from which
which we conclude
conclude IA. =
= A,
i.e., A.
real. 0D

x
Let A
A eE C"nxn
be Hermitian
Hermitian and
and suppose
suppose A
iJ- are
are distinct
Theorem
9.13. Let
Theorem 9.13.
" be
X and
and /JL
distinct eigenvalues
eigenvalues
of
A with
with corresponding
right eigenvectors
eigenvectors x and
and zz must
of A
corresponding right
and z, respectively.
respectively. Then
Then x and
must be
be
orthogonal.
orthogonal.
H
Proof:
the equation
equation Ax
= A.JC
to get
get ZH
ZH Ax
Take the
Hermitian
Premultiply the
Ax =
AX by
by Z
ZH to
Ax =
= XAZz HH xx.. Take
the Hermitian
Proof: Premultiply
A is Hermitian and A
transpose of
of this equation
equation and use the facts
facts that A
A.isisreal
realtotoget
getxXHHAz
Az ==
H
H
H
AxH
Az = iJ-Z
Az =
=
iJ-XH
AXH
Xx
z.z. Premultiply the equation
equation Az
i^z by xXHH to get
get xXHHAz
/^X
ZZ = Xx
z.z. Since
Since
A,A ^=1= /z,
that X
= 0,
0, i.e.,
two vectors
vectors must
be orthogonal.
iJ-, we
we must
must have
have that
x HHzz =
i.e., the
the two
must be
orthogonal. D
0

Let
us now
the general
case.
Let us
now return
return to
to the
general case.
nxn
Theorem 9.14.
. c
Cnxn
have distinct
distinct eigenvalues
eigenvalues A
, 1 ?...
. . . ,, A.
corresponding
Theorem
9.14. Let
Let A
A E
have
AI,
An n with
with corresponding
right
Then {XI,
[x\,...,
linearly independent
independent set.
The same
same
XI, ... ,,xxnn. Then
... , x
xn}} is
is a
a linearly
set. The
right eigenvectors
eigenvectors x\,...
result
holds for
corresponding left
left eigenvectors.
eigenvectors.
for the
the corresponding
result holds

Proof: For the


proof see,
Proof:
the proof
see, for
for example, [21,
[21, p.
p. 118].
118].

nxn
If
e c
C nx
" has
distinct eigenvalues,
eigenvalues, and
and if
if Ai
A., Ee A(A),
Theorem 9.11,
9.11, jc,
If A
A E
has distinct
A(A), then
then by
by Theorem
Xi is
is
H
orthogonal to
to all
all yj's
for which
i. However,
However, it
cannot be
the case
case that
yf*x
= 00 as
orthogonal
y/s for
which jj ^=1= i.
it cannot
be the
that Yi
Xi
as
t =
would be
be orthogonal
to nn linearly
vectors (by
Theorem 9.14)
well, or
well,
or else
else xXif would
orthogonal to
linearly independent
independent vectors
(by Theorem
9.14)
and would
thus have
to be
0, contradicting
the fact
fact that
is an
an eigenvector.
eigenvector. Since
Since yf*XiXi =1=
^ 00
and
would thus
have to
be 0,
contradicting the
that it
it is
for each
each i,
i, we
can choose
choose the
the *,
's, or
or the
y, 's,
's, or
or both,
so that
that Yi
ytHHx;
= 11
for
we can
the normalization
normalization of
of the
Xi'S,
the Yi
both, so
Xi =
for/ i E !1.
n.
for

yr

79

9.1.
9.1. Fundamental
Fundamental Definitions
Definitions and
and Properties

x
Theorem
Let A
A Ee C"
en xn
AI,, ...
Annand
Theorem 9.15.
9.15. Let
" have
have distinct
distinct eigenvalues
eigenvalues A.I
..., , A.
andlet
letthe
thecorrespondcorresponding right
right eigenvectors
eigenvectors form
matrix X
X = [XI,
[x\, ...
..., , xxn].
let YY =
[YI,""
[y\, ..., yYn]
ing
form aa matrix
Similarly, let
n]. Similarly,
n]
be
Furthermore, suppose
suppose that
be the
the matrix
matrix of
of corresponding
corresponding left
left eigenvectors.
eigenvectors. Furthermore,
that the
the left
left and
and
right eigenvectors
Xi =
Finally, let
A ==
right
eigenvectors have
have been
been normalized
normalized so
so that
that YiH
yf1 Xi
= 1,
1, i/ Een.!!:: Finally,
let A
txn
diag(AJ,
An)
]Rnxn.. Then
AXi =
= A.,-*/,
AiXi, i/ E
as
diag(A,j, ...
. . . ,, X
e W
Then AJC,
e !!,
n, can
can be
be written
written in
in matrixform
matrix form as
n) E

(9.8)

AX=XA

while YiH
y^XjXj =
= oij,
5,;, i/ E!!,
en, y'
e !!,
n, is
is expressed
expressed by
by the
equation
while
j E
the equation
yHX = I.

(9.9)

These
yield the
following matrix
These matrix
matrix equations
equations can
can be
be combined
combined to
to yield
the following
matrix factorizations:
factorizations:
X-lAX

=A =

yRAX

= XAX- I =

XAyH

and
and

(9.10)

(9.11)

LAixiyr
i=1

Example 9.16.
Let
Example
9.16. Let
2

5
-3

-3
-2

-4

Then
AI) = -(A
4A22 + 9)"
Then rr(A)
n(X) = det(A
det(A -- A./)
-(A.33 + 4A.
9 A. + 10)
10) = -()"
-(A. + 2)(),,2
2)(A.2 + 2)"
2A,++5),
5),from
from
which we
find A
A(A)
find the
which
we find
(A) =
= {-2,
{2, -1
1 2j}.
2 j } . We
We can
can now
now find
the right
right and
and left
left eigenvectors
eigenvectors
corresponding to
eigenvalues.
corresponding
to these
these eigenvalues.
For A-i
Al =
linear system
get
For
= -2,
2, solve
solve the
the 33 xx 33 linear
system (A
(A - (-2)l)xI
(2}I)x\ =
= 00 to
to get

Note that
one component
component of
of XI
;ci can
can be
set arbitrarily,
arbitrarily, and
and this
then determines
determines the
the other
other two
two
be set
this then
Note
that one
(since dimN(A
(since
dimA/XA - (-2)1)
(2)7) =
= 1).
1). To
To get
get the
the corresponding
corresponding left
left eigenvector
eigenvector YI,
y\, solve
solve the
the
linear system
system y\(A
21) =
= 00 to
to get
get
linear
(A + 21)

yi

yi

This
time we
we have
arbitrary scale
1.
This time
have chosen
chosen the
the arbitrary
scale factor
factor for
for YJ
y\ so
so that
that y f xXI\ =
= 1.
For
A22 = -1
I)x2
get
For A
1 +
+ 2j,
2j, solve
solve the
the linear
linear system
system (A
(A - (-1
(1+
+ 2j)
2j)I)x
= 00 to
to get
2 =

X2

=[

3+ j ]
3 ~/
.

80

Chapter 9.
9. Eigenvalues
Eigenvalues and
and Eigenvectors
Eigenvectors
Chapter

Solve the
the linear
linear system
system y"
(A - ((-1
+ 227')/)
and nonnalize
normalize Y2
y>2 so
so that
that y"x
1 to
to get
Solve
yf (A
-I +
j) I) =
= 00 and
yf X2
get
2 = 1

For XT,
= -I
1 - 2j,
2j, we
we could
could proceed
proceed to
to solve
solve linear
linear systems
systems as
as for
for A2.
A.2. However,
we
For
A3 =
However, we
can also
also note
note that
that x$
=xX2
'
and
yi
=
jj.
To
see
this,
use
the
fact
that
A,
3
A.2
and
simply
X3 =
and
Y3
Y2.
To
see
this,
use
the
fact
that
A3
=
A2
and
simply
can
2
conjugate
AX22 =
A2X2 to
get Ax^
AX2 = ^2X2A2X2. A similar
conjugate the
the equation
equation A;c
^.2*2
to get
similar argument
argument yields
yields the
the result
result
for left
left eigenvectors.
eigenvectors.
for
Now
the matrix
right eigenvectors:
Now define
define the
matrix X of
of right
eigenvectors:
3- j ]
3+j
.

3+j
3-j

-2

-2

It
that
It is
is then
then easy
easy to
to verify
verify that

.!.=.L

!.1

l+j

.!.=.L

Other results
results in
in Theorem
Theorem 9.15
9.15 can
can also
also be
verified. For
For example,
Other
be verified.
example,
X-IAX=A=

-2
0

0
-1+2j

Finally, note
note that
we could
could have
solved directly
directly only
only for
for *i
and xX22 (and
(and XT,
= xX2).
Finally,
that we
have solved
XI and
X3 =
Then,
2). Then,
instead of
of detennining
determining the
j,'s directly,
directly, we
we could
could have
have found
found them
instead by
by computing
instead
the Yi'S
them instead
computing
X-I
X~l and reading off its rows.
Example 9.17.
9.17. Let
Example
Let
A =

[-~ -~ ~] .
o

-3

3
Then Jl"(A)
7r(A.) = det(A
A./) = -(A
+ 8A
8A22+ 19A
19X++ 12)
12)== -(A
-(A.++ I)(A
1)(A.++3)(A
3)(A,++4),
4),
Then
det(A -- AI)
_(A 3 +
from which
which we
we find
(A) =
= {-I,
{1, -3,
3, -4}.
4}.Proceeding
Proceedingasasininthe
theprevious
previousexample,
example,ititisis
from
find A
A(A)
gtruightforw!U"d
comput~
straightforward to
to compute

X~[~
and
and

x-,~q

3
2

0
-I

2
0
-2

-i ]
1

-3 ]
2

~ y'

9.1. Fundamental
Fundamental Definitions
Properties
9.1.
Definitions and
and Properties

81
81

l
We also
also have
have X~
X-I AX
AX =
A=
= diag(
-1, 3,
-3, -4),
which is
is equivalent
equivalent to
to the
the dyadic
dyadic expanWe
=A
diag(1,
4), which
expansion
sion

A = LAixiyr
i=1

~(-I)[ ~

W~ ~l+(-3)[ j ][~

+(-4) [ -; ] [~
~ (-I) [

I
(;
I

3
2

1
- 3

I
(;

I
(;

-~l

~J

I
(;

J+

(-3) [

I
2 0
0 0
I

-2

-2
0
I

]+

(-4) [

3
I

-3
I

-3
I

3
I

-3

3
I

-3
I

Theorem
9.18. Eigenvalues
Eigenvalues (but
not eigenvectors)
eigenvectors) are
under a
a similarity
similarity transtransTheorem 9.18.
(but not
are invariant
invariant under
formation T.
formation
T.
X) is
is an
pair such
that Ax
Ax =
= Xx.
AX. Then,
since T
T
Proof: Suppose
Proof:
Suppose (A,
(A, jc)
an eigenvalue/eigenvector
eigenvalue/eigenvector pair
such that
Then, since
I
AT)(T-lx)
x) =
= XA(Tis nonsingular, we have the equivalent
equivalent statement
statement (T(T~lIAT)(T~
( T ~ lIxx),
) , from
from which
the theorem
theorem statement
follows. For
For left
we have
have aa similar
similar statement,
statement, namely
the
statement follows.
left eigenvectors
eigenvectors we
namely
H
H
H HH
1
AyH
if and only if (T
= A(THHyf.
y)H. DD
yyH
AA = Xy
ifandon\yif(T
y) y)H
(T~(TAT)1 AT) =X(T
x
Remark 9.19.
9.19. If
analytic function
function (e.g.,
polynomial, or
or eeX,
or sin*,
sinx,
Remark
If /f is
is an
an analytic
(e.g., ff(x)
( x ) is
is aa polynomial,
, or
fl n
or, in general, representable
representable by a power series X^^o
L~:O anxn),
then
it
is
easy
to
show
that
n* )> then
easy to show that
the eigenvalues
eigenvalues of
f(A) (defined
(defined as
L~:OanAn) are
f(A), but
the
of /(A)
as X^o^-A")
are /(A),
butf(A)
/(A)does
does not
notnecessarily
necessarily
have all
all the
the same
same eigenvectors
eigenvectors (unless,
(unless, say,
A is
is diagonalizable).
diagonalizable). For
For example,
example, A
A =
= T
[~0 6
have
say, A
Oj]
2
has
only one
one right
corresponding to
has only
right eigenvector
eigenvector corresponding
to the
the eigenvalue
eigenvalue 0,
0, but
but A
A2 =
= f[~0 0~1]has
has two
two
independent right
right eigenvectors
eigenvectors associated
associated with
with the
the eigenvalue
o. What
What is
is true
true is
is that
that the
the
independent
eigenvalue 0.
eigenvalue/eigenvector pair
pair (A,
(A, x)
x) maps
maps to
to (f(A),
x) but
but not
not conversely.
eigenvalue/eigenvector
(/(A), x)
conversely.

The following
theorem is
is useful
useful when
when solving
solving systems
of linear
linear differential
differential equations.
The
following theorem
systems of
equations.
A
etA
Ax are
Details of how the matrix exponential e'
is used to solve
solve the system
system xi = Ax
are the subject
of
of Chapter
Chapter 11.
11.
xn
1
Theorem
9.20. Let A Ee R"
jRnxn and suppose
suppose X~~
X-I AX
= A,
A, where A
A is diagonal. Then
Theorem 9.20.
AX

= LeA,txiYiH.
i=1

82

Chapter 9.
9. Eigenvalues
Eigenvalues and
and Eigenvectors
Chapter
Eigenvectors

Proof:
Starting from
from the
definition, we
Proof' Starting
the definition,
we have
have

LeA;IXiYiH.
i=1

The following
following corollary
corollary is
is immediate
immediate from
from the
the theorem
setting tt == I.I.
The
theorem upon
upon setting
nx
Corollary
If A
A Ee R
]Rn xn
is diagonalizable
diagonalizable with
Ai, i/' E
right
Corollary 9.21.
9.21. If
" is
with eigenvalues
eigenvalues A.,-,
en,~, and
and right
AA
XA i
eigenvectors
,
/

n_,
then
e
has
eigenvalues
e
,
i

n_,
and
the
same
eigenvectors.
i
E
~,
then
e
has
eigenvalues
e
i
E
~,
and
the
same
eigenvectors.
eigenvectors xXi,
"
t

There
are extensions
extensions to
to Theorem
Theorem 9.20
9.20 and
and Corollary
Corollary 9.21
9.21for
forany
anyfunction
functionthat
thatisis
There are
analytic
A, i.e.,
i.e., ff(A)
... , f(An))Xanalytic on
on the
the spectrum
spectrum of
of A,
(A) =
= XXf(A)Xf(A)X~l I =
= Xdiag(J(AI),
Xdiag(/(A.i),...,
f ( X t t ) ) X ~ Il ..
It
course, to
have aa version
version of
which
It is
is desirable,
desirable, of
of course,
to have
of Theorem
Theorem 9.20
9.20 and
and its
its corollary
corollary in
in which
A
A is
is not
not necessarily
necessarily diagonalizable.
diagonalizable. It
It is
is necessary
necessary first
first to
to consider
consider the
the notion
notion of
of Jordan
Jordan
canonical form,
form, from
from which
such aa result
is then
then available
available and
and presented
in this
chapter.
canonical
which such
result is
presented later
later in
this chapter.

9.2
9.2

Jordan Canonical
Canonical Form
Form
Jordan

Theorem 9.22.
9.22.
Theorem
x
I. lordan
all A
A Ee C"
c nxn
AI, ... , kAnn E
C
1.
Jordan Canonical
Canonical Form
Form (JCF):
(/CF): For
For all
" with
with eigenvalues
eigenvalues X\,...,
eC
x
(not necessarily
necessarily distinct),
distinct), there
there exists
exists X
C^
" such
(not
X E
c~xn
such that
that

X-I AX

= 1 = diag(ll, ... , 1q),

(9.12)

where
of the
the lordan
Jordan block
matrices 1/ i1,, .
. . ,, 1q
Jq is
is of
of the
the form
form
where each
each of
block matrices

1i

Ai

Ai
Ai

(9.13)

o
o

Ai

Ai

9.2.
Jordan Canonical
Canonical Form
Form
9.2. Jordan

83
83

and L;=1 ki = n.
nx
Form: For all A E R
jRnxn" with eigenvalues AI,
2. Real Jordan Canonical Form:
Xi, ...
. . .,,An
Xn (not
(not
xn
necessarily distinct), there exists X
X
E R"
lR.~xn such that
necessarily

(9.14)
J\, ...
..., , J1qq is
form
where each of
of the Jordan block matrices 11,
is of
of the form

in the case of real eigenvalues A., e A (A), and

where
= [[ _'
andhI2 == [6
\0 ~]A ininthe
thecase
caseof
of complex
complex conjugate
conjugateeigenvalues
eigenvalues
Mi; =
_~; ^~: 1] and
where M
>
aijp
eA(A
).
(Xi
jfJi
E
A(A).
i
Proof:
Proof: For the proof
proof see, for example, [21, pp. 120-124].

D
0

Transformations
T == [I"__,~ -"{"]
allowus
usto
togo
goback
back and
andforth
forthbetween
between aareal
realJCF
JCF
Transformations like
like T
{ ] allow
and its complex counterpart:
T-I [ (X

+ jfJ
o

O. ] T
(X - JfJ

=[

(X
-fJ

fJ ]
(X

= M.

complicated. With
For nontrivial Jordan blocks, the situation is only a bit more complicated.
1

-j

-j

~ -~]

o
-j

'

84

Chapter 9.
9. Eigenvalues
Eigenvectors
Chapter
Eigenvalues and
and Eigenvectors

it is
is easily
it
easily checked
checked that
that

T- I

[ "+ jfi
0
0
0

et

0
0

+ jf3
0
0

0
0

]T~[~ l
h

et - jf3

et - jf3

Definition
Definition 9.23.
9.23. The
The characteristic
characteristic polynomials
polynomials of
of the
the Jordan
Jordan blocks
blocks defined
defined in
in Theorem
Theorem
9.22
are called
the elementary
or invariant
of A.
9.22 are
called the
elementary divisors
divisors or
invariant factors
factors of
A.
matrix is
product of
of its
its elementary
Theorem
9.24. The
characteristic polynomial
polynomial of
Theorem 9.24.
The characteristic
of aa matrix
is the
the product
elementary
divisors.
The minimal
of aa matrix
divisors of
of
divisors. The
minimal polynomial
polynomial of
matrix is
is the
the product
product of
of the
the elementary
elementary divisors
highest degree
corresponding to
to distinct
distinct eigenvalues.
highest
degree corresponding
eigenvalues.

x
Theorem 9.25.
" with
eigenvalues AI,
...," X
Then
Theorem
9.25. Let
Let A
A eE C"nxn
with eigenvalues
AI, ..
An.
n. Then

1. det(A) = nAi.
i=1
n

2. Tr(A) =

2,)i.
i=1

Proof:
Proof:
l

1.
Theorem 9.22
we have
have that
A = XXJJXX-I.
Thus,
1. From
From Theorem
9.22 we
that A
~ . Thus,
1
det(A) =
) = det(7)
A,-.
det(A)
= det(XJXdet(X J X-I)
det(J) =
= ]~["
n7=1
Ai.
=l

Theorem 9.22
2. Again, from
from Theorem
9.22 we have that A = XXJJXX-I.
~ l . Thus,
l
11
Tr(A) =
= Tr(XJX~
) = TrC/X"
*) =
Tr(A)
Tr(X J X-I)
Tr(JX- X)
= Tr(/)
Tr(J) =
= "
L7=1
Ai.
=1 A.,-.

D
0

Example 9.26.
Suppose
A e
E lR.
is known
known to
to have
have 7r(A)
:rr(A) = (A
Example
9.26.
Suppose A
E7x7 is
(A.- - 1)4(A
1)4(A- - 2)3
2)3and
and
2
2
et(A)
a (A.) =
= (A
(A.- 1)2(A
I) (A.- 2)2.
2) . Then
ThenAAhas
hastwo
twopossible
possibleJCFs
JCFs(not
(notcounting
countingreorderings
reorderingsofofthe
the
diagonal
blocks):
diagonal blocks):
1

J(l)

0
0
0

0
0

0
0
0

0
0
0

0
0
1
0
0
0
0

0
0
0

0
0
0
1 0
0 2
0 0

0
0
0
0
1
2

1
0
0 0

0
0
0

0
0
0
2

and

f2)

0 0 0 0
0

I
1 0
0 2

0 0 0
0

0
0

0
0 0
0 0
1 0
2 0

0 0 0 0 0
0 0 0 0 0 0 2

(1)
has elementary
- 1),
(A - (A.
1),-(A1),
- 2)2,
- 2),(A - 2),
Note that 7J(l)
has
elementary divisors
divisors(A(A- -1)z,
I) 2(A
, (A.
- 1),
(A, -and
2)2(A
, and
2)
2
2
2
J(2) has
has elementary
- -1)2,I)(A, (A
- 2)2,
(A -(A2).
while /(
elementarydivisors
divisors (A(A- -1)2,
I) (A
, (A
- 2)and
, and
- 2).

9.3. Determination
Determination of
JCF
9.3.
of the
the JCF

85
&5

Example
rr(A),
l) for
Example 9.27.
9.27. Knowing TT
(A.), a(A),
a (A), and
and rank(A
rank (A -Ai
A,,7)
for distinct
distinct Ai
A.,isis not
not sufficient
sufficient to
to
determine
A uniquely.
determine the JCF of A
uniquely. The matrices

Al=

a
0
0
0
0
0
0

0
a
0
0
0
0
0

a
0
0
0
0

0
0
0
a
0
0
0

0
0
0
a
0
0

0
0
0
0
0
a
0

0
0
0
0
0
1
a

A2 =

a
0
0
0
0
0
0

0
a
0
0
0
0
0

a
0
0
0
0

0
0
0
a
0
0
0

0
0
0
a
0
0

0
0
0
0
a
0

0
0
0
0
0
0
a

a)\
al)
both have rr(A)
7r(A.) =
= (A
(A.- a)7,
a) ,a(A)
a(A.)== (A(A.-
a) and
, andrank(A
rank(A-
al) ==4, 4,i.e.,
i.e.,three
threeeigeneigenvectors.

9.3

Determination of
of the
the JCF
Determination
JCF

lxn
The first critical item of information in determining the JCF of a matrix A
]R.nxn is its
A Ee W
number of eigenvectors. For each distinct eigenvalue Ai,
A,,, the associated
associated number of linearly
independent
right (or left) eigenvectors
eigenvectors is given by dim
dimN(A
A;l) = n - rank(A - A;l).
independent right
A^(A - A.,7)
A.(7).
The straightforward
straightforward case
case is,
of course,
course, when
when Ai
X,- is
is simple,
simple, i.e.,
of algebraic
algebraic multiplicity
1; it
it
The
is, of
i.e., of
multiplicity 1;
then has precisely one eigenvector. The more interesting (and difficult) case occurs when
Ai is of algebraic multiplicity
multiplicity greater than one. For example, suppose
A,
suppose

A =

[3 2
0

o
Then
Then

A-3I=

3
0

U2 I]
o
o

0
0

has rank 1, so the eigenvalue 3 has two eigenvectors associated


associated with it. If
If we let [~l
[^i ~2
2 ~3]T
&]T
denote aa solution
solution to
to the
the linear
linear system
system (A
(A - 3/)
0, we
that 2~2
22 +
+
= 0O.. Thus,
Thus, both
both
denote
3l)~ =
= 0,
we find
find that
~33=

are eigenvectors
eigenvectors (and
(and are
are independent).
independent). To
get aa third
JC3 such
such that
X = [Xl
[x\ KJ_
XT,]
are
To get
third vector
vector X3
that X
X2 X3]
reduces A to JCF, we need the notion of principal vector.

xn
x
Definition 9.28.
A Ee C"nxn
(or R"
]R.nxn).
principal vector of degree
Definition
9.28. Let A
"). Then xX is a right principal
degree k
associated with A
A(A)
X Ee A
(A) ifand
if and only
only if(A
if (A -- ulx
XI)kx == 00 and
and(A
(A -- AI)k-l
U}k~lxx i=
^ o.
0.

Remark
Remark 9.29.
9.29.
1. An analogous definition holds for a left
left principal vector of degree k.
k.

86

Chapter 9.
9. Eigenvalues
Eigenvectors
Chapter
Eigenvalues and
and Eigenvectors

synonymously with "of


"of degree k."
2. The phrase "of
"of grade k" is often
often used synonymously
3. Principal vectors are sometimes also called generalized
generalized eigenvectors,
eigenvectors, but the latter
different meaning in Chapter 12.
term will be assigned a much different
= 1 corresponds to the "usual" eigenvector.
eigenvector.
4. The case kk =

S.
of
5. A right (or left) principal vector of degree kk is associated with a Jordan block J;
ji of
dimension k or larger.

9.3.1
9.3.1

Theoretical
Theoretical computation
computation

To motivate the development of a procedure for determining


determining principal vectors, consider a
(1)
(2)
2 2
2 x 2 Jordan
Jordan block{h[~0 h1.
i]. Denote
Denote by
by xx(l)
and x
x(2) the
the two
two columns
columns of
of aa matrix
matrix XX eE R
lR~X2
2x2
and
,x
A to this JCF.
JCF. Then
J can
that reduces a matrix A
Then the
theequation
equation AX
AX == XXJ
canbe
bewritten
written
A [x(l)

x(2)] = [x(l)

X(2)]

[~ ~

J.

The first column yields the equation Ax(!)


Ax(1) =
= AX(!),
hx(1) which simply says that x(!)
x (1) is a right
(2)
x(2),, the principal vector
eigenvector. The second
second column yields the following equation for x
of
degree 2:
of degree
(A - A/)x(2)

= x(l).

(9.17)
z (2)

If
we premultiply
premultiply (9.17) by
by (A
AI), we
we find
find (A
==(A
If we
(A -- XI),
(A-- A1)2
X I ) x(2)
x
(A-- A1)X(l)
XI)x ==O.0.Thus,
Thus,
the definition of principal vector is satisfied.
x
lR nxn
This suggests a "general" procedure. First, determine all eigenvalues of A eE R"
"
nxn ).
A eE A
A(A)
following:
(or C ). Then for each distinct X
(A) perform the following:

1. Solve
(A - A1)X(l) = O.

I) associated
This step finds all the eigenvectors (i.e., principal vectors of degree 1)
associated with
A. The number of
of A - XI.
AI. For example, if
if
X.
of eigenvectors depends on the rank of
- XI)
A/) =
= n - 1, there is only one eigenvector. If
multiplicity of
rank(A
If the algebraic multiplicity
of
principal vectors still need
need to be computed
XA is greater than its geometric multiplicity, principal
from succeeding steps.
(1)
x(l),, solve
2. For each independent jc

(A - A1)x(2) = x(l).

of
The number of linearly independent solutions at this step depends on the rank of
(A
- uf.
(A
X I ) 2 . If, for example, this rank is nn - 2, there are two linearly independent
AI)22x^
o. One of these solutions
solutions
solutions to the homogeneous equation (A
(A - XI)
x (2) = 0.
(l)
22 ( l )
(1= 0),
0), since (A
= (A
AI)O = 0.
o. The
The other
is, of course, xx(l) (^
(A -- 'A1)
X I ) xx(l) =
(A - XI)0
othersolution
solution
necessary to take a linear
is the desired principal vector of degree 2. (It may be necessary
(1)
of jc
x(l)
R(A
- XI).
AI). See,
combination of
vectors to get
get a right-hand
right-hand side that is in 7(A
See, for
example, Exercise 7.)

9.3. Determination
Determination of
of the
the JCF
9.3.
JCF

87

3.
3. For
For each
each independent
independent X(2)
x(2) from
from step
step 2,
2, solve
solve
(A -

AI)x(3)

x(2).

4. Continue
Continue in
in this
this way
until the
the total
total number
number of
of independent
independent eigenvectors
eigenvectors and
and principal
4.
way until
principal
vectors is
is equal
equal to
to the
the algebraic
algebraic multiplicity
multiplicity of
of A.
vectors
A.
Unfortunately, this
this natural-looking
can fail
fail to
to find
find all
vectors. For
For
Unfortunately,
natural-looking procedure
procedure can
all Jordan
Jordan vectors.
more extensive
more
extensive treatments,
treatments, see,
see, for
for example,
example, [20]
[20] and
and [21].
[21]. Determination
Determination of
of eigenvectors
eigenvectors
and principal
principal vectors
is obviously
obviously very
for anything
anything beyond
simple problems
problems (n
(n =
= 22
and
vectors is
very tedious
tedious for
beyond simple
or
or 3,
3, say).
say). Attempts
Attempts to
to do
do such
such calculations
calculations in
in finite-precision
finite-precision floating-point
floating-point arithmetic
arithmetic
generally prove
prove unreliable.
unreliable. There
There are
are significant
significant numerical
numerical difficulties
difficulties inherent
inherent in
in attempting
generally
attempting
to compute
compute aa JCF,
JCF, and
and the
the interested
interested student
student is
is strongly
strongly urged
urged to
to consult
consult the
the classical
classical and
and very
to
very
readable
MATLAB
readable [8]
[8] to
to learn
learn why.
why. Notice
Notice that
that high-quality
high-quality mathematical
mathematical software
software such
such as
as MATLAB
does not
not offer
j cf command,
j ardan command
is available
does
offer aa jcf
command, although
although aa jordan
command is
available in
in MATLAB's
MATLAB'S
Symbolic Toolbox.
Toolbox.
Symbolic
kxk
Theorem 9.30.
9.30. Suppose
Suppose A
Ckxk
has an
an eigenvalue
eigenvalue A
A,ofofalgebraic
algebraicmultiplicity
multiplicitykkand
and
Theorem
A Ee C
has
suppose further
further that
X =
of
suppose
that rank(A
rank(A - AI)
AI) =
= kk - 1.
1. Let
Let X
= [[x(l),
x ( l ) , ...
. . . ,, X(k)],
x(k)], where
where the
the chain
chain of
vectors
Then
vectors x(i)
x(i) is
is constructed
constructed as
as above.
above. Then

Theorem
Theorem 9.31.
9.31. {x(l),
(x (1) , ...
. . . ,, X(k)}
x (k) } is
is aa linearly
linearly independent
independent set.
set.
Theorem
Principal vectors
Jordan blocks
indeTheorem 9.32.
9.32. Principal
vectors associated
associated with
with different
different Jordan
blocks are
are linearly
linearly independent.
pendent.
Example
Let
Example 9.33.
9.33. Let

A=[~0 01 2; ] .
The eigenvalues
eigenvalues of
of A
are A1
= I,
1, A2
h2 =
= 1,
1, and
and A3
h3 =
= 2.
2. First,
First, find
the eigenvectors
eigenvectors associated
associated
The
A are
AI =
find the
with
the distinct
distinct eigenvalues
eigenvalues 11 and
and 2.
with the
2.
,(1)=
(A
2I)x~1)
= 00 yields
(A --2/)x3(1)
yields

88

Chapter 9. Eigenvalues and Eigenvectors


(1)

(A
yields
(A-- 11)x?J
l/)x, ==00 yields

To find
find aa principal
of degree
degree 22 associated
associated with
with the
the multiple
multiple eigenvalue
eigenvalue 1,
1, solve
solve
To
principal vector
vector of
(A
get
(A - 1I)xl
l/)x,(2)2) == xiI)
x, (1)to
toeet

x,

(2)

Now let
let
Now
X

= [xiI)

=[

0~ ]

xl" xl"] ~ [ ~

0
1

5
3

Then itit is
is easy
easy to
to check
check that
Then
that

X-'~U -i
0

-5 ]

and X-lAX

=[

9.3.2
9.3.2

On the +1 's
's in JCF
JCF blocks

In
this subsection
subsection we
show that
superdiagonal elements
elements of
of aa JCF
not be
In this
we show
that the
the nonzero
nonzero superdiagonal
JCF need
need not
be
11's's but
but can
can be
be arbitrary
arbitrary - so
so long
long as
as they
they are
are nonzero.
nonzero. For
For the
the sake
sake of
of definiteness,
defmiteness, we
we
consider below
below the
case of
of aa single
single Jordan
but the
the result
clearly holds
any JCF.
JCF.
consider
the case
Jordan block,
block, but
result clearly
holds for
for any
nxn
Suppose
and
SupposedAAE RjRnxn
and

Let D
diag(d1, ...
. . . ,, ddnn)) be
be aa nonsingular
nonsingular "scaling"
"scaling" matrix.
D = diag(d"
matrix. Then
Then
Let

D-'(X-' AX)D = D-' J D = j

4l.
d,

)...

!b.
d,

dn -

dn -

A-

0
dn
dn -

)...

89

9.4.
9.4. Geometric
Geometric Aspects
Aspects of
of the
the JCF
JCF

di's
Appropriate choice of the di
's then yields any desired nonzero superdiagonal elements.
interpreted in terms of the matrix X =
[x[, ... ,x
This result can also be interpreted
= [x\,...,
xnn]] of eigenvectors
eigenvectors
and principal
that reduces
reduces A
Specifically, Jj is
is obtained
obtained from
from A
the
and
principal vectors
vectors that
A to
to its
its JCF.
lCF. Specifically,
A via
via the
similarity
dnxn}.
similarity transformation
transformation XD
XD = \d\x\,...,
[d[x[, ... , dnxn].
In
similar fashion,
reverse-order identity
matrix (or
matrix)
In aa similar
fashion, the
the reverse-order
identity matrix
(or exchange
exchange matrix)
I

0
p = pT = p-[ =

(9.18)

0
I

can be used to
to put the superdiagonal
superdiagonal elements
elements in
in the subdiagonal instead
instead if that is desired:
desired:
A

0
0

p-[

p=

0
A
0

9.4
9.4

I
A

0
A

Geometric Aspects
of the
Geometric
Aspects of
the JCF
JCF

X
nxn
The matrix X
X that reduces a matrix A E
jH.nxn
)) totoaalCF
e IR"
"(or
(or Cnxn
JCFprovides
providesaachange
changeof
ofbasis
basis
with
respect to
diagonal or
or block
natural to
with respect
to which
which the
the matrix
matrix is
is diagonal
block diagonal.
diagonal. It
It is
is thus
thus natural
to expect
expect an
an
associated direct
direct sum
decomposition of
of jH.n.
R. Such
Such aa decomposition
decomposition is
is given
given in
in the
the following
associated
sum decomposition
following
theorem.
x
Theorem 9.34.
Suppose A Ee R"
jH.nxn
9.34. Suppose
" has characteristic polynomial

n(A) = (A - A[)n) ... (A - Amtm

and minimal polynomial


a(A) = (A - A[)V) '" (A - Am)Vm

. . . ,, A.
distinct. Then
Then
with A-i,
AI, ...
Ammdistinct.
jH.n

= N(A
= N (A

- AlIt) E6 ... E6 N(A - AmItm


- A1I)

v)

E6 ... E6 N (A - Am I) Vm .

dimN(A - A.,/)
AJ)Viw =
= ,-.
ni.
Note that dimM(A
Definition
9.35. Let
Definition 9.35.
Let V be a vector space over F
IF and
and suppose
suppose A : V >
--+ V is a linear
transformation. A
subspace S
c V
V is
if AS
c S,
is defined
as the
transformation.
A subspace
S ~
is A-invariant
A -invariant if
AS ~
S, where
where AS
AS is
defined as
the
set
{As : ss eE S}.
S}.
set {As:

90

Chapter
Eigenvectors
Chapter 9.
9. Eigenvalues
Eigenvalues and
and Eigenvectors

If
... , Sk
If V is taken to be ]Rn
R" over Rand
R, and SS Ee ]Rn
R"xxk* is a matrix whose columns SI,
s\,...,
s/t
span aa k-dimensional
/^-dimensional subspace
subspace S,
<S,i.e.,
i.e.,R(S)
K(S) == S,
<S,then
thenS <S
A-invariant
andonly
onlyififthere
there
span
is isA-invariant
ififand
kxk
exists M EeR
]Rkxk such that
(9.19)
AS = SM.
This follows easily by comparing the ith
/th columns of each side of (9.19):

Example 9.36.
9.36. The
The equation
equation Ax
Ax = A*
= xx A
defining aa right
right eigenvector
eigenvector xx of
of an
an eigenvalue
Example
AX =
A defining
eigenvalue
XA says that *x spans an A-invariant subspace (of dimension one).
Example 9.37.
9.37. Suppose X block diagonalizes A, i.e.,
X-I AX =

[~

J
2

].

Rewriting in the form

~ J,
we
have that
that A
A,i = X;li,
A", /,,i /== 1,2,
1, 2,sosothe
thecolumns
columnsofofXiA,span
spanananA-invariant
A-mvanantsubspace.
subspace.
we have
AX
Theorem 9.38.
9.38. Suppose A Ee ]Rnxn.
E"x".

7.
= CloI
o/ +
+ ClIA
o?i A +
+ '" +
+ ClqAq
<xqAq be
be aa polynomial
polynomial in
in A.
A. Then
Then N(p(A))
N(p(A)) and
and
1. Let
Let p(A)
peA) =
R(p(A)) are A-invariant.
7(p(A))
A-invariant.

2. S is A-invariant
A -invariant if
if and only
only if
ifSS1-1. is AATT-invariant.
Theorem
If V is a vector space over IF
NI EB
Theorem 9.39.
9.39. If
F such that V =
= N\
...
EB
0 Nmm, , where each
A//
is A-invariant,
then aa basis
V can
can be
with respect
respect to
which A
N; is
A-invariant, then
basis for
for V
be chosen
chosen with
to which
A has
has aa block
block
diagonal
representation.
diagonal representation.

The Jordan
Jordan canonical
canonical form
form is
is aa special
special case
case of
of the
above theorem.
If A
A has
The
the above
theorem. If
has distinct
distinct
eigenvalues
Ai as in Theorem 9.34,
N(A
- A.,-/)"'
Ai/)n, by SVD, for
eigenvalues A,,9.34, we could choose bases for N(A
example (note
(note that
that the
the power
power ni
n, could
could be
be replaced
replaced by
v,). We
would then
then get
get aa block
block diagonal
diagonal
example
by Vi).
We would
representation for
blocks rather
structured Jordan
blocks. Other
Other
representation
for A
A with
with full
full blocks
rather than
than the
the highly
highly structured
Jordan blocks.
such "canonical"
"canonical" forms
forms are
are discussed
discussed in
text that
that follows.
such
in text
follows.
Suppose A"
X == [Xl
AX ==diag(J1,
... , Jm
[ X .....
i , . . . ,Xm]
Xm] Ee]R~xn
R"nxnisissuch
suchthat
thatX-I
X^AX
diag(7i,...,
Jm),),where
where
each Ji
= diag(JiI,""
diag(/,i,..., Jik,)
//*,.) and
and each
each /,*
is aa Jordan
Jordan block
block corresponding
corresponding to
to Ai
A, Ee A(A).
each
Ji =
Jik is
A(A).
We could also use other block diagonal decompositions (e.g., via SVD), but we restrict our
attention
to only
only the
the Jordan
block case.
case. Note
that A
A", == Xi
A*,- J/,,i , so
so by
by (9.19)
(9.19) the
the columns
columns
attention here
here to
Jordan block
Note that
AXi
of A",
(i.e., the
the eigenvectors
eigenvectors and
and principal
vectors associated
associated with
with Ai)
A.,) span
span an
an A-invariant
of
Xi (i.e.,
principal vectors
A-invariant
subspace of]Rn.
of W.
Finally, we return to the problem of developing a formula
A
formula for ee'l AA in the case that A
x
T
nxn
is
not necessarily
diagonalizable. Let
7, E <e
C" "' , be
Jordan basis
for N
N(A
is not
necessarily diagonalizable.
Let Yi
be aa Jordan
basis for
(AT - A.,/)"'.
A;lt.
Equivalently, partition
Equivalently,
partition

9.S. The
The Matrix
Sign Function
Function
9.5.
Matrix Sign

91
91

compatibly.
Then
compatibly. Then
A = XJX- I = XJy H
= [XI, ... , Xm] diag(JI, ... , Jm) [YI ,

Ym]H

= LX;JiYi .
i=1

In a similar fashion we can compute


m

etA = LXietJ;YiH,
i=1

which
in conjunction
which is
is aa useful
useful formula
formula when
when used
used in
conjunction with
with the
the result
result
A

0
exp t

teAt

.lt 2 e At

eAt

teAt

eAt

1
A

A
A

eAt

2!

block 7,
Ji associated
A == Ai.
for a k x k Jordan block
associated with an eigenvalue A.
A.,.

9.5
9.5

The
Function
The Matrix
Matrix Sign
Sign Function

section we give a very brief


brief introduction to an interesting
interesting and useful
useful matrix function
In this section
function
called
sign function.
sign (or
scalar. A
called the
the matrix
matrix sign
function. It
It is
is aa generalization
generalization of
of the
the sign
(or signum)
signum) of
of aa scalar.
A
survey of the matrix sign function and some of its applications can be found in [15].
Definition 9.40.
9.40. Let z E
E C with Re(z) ^f= O.
of z is defined
defined by
Definition
0. Then the sign of
ifRe(z) > 0,
ifRe(z) < O.

Re(z)
{+1
sgn(z) = IRe(z) I =
-1

x
Definition 9.41.
cnxn
Definition
9.41. Suppose A E
e C"
" has no eigenvalues on the imaginary axis, and let

be
Jordan canonical
canonicalform
form for
for A, with
with N
N containing
containing all
all Jordan
Jordan blocks
blocks corresponding
corresponding to
to the
the
be aa Jordan
in the
the left
left half-plane
half-plane and
and P
P containing
containing all
all Jordan
Jordan blocks
blocks corresponding
corresponding to
eigenvalues of
eigenvalues
of A in
to
eigenvalues in
eigenvalues
in the
the right
right half-plane.
half-plane. Then
Then the
the sign
sign of
of A,
A, denoted
denoted sgn(A),
sgn(A), is
is given
given by
by
sgn(A) = X

[ -/ 0]
0

-I

92
92

Chapter 9.
9. Eigenvalues
Eigenvalues and
and Eigenvectors
Eigenvectors
Chapter

where the negative and positive


positive identity matrices are of
of the same dimensions as N and p,
P,
respectively.
There are
are other
other equivalent
equivalent definitions
definitions of
of the
sign function,
function, but
but the
one given
There
the matrix
matrix sign
the one
given
here is
is especially
especially useful
useful in
in deriving
deriving many
of its
its key
key properties.
The JCF
JCF definition
definition of
of the
the
here
many of
properties. The
matrix sign function does not generally
generally lend itself
itself to reliable computation on a finite-wordlength digital computer. In fact, its reliable numerical calculation
calculation is an interesting topic in
its own right.
We state
state some
some of
the more
properties of
matrix sign
sign function
function as
as theorems.
theorems.
We
of the
more useful
useful properties
of the
the matrix
Their
Their straightforward proofs are left
left to the exercises.
exercises.

x
Theorem 9.42.
9.42. Suppose A Ee C"nxn
" has no eigenvalues on the imaginary axis, and let
Theorem
= sgn(A).
S=
sgn(A). Then the following
following hold:

1. S is diagonalizable with eigenvalues equal to del.


1.
2. S2
2.
S2 =
= I.
I.

3.
= SA.
SA.
3. AS
AS =
4.
sgn(AH) =
4. sgn(A")
= (sgn(AH.
(sgn(A))".
l
x
5.
sgn(T-1AT)
foralinonsingularT Ee C"
enxn
5. sgn(TAT) = T-1sgn(A)T
T-lsgn(A)TforallnonsingularT
"..

6. sgn(cA)
= sgn(c)
sgn(c) sgn(A)
sgn(A)/or
c.
6.
sgn(cA) =
for all nonzero real scalars c.
x
nxn
Theorem 9.43.
9.43. Suppose A Ee e
C"
" has no eigenvalues on the imaginary axis, and let
Theorem
sgn(A).
sgn(A). Then the following
S=
following hold:

1.
/) is an A-invariant
left half-plane
half-plane eigenvalues
I. 7l(S
R(S -l)
A-invariant subspace corresponding to the left
of A (the
(the negative
negative invariant
invariant subspace).
subspace).
of

2. R(S+/)
R(S + l) is an A-invariant
A -invariant subspace corresponding to the right half-plane
half-plane eigenvalues
of A (the
(the positive
invariant subspace).
of
positive invariant
3. negA
negA ==
= (l
(/ - S)
S)/2
of A.
3.
/2 is a projection
projection onto the negative invariant subspace
subspace of
4. posA ==
positive invariant subspace of
= (l
(/ +
+ S)/2 is a projection onto the positive
of A.
A.

EXERCISES
EXERCISES

nxn
1.
A Ee Cnxn
),.1>
),.nn with corresponding right
1. Let A
have distinct
distinct eigenvalues AI,
...,, X
right eigeneigenvectors
... ,,xXnn and
and left
left eigenvectors
eigenvectors Yl,
y\, .
..., , Yn,
yn, respectively.
respectively. Let
Let v Ee en
C" be
be an
vectors Xi,
Xl, ...
an
arbitrary vector.
vector. Show
Show that
that vv can
can be
be expressed
expressed (uniquely)
(uniquely) as
as aa linear
linear combination
combination
arbitrary

of the right eigenvectors. Find the appropriate expression


expression for v as a linear combination
of the left eigenvectors as well.

93
93

Exercises

x
H
2.
A E
rc nxn
i.e., A
AH
=
-A. Prove
that all
of
2. Suppose
Suppose A
C"
" is
is skew-Hermitian,
skew-Hermitian, i.e.,
= A.
Prove that
all eigenvalues
eigenvalues of
aa skew-Hermitian
matrix
must
be
pure
imaginary.
skew-Hermitian matrix must be pure imaginary.
x
3.
A Ee C"
rc nxn
is Hermitian.
Let A
be an
an eigenvalue
eigenvalue of
A with
with corresponding
3. Suppose
Suppose A
" is
Hermitian. Let
A be
of A
corresponding
right
eigenvector x.
Show that
also aa left
left eigenvector
eigenvector for
for A.
right eigenvector
x. Show
that xx is
is also
A. Prove
Prove the
the same
same result
result
if A
A is
skew-Hermitian.
if
is skew-Hermitian.
5x5
4. Suppose a matrix A E lR.
R5x5
has eigenvalues {2,
{2, 2, 2, 2, 3}.
3}. Determine all possible
JCFs for
A.
JCFs
for A.

5.
5. Determine the eigenvalues,
eigenvalues, right eigenvectors
eigenvectors and
and right principal vectors if
if necessary,
and (real) JCFs of the following matrices:
(a)

2 -1 ]
0 '
[ 1

6.
Determine the
the JCFs
JCFs of
6. Determine
of the
the following
following matrices:
matrices:

<a)

Uj

7.
7. Let
Let
A =

-2
-1

=n

[H -1]
2

2"

Find aa nonsingular
nonsingular matrix
matrix X
X such
that X
X-IAX
AX =
= J,J, where
where JJ is
is the
the JCF
JCF
Find
such that

J=[~0 0~ 1~].
r
Hint: Use[
Use[-11 11 - l]
I]T
and[1 0 of
Hint:
as an
an eigenvector. The vectors [0
[0 1 -If
l] r and[l
0]r
(2)
(1)
I)x(2) = x
x(1) can't be solved.
are both eigenvectors, but then the equation (A
(A - /)jc

8.
that all
right eigenvectors
of the
the Jordan
Jordan block
block matrix
matrix in
in Theorem
9.30 must
must be
8. Show
Show that
all right
eigenvectors of
Theorem 9.30
be
multiples
lR. k . Characterize
Characterize all
multiples of
of el
e\ eE R*.
all left
left eigenvectors.
eigenvectors.
x
T
9. Let A
A eE R"
lR.nxn
A =
xyT,
x, y
y e
E R"
lR.n are nonzero vectors with
" be of the form A
= xy
, where x,
TT
xx yy = 0.
O. Determine
Determine the
the JCF
JCF of
of A.
A.
xn
T
10.
Let A
A eE R"
lR. nxn
be of
the form
form A
A =
= 1+
xyT,
where x,
x, y
y e
E R"
lR. n are
nonzero vectors
vectors
10. Let
be
of the
/ + xy
, where
are nonzero
TT
with x yy =
O. Determine
Determine the
the JCF
A.
with
= 0.
JCF of
of A.

16x 16
11. Suppose a matrix A
A Ee R
lR. 16x
16 has
has 16 eigenvalues
eigenvalues at
at 00 and its
its JCF
JCF consists
consists of
of a single
single
16
Jordan
Jordan block of the form
form specified
specified in Theorem 9.22.
9.22. Suppose
Suppose the small number 1010~16
is
to the
the (16,1)
(16,1) element
element of
What are
are the
the eigenvalues
eigenvalues of
of this
this slightly
slightly perturbed
is added
added to
of J.
J. What
perturbed
matrix?
matrix?

94

Chapter 9.
9. Eigenvalues
Eigenvalues and
and Eigenvectors
Eigenvectors
Chapter

x
12.
A E
jRnxn
A = SIS2,
SI
12. Show
Show that
that every
every matrix
matrix A
e R"
" can
can be
be factored
factored in
in the
the form
form A
Si$2, where
where Si
and
real symmetric
symmetric matrices
matrices and
and one
one of
of them,
them, say
say S1,
Si, is
is nonsingular.
nonsingular.
and 2
S2 are
are real
Hint: Suppose
A =
A to
Hint:
Suppose A
= XXl
J XX-I
~ l is
is aa reduction
reduction of
of A
to JCF
JCF and
and suppose
suppose we
we can
can construct
construct
the
the "symmetric
"symmetric factorization"
factorization" of
of 1.
J. Then
Then A =
= (X
( X SSIXT)(Xi X T ) ( X ~ T T S2X-I)
S2X~l) would
would be the
A. Thus,
required
required symmetric
symmetric factorization
factorization of
of A.
Thus, it
it suffices
suffices to
to prove
prove the
the result
result for
for the
the
JCF. The
The transformation
transformation P
P in
in (9.18)
(9.18) is
is useful.
useful.
JCF.
x
13.
jRn xn
is similar
similar to
to its
its transpose
transpose and
and determine
determine aa similarity
similarity
13. Prove
Prove that
that every
every matrix
matrix A Ee W
" is
transformation explicitly.
explicitly.
transformation
Hint:
Use the
the factorization
factorization in
in the
the previous
previous exercise.
exercise.
Hint: Use

14.
block upper
upper triangular
14. Consider
Consider the
the block
triangular matrix
matrix
A _ [ All

Al2 ]

A22

'

xn
kxk
where A Ee M"
jRnxn and
All
jRkxk with 1
::s: n.
Al2
we
and A
e R
1 ::s:
< k <
n. Suppose
Suppose A
^ 0 and
and that we
n E
u =1=
want to
to block
diagonalize A
via the
similarity transformation
A via
the similarity
transformation
want
block diagonalize

where X
X Ee IRkx(n-k),
R*x <-*), i.e.,
T-IAT = [A011

A22

Find aa matrix
matrix equation
equation that
that X
X must
must satisfy
satisfy for
for this
this to
to be
If nn =
= 22 and
and kk =
= 1,
Find
be possible.
possible. If
1,
what
you say
All and
A22,
what can
can you
say further,
further, in
in terms
terms of
of AU
and A
22, about
about when
when the
the equation
equation for
for X is
is
solvable?
solvable?
15.
Prove Theorem
15. Prove
Theorem 9.42.
9.42.
16.
16. Prove
Prove Theorem
Theorem 9.43.
9.43.

en

A Ee C"xn
xn has
that
17.
17. Suppose
Suppose A
has all
all its
its eigenvalues
eigenvalues in
in the
the left
left half-plane.
half-plane. Prove
Prove that
sgn(A)
sgn(A) =
= -1.
-/.

Chapter
10
Chapter 10

Canonical
Canonical Forms
Forms

10.1
10.1

Some Basic
Basic Canonical
Canonical Forms
Some
Forms

Problem: Let
Let V
and W
W be
be vector
vector spaces
and suppose
suppose A
A :: V
---+ W
W is
is aa linear
linear transformation.
transformation.
Problem:
V and
spaces and
V >
Find
V and
"simple form"
or "canonical
"canonical
Find bases
bases in
in V
and W
W with
with respect
respect to
to which
which Mat
Mat A
A has
has aa "simple
form" or
mxn
n xn
form." In
In matrix
matrix terms,
terms, if
if A
A eE R
IR mxn
find P eE lR;;:xm
and Q
lR~xn
such
that PAQ
P AQ has
has aa
form."
,, find
R xm and
Q eE R
such that
n
"canonical form."
form." The
The transformation
transformation A
A M
f--+ PAQ
P AQ is
is called
called an
an equivalence;
it is
called an
an
"canonical
equivalence; it
is called
orthogonal
orthogonal equivalence
equivalence if
if P
P and
and Q are
are orthogonal
orthogonal matrices.
matrices.
xn
Remark
10.1. We
can also
and
Remark 10.1.
We can
also consider
consider the
the case
case A
A eE C
emmxn
and unitary
unitary equivalence
equivalence if
if P
P and
and

<2
Q are
are unitary.
unitary.

of interest:
interest:
Two special
cases are
are of
Two
special cases
1.
V and
and <2
Q == p1. If
If W = V
P"11,, the
thetransformation
transformation AAf--+
H>PAP-I
PAP" 1 isiscalled
calledaasimilarity.
similarity.
T
T
2. If
= VV and
and if
if Q = P
pT
is orthogonal,
the transformation
transformation A
A i-
f--+ PAP
P ApT
is called
If W =
is
orthogonal, the
is
called
an orthogonal
orthogonal similarity
(or unitary
unitary similarity
in the
the complex
complex case).
case).
an
similarity (or
similarity in

The
achieved under
similarity. If
The following
following results
results are
are typical
typical of
of what
what can
can be
be achieved
under aa unitary
unitary similarity.
If
A =
AHH E
has eigenvalues
AI, ...
An,n, then
then there
matrix U
A
= A
6 en
C"xxn
" has
eigenvalues AI,
. . . ,, A
there exists
exists aa unitary
unitary matrix
7 such
suchthat
that
UHHAU
D, where
where D
D == diag(AJ,
diag(A.j,...,
A. n ). This
This is
is proved
proved in
in Theorem
Theorem 10.2.
10.2. What
What other
other
U
AU = D,
... , An).
answer is
given in
in Theorem
matrices are
are "diagonalizable"
"diagonalizable" under
under unitary
unitary similarity?
matrices
similarity? The
The answer
is given
Theorem
x
10.9, where
C"nxn
" is
is unitarily
similar to
10.9,
where it
it is
is proved
proved that
that aa general
general matrix
matrix A
A eE e
unitarily similar
to aa diagonal
diagonal
H
H
and only
only if
if it
it is
is normal
normal (i.e.,
(i.e., AA
AA H =
= A
AHA).
Normal matrices
matrices include
include Hermitian,
Hermitian,
matrix if
matrix
if and
A). Normal
skew-Hermitian,
(and their
symmetric, skewskew-Hermitian, and
and unitary
unitary matrices
matrices (and
their "real"
"real" counterparts:
counterparts: symmetric,
skewsymmetric, and
and orthogonal,
orthogonal, respectively),
respectively), as
as well
well as
as other
other matrices
matrices that
that merely
merely satisfy
the
symmetric,
satisfy the
a
definition,
as A
_ b ^1 for
for real
scalars aa and
If aa matrix
definition, such
such as
A=
= [[_~
real scalars
and b.
h. If
matrix A
A is
is not
not normal,
normal, the
the
JCF described
described in
9.
most "diagonal"
we can
can get
is the
the JCF
most
"diagonal" we
get is
in Chapter
Chapter 9.

!]

x
Theorem
en xn
AI, ...
Theorem 10.2.
10.2. Let A =
= AHH eE C"
" have (real) eigenvalues A.I,
. . . ,,An.
Xn. Then there
HH
exists aa unitary
unitary matrix
matrix X
X such
such that
that X
X AX
AX =
D=
diag(Al, ...
An)
(the columns
columns ofX
of X are
are
exists
= D
= diag(A.j,
. . . ,, X
n) (the
orthonormal
eigenvectors for
orthonormal eigenvectors
for A).
A).

95
95

96
96

Chapter 10.
10. Canonical
Canonical Forms
Forms
Chapter

Proof: Let x\
eigenvector corresponding
corresponding to X\,
xf*x\
=
Proof'
XI be a right eigenvector
AI, and normalize it such that x~
XI =
1. Then
Then there
exist n
. . . ,, xXnn such
such that
that X = (XI,
[x\,...,
=
1.
there exist
n - 11 additional
additional vectors
vectors xX2,
... , x
xn]
2, ...
n] =
[x\ X22]] is unitary. Now
[XI
XHAX

=[

xH
I
XH ] A [XI
2

X 2]

=[
=[
=[

x~Axl

X~AX2
XfAX 2

XfAxl

Al

X~AX2

XfAX 2

Al

XfAX z

(10.1)

(10.2)

In (l0.1)
(10.1) we
have used
fact that
= AIXI.
k\x\. When
When combined
combined with
with the
the fact
fact that
In
we have
used the
the fact
that Ax\
AXI =
that
x~
Al remaining in the (l,I)-block.
(2, I)-block by
x"xiXI =
= 1,
1, we get A-i
(l,l)-block. We also get 0 in the (2,l)-block
orthogonal to all vectors in X
(l,2)-block by
noting that x\
XI is orthogonal
Xz.
2. In (10.2), we get 0 in the (l,2)-block
H
XH AX
AX is Hermitian. The proof
induction upon noting
noting that X
proof is completed easily by induction
that the (2,2)-block
... , A.
An.n .
0
(2,2)-block must have eigenvalues A2,
X2,...,
D
XI Ee JRn,
X
=
Given a unit vector x\
E", the construction of X2z Ee JRnx(n-l)
]R"X("-1) such that X
[XI
orthogonal is frequently
[x\ X22]] is orthogonal
frequently required. The construction can actually be performed
quite easily by means of Householder
Householder (or Givens) transformations
transformations as in the proof
proof of the
following general
general result.
following
result.
nxk
10.3. Let
X\ E
e C
Cnxk
have orthonormal
orthonormal columns
columns and
and suppose
U is
is a
unitary
have
Theorem 10.3.
Let XI
suppose V
a unitary
kxk
matrix such
such that
that V
UX\
= [\ ~],
1, where
is
matrix
XI =
where R
R E Ckxk
is upper
upper triangular.
triangular. Write
Write U
V HH = [U\
[VI U
Vz]]
0

nxk
with Ui
VI E
C
Cnxk
. Then [XI
[Xi V
U2]] is unitary.

Proof:
Xk]. Construct
sequence of
of Householder
(also known
Proof: Let
Let X\
X I = [x\,...,
[XI, ... ,xd.
Construct aa sequence
Householder matrices
matrices (also
known
HI, ... , H
Hkk in the usual way (see below) such that
as elementary reflectors) H\,...,
Hk ... HdxI, ... , xd = [

..., , Xk
U=
=
where R is upper triangular (and nonsingular since x\,
XI, ...
Xk are orthonormal). Let V
H
UH =
/,- H
Hk'"
HI.
Then VH
= /HI'"
Hkk and
and
k...H
v. Then

H
Then x^U
= 0 (i
(/ E
~)
k) means that xXif is orthogonal to each of the n
U2.
X i U2
- kk columns of V2.
2 =
But the latter are orthonormal since they are the last n - kk rows of the unitary matrix U.
U.
Thus. [XI
unitary.
0
Thus,
[Xi U2]
f/2] is unitary.
D

10.3
The construction called
called for in Theorem 10.2 is then a special case of Theorem
Theorem 10.3
for kk = 1.
= 1.
1. We illustrate the construction of the necessary Householder matrix for kk
For simplicity,
simplicity, we consider the real case. Let the unit vector x\
[i, ..
. . ,. ,, ~nf.
%n]T.
XI be denoted by [~I,

10.1.
Basic Canonical
Canonical Forms
10.1. Some
Some Basic
Forms

97

Then
X^2 is
is given
given by
Then the
the necessary
necessary Householder
Householder matrix
matrix needed
needed for
for the
the construction
construction of
of X
by
+
r
TT
,
U = I -2uu+
=
I
+uu
where
u
=
[';1

1,
';2,
...
,
';nf.
It
can
easily
be
checked
2uu
u-^UU
, where u
[t-\ 1, 2, ] - It can
checked
u
that U
U is symmetric
symmetric and U
UTTU
U =
= U
U22 =
= I,
I, so U
U is orthogonal.
orthogonal. To see that U
U effects
effects the
necessary
is easily
easily verified
= 2
2i and
= 11
necessary compression
compression of
of jci,
Xl, it
it is
verified that
that U
u TTU
u =
2';1
and U
u TTX\
Xl =
1.
';1.
Thus,

Further details on Householder matrices, including the choice of sign and the complex case,
consulted in standard
numerical linear
linear algebra
can be consulted
standard numerical
algebra texts such as [7],
[7], [11],
[11], [23],
[23], [25].
[25].
The real version of Theorem 10.2
10.2isisworth
worthstating
statingseparately
separately since
sinceititisisapplied
appliedfrefrequently
quently in
in applications.
applications.
T
nxn
Theorem 10.4.
Let A
A = A
AT
jRnxn
have eigenvalues
eigenvalues k\,
AI, ...
, An.
Then there
there exists
an
10.4. Let
eE E
have
... ,X
exists an
n. Then
lxn
jRn xn (whose
orthogonal matrix X eE W
(whose columns are orthonormal eigenvectors of
of A) such that
T
XT
AX =
= D
D=
= diag(Xi,
diag(Al, ....
X
AX
. . , An).
X n ).

A (with the obvious analogue


Note that Theorem 10.4 implies that a symmetric matrix A
from
10.2for
forHermitian
Hermitian matrices)
matrices) can
canbe
bewritten
written
from Theorem
Theorem 10.2
n

A = XDX

= LAiXiXT,

(10.3)

i=1

spectral representation of A. In fact, A in (10.3) is actually a


which is often
often called the spectral
weighted sum of orthogonal projections P,
Pi (onto the one-dimensional
one-dimensional eigenspaces
eigenspaces corresponding
's),i.e.,
i.e.,
sponding to
to the
the A.,
Ai'S),
n

= LAiPi,
i=l

where
= PUM
xxiXt
ixf =
ixj since
where P,
Pi =
PR(x;) =
=xxixT
sincexjxTxi
Xi
=1.1.

The following pair of theorems form the theoretical


theoretical foundation of the double-Francisdouble-FrancisQR algorithm used to compute matrix eigenvalues in a numerically stable and reliable way.

98

Chapter
Canonical Forms
Chapter 10.
10. Canonical
Forms

x
Theorem 10.5
Let A
A eE C"
cnxn
Then there
there exists
exists a
a unitary
unitary matrix
matrix U
such that
that
Theorem
10.5 (Schur).
(Schur). Let
". . Then
U such
H
U H AU
U
AU == T,
T, where
where TT is
is upper
upper triangular.
triangular.

Proof: The
proof of
of this
this theorem
theorem is
is essentially
essentially the
the same
same as
that of
of Theorem
lO.2 except
except that
that
Proof:
The proof
as that
Theorem 10.2
in this
this case
case (using
(using the
the notation
notation U
rather than
than X)
X) the
the (l,2)-block
AU2 is
is not
not 0.
O.
0
in
U rather
(l,2)-block wf AU2
D

ur

of A
A E
IRn xxn
it is
is thus
thus unitarily
unitarily similar
to an
an upper
upper triangular
triangular matrix,
matrix, but
but
In the
the case
case of
In
e R"
",, it
similar to
if A
A has
has aa complex
complex conjugate
conjugate pair
pair of
of eigenvalues,
eigenvalues, then
then complex
arithmetic is
if
complex arithmetic
is clearly
clearly needed
needed
to place
place such
such eigenvalues
eigenValues on
on the
the diagonal
diagonal of
of T.
T. However,
However, the
the next
next theorem
theorem shows
shows that
that every
every
to
xn
A eE W
IRnxn
is also
also orthogonally
orthogonally similar
similar (i.e.,
(i.e., real
real arithmetic)
arithmetic) to
to aa quasi-upper-triangular
A
is
quasi-upper-triangular
matrix. A
A quasi-upper-triangular
matrix is
is block
block upper
upper triangular
triangular with
with 1
matrix.
quasi-upper-triangular matrix
1 xx 11 diagonal
diagonal
blocks
corresponding to
corresponding to
blocks corresponding
to its
its real
real eigenvalues
eigenvalues and
and 2x2
2 x 2 diagonal
diagonal blocks
blocks corresponding
to its
its
complex conjugate
conjugate pairs
pairs of
of eigenvalues.
eigenvalues.
complex

Theorem 10.6
Let A
A E
IR n xxn.
there exists
exists an
an orthogonal
10.6 (Murnaghan-Wintner). Let
e R"
". Then
Then there
orthogonal
T
T
matrix U
such that
that U
AU =
where S
S is
is quasi-upper-triangular.
matrix
U such
U AU
= S,
S, where
quasi-upper-triangular.
Definition 10.7.
triangular matrix
matrix T
in Theorem
Theorem 10.5
is called
Schur canonical
canonical
Definition
10.7. The
The triangular
T in
10.5 is
called aa Schur
form
The quasi-upper-triangular
S in
10.6 is
real
form or
or Schur
Schur form.
fonn. The
quasi-upper-triangular matrix
matrix S
in Theorem
Theorem 10.6
is called
called aa real
Schur
canonical form
form or
real Schur
Schur form
fonn (RSF).
columns of
unitary [orthogonal}
Schur canonical
or real
(RSF). The
The columns
of aa unitary
[orthogonal]
matrix U
that reduces
reduces a
a matrix
matrix to
[real} Schur
Schur form
fonn are
are called
called Schur
matrix
U that
to [real]
Schur vectors.
vectors.

Example 10.8.
10.8. The
The matrix
matrix

s~ [ -20

h[

-2

is
is in
in RSF.
RSF. Its
Its real
real JCF
JCF is
is

1
-1

0 0

n
n

Note
corresponding first
Note that
that only
only the
the first
first Schur
Schur vector
vector (and
(and then
then only
only if
if the
the corresponding
first eigenvalue
eigenvalue
if U
orthogonal) is
is an
an eigenvector.
eigenvector. However,
However, what
what is
is true,
true, and
and sufficient
for virtually
virtually
is real
real if
is
U is
is orthogonal)
sufficient for
all applications
applications (see,
(see, for
for example,
example, [17]),
is that
that the
the first
first k Schur
vectors span
span the
the same
all
[17]), is
Schur vectors
same Ainvariant subspace
the eigenvectors
corresponding to
to the
the first
first k eigenvalues
along the
the
invariant
subspace as
as the
eigenvectors corresponding
eigenvalues along
diagonal of
of T
(or S).
diagonal
T (or
S).
While every
every matrix
matrix can
can be
be reduced
reduced to
to Schur
Schur form
(or RSF),
RSF), it
it is
is of
of interest
interest to
to know
While
form (or
know
when we
we can
go further
further and
reduce aa matrix
matrix via
via unitary
unitary similarity
to diagonal
diagonal form.
form. The
when
can go
and reduce
similarity to
The
following
following theorem
theorem answers
answers this
this question.
question.
x
Theorem 10.9.
10.9. A
C"nxn
" is
is unitarily
unitarily similar
Theorem
A matrix
matrix A
A eE c
similar to
to a
a diagonal
diagonal matrix
matrix ifif and
and only
only if
if
H
H
H
A is
is normal
normal (i.e.,
(i.e., A
AHAA =
= AA
A
AA ).).

Proof: Suppose
Suppose U
is aa unitary
unitary matrix
matrix such
such that
that U
AU =
D, where
where D
D is
is diagonal.
diagonal. Then
Then
Proof:
U is
UHH AU
= D,
AAH

so
is normal.
so A
A is
normal.

= U VUHU VHU H = U DDHU H == U DH DU H == AH A

10.2.
Definite Matrices
10.2. Definite
Matrices

99

Conversely, suppose A
A is normal and let U
A U = T,
U be a unitary matrix such that U
UHHAU
T,
where T
T is an upper triangular matrix (Theorem
(Theorem 10.5). Then

It
It is then a routine exercise to show that T
T must, in fact, be diagonal.

10.2
10.2

0
D

Definite
Matrices
Definite Matrices

xn
Definition
10.10. A
e lR.
Wnxn
is
Definition 10.10.
A symmetric
symmetric matrix
matrix A
A E

definite if
if and
only if
ifxxTTAx
> 0Qfor
all nonzero
nonzero xx G
Wn1.. We
We write
write A
> 0.
1. positive
positive definite
and only
Ax >
for all
E lR.
A >
O.

2. nonnegative definite (or


x TT Ax
Ax ::::
for all
(or positive
positive semidefinite) if
if and
and only
only if
if X
> 0 for
all
n
nonzero xx Ee lR.
W. We
We write
write A
> 0.
A ::::
O.
nonzero
3. negative
negative definite if
- A is positive
positive definite.
write A
A <
O.
ifA
definite. We
We write
< 0.
4. nonpositive definite (or
negative semidefinite) if
We
(or negative
if-A
A is nonnegative
nonnegative definite.
definite. We
write
< 0.
write A
A ~
O.
Also,
if A
and B
are symmetric
we write
write A
> B
if and
only if
or
Also, if
A and
B are
symmetric matrices,
matrices, we
A >
B if
and only
if AA - BB >> 0 or
B
- A
A <
< 0.
O. Similarly,
Similarly, we
we write
write A
A ::::
B ifif and
and only
only ifA
if A
- B>QorB
B :::: 0 or B
- A
A <
~ 0.
O.
B
> B

x
nxn
Remark
If A
A Ee C"
Remark 10.11.
10.11.
" is Hermitian, all the above definitions hold except that
superscript
s. Indeed, this is generally true for all results in the remainder of
of
superscript H
//ss replace T
Ts.
this section that may be stated in the real case for simplicity.

Remark 10.12. If
If a matrix is neither
neither definite nor semidefinite,
semidefinite, it is said to be indefinite.
indefinite.
H
nxn
Theorem 10.13. Let
Let A
A =
AH
with
AI{ ::::
A22 ::::
An.n. Thenfor
= A
eE e
Cnxn
with eigenvalues
eigenvalues X
> A
> ...
::::
> A
Then for all
all
E en,
x eC",

Proof: Let U
A as in Theorem 10.2.
Proof:
U be a unitary matrix that diagonalizes
diagonalizes A
10.2. Furthermore,
Furthermore,
let yv = U
UHHx,
x, where x is an arbitrary vector in en,
CM, and denote the components of y by
j]i, ii En.
n. Then
Then
11;,
n

x HAx = (U HX)H U H AU(U Hx) = yH Dy = LA; 111;12.


;=1

But clearly
n

LA; 11'/;12 ~ AlyH Y = AIX HX


;=1

100
100

and
and

Chapter 10.
10. Canonical
Canonical Forms
Forms
Chapter

LAillJilZ:::

AnyHy = An xHx ,

i=l

from which the theorem follows.

0
D

H
nxn
nxn
Remark 10.14. The ratio ^^
XHHAx for A =
AH
E eC
and
Remark
= A
<=
andnonzero
nonzerox jcEeen
C"isiscalled
calledthe
the
x x
of jc.
x. Theorem
Theorem 1O.l3
provides upper (AO
(A 1) and lower (An)
Rayleigh quotient of
10.13 provides
(A.w) bounds for
H
x
AH
enxn
x HHAx
Ax >
the Rayleigh quotient. If A =
= A
eE C"
" is positive definite, X
> 0 for all nonzero
E C",soO
en, so 0 < XAnn <:::::
... <
::::: A.I.
AI.
x E
I

x
H
Corollary 10.15. Let
". . Then
Then IIAII2
\\A\\2 =
=^
A}.
Corollary
Let A
A e
E C"
enxn
Ar1ax(AH
A).
m(A

Proof: For all x E en


Proof:
C" we have

Let jc
Let
x be
be an
an eigenvector
eigenvector corresponding
corresponding to
to X
Amax
(AHHA).
A). Then
Then ^pjp
111~~1~22 = ^^(A"
Ar1ax (A HA),
A), whence
whence
max(A
IIAxll2
!
H
IIAliz = max - - = Amax{A A).
xfO
IIxll2

Definition 10.16. A principal submatrix


submatrixofan
n x n matrix A is the (n
-k)
x (n
-k)
Definition
of an nxn
k)x(n
k) matrix
that remains by deleting k rows and the corresponding k columns. A leading principal
submatrix of
of order n
- k is obtained
obtained by deleting the last k rows and
and columns.
x
~nxn
positive definite
definite ififand
and only
only ififany
any of
ofthe
the
Theorem 10.17. A symmetric matrix A eE E"
" is positive
following three equivalent
equivalent conditions hold:
following

determinants of
principal submatrices of
1. The determinants
of all leading principal
of A are positive.
positive.

positive.
2. All
All eigenvalues
eigenvalues of
of A
A are
are positive.
T
3. A can be written in the form
form M
MT
M, where M eE R"
~n xxn
" is nonsingular.
x
~n xn
definite if
and only
Theorem 10.18. A symmetric matrix A E R"
" is nonnegative definite
if and
only if
if any
of the following
following three equivalent
equivalent conditions hold:
of

1. The determinants of
all principal
principal submatrices
submatrices of
of A are nonnegative.
of all

2.
eigenvalues of
nonnegative.
2. All
All eigenvalues
of A
A are
are nonnegaTive.
T
ix
3.
A can
be written
wrirren in
[he/orm
MT
M, where
where M
M 6
E R
IRb<n
and kk >
~ rank(A)
ranlc(A) ""
ranlc(M).
3. A
can be
in the
form M
M,
" and
rank(M).

Remark
10.19. Note
that the determinants of all principal "ubm!ltriC[!!l
eubmatrioesmu"t
muetbB
bQnonnBgmivB
nonnogativo
R.@mllrk 10.19.
Not@th!ltthl!dl!termin!lntl:ofnllprincip!ll
in Theorem
10.18.1, not just those of the leading principal submatrices. For example,
Theorem 10.18.1,
consider
1. The
The determinant
determinant of
submatrix is
is 0
consider the
the matrix
matrix A
A
= [[~0 _l~].
of the
the 1x1
I x 1 leading
leading submatrix
0 and
and
2 x 2 leading submatrix is also 0
0 (cf.
determinant of the 2x2
the determinant
(cf. Theorem
Theorem 10.17).
10.17). However, the

101
101

10.2.
10.2. Definite
Definite Matrices
Matrices

principal
principal submatrix
submatrix consisting
consisting of
of the
the (2,2)
(2,2) element
element is,
is, in
in fact,
fact, negative
negative and
and A is
is nonpositive
nonpositive
definite.
Remark
Remark 10.20.
10.20. The
The factor
factor M
M in
in Theorem
Theorem 10.18.3
10.18.3 is
is not
not unique.
unique. For
For example,
example, if
if

then
can be
be
then M
M can

[1 0], [

fz
-ti

o
o

l [~~ 0]
0

v'3

, ...

Recall
> B
B if
if the
B is
definite. The
The following
Recall that
that A
A ::::
the matrix
matrix A
A - B
is nonnegative
nonnegative definite.
following
theorem
is useful
"comparing" symmetric
is straightforward
straightforward from
from
in "comparing"
symmetric matrices.
matrices. Its
Its proof
proof is
theorem is
useful in
basic
basic definitions.
definitions.
nxn
Theorem 10.21. Let A, B eE R
jRnxn be symmetric.
nxm
T
Band M E R
jRnxm,, then M
MT
AM ::::
1. 1f
If A ::::
>BandMe
AM
> MT
MTBM.
BM.
nxm
T
2. If
A>
Band
jR~xm,
MT
AM>
2.
Ifj A
>B
and M eE R
, then M
AM
> MT
M. TBM.
BM.
m

proof (see,
The following standard
standard theorem
theorem is stated
stated without proof
(see, for
for example,
example, [16,
[16,p.p.
xn
nxn
181]).
the notion
notion of
root" of
of aa matrix.
matrix. That
That is,
is, if
if A E
,,we
181]). It concerns
concerns the
of the
the "square
"square root"
lR.
E"
wesay
say
nx
that S Ee R
jRn xn"isisa asquare
that
squareroot
rootofofAAififS2S2 =A.A. InIngeneral,
general,matrices
matrices(both
(bothsymmetric
symmetricand
and
nonsymmetric) have
have infinitely
infinitely many
many square
square roots.
roots. For
For example,
matrix S of
of
nonsymmetric)
example, if
if A =
= lz,
/2, any
any matrix
c e
s 9 .
[COSO
Sino]
"
the
form
[
*
_

]
is
a
square
root.
the 10rm ssinOe _ ccosOe IS a square root.
x
nxn
Theorem 10.22.
A Ee lR.
Theorem
10.22. Let A
R"
"be
benonnegative
nonnegativedefinite.
definite. Then
ThenAAhas
hasaaunique
uniquenonnegative
nonnegative
definite
= AS
= rank
A (and hence S
S is positive
S. Moreover, SA =
AS and rankS
rankS =
rankA
definite square root S.
definite ifif A is positive
positive definite).
definite
definite).

A stronger
form of
of the
third characterization
characterization in
available and
is
A
stronger form
the third
in Theorem
Theorem 10.17
10.17 is
is available
and is
known as
Cholesky factorization.
factorization. It
It is
is stated
stated and
for the
the more
more general
general
and proved
proved below
below for
known
as the
the Cholesky
Hermitian case.
Hermitian
case.
nxn
Theorem 10.23.
10.23. Let A eE c
be Hermitian
Theorem
<Cnxn
unique nonsingular
nonsingular lower triangular matrix L
H
A=
= LL
LLH.
.

and positive
positive definite.
definite. Then there exists a
with positive
positive diagonal elements such that

Proof: The
The proof
proof is
is by
by induction.
induction. The
The case
= 1 is
is trivially
true. Write
Write the
the matrix
matrix A
A in
in
Proof:
case n =
trivially true.
the form
form
the

By
our induction
induction hypothesis,
hypothesis, assume
assume the
the result
result is
is true
true for
for matrices
so that
that B
By our
matrices of
of order
order n - 11 so
B
may be written as
as B =
= L\L^,
L1Lf, where L\
Ll eE c(n-l)x(n-l)
and lower triangular
C1-""1^""^ is nonsingular and

102
102

Chapter
Chapter 10.
10. Canonical
Canonical Forms
Forms

with positive diagonal elements. It


It remains to prove that we can write the n x n matrix A
in the
form
in
the form

b ] = [Lc J

ann

0 ]

[Lf0

c
a

J,

multiplication and equating the corwhere a is positive. Performing the indicated matrix multiplication
H
responding submatrices, we
we see
we must have L\c
L IC =
b and
ann
cH
cC + aa22. Clearly
see that we
=b
and a
=C
nn =
c is given simply by
by c = C,lb.
L^b. Substituting
Substituting in
in the
the expression involving
involving a, we
we find
find
H
LIH
L11b
a22 =
= ann
ann - bbHL\
L\lb =
= ann
ann - bbHH B-1b
B~lb (= the Schur complement of B
B in A).
A). But
But we
know that

o < det(A) =

det [

b ] = det(B) det(a nn _ b H B-1b).


ann

H
l
Since det(B)
ann
- bH
B-1b
> 0.
O. Choosing
Choosing aa to be
be the positive square
det(fi) >
> 0, we must have a
B
b >
nn b
l
H
of
ann - bb B~
B-1b
completes the proof.
0
root of
b completes
D

10.3
10.3

Equivalence
Equivalence Transformations
Transformations and
and Congruence
Congruence

71
xm
x
Theorem 10.24. Let A E C*
c;,xn.
c~xn
. Then
Then there exist
exist matrices P Ee C:
Cxm
and Q eE C"
such
n " such
that
that

PAQ=[~ ~l

(l0.4)

Proof: A classical proof


proof can be consulted in, for example, [21,
Proof:
[21,p.p.131].
131].Alternatively,
Alternatively,
suppose A has an SVD of the form (5.2) in its complex version. Then

Take P

=[

'f [I ]

S~

0 ] [
I
Uf

S-l

and Q

AV

[I0

V to complete the proof.

0 ]
0 .

Note that the greater freedom afforded


afforded by the equivalence transformation of Theorem
10.24, as opposed to the more restrictive situation of a similarity transformation, yields a
far "simpler" canonical form (10.4). However, numerical procedures
procedures for computing such
an equivalence directly via, say, Gaussian or elementary row and column operations, are
generally unreliable. The numerically preferred equivalence is, of course, the unitary
unitary equivalence known as the SVD. However, the SVD is relatively expensive to compute and other
canonical forms exist that are intermediate between (l0.4)
(10.4) and the SVD; see, for example
[7, Ch. 5], [4, Ch. 2]. Two such forms are stated here. They are more stably computable
than (lOA)
(10.4) and more efficiently
efficiently computable than a full SVD. Many similar results are also
available.
available.

10.3.
Transformations and
Congruence
10.3. Equivalence
Equivalence Transformations
and Congruence

103
103

x
Theorem 10.25
10.25 (Complete
(Complete Orthogonal
Decomposition). Let
Let A
A Ee C
e~xn.". Then
exist
Theorem
Orthogonal Decomposition).
Then there
there exist
mxm
nxn
mxm
nxn
unitary
matrices U
e
and V
e
such that
that
unitary matrices
U eE C
and
V Ee C
such

(10.5)

where
R Ee e;xr
upper (or
lower) triangular
triangular with
with positive
positive diagonal
diagonal elements.
where R
,rrxr is
is upper
(or lower)
elements.

Proof: For
the proof,
proof, see
Proof:
For the
see [4].
[4].

0
D
x

mxm
Let A
A eE C
e~xn.". Then
exists a
a unitary
unitary matrix
matrix Q
Q Ee C
e mxm and
and aa
Theorem 10.26.
10.26. Let
Theorem
Then there
there exists
x
permutation
matrix
IT
E
en
xn
such
that
permutation
Fl e C" "

QAIT =

[~ ~

(10.6)

r xr
rx( r)
E C
e;xr
erx(n-r)
arbitrary but in general
general nonzero.
nonzero.
where R E
upper triangular and S eE C
" is arbitrary
r is upper

Proof:
For the
see [4].
[4].
Proof: For
the proof,
proof, see

D
0

Remark
10.27.
When
A has
has full
column rank
rank but
but is
"near" aa rank
rank deficient
deficient matrix,
Remark 10.27.
When A
full column
is "near"
matrix,
various rank
rank revealing
decompositions are
can sometimes
detect such
such
various
revealing QR
QR decompositions
are available
available that
that can
sometimes detect
phenomena at
considerably less
less than
than aa full
Again, see
see [4]
phenomena
at aa cost
cost considerably
full SVD.
SVD. Again,
[4] for
for details.
details.
nxn
n xn
H
e nxn and X
X e
E C
e~xn.
H- X
XH
AX is called
Definition 10.28.
Definition
10.28. Let A eE C
The transformation A i->
AX
n . The
aa congruence.
congruence. Note
Note that
that aa congruence
congruence is
is aa similarity
similarity if
if and
and only
only if
ifXX is
is unitary.
unitary.

Note that
that congruence
preserves the
the property
property of
of being
being Hermitian;
Hermitian; i.e.,
if A
A is
Note
congruence preserves
i.e., if
is Hermitian,
Hermitian,
then
AX is
is also
also Hermitian.
Hermitian. It
of interest
to ask
ask what
what other
properties of
of aa matrix
matrix are
are
then X
XHH AX
It is
is of
interest to
other properties
preserved under
under congruence.
congruence. It
turns out
the principal
principal property
property so
so preserved
preserved is
is the
the sign
sign
preserved
It turns
out that
that the
of
of each
each eigenvalue.
eigenvalue.
H
x
nxn
Definition
10.29. Let
=A
eE C"
" and
and let
the numbers
positive,
Let A
A =
AH
e
let 7t,
rr, v,
v, and
and
~ denote
denote the
numbers of
of positive,
Definition 10.29.
negative,
and zero eigenvalues,
respectively, of
of A.
A. Then
inertia of
of
negative, and
eigenvalues, respectively,
Then the inertia
of A is
is the
the triple of
numbers
v, n
of A is
sig(A) =
v.
numbers In(A)
In(A) = (rr,
(n, v,
). The
The signature
signature of
is given by sig(A)
= nrr - v.

Example 10.30.
Example
10.30.

o
1
o
o

0]
00
-10
=(2,1,1).

l.In[!

x
2. If
A = A"
AH Ee Ce nnxn
if and
and only
only if
In(A)
= (n,
0, 0).
2.
If A
" , ,t hthen
e n AA>
> 00 if
if In
(A) =
(n,0,0).

In(A) = (rr,
v, ),
n, then
rank(A) = n
rr + v.
3. If In(A)
(TT, v,
then rank(A)
v.
n xn
Theorem 10.31
10.31 (Sylvester's
(Sylvester's Law
Law of
Inertia). Let A = A HHE
xn and X e
E C
e~ nxn.. Then
Theorem
of Inertia).
e en
Cnxn
H
H AX).
In(A)
In(A) == In(X
ln(X
AX).

Proof: For
For the
the proof,
proof, see,
for example,
p. 134].
D
Proof:
see, for
example, [21,
[21, p.
134]. D
Theorem
Theorem 10.31
10.31guarantees
guaranteesthat
thatrank
rankand
andsignature
signatureofofa amatrix
matrixare
arepreserved
preservedunder
under
We then
then have
have the
the following.
congruence.
congruence. We
following.

104
104

Chapter 10.
Chapter
10. Canonical
Canonical Forms
Forms

H
xn
nxn
Theorem 10.32. Let A = A
AH
with In(A)
= (jt,
(Jr, v,
v, O.
eE c
C"
In(A) =
). Then there exists a matrix
xn
H
X
E
c~xn
such
that
XH
AX
=
diag(1,
I, -1,...,
-I, ... , -1,
-1,0,
X e C"n
X AX = diag(l, ....
. . ,, 1,
0, ....
. . ,0),
, 0),where
wherethe
thenumber
number of
of
1's's is
Jr,
the
number
of
-I
's
is
v,
and
the
numberofO's
is~.
is 7i, the number of l's is v,
the number 0/0 's is (,.

Proof: Let AI
AI,, ...
Anw denote the eigenvalues of
of A and order them such that the first TT
Jr are
Proof:
. . . ,, X
O. By Theorem
Theorem 10.2 there exists a unitary
positive, the next v are negative, and the final ~ are 0.
AV =
matrix V
U such that VH
UHAU
= diag(AI,
diag(Ai, ...
. . . ,, An).
A w ). Define
Define the
thenn xx nnmatrix
matrix

vv

= diag(I/~, ... , I/~, 1/.f-Arr+I' ... , I/.f-Arr+v, I, ... ,1).

Then it is easy to check that X


X =V
U VV
W yields the desired
desired result.

10.3.1
10.3.1

0
D

Block matrices and definiteness

T
AT
Theorem 10.33. Suppose A =
=A
and D
D=
= DT.
DT. Then

T
ifand
A>
D -- BT
A-Il B >
D >
and A -- BD^B
BD- I BT
>
O.
if and only ifeither
if either A
> 0 and
and D
BT A~
> 0,
0, or D
> 0 and
> 0.

Proof: The proof


proof follows by considering, for example, the congruence
Proof:
B ]
[I
D
~
0

_A-I B
I

JT [

A
BT

~ ][ ~

The details are straightforward and are left


left to the reader.

0
D

Remark 10.34. Note the symmetric Schur complements of A (or D) in the theorem.
Remark
T
T
AT
D =D
DT.
Theorem 10.35. Suppose A = A
and D
. Then

B ] >
D
-

+
+
if
A:::: 0, AA
AA+B
= B,
B. and
D -- BT
A+B::::
o.
if and only if
ifA>0,
B =
and D
BT A
B > 0.

Proof:
Consider the congruence with
Proof: Consider

proof of Theorem
Theorem 10.33.
and proceed as in the proof

10.4
10.4

0
D

Rational
Form
Rational Canonical
Canonical Form

rational canonical form.


One final canonical form to be mentioned is the rational

10.4. Rational
Rational Canonical
Canonical Form
Form
10.4.

105
105

n x
Definition 10.36. A
A matrix
matrix A
A E
Xn" is
said to
be nonderogatory ifits
Definition
e lR
M"
is said
to be
if its minimal
minimal polynomial
polynomial
and characteristic
characteristic polynomial
polynomial are
are the
same or;
Jordan canonical
canonical form
and
the same
or, equivalently,
equivalently, if
if its
its Jordan
form
has only
one block
block associated
each distinct
has
only one
associated with
with each
distinct eigenvalue.
eigenvalue.

xn
Suppose
A EE lR
is aa nonderogatory
nonderogatory matrix
characteristic polynoSuppose A
Wnxn
is
matrix and
and suppose
suppose its
its characteristic
polynon(A) = A"
An - (ao
alA + ... +
A
+ an_IAn-I).
a n _iA n ~')- Then
Then it
it can
can be
be shown
shown (see
(see [12])
[12]) that
that A
mial is 7r(A)
(a0 +
+ A
is similar
is
similar to
to aa matrix
matrix of
of the
the form
form

o
o

o
0

(10.7)

nxn
Definition 10.37.
10.37. A
" of
Definition
A matrix
matrix A
A eE E
lRnx
of the
the form
form (10.7)
(10.7) is
is called
called a
a companion
cornpanion matrix
rnatrix or
or
is
to be
in companion
cornpanion forrn.
is said
said to
be in
form.

Companion matrices
matrices also
also appear
appear in
in the
the literature
literature in
in several
several equivalent
equivalent forms.
forms. To
To
Companion
illustrate,
consider the
the companion
matrix
illustrate, consider
companion matrix

(l0.8)

This
in lower
Hessenberg form.
This matrix
matrix is
is aa special
special case
case of
of aa matrix
matrix in
lower Hessenberg
form. Using
Using the
the reverse-order
reverse-order
identity
P given
by (9.18),
(9.18), A
A is
is easily
to be
be similar
to the
the following
matrix
identity similarity
similarity P
given by
easily seen
seen to
similar to
following matrix
in
upper Hessenberg
Hessenberg form:
in upper
form:
a2

al

6]

o .
o

(10.9)

Moreover, since
since aa matrix
matrix is
is similar
similar to
to its
its transpose
transpose (see
(see exercise
exercise 13
13 in
in Chapter
Chapter 9),
9), the
the
Moreover,
following
are also
also companion
companion matrices
matrices similar
similar to
above:
following are
to the
the above:

:l ~ ! ~01].
ao

(10.10)

Notice that
that in
in all
cases aa companion
companion matrix
matrix is
is nonsingular
nonsingular if
and only
only if
ao i=
Notice
all cases
if and
if aO
/= O.
0.
In fact,
the inverse
of aa nonsingular
nonsingular companion
matrix is
in companion
companion form.
form. For
In
fact, the
inverse of
companion matrix
is again
again in
For
*Yamr\1j=
example,

o
1

-~
ao

o
o

-~
ao

o
o

_!!l

o
o

(10.11)

Chapter 10.
10. Canonical
Canonical Forms
Forms
Chapter

106

with
with aa similar
similar result
result for
for companion
companion matrices
matrices of
of the
the form
form (10.10).
(10.10).
If
If a companion matrix of the form (10.7) is singular,
singular, i.e., if
if ao
ao =
= 0, then its pseudo1
... , an-If
inverse can still be computed. Let a Ee JRn-1
M"" denote the vector [ai,
\a\, a2,
02,...,
a n -i] and
and let
l
r
.
Then
it
is
easily
verified
that
I+~T
a'
Then
it
is
easily
verified
that
cc = l+
a a

o o

1- caa T

ca

J.

Note that /I - caa TT =


= (I +
+ aaTT)
) -I ,, and hence the pseudoinverse of a singular companion
matrix
is not
companion matrix
matrix unless
= 0.
O.
matrix is
not aa companion
unless a
a=
Companion matrices
matrices have
interesting properties,
among which,
perCompanion
have many
many other
other interesting
properties, among
which, and
and perhaps surprisingly,
surprisingly, is
is the
the fact
singular values
found in
in closed
form; see
see
haps
fact that
that their
their singular
values can
can be
be found
closed form;
[14].

Theorem 10.38.
10.38. Let
GI >
>
the singular
values of
of the
companion matrix
matrix
Theorem
Let a\
al >
~ a2
~ ...
~ a
ann be
be the
singular values
the companion
A
a =
Then
Leta
= a\ +
+ a\
ai +
+ ... ++ a%_
a;_1{ and
and yy =
= 1
1+
+ .Q ++ a.
a. Then
A in
in (10.7).
(10.7). Let

ar

aJ

2_ 21 ( y + Jy 2- 4ao2) '

al

a? = 1

for i = 2, 3, ... , n - 1,

a; = ~ (y - Jy2 - 4aJ) .
Ifao
^ 0,
the largest
largest and
and smallest
smallest singular
also be
be written
in the
the equivalent
equivalent form
form
If
ao =1=
0, the
singular values
values can
can also
written in

Remark
10.39. Explicit
Explicit formulas
formulas for
for all
all the
right and
left singular
singular vectors
can
Remark 10.39.
the associated
associated right
and left
vectors can
also be
derived easily.
easily.
also
be derived
nx
If
A E
JRnxn
If A
R
" is derogatory,
derogatory, i.e., has more than one Jordan block associated
associated with
at least
least one
not similar
companion matrix
matrix of
of the
at
one eigenvalue,
eigenvalue, then
then it
it is
is not
similar to
to aa companion
the form
form (10.7).
(10.7).
However, it can be shown that a derogatory matrix is similar to a block diagonal matrix,
each of
each
of whose
whose diagonal
diagonal blocks
blocks is
is aa companion
companion matrix.
matrix. Such
Such matrices
matrices are
are said
said to
to be
be in
in
rational canonical
form (or Frobenius
Frobenius canonical form).
rational
canonical form
form). For details, see, for example, [12].
Companion
appear frequently
control and
signal processing
literature
Companion matrices
matrices appear
frequently in
in the
the control
and signal
processing literature
but
they are
are often
often very
very difficult
difficult to
to work
work with
numerically. Algorithms
reduce
but unfortunately
unfortunately they
with numerically.
Algorithms to
to reduce
an
companion form
form are
are numerically
an arbitrary
arbitrary matrix
matrix to
to companion
numerically unstable.
unstable. Moreover,
Moreover, companion
companion
matrices are
are known
known to
possess many
many undesirable
undesirable numerical
properties. For
For example,
in
matrices
to possess
numerical properties.
example, in
n increases, their eigenstructure is extremely ill conditioned,
general and especially
especially as n
nonsingular ones
nearly singular,
unstable, and
nonsingular
ones are
are nearly
singular, stable
stable ones
ones are
are nearly
nearly unstable,
and so
so forth
forth [14].
[14].

Exercises
Exercises

107

Companion matrices
matrices and
and rational
rational canonical
canonical forms
forms are
are generally
generally to
to be
be avoided
avoided in
in fioatingCompanion
floatingpoint computation.

Remark 10.40.
Theorem 10.38
10.38 yields
yields some
understanding of
of why
why difficult
difficult numerical
Remark
10.40. Theorem
some understanding
numerical
behavior
linear
behavior might
might be
be expected
expected for
for companion
companion matrices.
matrices. For
For example,
example, when
when solving
solving linear
equations of the form (6.2), one measure of numerical
numerical sensitivity
Kp(A)
systems of equations
sensitivity is K
=
P(A) =
l
m
A -] IIpp'> the
so-calledcondition
conditionnumber
numberof
ofAA with
withrespect
respecttotoinversion
inversionand
andwith
withrespect
respect
II ^A IIpp II A~
e so-called
k
to
P-norm. If
If this
0(10*),
this number
number is
is large,
large, say
say O(lO
), one
one may
may lose
lose up
up to
to kk digits
digits of
of
to the
the matrix
matrix p-norm.
precision. In
In the
the 2-norm,
2-norm, this
this condition
number is
is the
the ratio
ratio of
of largest
largest to
to smallest
smallest singular
singular
precision.
condition number
explicitly as
values which, by the theorem, can be determined
determined explicitly

y+J y 2 - 4a5

21 a ol
It is
is easy
k2(A) <
small or
or yy is
both),
It
easy to
to show
show that
that y/2/ao
21~01 ::::< K2(A)
:::: --,,
1:01' and
and when
when GO
ao is
is small
is large
large (or
(or both),
then
It is
for yy to
large n. Note
K2(A) ^~ T~I.
I~I' It
is not
not unusual
unusualfor
to be
be large
large for
forlarge
Note that
that explicit
explicit formulas
formulas
then K2(A)
Koo(A) can also be determined easily by using (l0.11).
for K]
K\ (A)
(A) and Koo(A)
(10.11).

EXERCISES
EXERCISES
1.
1. Show that if a triangular matrix is normal, then it must be diagonal.
x
A e
E M"
jRnxn" is normal, then Af(A)
N(A) =
= N(A
).
2. Prove that if A
A/"(ATr ).
nx
A G
E C
cc nxn
peA) =
= maxx
max)..EA(A)
peA) is called the spectral
3. Let A
" and define p(A)
I'M- Then p(A)
A(A) IAI.
radius of
if A
||A||2. Show
radius
of A.
A. Show
Show that
that if
A is
is normal,
normal, then
then p(A)
peA) =
= IIAII2'
Show that
that the
the converse
converse
is
is true
true if
if n
n=
= 2.
2.
nxn
A
E C
en xn be normal with eigenvalues
eigenvalues y1
A],, ...
and singular
singular values a\
0'1 ~
~
4. Let A
..., , yAnn and
> a0'22 >
...
~
> an
on ~
> O.
0. Show
Show that
that a;
a, (A)
(A) = IA;(A)I
|A.,-(A)| for
for ii E!l.
e n.

5. Use the reverse-order identity


identity matrix P introduced in
in (9.18)
(9.18) and the matrix U
U in
x
A e
E C"
cc nxn
Theorem 10.5 to find a unitary matrix Q that reduces A
" to lower triangular
form.
x2
6.
M]eECCC22x2
.. Find
U such
such that
that
A = I[~J :
Find aa unitary
unitary matrix
matrix U
6. Let
Let A

xn
7.
A E
jRn xn is positive definite, show that A
A -I[ must
must also
also be
be positive
positive definite.
7. If
If A
e W

[1

x
8.
A e
E E"
jRnxn
is positive definite.
definite. Is [ ^
3. Suppose A
" is
nxn
9.
Let R,
R, S 6
E E
jRnxn be
be symmetric.
Show that
that [[~*
}. Let
symmetric. Show

R > SS-I.
R>

A~I]1 >
~ 0?
O?
/i
> 0
if and
and only
only if
if S >
> 0
and
J~]1 >
0 if
0 and

108
108

Chapter
Chapter 10.
10. Canonical
Canonical Forms
Forms

10.
following matrices:
10. Find the inertia of the following
(a)

[~ ~

(d) [ - 1
1- j

(b) [

1+ j ]
-1
.

-2
1- j

1+ j ]
-2
'

Chapter 11
11
Chapter

Linear
and
Linear Differential
Differential and
Difference
Equations
Difference Equations

11.1
11.1

Differential
Differential Equations
Equations

In
this section
the linear
homogeneous system
equations
In this
section we
we study
study solutions
solutions of
of the
linear homogeneous
system of
of differential
differential equations
x(t)

= Ax(t);

x(to)

= Xo

E JR.n

(11.1)

for
this
for tt 2:
> to.
IQ. This
This is
is known
known as
as an
an initial-value
initial-value problem.
problem. We
We restrict
restrict our
our attention
attention in
in this
nxn
chapter
to the
where the
the matrix
A E
JR.nxn is
constant
chapter only
only to
the so-called
so-called time-invariant
time-invariant case,
case, where
matrix A
e R
is constant
and
(11.1) is
known always
and does
does not
not depend
depend on
on t.t. The
The solution
solution of
of (11.1)
is then
then known
always to
to exist
exist and
and be
be
unique.
in terms
unique. It
It can
can be
be described
described conveniently
conveniently in
terms of
of the
the matrix
matrix exponential.
exponential.
nxn
Definition 11.1.
A Ee JR.nxn,
JR.nxn is
Definition
11.1. For
For all
all A
Rnxn, the
the matrix
matrix exponential
exponential eeAA Ee R
is defined
defined by
by the
power
series
power series

e =

+00 1

k=O

,Ak.

(11.2)

k.

The series
be shown
to converge
A (has
radius of
The
series (11.2)
(11.2) can
can be
shown to
converge for
for all
all A
(has radius
of convergence
convergence equal
equal
to +(0).
to
+00). The
Thesolution
solutionof
of(11.1)
(11.1)involves
involvesthe
thematrix
matrix
(11.3)

which thus
A and
which
thus also
also converges
converges for
for all
all A
and uniformly
uniformly in
in t.t.

11.1.1
11.1.1

Properties of
of the
matrix exponential
exponential
Properties
the matrix

1. eO
e =
= I.
I.
Proof: This
This follows
follows immediately
immediately from
from Definition
Definition 11.1
11.1bybysetting
settingAA==O.0.
Proof
T

A )A = e A
2. For
For all
allAAEGJR.nxn,
R" XM , (e(e
f - e^.
Proof
This follows
follows immediately
immediately from
Definition 11.1
linearity of
of the
the transpose.
Proof: This
from Definition
11.1 and
and linearity
transpose.
T

109
109

110
110

Chapter
Chapter 11.
11. Linear
Linear Differential
Differential and
and Difference
Difference Equations
Equations

3. For
For all
all A Ee JRnxn
R"x" and
and for
for all
all t,
t, Tr Ee JR,
R,
Proof"
Note
that
Proof: Note that
e(t+r)A

= I

e(t+r)A
e(t+T)A

rA
=
=
= etA
e'AeerA
= erAe
elAe'tAA..

+ (t + T)A + (t + T)2 A 2 + ...


2!

and
and
tA rA

e e

= ( I + t A + t2!2 A 2 +... ) ( I + T A + T2!2 A 2 +... ) .

Compare
powers of
A in
the above
Compare like
like powers
of A
in the
above two
two equations
equations and
and use
use the
the binomial
binomial theorem
theorem
on(t+T)k.
on (t + T)*.
xn
B
4. For all
JRnxn and
=
all A, B Ee R"
and for all
all t Ee JR,
R, et(A+B)
et(A+B) =-etAe
=^e'Ae'tB
= etBe
e'Be'tAA if and
and only if A
and
B commute,
AB =
BA.
and B
commute, i.e.,
i.e., AB
=B
A.
Proof' Note
that
Proof:
Note that
2

et(A+B)

= I

t
+ teA + B) + -(A
+ B)2 + ...

2!

and
and

while
while
tB tA

e e

1+ tB

t2 2
2 2 +... ) .
+ 2iB
+... ) ( 1+ tA + t2!A

Compare like
like powers
of tt in
in the
first equation
equation and
the second
second or
or third
third and
the
Compare
powers of
the first
and the
and use
use the
binomial
theorem on
on (A
(A + B/
B)k and
and the
the commutativity
commutativityof
ofAAand
andB.B.
binomial theorem
x
5. ForaH
JRnxn" and
For all A Ee R"
and for
for all
all t eE JR,
R, (etA)-1
(e'A)~l = ee~'tAA..
Proof" Simply
Proof:
Simply take
take TT =
= -t
t in
in property
property 3.
3.

6.
Let denote
the Laplace
transform. Then
for
6. Let
denote the
Laplace transform
transform and
and -1
~! the
the inverse
inverse Laplace
Laplace transform.
Then for
x
E R"
JRnxn" and for all tt
E lR,
all A
R,
tA } = (sI - A)-I.
(a)
(a) .l{e
C{etA
} = (sI-Arl.
1
M
(b)
A)-I}
erA.
(b) .l-I{(sl- 1 {(j/-A)} ==
.

Proof"
prove only
similarly.
Proof: We
We prove
only (a).
(a). Part
Part (b)
(b) follows
follows similarly.

{+oo
= io

et(-sl)e

(+oo

=io

ef(A-sl)

tA

dt

dt

since A and (-sf) commute

111
111

11.1. Differential
Differential Equations
11.1.
Equations

= {+oo

10

e(Ai-S)t x;y;H dt assuming A is diagonalizable

;=1

~[fo+oo e(Ai-S)t dt]x;y;H


1

= '"'

- - X i y;H

L.....
s - A"I
i=1

assuming Re s > Re Ai for i E !!

1
=
A)-I.
= (sI
(sl --A).

The matrix
matrix (s
A) ~'
-I is
is called
called the
the resolvent
resolvent of
A and
and is
is defined
defined for
for all
all ss not
not in
A (A).
The
(s II - A)
of A
in A
(A).
Notice in
in the
the proof
proof that
that we
we have
have assumed,
assumed, for
convenience, that
that A
A is
Notice
for convenience,
is diagonalizable.
diagonalizable.
If this
is not
scalar dyadic
If
this is
not the
the case,
case, the
the scalar
dyadic decomposition
decomposition can
can be
be replaced
replaced by
by
m

et(A-sl)

=L

Xiet(Ji-sl)y;H

;=1

using
Allsucceeding
succeedingsteps
stepsin
inthe
theproof
proof then
then follow
follow in
inaastraightforward
straightforward way.
way.
using the
the JCF.
JCF. All
x
A
For all
all A
A eE R"
JRnxn" and
and for
all t eE R,
JR, 1h(e
7. For
for all
(e'tA
)) = AetA = etA
e'AA.
Proof: Since
Since the
the series
series (11.3)
is uniformly
uniformly convergent,
convergent, it
it can
can be
be differentiated
Proof:
(11.3) is
differentiated term-byterm-byterm from which the result follows immediately. Alternatively, the formal definition

e(t+M)A _ etA

_(/A) = lim

dt

L'lt

~t-+O

can be employed
employed as follows. For any consistent matrix norm,

e(t+~t)AAtetA
--u.-- - Ae tA

I = IIIL'lt (etAe~tA -

/A) - Ae tA

tA

I ~t (e~tAetA -

I ~t (e~tA - l)e - Ae II

tA

tA

I (M A

(M)2 A 2 +... )
+~

e tA - AetAil

tA

~; A 2etA + ... ) -

Ae

I L'lt

= I ( Ae +
=

etA) - Ae

I ( ~; A2 + (~~)2 A

< MIIA21111e

tA II

1
( _2!

< L'lt1lA21111e

tA Il

(1 +

+ .. , )

etA

tA

II

L'lt
(L'lt)2
+ -IIAII
+ --IIAI12
+ ... )
3!
4!
L'ltiIAIl

= L'lt IIA 21111e tA IIe~tIIAII.

+ (~t IIAII2 + ... )

112
112

Chapter 11.
11. Linear
Linear Differential
Differential and
and Difference
Difference Equations
Equations
Chapter
For
fixed t,
the
For fixed
t, the
the right-hand
right-hand side
side above
above clearly
clearly goes
goes to
to 00 as
as t:.t
At goes
goes to
to O.
0. Thus,
Thus, the
limit
and equals
Ae t AA. A
A similar
the limit
etAA A,
A, or
the
limit exists
exists and
equals Ae'
similar proof
proof yields
yields the
limit e'
or one
one can
can use
use the
A
fact
A commutes
with any
A of
finite degree
etA.
fact that
that A
commutes with
any polynomial
polynomial of
of A
of finite
degree and
and hence
hence with
with e'
.

11.1.2

Homogeneous
Homogeneous linear differential equations
equations

Theorem 11.2. Let


Let A
A Ee IR
Rnnxn
xn.. The
The solution
solution of
of the
the linear
linear homogeneous
homogeneous initial-value
initial-value problem
problem
x(t)

= Ax(l);

x(to)

= Xo

IR n

(11.4)

for t ::: to is given by

(11.5)

Proof:
Proof: Differentiate
Differentiate (11.5)
(11.5) and
and use
use property
property 77 of
of the
the matrix
matrix exponential
exponential to
to get
get xx((t)
t ) ==
(t to)A
Ae(t-to)A
Xo so, by the fundamental
fundamental existence and
Ae
~ xo
x(t0) =
e(to-to)A
e(fo~t')AXQXo =
XQ
xo = Ax(t). Also, x(to)
uniqueness theorem
theorem for
for ordinary
ordinary differential
differential equations,
equations, (11.5)
(11.5) is
is the
the solution
solution of
of (11.4).
(11.4). D
uniqueness
0

11.1.3

Inhomogeneous
Inhomogeneous linear differential equations
equations

nxn
xm
Theorem
Let A
A Ee R
IR nxn
B Ee W
IR nxm
and
function uu be
given
Theorem 11.3. Let
,, B
and let
let the
the vector-valued
vector-valued function
be given
and,
and, say,
say, continuous.
continuous. Then
Then the
the solution
solution of
of the
the linear
linear inhomogeneous
inhomogeneous initial-value
initial-value problem
problem

= Ax(t) + Bu(t);

x(t)

= Xo

IRn

(11.6)

= e(t-to)A xo + t e(t-s)A Bu(s) ds.


lo

(11.7)

x(to)

for
> to
IQ is
is given
given by
by the
the variation
variation of
of parameters
parameters formula
formula
for tt :::
x(t)

Proof: Differentiate
property 77 of
Proof:
Differentiate (11.7)
(11.7) and
and again
again use
use property
of the
the matrix
matrix exponential.
exponential. The
The general
general
formula
formula
d
dt

(t)

pet)

f(x, t) dx =

af(x t)
' dx
pet)
at
(t)

dq(t)
dp(t)
f(q(t), t ) - - - f(p(t), t ) - dt
dt

Ir:

( s)A
is used to
Ae(t-s)A
Bu(s) ds +
Bu(t) =
Ax(t) + Bu(t). Also,
to get
get xx(t)
(t) =
= Ae(t-to)A
Ae{'-to)AxXo0 + f'o Ae
'- Bu(s)
+ Bu(t)
= Ax(t)
=
(f fo)/1
x(to} = <?
e(to-tolA
Xo +
+ 0 == XQ
Xo so, by the fundamental
fundilm()ntill existence
()lI.i~t()Oc() and
nnd uniqueness
uniqu()Oc:s:s theorem
theorem for
for
*('o)
~ .o
ordinary
0
ordinary differential
differential equations,
equations, (11.7)
(11.7) is
is the
the solution
solution of
of (1l.6).
(11.6). D

Remark
proof above
parameters formula
by
Remark 11.4.
11.4. The
The proof
above simply
simply verifies
verifies the
the variation
variation of
of parameters
formula by
direct
be derived
by means
direct differentiation.
differentiation. The
The formula
formula can
can be
derived by
means of
of an
an integrating
integrating factor
factor "trick"
"trick"
as
Ax =
Bu by
get
as follows.
follows. Premultiply
Premultiply the
the equation
equation x
x - Ax
= Bu
by ee~tA to
to get
(11.8)

11.1.
11.1. Differential
Differential Equations
Equations

113

[to, t]:
Now integrate (11.8) over the interval [to,
t]:

t d
-e-sAx(s) ds =
to ds

1t

e-SABu(s) ds.

to

Thus,
e-tAx(t) - e-toAx(to)

t e- sA Bu(s) ds

lto

and hence
x(t) = e(t-tolA xo

11.1.4
11.1.4

t e(t-s)A Bu(s) ds.

lto

Linear
differential equations
equations
Linear matrix
matrix differential

Matrix-valued initial-value problems also occur frequently. The first is an obvious generalization of Theorem
Theorem 11.2,
11.2,and
andthe
theproof
proof isisessentially
essentiallythe
thesame.
same.
lxn
Theorem 11.5. Let A Ee W
jRnxn.. The
The solution of
of the matrix linear homogeneous initial-value
nrohlcm
problem

X(t)

AX(t); X(to)

=C

E jRnxn

(11.9)

for
> to
for tt :::
to is
is given
given by
by
X(t) = e(t-to)Ac.

(11.10)

coefficient matrices on both the right and left. For


In the matrix case, we can have coefficient
convenience, the following
following theorem is stated with initial time to
to =
= 0.
O.
xn
mxm
xm
Theorem
jRnxn,
jRmxm,
]R.nxm.
the matrix initial-value
Theorem 11.6.
11.6. Let A Ee Rn
, B eE R
, and C eE Rn
. Then the
problem
problem

X(t) = AX(t)

+ X(t)B;

X(O) = C

(11.11)

tB .
ratB
= aetACe
has the
the solution XX(t)
(t)
=
etACe
tA
tB with respect to tt and use property
Proof: Differentiate
etACe
property 7 of the matrix exponential.
Proof:
Differentiate e
CetB
exponential.

The fact that X


X ((t)
t ) satisfies the initial condition is trivial.

0
D

Corollary 11.7.
]R.nxn.
Corollary
11.7. Let A, C eE IR"
". Then the matrix initial-value problem
X(t)

= AX(t) + X(t)AT;

X(O)

=C

(11.12)

X(t) =
= etACetAT.
has the
the solution X(t}
etACetAT.

When C is symmetric in (11.12), X


X ((t)
t ) is symmetric and (11.12) is known as a Lyapunov differential
differential equation.
equation. The initial-value problem (11.11) is known as a Sylvester
punov
Sylvester
differential equation.
equation.
differential

114
114

Chapter 11.
11. Linear
Linear Differential
Differential and
and Difference
Difference Equations
Equations
Chapter

11.1
.5
11.1.5

Modal decompositions

xn
E W
jRnxn
E

Let
A
and suppose,
suppose, for
convenience, that
is diagonalizable
(if A
A is
not diagonalizLet A
and
for convenience,
that it
it is
diagonalizable (if
is not
diagonalizable,
the rest
rest of
by using
decomposition
able, the
of this
this subsection
subsection is
is easily
easily generalized
generalized by
using the
the JCF
JCF and
and the
the decomposition
A
Ji YyitHH as
A
= ^L Xf
X;li
as discussed
discussed in
in Chapter
Chapter 9).
9). Then
Then the
the solution
solution x(t)
x(t) of
of (11.4)
(11.4) can
can be
be written
written
x(t) = e(t-to)A Xo

= (ti.iU-tO)Xiyr) Xo
1=1

= L(YiHxoeAi(t-tOXi.
i=1

The Ai
ki ss are
are called
called the
the modal
modal velocities
velocities and
and the
the right
right eigenvectors
eigenvectors *,
are called
called the
the modal
modal
The
Xi are
directions. The
The decomposition
decomposition above
above expresses
expresses the
the solution
solution x(t)
as aa weighted
sum of
of its
directions.
x (t) as
weighted sum
its
modal velocities
velocities and
and directions.
directions.
modal
This
be expressed
This modal
modal decomposition
decomposition can
can be
expressed in
in aa different
different looking
looking but
but identical
identical form
form
n

if we
we write
write the
the initial
initial condition
condition XQ
as aa weighted
weighted sum
sum of
of the
the right
right eigenvectors
eigenvectors Xo = L ai Xi.
if
Xo as
i=1

Then
Then

= L(aieAiU-tOXi.
i=1

HXj =
In the
last equality
equality we
used the
the fact
that Yi
yf*Xj
= flij.
Sfj.
In
the last
we have
have used
fact that
Similarly, in
in the
the inhomogeneous
case we
we can
can write
Similarly,
inhomogeneous case
write
t e(t-s)A Bu(s) ds

i~

11.1.6

t
i=1

(it eAiU-S)YiH Bu(s) dS) Xi.


~

Computation
Computation of the matrix exponential
exponential

JCF
method
JCF method
x
xn
1
Let
A eE R"
jRnxn" and
jR~xn is
that X"
X-I AX
AX =
where JJ is
JCF for
Let A
and suppose
suppose X
X Ee Rn
is such
such that
= J,
J, where
is aa JCF
for A.
A.
Then
Then

etA = etXJX-1
= XetJX- 1
n
,

Le A X'YiH

if A is diagonalizable

1=1

I t,x;e'J,y;H

in geneml.

11.1. Differential
Differential Equations
Equations
11.1.

115

tJ
If
Xe tl X-I
If A is
is diagonalizable, it is then easy to compute etA
etA via the formula etA
etA = Xe
X '
tj
since et I is simply a diagonal matrix.
clearly reduces
of
In the more general case, the problem
problem clearly
reduces simply to the computation of
kxk
kxk
the exponential of a Jordan block. To be specific, let .7,
Ji EeC<C
be
beaaJordan
Jordanblock
blockof
ofthe
theform
form

o A
o

Ji =

=U+N.

l N by property 4 of
Clearly A/
AI and N commute. Thus, eettJiI, = eO.!
e'ueetN
the matrix exponential.
tu
lH
Atx
lN is
,eAt).
The diagonal part is easy: ee == diag(e
diag(e ,',...,
ext}. But eetN
is almost as easy since N is
nilpotent
degree k.
nilpotent of degree
k.
nx
Definition 11.8. A matrix
jRnxn
if
matrix M
M E
e M
" is
is nilpotent
nilpotent of
of degree
degree (or
(or index,
index, or
or grade)
grade) p if
p
p l
MP
O.
M =
= 0,
0, while MP-I
M ~ t=^ 0.

l's along only


For the matrix N defined above, it is easy to check that while N has 1's
its first
superdiagonal (and
(and O's
O's elsewhere),
elsewhere), N
N22 has
1's along
along only
only its
its second
second superdiagonal,
superdiagonal,
has l's
its
first superdiagonal
and so forth.
N kk~- lI has a 1 in its (1,
forth. Finally, N
(1, k)
k) element and has O's
O's everywhere else, and
kk
N
N = 0.
O. Thus, the series expansion of e'
e lN
finite, i.e.,
is finite,
t2
t k- I
e IN =I+tN+-N 2 + ... +
N k2!
(k - I)!

t
1

Thus,

ell;

12

At

eAt

teAt

2I e

eAt

teAl

eAt

At

Ik-I

(k-I)!

12

2I e

At

teAl

eAt

In the case when A


A.isiscomplex,
complex,aareal
realversion
versionof
ofthe
theabove
abovecan
canbe
beworked
workedout.
out.

116

Chapter
Chapter 11.
11. Linear
Linear Differential
Differential and
and Difference
Difference Equations
Equations

=[=i a
= x-I
=[
=[

Example
and
Example 11.9.
11.9. Let
Let A
A = [ ~_\ J]. Then
Then A(A)
A (A) =
= {-2,
{-2, -2}
-2} and
etA

Xe tJ

2
1

] exp t
] [

e~2t

-2
0

-~ ] [ -1

te- 2t
e- 2t

][

-1

-1
2

-1
2

Interpolation
Interpolation method
method

This method is numerically unstable in finite-precision arithmetic but is quite effective


effective for
hand
small-order problems.
problems. The
The method
method is
stated and
and illustrated
hand calculation
calculation in
in small-order
is stated
illustrated for
for the
the
exponential function but applies equally well to other functions.
functions.
nxn
tx
A
A
E E.
jRnxn
and /(A)
f(A) =
compute f(A)
f(A) =
= e'
etA,
fixed scalar.
Given
Given A
and
= eetA,
, compute
, where tt is a fixed
Suppose the characteristic
characteristic polynomial of
t',
of A can be written as n(A)
n ( X ) = Yi?=i (A
(^ -~~ Ai^i)"'
where the
the A.,Ai s are distinct. Define

nr=1

n constants that are to be determined. They are, in fact, the unique


where ao,
OTQ, ...
. . . , , an-l
an-i are n
solution of the n equations:
g(k)(Ai) = f(k)(Ai);

k = 0, I, ... , ni - I,

i Em.

Here, the superscript


kth derivative with respect to A.
superscript (k)
(&) denotes the fcth
X. With the aiS
a,s then
known,
function g
= g(A).
The motivation
for this
known, the
the function
g is
is known
known and
and /(A)
f(A) =
g(A). The
motivation for
this method
method is
is
the Cayley-Hamilton Theorem, Theorem 9.3, which says that all powers of A
A greater than
n - 1 can be expressed
Akk for kk = 0, I,
expressed as linear combinations of A
1, ...
. . . ,, n - 1.
1. Thus, all the
terms of order greater
greater than nn - 1 in the power series for ee't AA can be written in terms of these
lower-order
lower-order powers as well. The polynomial gg gives the appropriate linear combination.
Example
11.10. Let
Example 11.10.
Let
A

= [-~

-~0-1~ ]

and /(A)
f(A) = etA.
n(A) = -(A
= 11 and
nl{ = 3.
etK. Then jr(A.)
-(A. +
+ 1)3,
I) 3 , so
so m
m=
and n
2
Let g(X)
g(A) =
ao + alA
aiS are given by
UQ
a\X + a2A2.
o^A. . Then the three equations for the a,s
g(-I)

= f(-1) ==> ao - a l +a2 = e-

g'(-1) = f'(-1)

g"(-I)

= 1"(-1)

==> at - 2a2 = te- t ,


==> 2a2 = t 2 e- t

117

11.1.
Differential Equations
11
.1. Differential
Equations
Solving for
s, we
ai s,
we find
find
Solving
for the
the a,

Thus,

6]

~4 4i
ff>\
TU^^ _/"i\
f\ i o\22
Example 11.11.
11.11. Let
_* J] and
= eO-.
eatk. Then
Then
7r(X)
= (A
2) so
somm=
= 11and
and
Example
Let A
A = [[::::~
andt /(A)
f(A)
=
rr(A) =
(A ++ 2)2
i = 2.
nL
2.
Let
+ ofiA..
equations for
are given
given by
Let g(A.)
g(A) = o
ao +
aLA. Then
Then the
the defining
defining equations
for the
the a,-s
aiS are
by

g(-2)

= f(-2) ==> ao -

g'(-2) = f'(-2)

==> al

2al

= te-

= e- 2t ,
2t

Solving
Solving for
for the
the a,s,
aiS, we
we find
find
ao = e- 2t
aL =

+ 2te- 2t ,

te- 2t .

Thus,
f(A)

= etA = g(A) = aoI + al A


= (e- 2t

_ [
-

+ 2te- 2t )

e- 2t _

~ oI ] + te- 2t

[-4 4]
-I

2te- 2t

-te- 2t

Other methods
Other
methods
l
1
1. Use etA
.c-I{(sI
A)-I}
1.
etA =
= ~
{(sl - A)^
} and techniques for inverse Laplace transforms. This
is quite effective
effective for small-order
small-order problems, but general nonsymbolic computational
techniques are numerically
numerically unstable since the problem
problem is theoretically equivalent
equivalent to
techniques
knowing precisely a JCE
JCF.

2. Use Pade approximation. There is an extensive literature on approximating certain nonlinear functions
functions by rational
rational functions. The matrix analogue yields eeAA ~
=

118
118

Chapter
and Difference
Chapter 11.
11. Linear
Linear Differential
Differential and
Difference Equations
Equations
l

D-I(A)N(A),
D~ (A)N(A), where
where D(A)
D(A) = 001
80I + olA
Si A +
H ... +hopAP
SPA and
and N(A)
N(A) = vol
v0I +
+ vIA
vlA +
+
q

...
+ Vq
vq A
A q..

Explicit
Explicit formulas
formulas are
are known
known for
for the
the coefficients
coefficients of
of the
the numerator
numerator and
and
denominator
polynomials
of
various
orders.
Unfortunately,
a
Pad6
approximation
denominator polynomials of various orders. Unfortunately, a Fade approximation for
for
the exponential
exponential is
is accurate
only in
in aa neighborhood
neighborhood of
the origin;
origin; in
matrix case
the
accurate only
of the
in the
the matrix
case
this means
means when
this
when IIAII
|| A|| isis sufficiently
sufficiently small.
small. This
This can
can be
be arranged
arranged by
by scaling
scaling A,
A, say,
say, by
by
22'
/
AA
{ ] / 2 *)A \ *
multiplying it by 1/2k for sufficiently large k and using the fact that
=
I /2')A )

e (e(

multiplying it by 1/2* for sufficiently large k and using the fact that e = ( e
j .
Numerical loss
accuracy can
in this
procedure from
the successive
squarings.
Numerical
loss of
of accuracy
can occur
occur in
this procedure
from the
successive squarings.

3. Reduce
Reduce A
A to
to (real)
(real) Schur
Schur form
form S
via the
the unitary
unitary similarity
U and
and use
use eeAA = UUe
e SsU
UH
H
3.
S via
similarity U
and
successive recursions
recursions up
the superdiagonals
the (quasi)
upper triangular
triangular matrix
and successive
up the
superdiagonals of
of the
(quasi) upper
matrix
e Ss..
e
4.
Many methods
are outlined
outlined in,
in, for
example, [19].
Reliable and
and efficient
4. Many
methods are
for example,
[19]. Reliable
efficient computation
computation
of matrix
matrix functions
and 10g(A)
remains aa fertile
area for
of
functions such
such as
as eeAA and
log(A) remains
fertile area
for research.
research.

11.2
11.2

Difference
Difference Equations
Equations

In this
this section
section we
we outline
of discrete-time
discrete-time analogues
analogues of
of the
the linear
linear differential
In
outline solutions
solutions of
differential
equations of
of the
the previous
previous section.
Linear discrete-time
modeled by
by systems
of
equations
section. Linear
discrete-time systems,
systems, modeled
systems of
difference equations,
many parallels
parallels to
to the
the continuous-time
continuous-time differential
differential equation
equation
difference
equations, exhibit
exhibit many
case, and
and this
this observation
case,
observation is
is exploited
exploited frequently.
frequently.

11.2.1
11.2.1

Homogeneous
difference equations
Homogeneous linear
linear difference
equations

xn
Theorem 11.12. Let
Let A
A E
jRn xn.. The
solution of
ofthe
the linear
homogeneous system
system of
e Rn
The solution
linear homogeneous
of difference
difference
equations
equations
(11.13)

for kk 2::
given by
by
for
> 00 is
is given

Proof: The
proof is
is almost
almost immediate
immediate upon
upon substitution
into (11.13).
Proof:
The proof
substitution of
of (11.14)
(11.14) into
(11.13).

0
D

Remark 11.13.
Again, we
we restrict
attention only
only to
to the
the so-called
so-called time-invariant
Remark
11.13. Again,
restrict our
our attention
time-invariant
case, where
where the
the matrix
matrix A
A in
is constant
constant and
not depend
depend on
on k.
k. We
We could
could also
case,
in (11.13)
(11.13) is
and does
does not
also
consider
ko, but
but since
the system
is time-invariant,
time-invariant, and
and since
we
consider an
an arbitrary
arbitrary "initial
"initial time"
time" ko,
since the
system is
since we
want to
to keep
keep the
the formulas
formulas "clean"
no double
double subscripts),
we have
have chosen
ko =
= 0
0 for
for
want
"clean" (i.e.,
(i.e., no
subscripts), we
chosen ko
convenience.
convenience.

11.2.2

Inhomogeneous
difference equations
Inhomogeneous linear
linear difference
equations

nxn
nxm
jRnxn,, B e
E R
jRnxm and suppose {*}
{udt~
is a given sequence of
of
Theorem 11.14. Let A eE R
m-vectors.
solution of
m-vectors. Then
Then the
the solution
of the
the inhomogeneous
inhomogeneous initial-value
initial-value problem
problem

(11.15)

11.2.
11.2. Difference
Difference Equations
Equations

119
119

is given by
k-I

xk=AkXO+LAk-j-IBUj, k:::.O.

(11.16)

j=O

Proof: The
Proof:
The proof
proof is
is again
again almost
almost immediate
immediate upon substitution
substitution of
of (11.16)
(11.16) into
into (11.15).
(11.15).

11.2.3
11.2.3

0
D

Computation
Computation of
of matrix
matrix powers
powers

It
clear that
that solution
of
It is
is clear
solution of
of linear
linear systems
systems of
of difference
difference equations
equations involves
involves computation
computation of
sometimes useful
useful for hand
Akk.. One solution method, which is numerically unstable but sometimes
calculation, is to use z-transforms,
z-transforms, by analogy with the use of Laplace transforms to compute
aa matrix
matrix exponential.
exponential. One
One definition
definition of
of the
the z-transform
z-transform of
of aa sequence
sequence {gk} is
is
+00

Z({gk}t~) = LgkZ-k.
k=O

Assuming |z|
Izl >
Assuming
> max
max IAI,
|A|,the
thez-transform
z-transformof
ofthe
thesequence
sequence {Ak}
{Ak} isisthen
thengiven
givenby
by
AEA(A)
XA(A)

+00

k
"'kk
1
12
Z({A})=L...-z-A =I+-A+"2 A + ...

k=O

= (l-z-IA)-I
= z(zI - A)-I.

based on the JCF are sometimes useful, again mostly for small-order
Methods based
small-order probxn
lems. Assume that A eE M"
jRnxn and
and let X
X e
E jR~xn
AX = /,
J, where J is a
R^n be such that X-I
X~1AX
JCF
JCF for
for A.
A. Then
Then
Ak = (XJX-I)k
= XJkX- 1

_I tA~X;y;H
-

if A is diagonalizable,

LXi Jtyi

in general.

;=1

k
If
Akk via the formula A
Akk =
X Jk
If A is diagonalizable,
diagonalizable, it is then easy to compute A
XJ
XX-Il
since /*
Jk is simply a diagonal matrix.

Chapter 11.
11. Linear
Linear Differential
Differential and
and Difference
Difference Equations
Equations
Chapter

120

In
the general
general case,
case, the
the problem
problem again
reduces to
to the
computation of
the power
power of
In the
again reduces
the computation
of the
of aa
pxp
To be specific, let 7,
Ji eE Cpxp
be
a
Jordan
block
of
the
form
Jordan block. To
C

o ...

Writing J/,i =
AI + N and
and noting
noting that
that XI
AI and
and the
the nilpotent
nilpotent matrix
is
Writing
= XI
matrix N commute,
commute, it
it is
k
then straightforward
straightforward to apply the binomial theorem to (AI
+
N)k
and
verify
that
(XI N)
verify

J/ =

Ak

kA k-I

k 2
(;)A -

Ak

kA k- 1

Ak

k
) Ak-P+I
p-l

( ; ) Ak- 2

kA k - 1
Ak

The
symbol (:
( )) has
,(^ ., and
and is
is to
to be
interpreted as
if kk <
< q.
q.
The symbol
has the
the usual
usual definition
definition of
of q!(kk~q)!
be interpreted
as 0
0 if
In the case when A.
A isiscomplex,
complex,aareal
realversion
versionof
ofthe
theabove
abovecan
canbe
beworked
workedout.
out.

a
[2

[=i

-4

Example
11.15. Let
Example 11.15.
Let A
A = [_J
Ak = XJkX-1 =

J]. Then
Then

1 ] [(_2)k
1
0

_ [ (_2/- 1 (-2 - 2k)


-

-k( _2)k-1

k(-2)kk(-2)

] [

1
-1
1

-2

1 ]

]
k( -2l+
(-2l- 1(2k - 2) .

Basic analogues of other methods


methods such as those mentioned in Section 11.1.6
11.1.6 can also
be
derived for
for the
the computation
computation of
of matrix
matrix powers,
but again
again no
universally "best"
be derived
powers, but
no universally
"best" method
method
exists. For an erudite discussion of the state of the art, see [11,
[11, Ch. 18].

11.3
11.3

Higher-Order Equations
Higher-Order
Equations

differential equation can be converted to


It is well known that a higher-order
higher-order (scalar) linear differential
a first-order linear system. Consider, for example, the initial-value
initial-value problem
(11.17)

with J(t)
4>(t} a given function and n initial conditions
y(O)

= Co,

y(O)

= CI,

... , in-I)(O)

= Cn-I'

(1l.l8)

121
121

Exercises

Here,
the mth
with
Here, y(m)
v (m) denotes
denotes the
mth derivative
derivative of
of yy with
with respect
respect to
to t.t. Define
Define aa vector
vector xx (t)
(?) Ee ]Rn
R" with
components
Xl (t) = yyet),
components *i(0
( t ) , xX2(t)
( t ) , ...
. . . ,, xXn(t)
y { n ~ l ) ( t ) . Then
Then
2(t) = yyet),
n(t) = In-l)(t).
Xl (I)

= X2(t) = y(t),

X2(t)

= X3(t) = yet),

Xn-l (t)
Xn(t)

= Xn(t) =

y(n-l)(t),

= y(n)(t) = -aoy(t) -

aly(t) - ... - an_lln-l)(t)

= -aOx\ (t) - a\X2(t) - ... - an-lXn(t)

+ (t)

+ (t).

These
rewritten as
first-order linear
system
These equations
equations can
can then
then be
be rewritten
as the
the first-order
linear system

0
0

0
0

x(t) =

x(t)+ [

0
0
-ao

0
-a\

n~(t)

(11.19)

-a n-\

r.

The initial
initial conditions
conditions take
take the
the form
form X^(0)
= Cc = [co,
[CQ, Cl,
c\,...,
CnM-\_I] .
The
(0) =
.. , C
Note that
that det(A!
det(X7 - A)
A) == A."
an-\Xnn-~1l+H... +halA
a\X++ao.
ao.However,
However,the
thecompanion
companion
An ++an_1A
Note
matrix A
in (11.19)
(11.19) possesses
many nasty
nasty numerical
numerical properties
for even
even moderately
moderately sized
sized nn
matrix
A in
possesses many
properties for
and, as
as mentioned
mentioned before,
before, is
is often
often well
well worth
worth avoiding,
avoiding, at
at least
least for
for computational
computational purposes.
purposes.
and,
A similar
similar procedure
procedure holds
holds for
for the
the conversion
conversion of
of aa higher-order
higher-order difference
difference equation
equation
A

with n
first-order difference
with (vector)
with
n initial
initial conditions,
conditions, into
into aa linear
linear first-order
difference equation
equation with
(vector) initial
initial
condition.
condition.

EXERCISES
EXERCISES
nxn

1. Let
Let P
Rnxn be
projection. Show
Show that
that eeP ~
% !/ +
+ 1.718P.
1.718P.
1.
P E lR
be aa projection.
T
2.
x, y
y E
A =
Further, let
that etA
2. Suppose
Suppose x,
lR
R"n and
and let
let A
= xyT.
xyT. Further,
let aa =
= xXT
y.y. Show
Show that
e'A
T
1+
a)xyT,
where
I + gget,
( t , a)xy
, where

!(eat - I)
g(t,a)= { a
t

3. Let
Let
3.

if a
if a

1= 0,
= O.

122
122

Chapter
and Difference
Chapter 11.
11. Linear
Linear Differential
Differential and
Difference Equations
nx
where X eE M'
jRmxn
" is arbitrary. Show that

e = [eoI
A

sinh 1 X ]
~I

4. Let
Let K denote
denote the
the skew-symmetric
matrix
4.
skew-symmetric matrix

0
[ -In

In ]
0 '

2nx2n
In denotes the n x n identity matrix. A
A matrix A e
E R
jR2nx2n is said to be
where /
1 T
l T
1
K -I A
ATK
K =
- A and
to be
be symplectic
symplectic if K
-I A
ATK
K =
Hamiltonian if K~
= -A
and to
K~
- AA--I.
.

(a) Suppose
Suppose E
H is
is Hamiltonian
Hamiltonian and
and let
let)..
(a)
A,be
be an
aneigenvalue
eigenvalueof
of H.
H. Show
Showthat
that-)..
A,must
must
also
be an
an eigenvalue
also be
eigenvalue of
of H.
H.
(b)
is symplectic
symplectic and
let)..
(b) Suppose
Suppose SS is
and let
A.be
bean
aneigenvalue
eigenvalueof
ofS.S. Show
Showthat
that1/)..
1 /A,must
must
also
also be an eigenValue
eigenvalue of
of S.
H S must be
(c) Suppose
Suppose that H is Hamiltonian and
and S is symplectic.
symplectic. Show
Show that S-I
S~1HS
Hamiltonian.
Hamiltonian.

(d)
(d) Suppose
Suppose H is Hamiltonian. Show that eHH must be symplectic.

5. Let
R and
Let a,
a, ft
f3
E lR
and

Then show
that
Then
show that
ectt

cos f3t
sin f3t

_eut

ectctrt

sin ~t
cos/A

J.

6. Find
Find aa general
general expression
expression for
for

M
7. Find
Find e
etA when A =
=

8.5. Let
Let

(a) Solve
the differential
equation
(a)
Solve the
differential equation

= Ax ;

x(O)

=[ ~

J.

Exercises
Exercises

123

(b)
equation
(b) Solve
Solve the
the differential
differential equation
i

= Ax + b;

x(O)

=[

x(O)

= Xo

9. Consider
Consider the
the initial-value
initial-value problem
9.
problem
i(t)

Ax(t);

for
that
for tt ~
> O.
0. Suppose
Suppose that
that A Ee ~nxn
E"x" is
is skew-symmetric
skew-symmetric and
and let
let ex
a == Ilxol12.
\\XQ\\2. Show
Show that
||*(OII2 =
= ex
aforallf
> 0.
I/X(t)1/2
for all t >
O.

10. Consider
Consider the
the n
matrix initial-value
initial-value problem
10.
n xx nn matrix
problem
X(t)

AX(t) - X(t)A;

X(O)

= c.

Show that
the eigenvalues
eigenvalues of
of the
solution XX((t)
t ) of
of this
this problem
are the
the same
same as
as those
those
Show
that the
the solution
problem are
of
all t.
of C
Cffor
or all?.
11.
there are
three large
Asia (A),
(A),
11. The
The year
year is
is 2004
2004 and
and there
are three
large "free
"free trade
trade zones"
zones" in
in the
the world:
world: Asia
Europe (E),
(E), and
and the
Americas (R).
(R). Suppose
Suppose certain
certain multinational
companies have
Europe
the Americas
multinational companies
have
total
assets of
$40 trillion
$20 trillion
is in
in E
and $20
$20 trillion
is in
in R.
R. Each
total assets
of $40
trillion of
of which
which $20
trillion is
E and
trillion is
Each
year half
half of
of the
Americas' money
stays home,
home, aa quarter
quarter goes
goes to
to Europe,
Europe, and
and aa quarter
quarter
year
the Americas'
money stays
goes to
to Asia.
Asia. For
Europe and
and Asia,
Asia, half
stays home
and half
goes to
to the
Americas.
goes
For Europe
half stays
home and
half goes
the Americas.
(a)
the matrix
that gives
gives
(a) Find
Find the
matrix M
M that

[ A]
E

=M

[A]
E

year k+1

year k

(b)
Find the
the eigenvalues
(b) Find
eigenvalues and
and right
right eigenvectors
eigenvectors of
of M.
M.
(c)
Find the
the distribution
the companies'
(c) Find
distribution of
of the
companies' assets
assets at
at year
year k.
k.
(d)
Find the
the limiting
(d) Find
limiting distribution
distribution of
of the
the $40
$40 trillion
trillion as
as the
the universe
universe ends,
ends, i.e.,
i.e., as
as
+00
+00 (i.e.,
(i.e., around
around the
the time
time the
the Cubs
Cubs win
win aa World
World Series).
Series).
kk ---*
(Exercise adapted
(Exercise
adapted from
from Problem
Problem 5.3.11
5.3.11 in
in [24].)
[24].)

12.
12.

(a) Find
the solution
solution of
of the
the initial-value
initial-value problem
problem
(a)
Find the
.Yet)

+ 2y(t) + yet) = 0;

yeO)

1, .YeO)

= O.

(b)
the difference
(b) Consider
Consider the
difference equation
equation
Zk+2

+ 2Zk+1 + Zk =

O.

If Zo
0 =
= 11 and
and ZI
z\ = 2,
is the
value of
of ZIOOO?
ZIQOO? What
What is
is the
value of
of Zk
Zk in
If
2, what
what is
the value
the value
in
general?
general?

This
page intentionally
intentionally left
left blank
blank
This page

Chapter 12
12
Chapter

Generalized
Generalized Eigenvalue
Eigenvalue
Problems

12.1
12.1

The Generalized
Problem
The
Generalized Eigenvalue/Eigenvector
Eigenvalue/Eigenvector Problem

In
generalized eigenvalue
eigenvalue problem
problem
In this chapter we
we consider the
the generalized
Ax = 'ABx,
xn
where
e e
C"nxn
. . The
The standard
eigenvalue problem
considered in
Chapter 99 obviously
where A,
A, B
B E
standard eigenvalue
problem considered
in Chapter
obviously
corresponds to
special case
case that
corresponds
to the
the special
that B
B =
= I.I.

Definition 12.1.
A nonzero vector x eE C"
en is a right generalized
Definition
12.1. A
generalized eigenvector
eigenvector of
of the pair
MX
B) with
A, B
B eE e
exists aa scalar
scalar 'A.
A eE e,
generalized eigenvalue,
(A,
(A, B)
with A,
Cnxn
" ifif there
there exists
C, called
called aa generalized
eigenvalue,
such that
that
(12.1)
Ax = 'ABx.
Similarly, a nonzero vector y eE C"
en is a left generalized
eigenvector corresponding to an
generalized eigenvector
eigenvalue
eigenvalue 'XA if
if
(12.2)

When the context is such that no confusion can arise, the adjective "generalized"
"generalized"
standard eigenvalue
eigenvalue problem, if x [y]
[y] is a right [left]
is usually dropped. As with the standard
ax [ay]
[ay] for
eigenvector,
eigenvector, then so is ax
for any
any nonzero scalar aa. Ee <C.
C.
Definition
12.2. The
The matrix
matrix A
XB is
is called
(or pencil
Definition 12.2.
A
- 'AB
called aa matrix
matrix pencil
pencil (or
pencil of
of the
the matrices
matrices A
A
and B).
B).
As with the standard eigenvalue problem, eigenvalues for the generalized eigenvalue
problem occur
occur where the matrix pencil
pencil A - 'AB
problem
XB is singular.
Definition
polynomial 7r(A.)
n('A) =
polyDefinition 12.3.
12.3. The
The polynomial
= det(A
det(A - 'AB)
A.5) is
is called
called the
the characteristic
characteristic polynomial of
the matrix
matrix pair
(A, B).
The roots
roots ofn(X.)
are the
the eigenvalues
of the
the associated
nomial
of the
pair (A,
B). The
ofn('A) are
eigenvalues of
associated
generalized eigenvalue
generalized
eigenvalue problem.
problem.
xn
Remark
A, B
jRnxn,
and
Remark 12.4.
12.4. When A,
B Ee E"
, the characteristic
characteristic polynomial is obviously real, and
hence nonreal
nonreal eigenvalues
occur in
in complex
hence
eigenvalues must
must occur
complex conjugate
conjugate pairs.
pairs.

125
125

126
126

Chapter
Chapter 12.
12. Generalized
Generalized Eigenvalue
Eigenvalue Problems
Problems

Remark
12.5. If
= I (or in general when B is nonsingular),
nonsingular), then rr(A)
n ( X ) is a polynomial
Remark 12.5.
If B =
of degree n, and hence there are n eigenvalues
associated
with
the
pencil
A - XB.
AB. However,
eigenvalues
However,
when B
B =II,
in
particular,
when
B
is
singular,
there
may
be
0,
k
E
!!,
or
infinitely
many
= I,
B
k e n,
eigenvalues associated
AB. For example, suppose
associated with the pencil A - XB.
(12.3)
where a and (3
ft are scalars. Then the characteristic polynomial is
det(A - AB)

(I - AHa - (3A)

and there are several cases to consider.


Case
1: aa ^
aretwo
twoeigenvalues,
eigenvalues, I1and
and~.|.
Case 1:
=I- 0,
0, {3ft ^
=I- 0.
O. There
There are
Case 2:
2: a =
= 0,
0, {3
f3 =I/ O.
0. There
There are
are two
eigenvalues, I1 and
and O.
0.
Case
two eigenvalues,
Case 3: a =
= O.
0. There is only one eigenvalue, 1
Case
=I- 0, f3
{3 =
I (of multiplicity
multiplicity 1).
1).
Case 4:
= 0,
0, f3
= 0.
A Ee C
C are
are eigenvalues
eigenvalues since
since det(A
det(A - A.B)
Case
4: aa =
(3 =
O. All
All A
AB) ===0.
O.
If del
det(A
AB) is not
not identically zero,
zero, the pencil
pencil A
- XB
AB is said to be
Definition 12.6.
12.6. If
(A - XB)
regular;
is said to be singular.
singular.
regular; otherwise, it is
Note that if AA(A)
N(A) n
n J\f(B)
N(B) ^
=I- 0, the associated matrix pencil is singular
singular (as in Case
4 above).
Associated
any matrix
B is
and corcorAssociated with
with any
matrix pencil
pencil A - XAB
is aa reciprocal
reciprocal pencil
pencil B
- n,A
/.LA and
responding
generalized eigenvalue problem. Clearly the reciprocal pencil has eigenvalues
responding generalized
/.L =
(JL
= . It
It is instructive to consider the reciprocal
reciprocal pencil associated with the example in
Remark 12.5. With A and B as in (12.3), the characteristic polynomial is

det(B - /.LA) = (1 - /.L)({3 - a/.L)


and there are again four cases to consider.
Case 1:
1: a =I^ 0,
^ 0.
are two
two eigenvalues,
eigenvalues, 1
and ~.
^.
Case
0, {3ft =IO. There
There are
I and
Case
I).
Case 2: a =
= 0, {3ft =I^ O.
0. There is only one eigenvalue, I1 (of multiplicity 1).
Case 3:
^ 0,
= O.
0. There
There are
eigenvalues, 11 and
Case
3: a =I0, f3
{3 =
are two
two eigenvalues,
and 0.
O.
Case 4:
= 0,
0, (3
(3 =
= 0.
6C
C are
are eigenvalues
eigenvalues since
since det(B
det(B - /.LA)
uA) ==
= 0.
Case
4: a =
O. All
All A
AE
O.
At least for the case of regular pencils, it is apparent where the "missing"
"missing" eigenvalues have
gone in Cases 2 and 3. That is to say, there is a second eigenvalue "at infinity" for Case 3 of
of
- A.B,
AB, with its reciprocal
reciprocal eigenvalue being 0 in Case
Case 3 of
of the reciprocal
reciprocal pencil B
- /.LA.
A
nA.
A
similar reciprocal
reciprocal symmetry
symmetry holds
for Case
Case 2.
A similar
holds for
2.
While there are applications in system theory and control where singular pencils
While
appear,
only the
case of
of regular
regular pencils
is considered
considered in
in the
of this
this chapter.
chapter. Note
Note
appear, only
the case
pencils is
the remainder
remainder of
AB always has
that A and/or B may still be singular. If B is singular, the pencil
pencil A - KB

12.2.
12.2. Canonical
Canonical Forms
Forms

127

B is
is nonsingular,
nonsingular, the
the pencil
pencil A
A --AAB
has precisely
precisely n
fewer than
than n eigenvalues.
fewer
eigenvalues. If B
. f i always
always has
eigenvalues,
since the
eigenvalue problem
easily seen
to be
eigenvalues, since
the generalized
generalized eigenvalue
problem is
is then
then easily
seen to
be equivalent
equivalent
to the
eigenvalue problem
problem B~
B- 1lAx
Ax = Xx
Ax (or
AB- 1lw
W = Xw).
AW). However,
However, this
this turns
turns
to
the standard
standard eigenvalue
(or AB~
be aa very
very poor
poor numerical
numerical procedure
procedure for
for handling
handling the
out to
to be
out
the generalized
generalized eigenvalue
eigenvalue problem
problem
if B is
is even
even moderately
moderately ill
conditioned with
with respect
respect to
to inversion.
inversion. Numerical
Numerical methods
methods that
that
if
ill conditioned
A and
and B are
in standard
standard textbooks
textbooks on
on numerical
numerical linear
linear algebra;
algebra;
work directly
directly on
on A
work
are discussed
discussed in
see,
see, for
for example,
example, [7,
[7, Sec. 7.7]
7.7] or [25,
[25, Sec.
Sec. 6.7].
6.7].

12.2
12.2

Canonical Forms
Canonical
Forms

Just as
for the
the standard
standard eigenvalue
eigenvalue problem,
problem, canonical
forms are
are available
available for
the generalized
Just
as for
canonical forms
for the
generalized
eigenvalue problem. Since the latter involves aa pair of matrices,
matrices, we now deal with equivalencies rather
rather than
than similarities,
the first
first theorem
theorem deals
with what
what happens
happens to
to eigenvalues
lencies
similarities, and
and the
deals with
eigenvalues
and eigenvectors
under equivalence.
and
eigenvectors under
equivalence.

nxn
Let A,
A, B,
with Q
and Z nonsingular.
nonsingular. Then
Theorem 12.7.
12.7. Let
fl, Q,
Q, Z eE Cnxn
with
Q and
Then

1.
the same
two
1. the
the eigenvalues
eigenvalues of
of the
the problems
problems A
A
- XB
AB and
and QAZ
QAZ
- XQBZ
AQBZ are
are the
same (the
(the two
problems are said to
problems
to be equivalent).
ifx isa
A-AB, then Z~
Z-llxx isa
righteigenvectorofQAZ-AQB
2. ifx
is a right eigenvector of
of AXB,
is a right
eigenvector of QAZXQ B Z.
Z.
left eigenvector of
-AB, then Q-H
isa left
lefteigenvectorofQAZ
3. ify
ify isa
is a left
of A KB,
Q~Hyy isa
eigenvector ofQAZ -AQBZ.
XQBZ.
Proof:
Proof:
1.
det(QAZ - AQBZ) =
= det[0(A
det[Q(A -- XB)Z]
AB)Z] =
= det
det Q
det ZZdet(A
det(A -- AB).
1. det(QAZ-XQBZ)
gdet
XB). Since
Sincedet
detQ0

and det
det Z
Z are
nonzero, the
the result
result follows.
and
are nonzero,
follows.
l
if and
only if
-AB)Z(Z-l
2. The
The result
result follows
follows by
bynoting
notingthat
that (A
(A-AB)x
yB)x =- 0Oif
andonly
if Q(A
Q(A-XB)Z(Z~
x)x) ==

o.0.

H
3. Again, the result follows easily by noting that yyH
(A
- XB)
AB) 0
o ifif and
and only
if
(A
only if
H
H
(Q-H
O.
0
( Q ~ yy)H
) QQ(A
( A _X BAB)Z
)Z =
= Q.
D

The first
form is
an analogue
of Schur's
Schur's Theorem
and forms,
forms, in
fact, the
the
The
first canonical
canonical form
is an
analogue of
Theorem and
in fact,
QZ algorithm,
algorithm, which
which is
the generally
preferred method
method for
for
theoretical foundation
foundation for
for the
the QZ
theoretical
is the
generally preferred
or [25,
solving the
the generalized
eigenvalue problem;
problem; see,
for example,
example, [7,
solving
generalized eigenvalue
see, for
[7, Sec.
Sec. 7.7]
7.7] or
[25, Sec.
Sec. 6.7].
6.7].
xn
nxn
Theorem 12.8.
Let A,
A, B
B eE c
Then there
exist unitary
matrices Q,
cnxnxn such
such that
that
12.8. Let
Cn
.. Then
there exist
unitary matrices
Q, Z eE Cn

QAZ = Ta ,

QBZ = TfJ ,

where
are upper
Taa and
and Tp
TfJ are
upper triangular.
triangular.
where T
By Theorem
Theorem 12.7,
the eigenvalues
pencil A
A
- XB
AB are
are then
the ratios
ratios of
the diagBy
12.7, the
eigenvalues ofthe
of the pencil
then the
of the
diagonal
elements of
to the
the corresponding
diagonal elements
with the
the understanding
onal elements
of Ta
Ta to
corresponding diagonal
elements of
of T
Tp,
understanding
fJ , with
to an
infinite generalized
generalized eigenvalue.
that aa zero
zero diagonal
diagonal element
that
element of
of TfJ
Tp corresponds
corresponds to
an infinite
eigenvalue.
There
is also
also an
an analogue
of the
Theorem for
for real
matrices.
There is
analogue of
the Murnaghan-Wintner
Murnaghan-Wintner Theorem
real matrices.

Chapter 12.
Chapter
12. Generalized
Generalized Eigenvalue
Eigenvalue Problems
Problems

128

nxn
xn
Theorem 12.9.
B eE R
jRnxn.. Then there exist orthogonal matrices Q, Z e
E R"
jRnxn such
12.9. Let A, B
thnt
that

QAZ = S,

QBZ = T,

where T is upper triangular and S is quasi-upper-triangular.


quasi-upper-triangular.

When S has a 2 x 2 diagonal block, the 2 x 2 subpencil formed


fonned with the corresponding
2 x 2 diagonal
diagonal subblock
2x2
subblock of T has a pair of complex conjugate eigenvalues.
eigenvalues. Otherwise, real
of S to corresponding
eigenvalues are given as above by the ratios of diagonal elements of
elements of T.
T.
There is also an analogue of the Jordan canonical form
fonn called the Kronecker
Kronecker canonical
form (KCF).
KCF, including analogues of
form
(KeF). A full description
description of the KeF,
of principal vectors and
of
so forth, is beyond the scope of this book. In this chapter, we present only statements of
the basic theorems and some examples. The first theorem pertains only to "square" regular
pencils, while the full KeF
KCF in all its generality applies also to "rectangular"
"rectangular" and singular
pencils.
nxn
B eE C
cnxn
pencil A - XB
AB is regular. Then there
Theorem 12.10. Let A, B
and suppose the pencil
x
nxn
exist nonsingular
nonsingular matrices P, Q
E c
C"
"such
suchthat
that

peA - AB)Q =

[~ ~

] - A

[~ ~

form corresponding to the finite eigenvalues of


of A -A.fi
- AB and
where J is a Jordan canonical
canonical form
nilpotent matrix
matrix of
ofJordan blocks associated
associated with 0 and
and corresponding to the infinite
N is a nilpotent
infinite
eigenvalues of
of A - AB.
XB.

Example 12.11.
12.11. The matrix pencil

[2oo I

2 0 0

o
o

0
0
0

1 0
0 1
0 0

~ ]-> [~

0
I
0
0
0

0
0
0
0
0

o
I

o
0

0]
0
0
0
0

2
(X
with characteristic polynomial (A
- 2)
2)2 has a finite eigenvalue 2
2 of multiplicty 2
2 and three
infinite eigenvalues.
mxn
Theorem 12.12
12.12 (Kronecker Canonical Form). Let A, B eE c
Cmxn
. Then there exist
mxm
nxn
mxm
nxn
nonsingular matrices P eE c
nonsingular
C
and Q
Q eE c
C
such that

peA - AB)Q

= diag(LII' ... , L l"

L~, ... L;'. J - A.I, I - )"N),

12.2. Canonical
Canonical Forms
Forms
12.2.

129

where
is nilpotent,
nilpotent, both
both N
and JJ are
in Jordan
canonical form,
is the
the (k
(k +
+ I)
1) xx kk
Nand
are in
Jordan canonical
form, and
and L^
Lk is
where N
N is
bidiagonal
pencil
bidiagonal pencil

-A

-A
Lk

-A
0

The /(
are called
called the
indices while
the r,
called the
the right
right minimal
indices.
The
Ii are
the left
left minimal
minimal indices
while the
ri are
are called
minimal indices.
Left or right minimal indices can take the value O.
Left
0.
Example 12.13. Consider a 13 x 12 block diagonal matrix whose diagonal blocks are

-A 0]
I

-A
I

Such a matrix is in KCF. The first block of zeros actually corresponds


Lo, LQ,
Lo, LQ,
Lo, LQ
L6,,
corresponds to LQ,
LQ,
L6, where each LQ
Lo has "zero columns" and one row, while each LQ
L6 has "zero rows" and
one
second block
L\ while the
L\. The next
one column. The second
block is L\
the third block
block is
is LInext two
two blocks
correspond
correspond to
21
0 2

J =

while the nilpotent matrix N


N in this example is

[ ~6~].
000

Just as sets of eigenvectors


eigenvectors span A-invariant subspaces in the case of the standard
eigenproblem
eigenproblem (recall Definition 9.35), there is an analogous geometric concept for the
generalized eigenproblem.
eigenproblem.
generalized
lxn
Definition 12.14. Let A, B eE W
~nxn and suppose
suppose the pencil
pencil A - XB
AB is regular. Then V is a
deflating
deflating subspace
subspace ifif

dim(AV

+ BV) =

dimV.

(12.4)

eigenvalue case, there is a matrix characterization


characterization of deflating
Just as in the standard eigenvalue
xk
subspace. Specifically, suppose S eE Rn*
~nxk
is a matrix whose columns span a k-dimensional
^-dimensional
subspace S
S of ~n,
Rn, i.e.,
i.e., n(S)
R ( S ) = S.
<S.Then
ThenSS isisaadeflating
deflatingsubspace
subspacefor
forthe
thepencil
pencilAA- AB
XBifif
kxk
and only if there exists M Ee R
~kxk such that
AS = BSM.

(12.5)

130

Chapter 12. Generalized Eigenvalue Problems

If
= /,
(12.4) becomes
dim(AV
+ V)
V) == dim
dimV,
clearly equivalent
equivalent to
If B
B =
I, then
then (12.4)
becomes dim
(A V +
V, which
which is
is clearly
to
AV
V. Similarly,
Similarly, (12.5)
as before.
If the pencil
pencil is
AV c~ V.
(12.5) becomes
becomes AS
AS =
= SM
SM as
before. lEthe
is not
not regular,
regular, there
there
is aa concept
reducing subspace.
is
concept analogous
analogous to
to deflating
deflating subspace
subspace called
called aa reducing
subspace.

12.3
12.3

Application
the Computation
Computation of
of System
System Zeros
Zeros
Application to
to the

Consider the
linear svstem
Consider
the linear
system
i
y

= Ax + Bu,
= Cx + Du

nxn
xm
pxn
pxm
jRnxn,, B
E R"
jRnxm,, C e
E R
jRPxn,, and
jRPxm.. This
with A E M
and D E R
This linear
linear time-invariant statespace model
model is
control theory,
is called
called the
state
space
is often
often used
used in
in multivariable
multivariable control
theory, where
where x(=
x(= x(t))
x(t)) is
the state
vector, u
u is
the vector
vector of
controls, and
is the
the vector
vector,
is the
of inputs
inputs or
or controls,
and yy is
vector of
of outputs
outputs or
or observables.
observables.
For details,
For
details, see,
see, for
for example,
example, [26].
[26].
In general,
general, the
the (finite)
(finite) zeros
of this
system are
given by
the (finite)
(finite) complex
complex numbers
In
zeros of
this system
are given
by the
numbers
where the
the "system
pencil"
z, where
"system pencil"

(12.6)
drops rank.
rank. In
the special
special case
case p
these values
values are
are the
the generalized
generalized eigenvalues
the
drops
In the
p =
= m,
m, these
eigenvalues of
of the
(n +
+ m)
(n + m)
(n
m) x
x (n
m) pencil.
pencil.

Example
12.15. Let
Example 12.15.
Let
A=[

-4

[I 2],

D=O.

Then
the transfer
(see [26])
[26)) of
Then the
transfer matrix
matrix (see
of this
this system
system is
is

+ 14
'
+ 3s + 2

55

g(5)=C(sI-A)-'B+D=

2
5

which clearly
has aa zero
zero at
at -2.8.
Checking the
finite eigenvalues
of the
the pencil
we
which
clearly has
2.8. Checking
the finite
eigenvalues of
pencil (12.6),
(12.6), we
find
find the
the characteristic
characteristic polynomial
polynomial to
to be
be
det [

A-c M DB] "'" 5A + 14,

which has
root at
-2.8.
which
has aa root
at 2.8.
The method
method of
of finding
via aa generalized
generalized eigenvalue
problem also
works
The
finding system
system zeros
zeros via
eigenvalue problem
also works
well for
for general
multi-output systems.
Numerically, however,
however, one
must be
well
general mUlti-input,
multi-input, multi-output
systems. Numerically,
one must
be
careful first
first to
to "deflate
out" the
the infinite
(infinite eigenvalues
of (12.6.
This is
careful
"deflate out"
infinite zeros
zeros (infinite
eigenvalues of
(12.6)). This
is accomaccomplished
computing aa certain
certain unitary
unitary equivalence
equivalence on
system pencil
that then
yields aa
by computing
on the
the system
pencil that
then yields
plished by
smaller
eigenvalue problem
problem with
with only
only finite
finite generalized
generalized eigenvalues
(the finite
finite
smaller generalized
generalized eigenvalue
eigenvalues (the
zeros).
zeros).
The
connection between
system zeros
zeros and
and the
system pencil
is nonThe connection
between system
the corresponding
corresponding system
pencil is
nonof aa single-input,
single-input.
trivial. However,
However, we
we offer
some insight
insight below
below into
the special
case of
trivial.
offer some
into the
special case

12.4. Symmetric
Generalized Eigenvalue
Eigenvalue Problems
Problems
12.4.
Symmetric Generalized

131
131

1
lxn
single-output system. Specifically, let B = bb E
ffi.n, C = c T E
ffi.l xn,, and
D =d E
R
e Rn,
e R
and D
e R.
r
!
T
g(s) = cc (s7
(s I - A)~
A) -1Z?
b+
Furthermore, let g(.s)
+ dd denote the system transfer function
function (matrix),
and assume
that gg(s)
( s ) can
in the
can be
be written
written in
the form
form
and
assume that

v(s)
g(s) = n(s)'

polynomial of A,
A, and v(s)
relatively prime
where n(s)
TT(S) is the characteristic polynomial
v(s) and n(s)
TT(S) are relatively
(i.e., there are no "pole/zero
"pole/zero cancellations").
cancellations").
Suppose Zz E C is
is such
such that
that
Suppose
[

A - zI
cT

b ]

is singular. Then there exists a nonzero solution to

or
or

+ by =

0,

(12.7)

c T x +dy = O.

(12.8)

(A - zl)x

A (i.e., no pole/zero
pole/zero cancellations), then from (12.7) we
Assuming z is not an eigenvalue of A
get
get
x = -(A - zl)-lby.
(12.9)

Substituting this
(12.8), we
have
Substituting
this in
in (12.8),
we have
_c T (A - zl)-lby

+ dy =

0,

or
( z ) y = 00 by
definition of
of g.
^ 00 (else
from (12.9)).
or gg(z)y
by the
the definition
g. Now
Now _y
y 1=
(else xx = 00 from
(12.9. Hence
Hence g(z)
g(z) = 0,
0,
i.e., zz is a zero of g.
g.

12.4
12.4

Symmetric
Symmetric Generalized
Generalized Eigenvalue
Eigenvalue Problems
Problems

A very important special case of the generalized eigenvalue problem


Ax = ABx

(12.10)

nxn
for A,
A, B
ffi.nxn arises when A
A = A
AT and
B = BT
the second-order
B Ee R
and B
B1 > O.
0. For example, the
system of differential
differential equations

Mx+Kx=O,
M is a symmetric positive definite
K is a symmetric "stiffness
where M
definite "mass matrix" and K
"stiffness
matrix," is a frequently
frequently employed model of structures or vibrating systems and yields a
generalized eigenvalue problem ofthe
of the form (12.10).
Since B
definite it
Thus, the
(12.10) is
is equivalent
Since
B is
is positive
positive definite
it is
is nonsingular.
nonsingular. Thus,
the problem
problem (12.10)
equivalent
Ax =
AX. However, B~
B-11AA is not necessarily
to the standard eigenvalue problem BB~l1Ax
= AJC.
symmetric.

132
132

Chapter 12.
12. Generalized
Eigenvalue Problems
Problems
Chapter
Generalized Eigenvalue

Example
12.16. Let
Example 12.16.
Let A
A

= [~ ;

l = [i ~ J

B~Il = [-~ ~

ThenB~ AA
Then

B~Il A
A are always real (and are approximately
approximately 2.1926
Nevertheless, the eigenvalues of B
and -3.1926 in Example 12.16).
nxn
T
T
Theorem 12.17. Let A,
A, B
B E
jRnxn with A
A =A
AT
and B
B = B
BT
>
O. Then the
eR
and
> 0.
the generalized
eigenvalue
problem
eigenvalue problem
Ax = ABx

has n real eigenvalues, and the n corresponding right eigenvectors can be chosen to be
orthogonal
product (x, y)
y)BB = X
x TTBy.
By. Moreover,
Moreover, if
orthogonal with respect to the inner product
if A >
> 0,
0, then
the eigenvalues are also all positive.
positive.
Proof:
Since B
> 0,
0, it
it has
= LL
Proof: Since
B >
has aa Cholesky
Cholesky factorization
factorization B
B =
LL TT,, where
where L
L is
is nonsingular
nonsingular
(Theorem
10.23). Then
the eigenvalue
problem
(Theorem 10.23).
Then the
eigenvalue problem
Ax

= ABx = ALL Tx

can be rewritten as the equivalent problem


(12.11)
J
1
Letting C =
AL ~T
and zZ =
= LLT
x, (12.11)
(12.11) can
can then
then be
be rewritten
rewritten as
as
= L ~I1AL
and
x,

(12.12)

Cz = AZ.

Since C
C=
=C
CTT,, the
n real
corresponding eigeneigenthe eigenproblem
eigenproblem (12.12)
(12.12) has
has n
real eigenvalues,
eigenvalues, with
with corresponding
Since
vectors
Z I, .. , z
Znn satisfying
vectors zi,...,
Zj = Dij.

zi

Then x,
Xi = L ~Tzi,
Zi, ii
E n,
!!., are
are eigenvectors
eigenvectors of
of the
the original
original generalized
generalized eigenvalue
eigenvalue problem
problem
and satisfy
satisfy
and
(Xi, Xj)B

= xr BXj = (zi L ~l)(LLT)(L ~T Zj) = Dij.

Finally,
> 0,
CTT >
> 0,
0, so
so the
eigenvalues are
Finally, if
if A
A =
= A
AT>
0, then
then C =
= C
the eigenvalues
are positive.
positive.

D
0

Example
12.18. The
The Cholesky
factor for
for the
B in
in Example
12.16 is
Example 12.18.
Cholesky factor
the matrix
matrix B
Example 12.16
is

L=[~
.,fi

1] .
.,fi

Then it
Then
it is
is easily
easily checked
checked thai
that

c = L~lAL~T = [ 0..5
2.5

2..5 ]
-1.5 '

-3.1926 as expected.
whose eigenvalues are approximately 2.1926 and 3.1926
of this section can,
The material
material of
can, of course, be generalized
generalized easily to the case
case where A
A
and
since real-valued
most applications,
applications,
and B are
are Hermitian,
Hermitian, but
but since
real-valued matrices
matrices are
are commonly
commonly used
used in
in most
we
attention to
to that
we have
have restricted
restricted our
our attention
that case
case only.
only.

12.5. Simultaneous
Simultaneous Diagonalization
Diagonalization
12.5.

12.5
12.5

133

Simultaneous
Simultaneous Diagonalization
Diagonalization

Recall that
many matrices
be diagonalized
diagonalized by
by aa similarity.
In particular,
particular, normal
maRecall
that many
matrices can
can be
similarity. In
normal matrices
can be
by aa unitary
unitary similarity.
similarity. It
It turns
turns out
in some
some cases
cases aa pair
pair of
trices can
be diagonalized
diagonalized by
out that
that in
of
matrices (A,
be simultaneously
diagonalized by
by the
the same
matrix. There
There are
many
matrices
(A, B)
B) can
can be
simultaneously diagonalized
same matrix.
are many
such
results and
we present
useful) theorem
such results
and we
present only
only aa representative
representative (but
(but important
important and
and useful)
theorem here.
here.
Again, we
we restrict
our attention
attention only
only to
the real
case, with
with the
the complex
complex case
case following
following in
Again,
restrict our
to the
real case,
in aa
straightforward
way.
straightforward way.
x
][~nxn
Theorem 12.19
12.19 (Simultaneous Reduction to Diagonal Form). Let A, B Ee E"
" with
T
T
A=A
AT and
and B
B= B
BT >
> 0.
O. Then
Then there
there exists
exists a
a nonsingular
nonsingular matrix
matrix Q
Q such
such that
that
A

where D
D is
is diagonal.
Infact,
diagonal elements
D are
are the
eigenvalues of
of BA.
where
diagonal. In
fact, the
the diagonal
elements of
of D
the eigenvalues
B 11A.
T
T
Proof:
Let B = LL
LLT
be the
and set
C = L~
L -I1AL
-T.
Proof: Let
be
the Cholesky
Cholesky factorization
factorization of
of B and
setC
AL~
. Since
Since
T
C
p == D,
C is
is symmetric,
symmetric, there
there exists
exists an
an orthogonal
orthogonal matrix
matrix P
P such
such that
that pTe
P CP
D, where
where D
D is
is
Let Q
L - TTP.
P. Then
diagonal.
diagonal. Let
Q=
= L~
Then

and
and

= pT L -I(LLT)L -T P = pT P = [.
T P pT
1 A = L -T
= QQT
AQQ-Il = LL -T
L -I
L -I1A
A
QQT AQQ~
PPTL~
A
L~TL~

QT BQ
Finally,
Finally, since
since QDQ-I
QDQ~l
have
A(D) =
A(B-11A).
A).
haveA(D)
= A(B~

A, we
we
= BB~11A,

0
D

Note that
that Q
Q is
general orthogonal,
it does
does not
not preserve
preserve eigenvalues
Note
is not
not in
in general
orthogonal, so
so it
eigenvalues of
of A and
and B
B
individually.
it does
does preserve
preserve the
the eigenvalues
A -'AB.
This can
be seen
directly.
individually. However,
However, it
eigenvalues of
of A
XB. This
can be
seen directly.
Let
A = QT
AQ and B = QT
B- 1A = Q-1
B- 1l Q-T
LetA
QTAQandB
QTBQ. Then
Then/HA
Q~l B~
Q~T QT
QT AQ = Q-1
Q~1BB~1AQ.
AQ.
Theorem
very useful
useful for
many statements
pairs of
symmetric
Theorem 12.19
12.19 is
is very
for reducing
reducing many
statements about
about pairs
of symmetric
matrices to
diagonal case."
typical.
matrices
to "the
"the diagonal
case." The
The following
following is
is typical.
xn
nxn
Theorem
Let A,
A, B
B Ee lR
be
positive definite.
definite. Then
A >
2: B
B ifif and
and only
2:
Theorem 12.20.
12.20. Let
M"
be positive
Then A
only if
if BB~l1 >
1
A-I..
A-

Proof: By
that QT
AQ =
D and
BQ =
Proof:
By Theorem
Theorem 12.19,
12.19, there
there exists
exists Q
Q Ee lR~xn
E"x" such
such that
QT AQ
= D
and QT
QT BQ
= [,I,
where D
0 by
by Theorem
Theorem 10.31.
A >
2: B,
by Theorem
where
D is
is diagonal.
diagonal. Now
Now D
D >> 0
10.31. Also,
Also, since
since A
B, by
Theorem
10.21
have that
AQ 2:
BQ, i.e.,
i.e., D
D 2:
But then
D-1I ::::
trivially true
true
10.21 we
we have
that QT
QTAQ
> QT
QTBQ,
> [.
I. But
then D"
< [(this
/ (this is
is trivially
lI T
T
D- QT
QT,
A -Il ::::
0
since
the two
two matrices
are diagonal).
diagonal). Thus,
since the
matrices are
Thus, Q
QD~
Q ::::
<Q
QQ
, i.e.,
i.e., A~
< BB~l1. . D

\ 2.5.1
12.5.1

Simultaneous diagonalization
diagonalization via
via SVD
SVD
Simultaneous

T
There are
which forming
L -I1AL~
AL -T
as in
in the
proof of
Theorem 12.19
is
There
are situations
situations in
in which
forming C
C = L~
as
the proof
of Theorem
12.19 is
numerically
when L
L is
respect to
inversion. In
In
numerically problematic,
problematic, e.g.,
e.g., when
is highly
highly iII
ill conditioned
conditioned with
with respect
to inversion.
such cases,
SVD. To
illustrate, let
let
such
cases, simultaneous
simultaneous reduction
reduction can
can also
also be
be accomplished
accomplished via
via an
an SVD.
To illustrate.

134
134

Chapter 12.
12. Generalized
Eigenvalue Problems
Problems
Chapter
Generalized Eigenvalue

T
us
assume that
and B
LsLTB
us assume
that both
both A
A and
B are
are positive
positive definite.
definite. Further,
Further, let
let A
A =
= L
LAL~
and B
B
= LBL~
AL A and
be Cholesky
factorizations of A
Cholesky factorizations
A and B,
B, respectively.
respectively. Compute the SVD

(12.13)

i/

x
where E
" isisdiagonal.
diagonal. Then
Thenthe
thematrix
matrixQQ== LLBTu
U performs
performsthe
thesimultaneous
simultaneous
L Ee R
1R~ xn
diagonalization. To check this, note that

T
QT AQ = U Li/(LAL~)Li/U
= UTULVTVLTUTU

L2

while
QT BQ = U T LB1(LBL~)Li/U
= UTU

= I.

Remark
without explicitly forming the
Remark 12.21. The SVD in (12.13) can be computed without
product or the inverse by using the so-called generalized singular value
indicated matrix product
decomposition (GSVD). Note that

LBB1 LAA can be found from the eigenvalue problem


and thus the singular values of L

02.14)
Letting
see that (12.14)
= XL
=
Letting xx =
= LLBT
Z we
we see
02.14) can be rewritten
rewritten in the
the form L
LAL~x
ALBz
B z
ALAx =
Bz =
A
L g L ^ L g 7 zz,, which
ALBL~LBT
which is
is thus
thus equivalent to
to the
the generalized
generalized eigenvalue
eigenvalue problem
problem

02.15)
The problem
problem (12.15) is called a generalized
generalized singular value problem and algorithms exist to
solve it (and hence equivalently (12.13
LA
(12.13)) via arithmetic operations performed only on LA
T
T
and LB
L
L
L
see,
for
L B separately, i.e.,
i.e., without forming the products
products L
LA
L
~
or
L
B
L
~
explicitly;
see,
A A
B B
example, [7, Sec.
Sec. 8.7.3]. This is analogous to finding the singular values of a matrix M by
T
operations performed
performed directly on M rather than by forming
MT
forming the matrix M
M and solving
T
the eigenproblem
eigenproblem M
MT MX
M x = AX.
Xx.

Remark 12.22. Various generalizations


generalizations of the results
results in Remark 12.21
12.21 are possible, for
T
example, when A
A =
= A
AT:::
O. The case when A
A is symmetric but indefinite is not so
> 0.
T
straightforward, at least in real arithmetic. For example, A can be written as A = PDP
PDPT,
,
~ ~
~
~ T
P is orthogonal,butin
writing A
= PDDP
PDDp T = PD(PD)
PD(PD{ with
with
where Disdiagonaland
D is diagonal and P
orthogonal, but in writing
D
D diagonal, D
b may have pure imaginary elements.

12.6. Higher-Order
Eigenvalue Problems
Problems
12.6.
Higher-Order Eigenvalue

12.6
12.6

135

Higher-Order
Higher-Order Eigenvalue
Eigenvalue Problems
Problems

Consider
second-order system
equations
Consider the
the second-order
system of
of differential
differential equations
(12.16)

Mq+Cq+Kq=O,

1
xn
q(t) e
E W
~n and M, C, K e
E Rn
~nxn.. Assume for simplicity that M is nonsingular.
where q(t}
Suppose, by analogy with the first-order case, that we try to find a solution of (12.16) of the
= eeAtxt p,p, where the n-vector pp and scalar A.
A are
form q(t)
q(t) =
aretotobe
bedetermined.
determined. Substituting
Substitutinginin
(12.16)
(12.16) we get

or, since eAt

:F 0,
(A 2 M

+ AC + K) p

= O.

p, we thus seek
seek values of A.
A for which the matrix A.
A22M
AC +
+K
To get a nonzero solution /?,
M+
+ A.C
is singular.
singular. Since
the determinantal
determinantal equation
is
Since the
equation

o = det(A 2 M + AC + K) = A2n + ...


polynomial of degree 2rc,
2n, there are 2n eigenvalues for the second-order (or
yields a polynomial
A22M
AC +
quadratic) eigenvalue problem A.
M+
+ A.C
+ K.
K.
A special
case of
(12.16) arises
frequently in
in applications:
applications: M =
0, and
and
A
special case
of (12.16)
arises frequently
= I, C = 0,
T
= K
KT.
Suppose K has eigenvalues
K =
. Suppose
eigenvalues
IL I

::: ... :::

ILr ::: 0 > ILr+ I

::: ... :::

ILn

22
Let
= I| ILk
fjik I1!.2 Then
Then the
the 2n
2n eigenvalues
eigenvalues of
of the
the second-order
second-order eigenvalue
eigenvalue problem
problem A
A.
K
Let a>k
Wk =
I /+ K
are
are

jWk; k = 1, ... , r,
Wk; k = r + 1, ... , n.
T
If rr = n
n (i.e.,
(i.e., K = K
KT
:::
0), then
then all
all solutions
of q
q + Kq
Kq = 0
0 are
If
> 0),
solutions of
are oscillatory.
oscillatory.

12.6.1
12.6.1

Conversion
form
Conversion to
to first-order
first-order form

Let
Let x\
XI = q
q and
and \i
X2 = q. Then
Then (12.16)
(12.16) can
can be
be written
written as
as aa first-order
first-order system
system (with
(with block
block
companion matrix)

-M-1K

2
x (t) .
E E
~2n.
M is singular, or if it is desired to avoid the calculation of M
M- lI because
where x(t)
". If M
M
M is too ill conditioned with respect to inversion, the second-order
second-order problem (12.16) can still
generalized linear
linear system
be converted
converted to the first-order generalized

I
[ o

OJ'x = [0
-K

I
-C

Jx.

136
136

Chapter
Chapter 12.
12. Generalized
Generalized Eigenvalue
Eigenvalue Problems
Problems

Many
other first-order
realizations are
possible. Some
Some can
can be
useful when
M, C,
C, andlor
and/or K
Many other
first-order realizations
are possible.
be useful
when M,
K
have special symmetry or skew-symmetry properties
properties that can exploited.
Higher-order analogues of (12.16) involving,
naturally
involving, say, the kth derivative
derivative of q,
q, lead naturally
to
eigenvalue problems
problems that
converted to
form using
using aaknxkn
to higher-order
higher-order eigenvalue
that can
can be
be converted
to first-order
first-order form
kn x kn
block
companion matrix
analogue of
of (11.19).
(11.19). Similar
Similar procedures
general kthk\hprocedures hold
hold for
for the
the general
block companion
matrix analogue
order
difference
equation
order difference equation

which
various first-order
systems of
kn.
which can
can be
be converted
converted to
to various
first-order systems
of dimension
dimension kn.

EXERCISES
EXERCISES
xm
1. Suppose A eE R
lRnnxxn" and D Ee lR::!
finite generalized eigenvalues
eigenvalues of
of
Rxm.
. Show that the finite
the pencil

[~ ~J-A[~ ~J
are the eigenvalues of the matrix A
- BD
B D- 11C.
MX
2. Let
Let F, G E e
C nxn
".
Show that
that the
the nonzero
eigenvalues of
of FG and
the same.
Show
nonzero eigenvalues
and GF
G F are
are the
same.
proof' is to verify
verify that the matrices
Hint: An easy "trick
"trick proof

[Fg

~]

and

[~

GOF ]

are
similar via
the similarity
similarity transformation
transformation
are similar
via the

nxm
mx
Let F
F e
E Cnxm ,, G
Are the
FG and
the
3. Let
G eE Cmxn
". Are
the nonzero singular values of FG
and GF
GF the
same?
same?
nxn
E R
]Rnxn,, B e
E lR
4. Suppose A
Rnnxm
*m, and
and C eE lRmxn.
E wx ".Show
Showthat
thatthe
thegeneralized
generalizedeigenvaleigenvalues of
of the
the pencils
pencils
ues

[~ ~J-A[~ ~J
and
and
[ A

+ B~ + GC

~] _ A [~ ~]

1
are identical for all F
F E
Rm xn
6 E"
*" and all G EG R"
R"xmx m. .
Hint: Consider the equivalence

B][IF0]
[ 0I 1G][A-U
CO
l'
(A similar
similar result
result is
is also
also true
true for
for "nonsquare"
"nonsquare" pencils.
In the
the parlance
of control
control theory,
(A
pencils. In
parlance of
theory,
such results show that zeros are invariant under state feedback or output injection.)

Exercises
Exercises

137
137

diagonalization problems
desired
5. Another
Another family of simultaneous
simultaneous diagonalization
problems arises when it is desired
operates on matrices A, B Ee
that the simultaneous diagonalizing transformation Q operates
nx
T
jRnxn
Q-ll AQ~
AQ-T
and QT
BQ are simultaneously diagonal. Such
]R
" in such a way that Q~
QTBQ
a transformation
transformation is called contragredient.
contragredient. Consider the case where both A
A and
B are positive
B
positive definite with Cholesky
Cholesky factorizations A =
= L&L
LA LTA~ and B =
= L#Lg,
L B L ~,
T
respectively,
and let
an SVD
SVD of
of L~LA'
LTBLA.
respectively, and
let UW
U~VT be
be an

(a) Show that Q =


LA V
V~-!
contragredient transformation that reduces both
= LA
~ 5 is a contragredient
A
A and
and B
B to
to the
the same
same diagonal
diagonal matrix.
matrix.
T T
(b) Show that Q~
Q-ll =
~-!UTL~.
= ^~^U
L B.

(c) Show that the eigenvalues of A


AB
B are the same as those of 1;2
E2 and hence are
positive.
positive.

This
page intentionally
intentionally left
left blank
blank
This page

Chapter
13
Chapter 13

Kronecker
Products
Kronecker Products

13.1
13.1

Definition
Definition and
and Examples
Examples

mxn
the Kronecker
Definition 13.1.
Let A
A e
E R
lRmx
B e
E lR
Kronecker product
product (or
tensor
Definition
13.1. Let
",, B
Rpxq.. Then
Then the
(or tensor
product) of
of A
A and
and B
B is
defined as
product)
is defined
as the
the matrix
matrix
allB
A@B=
[

alnB ]

amlB

amnB

lRmpxnq.

(13.1)

Obviously, the
definition holds
holds if
A and
and B
B are
matrices. We
We
Obviously,
the same
same definition
if A
are complex-valued
complex-valued matrices.
restrict our
attention in
in this
this chapter
chapter primarily
primarily to
to real-valued
real-valued matrices,
matrices, pointing
pointing out
out the
the
restrict
our attention
extension to
to the
the complex
only where
where it
it is
not obvious.
extension
complex case
case only
is not
obvious.
Example 13.2.
Example
13.2.
1. Let A

= [~

2
2

nand B

A@B

=[

= [;

3~

2B
2B

~J. Then

~]~U

3 4
3 4
9 4

2
6
2
6

Note
that B
<g> AA i-/ AA@<g>B.
B.
Note that
B @

J.

6
6
2
2

X(
pxq
2.
e!F
= [o
2. Foranyfl
Forany B E
lR 7,, //z2 <8>fl
@ B =
[~ ~ l\
In yields a block diagonal
Replacing 12
I2 by /
diagonal matrix with
with nn copies
copies of
of B along
along the
the
diagonal.

Let B
B be
be an
arbitrary 22x2
x 2 matrix.
matrix. Then
3. Let
an arbitrary
Then

@/z =

b~l

b"

139

b12

b ll

b2 2

b21

0
b12
0
b 22

140

Chapter 13. Kronecker


Products
Kronecker Products

The extension to arbitrary B


B and /
In is obvious.
m
x
E R
~m,, y e
E !R.n.
4. Let Jt
R". Then

Y = [ XIY T , ... , XmY T]T

[XIYJ, ... , XIYn, X2Yl, ... , xmYnf E !R.

mn

5. Let* eR m , y eR". Then

13.2
13.2

Properties
the Kronecker
Product
Properties of
of the
Kronecker Product

mx
rxi
sxt
Theorem 13.3.
Let A e
E R
~mxn,
B Ee R
~rxs,, C
C e
E ~nxp,
and D
D e
E R
~sxt.. Then
13.3. Let
", 5
R" x ^ and

~mrxpt).

(A 0 B)(C 0 D) = AC 0 BD (E

(13.2)

Proof:
Simply verify
Proof; Simply
verify that
that

L~=l al;kCkPBD

~[
=AC0BD.

L~=1 amkckpBD

Theorem 13.4. For


Foral!
all A and B, (A Bl = AT BT.

Proof' For the proof, simply verify


Proof:
verify using the definitions of transpose
transpose and Kronecker
Kronecker
0
product. D
xn
mxm
Corollary 13.5.
If A eE R"
]Rn xn and B E
xm are
are symmetric, then A
A B is symmetric.
13.5. If
e !R.
Rm

Theorem 13.6.
If A and B
Bare
13.6. If
are nonsingular, (A B)-I =

A-I

B- 1.

Proof: Using Theorem


Proof:
Theorem 13.3,
13.3, simply note that (A B)(A -1 B- 1 )

= 1 1 = I.

13.2.
Properties of
Kronecker Product
Product
13.2. Properties
of the
the Kronecker

141
141

xn
mxm
Theorem 13.7.
13.7. If
am/ B
are normal,
is normal.
normal.
Theorem
If A
A Ee IR"
IR nxn
and
B eR
E IR mxm are
normal, then
then A
A0 B
B is

Proof:
Proof:
(A 0 B{ (A 0 B) = (AT 0 BT)(A 0 B)

by Theorem 13.4

= AT A 0 BT B

by Theorem 13.3

= AAT 0 B BT

since A and B are normal

= (A 0 B)(A 0 B)T

by Theorem 13.3.

xn
mxm
Corollary 13.8.
13.8. If
IR nxn
orthogonal and
IR mxm
is
0 B is
Corollary
If A E
E"
is orthogonal
and B E
eM
15 orthogonal, then
then A <g>
is
orthogonal.
Sine] anddB
Sin</>] Th
.,IS
'1y seen
Then
it
is easl
easily
seen that
Example
13.9.
B -= [Cos</>
E
L et A
A = [ _eose
xamp Ie 139
. Let
sin e cose an
_ sin</> cos</>O
en It
that
A is orthogonal
orthogonal with eigenvalues ejO
ej9 and B is orthogonal
orthogonal with eigenvalues
eigenvalues ej</J.
ej(i>. The 4 x 4
(6>fJ >
A
0 5
B is then also orthogonal with eigenvalues e^'^+'W
ejeH</ and eeje
-</.\
matrix A
^ ~^

vI

mx
Theorem
13.10. Lgf
" have
l/^E^Vj
an^
/ef
Theorem 13.10.
Let A
A EG E
IR mxn
have aa singular
singular value
value decomposition
decomposition VA
~A
and let
pxq
pxq
IR
singular value decomposition
decomposition V
B ~B VI. Then
B E
fi
e^
have a singular
UB^B^B-

yields
of A
(after aasimple
simplereordering
reorderingof
ofthe
thediagonal
diagonal
yields aa singular
singular value
value decomposition
decomposition of
A <8>
0 BB (after
elements of
O/A
<8>~B
5 and
andthe
thecorresponding
correspondingright
rightand
andleft
left singular
singularvectors).
vectors).
~A 0
elements
q
Corollary
Corollary 13.11. Let A E
e lR;"xn
Rx" have singular
singular values UI
a\ ::::
> ...
::::
> Uarr >
> 0 and let B Ee IRfx
have singular
singular values
values <I
T\ ::::
> ...
::::
> T<sS >
0. Then
Then A
(or BB 0<8>A)A)has
hasrsrssingular
singularvalues
values
have
> O.
A <g)
0 BB (or
U,
<I ::::
and
^iT\
> ...
::::
> UffrrT<ss > 0
Qand
rank(A 0 B)

= (rankA)(rankB) = rank(B 0

A) .

mmxw
IR nnx
xn"have
xm have
Theorem 13.12. Let A E
e R
haveeigenvalues
eigenvaluesAi,A.,-,i / E e!!,n,and
andletletBB E e IRR
/zave
eigenvalues
m.
TTzen the
?/ze mn
mn eigenvalues
eigenvalues of
of A
are
eigenvalues jJij,
JL j, 7j E
m. Then
A0 B
Bare

Moreover,
if x\,
...,, xxp
are linearly
independent right
right eigenvectors
eigenvectors of
of A
corresponding
Moreover, if
Xl, .
linearly independent
A corresponding
p are
AI,
App (p
(p <
::::: n), and
and zi,
ZI, ...
independent right eigenvectors of
of B
to A
- i ...
, . . . ,, A.
,,Zq
zq are linearly independent
mnm"are
corresponding to
to JJL\
...,,JLq
\Juq (q
(q <
then ;c,
<8>ZjZj E IR
ffi.
are linearly
linearlyindependent
independent right
right
corresponding
JLI,, ...
::::: m),
m), then
Xi 0
eigenvectors of
of A
to A.,/u,
e l!!
/?, 7j Ee 1
q.
A0 B
B corresponding
corresponding to
Ai JL j,7, ii E
eigenvectors

Proof: The basic idea of the proof


proof is as follows:
Proof:
follows:
(A 0 B)(x 0 z) = Ax 0 Bz

=AX 0

JLZ

= AJL(X 0 z).

If
and Bare
B are diagonalizable
diagonalizable in
in Theorem
Theorem 13.12,
13.12, we
can take
n and q
m
If A
A and
we can
take p
p = nand
q =
m and
and
thus
get the
<8>B.
B. InIngeneral,
general,ififAAand
and Bfi have
haveJordan
Jordan form
form
thus get
the complete
complete eigenstructure
eigenstructure of
of A
A0

142
142

Chapter 1
13.
Chapter
3. Kronecker
Kronecker Products
Products

decompositions given
given by
by P~
p-lI AP
AP = JJA
and Q-l
BQ = JB,
J B , respectively,
respectively, then
then we
we get
the
decompositions
Q~] BQ
get the
A and
following Jordan-like
Jordan-like structure:
following
structure:
(P Q)-I(A B)(P Q) = (P- I Q-l)(A B)(P Q)
= (P- 1 AP) (Q-l BQ)

= JA JB
Note that
that JA
h JB,
JR, while
while upper
upper triangular,
triangular, is
generally not
not quite
quite in
and needs
Note
is generally
in Jordan
Jordan form
form and
needs
further reduction
reduction (to
(to an
ultimate Jordan
form that
that also
depends on
on whether
whether or
or not
not certain
further
an ultimate
Jordan form
also depends
certain
eigenvalues are
are zero
zero or
or nonzero).
eigenvalues
nonzero).
A Schur
Schur form
form for
for A
B
B can
can be
derived similarly.
suppose P
A
be derived
similarly. For
For example,
example, suppose
P and
and
Q are
i.e.,
are unitary
unitary matrices
matrices that
that reduce
reduce A
A and
and B,
5, respectively,
respectively, to
to Schur
Schur (triangular)
(triangular) form,
form, i.e.,
H
H
pH
AP =
= T
TAA and
and Q
QH
BQ =
= T
TBB (and
(and similarly
similarly if
if P and
and Q are
are orthogonal
orthogonal similarities
similarities
P
AP
BQ
reducing
Schur form).
Then
reducing A
A and
and B
B to
to real
real Schur
form). Then
(P Q)H (A B)(P Q) = (pH QH)(A B)(P Q)

= (pH AP) (QH BQ)


= TA TR .
IRnnxn
xn and B e
E R
IR rnmxm
xm.. Then
Corollary 13.13.
13.13. Let A eE R
1. Tr(A B) = (TrA)(TrB) = Tr(B A).
2. det(A B) = (det A)m(det Bt = det(B A).
mxm
Definition 13.14.
IR nnxn
Xn and B e
E R
IRm
xrn.. Then the Kronecker
Kronecker sum (or tensor sum)
Definition
13.14. Let A eE R
of A
and B,
B, denoted
is the
(Im <g>
A)++ (B
(B In).
/).Note
Note that,
that,inin
of
A and
denoted A
A
EEl B,
B, is
the mn
mn x mn
mn matrix
matrix Urn
A)
general,
^ B
B
general, A
A
EEl B
B i=
EEl A.
A.

Example
Example 13.15.
13.15.
1.
1. Let
Let

A~U

2
2

!]andB~[ ; ~l

Then
Then

3
AfflB = (hA)+(Bh) =

2
2
1

3 0
1 0
4 0

0 0 0
0 0 0
0 0 0

0
0
0
2
2

0
0
0
3
4

2
0

0
0

0
0

0
0
2
0
0
2

0
0
0

0
0

0
3 0 0
0 3 0
0 0 3

The
B0
(A 0<g>h)
/2)and
andnote
notethe
thedifference
difference
The reader
reader is
is invited
invited to
to compute
compute B
EEl A
A =
= (/3
(h
B)
B) + (A
with
B.
with A
A
EEl B.

13.2. Properties
Kronecker Product
Product
13.2.
Properties of
of the
the Kronecker

143
143

2. Recall
Recall the
the real
real JCF
JCF
2.
M

1=

E jR2kx2k,

a f3
-f3 a

J. Define
0
0

Ek

o M

0
where
M == [
where M

0
0

o
o

Then
can be
be written
in the
(I} <8>
M)
h) =
Then 1J can
written in
the very
very compact
compact form
form 1J = (4
M)+
+(Ek
(E^l2)
=M
M$0 EEk.
k.
x
mx
Theorem 13.16. Let A Ee E"
jRnxn
jRmxm
" have eigenvalues
eigenvalues Ai,
A,-,ii Ee !!.
n, and let B Ee R
'" have
eigenvalues
ra. Then
TTzen the
r/ze Kronecker
sum A
B = (1m
(Im (g>
A)++ (B
(B<g>In)/)has
/za^
fJ-j,
j eE I!!.
Kronecker sum
A$ B
A)
mnran
eigenvalues /z
;, 7
eigenvalues
e/genva/wes

Al

+ fJ-t, ... , AI + fJ-m, A2 + fJ-t,, A2 + fJ-m, ... , An + fJ-m'

Moreover,
if x\,...
linearly independent
independent right
right eigenvectors
corresponding
Moreover, if
XI, . ,x
, xp
are linearly
eigenvectors of
of A
A corresponding
p are
to AI,
AI, ...
. . . ,, X
App (p
(p ::s:
< n), and
and z\,
ZI, ...
..., , Zq
zq are
are linearly
linearly independent
independent right eigenvectors
eigenvectors of
of B
corresponding
fJ-qq (q
m), then Zj
Zj
corresponding to fJ-t,
f j i \ , ...
. . . ,, f^
(q ::s:
< ra),
<8>XiXiE jRmn
W1" are
arelinearly
linearly independent
independent right
right
eigenvectors
of A
corresponding to
A.,+
+ [ij,
E,
p, jj Ee fl
q.
A$ B
B corresponding
to Ai
fJ-j' ii E
eigenvectors of
Proof: The
basic idea
the proof
Proof:
The basic
idea of
of the
proof is
is as
as follows:
follows:
[(1m A)

+ (B

= (Z

+ (Bz X)
Ax) + (fJ-Z X)

+ fJ-)(Z X).

In)](Z X) = (Z Ax)
(A

If A
A and
Bare
we can
nand
and
If
and B
are diagonalizable
diagonalizable in
in Theorem
Theorem 13.16,
13.16, we
can take
take p
p =n
and qq = m and
thus get
get the
the complete
complete eigenstructure
eigenstructure of
of A
A 0
$ B. In
In general,
general, if
if A
A and
and B have
have Jordan
Jordan form
thus
form
p-I1AP = lA
Q-t1 BQ
BQ = JB,
l B , respectively,
respectively, then
decompositions
decompositions given
given by P~
JA and
and Q"
then
[(Q In)(lm p)rt[(lm A)

+ (B In)][CQ In)(lm P)]


+ (B In)][(Q In)(/m

= [(1m p)-I(Q In)-I][(lm A)

= [(1m p-I)(Q-I In)][(lm A)

= (1m lA)

+ (JB In)

is
Jordan-like structure
structure for
A $ B.
is aa Jordan-like
for A
B.

+ (B

P)]

In)][CQ In)(/m <:9 P)]

Chapter 13.
13. Kronecker
Kronecker Products
Products
Chapter

144

A Schur
Schur form
fonn for
for A
A
EB B
B can
be derived
derived similarly.
Again, suppose
P and
unitary
A
can be
similarly. Again,
suppose P
and Q are
are unitary
H
fonn, i.e.,
pH
AP =
TAA
matrices that
that reduce
reduce A and B, respectively, to
to Schur
Schur (triangular) form,
i.e., P
AP
= T
and QH
QHBQ
TB (and
similarly if
orthogonal similarities
similarities reducing
and B
and
BQ =
= TB
(and similarly
if P
P and
and Q
Q are
are orthogonal
reducing A
A and
B
to real
real Schur
Schur fonn).
to
form). Then
Then

((Q /)(/ P)]"[(/m <8> A) + (B /B)][(e (g) /)(/, P)] = (/m <8> rA) + (7* (g) /),
[(Q <8>
In)(lm
where [(Q
/)(/P)]
P)] =
= (Q
(<2P)
P) isisunitary
unitaryby
byTheorem
Theorem 13.3
13.3and
andCorollary
Corollary 13.8.
13.8.

13.3
13.3

Application
to Sylvester
Sylvester and
Lyapunov Equations
Application to
and Lyapunov
Equations

In
study the
linear matrix
In this
this section
section we
we study
the linear
matrix equation
equation
(13.3)

AX+XB=C,

x
mxm
xm
IRnxn
IRmxm
IRnxm.
now often
Sylvester
where A eE R"
", , B eE R
,, and C eE M"
. This equation
equation is
is now
often called a Sylvester
equation
of 1.1.
J.J. Sylvester
Sylvester who
studied general
general linear
linear matrix
of the
equation in
in honor
honor of
who studied
matrix equations
equations of
the form
fonn
k

LA;XB; =C.
;=1

A special case of (13.3) is the symmetric equation


AX +XAT = C

(13.4)

T
obtained by taking
taking B
B =
AT.
When C is symmetric,
IRnx"xn is easily shown
=A
. When
symmetric, the solution
solution X E
eW
also to
to be
be symmetric
is known
as aa Lyapunov
Lyapunov equation.
also
symmetric and
and (13.4)
(13.4) is
known as
equation. Lyapunovequations
Lyapunov equations
arise
arise naturally
naturally in
in stability
stability theory.
theory.
The first
important question
ask regarding
(13.3) is,
The
first important
question to
to ask
regarding (13.3)
is, When
When does
does aa solution
solution exist?
exist?
By writing
writing the matrices in (13.3)
(13.3) in
in tenns
terms of
of their
their columns, it is easily
easily seen
seen by equating the
z'th
ith columns
columns that
that

AXi

+ Xb; = C; = AXi + l:~>j;Xj.


j=1

These equations
as the
These
equations can
can then
then be
be rewritten
rewritten as
the mn x
x mn linear
linear system
system
b 21 1

A+blll
bl21

+ b 2Z 1

(13.5)

[
blml

b2ml

The
in (13.5)
(13.5) clearly
as the
sum (1m
(Im 0* A)
+
The coefficient
coefficient matrix
matrix in
clearly can
can be
be written
written as
the Kronecker
Kronecker sum
A) +
(BTT 0 /).
very helpful
in completing
the writing
(B
In). The
The following
following definition
definition is
is very
helpful in
completing the
writing of
of (13.5)
(13.5) as
as
an "ordinary"
"ordinary" linear
an
linear system.
system.

13.3. Application
Lyapunov Equations
Equations
13.3.
Application to
to Sylvester
Sylvester and
and Lyapunov

145
145

n
nxm
Definition 13.17.
E E.
jRn denote the
ofC E
jRnxm so that C = [
[CI,
].
Definition
13.17. Let Ci
c(
the columns ofC
e R
n , ...
. . . ,, Ccm}.
Then vec(C) is defined to be the mn-vector formed by stacking the columns ofC on top of
by
C

::~~::~: ::d~~:::O:[]::::fonned

"ocking the colunuu of on top of

one another, i.e., vec(C) =


Using
Definition 13.17,
13.17, the
can be
Using Definition
the linear
linear system
system (13.5)
(13.5) can
be rewritten
rewritten in
in the
the form
form
[(1m A)

+ (B T

(13.6)

In)]vec(X) = vec(C).

There
exists aa unique
and only
+ (B
(BTT
if and
only if
if [(I
[(1m
A)
A) +
/)]
In)] is
is nonsingular.
nonsingular.
There exists
unique solution
solution to
to (13.6)
(13.6) if
m
T
T (g) /)] is nonsingular if and only if it has no zero eigenvalues.
But [(I
<8>
A)
+
(B
But
[(1m

A)
+
(B

In)]
is
nonsingular
if
and
only
if
it
has
no
zero
eigenvalues.
m
From
A) ++ (BT
(BT <8>In)]
/)]
where
From Theorem
Theorem 13.16,
13.16, the
the eigenvalues
eigenvalues of
of [(/
[(1mm <g>
A)
areareAi A.,
++Mj,IJLJ,where
A,,(A), ii eE!!,
n_,and
andMj
^j Ee A(B),
A(fi),j j E!!!..
e m.We
Wethus
thushave
havethe
thefollowing
followingtheorem.
theorem.
Ai eE A
A(A),
mxm
xm
Theorem 13.1S.
jRmxm,, and C e
E R"
jRnxm.. Then
Theorem
13.18. Let A eE lR
Rnxn,, B E
GR
77ie/i the Sylvester equation

(13.7)

AX+XB=C

has aa unique
if and
only ifif A
and B
have no
eigenvalues in
has
unique solution
solution if
and only
A and
- B have
no eigenvalues
in common.
common.
Sylvester equations
equations of
the form
(13.3) (or
(or symmetric
equations of
of the
the form
form
Sylvester
of the
form (13.3)
symmetric Lyapunov
Lyapunov equations
(13.4))
are generally
generally not
mn "vee"
"vec"formulation
formulation(13.6).
(13.6). The
Themost
most
(13.4 are
not solved
solved using
using the
the mn
mn x
x mn
commonly preferred
in [2].
[2]. First
First A
to
commonly
preferred numerical
numerical algorithm
algorithm is
is described
described in
A and
and B
B are
are reduced
reduced to
(real) Schur
Schur form.
(real)
form. An equivalent
equivalent linear
linear system
system is
is then
then solved
solved in
in which
which the
the triangular
triangular form
form
of the
can be
for the
of aa suitably
of
the reduced
reduced A and
and B can
be exploited
exploited to
to solve
solve successively
successively for
the columns
columns of
suitably
3
transformed
solution matrix
say, n
only 0O(n
transformed solution
matrix X.
X. Assuming
Assuming that,
that, say,
n >
:::: m,
m, this
this algorithm
algorithm takes
takes only
(n 3))
66
operations
rather than
than the
that would
would be
be required
required by
by solving
(13.6) directly
with
operations rather
the O(n
O(n )) that
solving (13.6)
directly with
Gaussian elimination.
A further
enhancement to
is available
available in
in [6]
[6] whereby
Gaussian
elimination. A
further enhancement
to this
this algorithm
algorithm is
whereby
the
only to
triangular
the larger
larger of
of A
A or
or B
B is
is initially
initially reduced
reduced only
to upper
upper Hessenberg
Hessenberg rather
rather than
than triangular
Schur form.
Schur
form.
The next
13.24, one
one of
The
next few
few theorems
theorems are
are classical.
classical. They
They culminate
culminate in
in Theorem
Theorem 13.24,
of many
many
elegant connections
stability theory
differential equations.
equations.
elegant
connections between
between matrix
matrix theory
theory and
and stability
theory for
for differential
mxm
nxm
Theorem
jRmxm,, and C
jRnxm.. Suppose
Suppose further
further that A and B
Theorem 13.19. Let A eE jRnxn,
Rnxn, B eE R
C eE R
are asymptotically
stable (a
(a matrix
all its
asymptotically stable
matrix is
is asymptotically
asymptotically stable
stable ifif all
its eigenvalues
eigenvalues have
have real
real
are
parts
of the
the Sylvester
Sylvester equation
equation
parts in
in the
the open
open left
left half-plane).
half-plane). Then
Then the
the (unique)
(unique) solution
solution of

(13.8)

AX+XB=C

can
as
can be
be written
written as
(13.9)

Proof:
are stable,
(A)+
+ Aj(B)
A;-(B) =I^ 00 for
for all
alli,i, j j so
sothere
there exists
exists aaunique
unique
Proof: Since
Since A
A and
and B
B are
stable, A.,
Aj(A)
solution
13.18. Now
equation X
XB
solution to(13.8)by
to (13.8) by Theorem
Theorem 13.18.
Now integrate
integrate the
the differential
differential equation
X = AX
AX + X
B
(with X(0)
X(O) = C)
C) on
on [0,
(with
[0, +00):
+00):
lim XU) - X(O) = A

I-Hoo

roo X(t)dt + ([+00 X(t)dt)

10

10

B.

(13.10)

146
146

Chapter
Chapter 13.
13. Kronecker
Kronecker Products
Products

Using the results


results of Section 11.1.6, it can be shown easily that lim elA = lim elB =
= O.0.
1-->+00
1 .... +00
r>+oo
tv+oo
lB from Theorem
= elACe
O.
Hence, using the solution XX((t)
t) =
etACetB
Theorem 11.6, we have that lim XX((t)
t) =
0.
t~+x
/<-+3C

Substituting in (13.10) we have


-C

and so
so X
and
X

{+oo

-1o

(1+

00

elACe lB dt)

elACe lB dt satisfies (13.8).

(1+

00

elACe lB dt) B

Remark
AX +
Remark 13.20. An equivalent condition for the existence of a unique solution to AX
+
XB
= C is
is that
that [~
[ J __CcBfi ]] be
similar to
]
(via
the
similarity
[
J
_*
]).
XB =
be similar
to [[~J _
_OB]
(via
the
similarity
[~
_~
]).
B
x
Let A, C E
jRnxn.
Theorem 13.21. Lef
e R"
". Then
TTzen the
r/ze Lyapunov equation

AX+XAT

=C

(13.11)

has
and only
A TT have no eigenvalues
has a unique
unique solution if
if and
only if
if A and
and -A
eigenvalues in
in common.
common. If
If C is
symmetric and
13.11) has
unique solution,
solution, then
that solution
solution is
is symmetric.
symmetric
and ((13.11)
has aa unique
then that
symmetric.
xn
T
If the matrix
matrix A
A E
jRn xn has eigenvalues
eigenvalues A.I
)"" ,...,!,
... , An, then - A
AT
Remark 13.22. If
e W
has eigeneigenT
values -AI,
- An.
Thus,
a
sufficient
condition
A
and
A
T
A.], ...
. . . ,, k
.
sufficient
that
guarantees
that
A

A
have
n
common eigenvalues
eigenvalues is that A
A be asymptotically
asymptotically stable. Many useful results exist conno common
cerning the relationship between stability and Lyapunov equations. Two basic results due
to Lyapunov are the following, the first of which follows immediately from Theorem
Theorem 13.19.
13.19.
x
Theorem 13.23. Let A,C
A, C E
jRnxn
further that A is asymptotically
stable.
e R"
" and suppose further
asymptotically stable.
Then the (unique) solution o/the
of the Lyapunov equation

AX+XAT=C

can be
written as
can
be written
as

(13.12)
x
Theorem 13.24. A matrix A E
jRnxn
asymptotically stable if
only if
e R"
" is asymptotically
if and only
if there exists a
positive definite
definite solution
solution to
to the
the Lyapunov
Lyapunov equation
positive
equation

AX +XAT = C,

(13.13)

where
where C
C -= C T < O.
Proof: Suppose A is asymptotically
asymptotically stable. By Theorems 13.21
l3.21 and 13.23
l3.23 a solution
Proof:
solution to
(13.13) exists and takes the form (13.12). Now let vv be an arbitrary nonzero vector in jRn.
E".
Then
Then

13.3. Application
Sylvester and
and Lyapunov
Lyapunov Equations
Equations
13.3.
Application to
to Sylvester

147
147

Since -C
C >
> 00 and
and etA
etA is
all t, the
is positive.
Hence
Since
is nonsingular
nonsingular for
for all
the integrand
integrand above
above is
positive. Hence
T
T
> 00 and
and thus
thus X
X is
is positive
positive definite.
vv Xv
Xv >
definite.
T
XT
>
and let
A(A)
with corresponding
Conversely,
Conversely, suppose
suppose X
X =
= X
> 00 and
let A
A. Ee A
(A) with
corresponding left
left eigeneigenvector
vector y.
y. Then
Then
0> yHCy

yH AXy

= (A

+ yHXAT Y

+ I)yH Xy.

H
Since yyH
Xy
> 0,
0, we
+ IA == 22 Re
R eAA << 0O.. Since
Since A
A was
Since
Xy >
we must
must have
have A
A+
was arbitrary,
arbitrary, A
A must
must be
be
asymptotically
stable. D
asymptotically stable.
D

Remark 13.25.
Lyapunov equation
AX + XA
X ATT =
= C
can also
written using
using the
the
Remark
13.25. The
The Lyapunov
equation AX
C can
also be
be written
vec
in the
vec notation
notation in
the equivalent
equivalent form
form
[(/ A)

+ (A l)]vec(X) = vec(C).

X A = C.
A subtle
when dealing
A TTXX + XA
A
subtle point
point arises
arises when
dealing with
with the
the "dual"
"dual" Lyapunov
Lyapunov equation
equation A
C.
The equivalent
equivalent "vec
"vec form"
of this
is
The
form" of
this equation
equation is
[(/ AT)

+ (AT l)]vec(X) =

vec(C).

However, the
the complex-valued
XA =
is equivalent
to
However,
complex-valued equation
equation AHHXX + XA
=C
C is
equivalent to
[(/ AH)

+ (AT l)]vec(X) =

vec(C).

The vec
operator has
most of
of which
which derive
derive from
from one
one key
key
The
vec operator
has many
many useful
useful properties,
properties, most
result.
result.
Theorem 13.26.
13.26. For
any three
and C
Theorem
For any
three matrices
matrices A,
A, B,
B, and
C for
for which
which the
the matrix
matrix product
product ABC
ABC is
is
defined,
defined,
vec(ABC) = (C T A)vec(B).
Proof:
The proof
follows in
in aa fairly
fairly straightforward
straightforward fashion
fashion either
either directly
directly from
the definidefiniProof: The
proof follows
from the
the fact
fact that
tions or
tions
or from
from the
that vec(xyT)
vec(;t;yr) =
= y
<8>x.x. D
D
An
application is
existence and
and uniqueness
conditions
of existence
uniqueness conditions
An immediate
immediate application
is to
to the
the derivation
derivation of
for the
of the
simple Sylvester-like
Sylvester-like equation
equation introduced
introduced in
in Theorem
Theorem 6.11.
6.11.
for
the solution
solution of
the simple
mxn
px(}
mxq
Theorem 13.27.
jRrnxn,, B
B E
jRPxq,
jRrnxq.. Then the
13.27. Let A Ee R
eR
, and C Ee R
the equation

AXB =C

(13.14)

nxp
+
jRn x p if
A A++CB
C B+
has
has aa solution X eE R.
if and only
only if
ifAA
BB = C,
C, in
in which
which case the
the general solution
solution
is
of the
the form
form
is of
(13.15)
nxp
+
+
jRnxp is
of (13.
14) is
BB+
A
A+
where
where Y
Y eE R
is arbitrary.
arbitrary. The
The solution of
(13.14)
is unique
unique if
if BB

AA =
= [.
I.

Proof:
(13.14) as
as
Proof: Write
Write (13.14)
(B T A)vec(X) = vec(C)

(13.16)

148
148

Chapter
3. Kronecker
Chapter 113.
Kronecker Products
Products

by Theorem
if
by
Theorem 13.26.
13.26. This
This "vector
"vector equation"
equation" has
has aa solution
solution if
if and
and only
only if
(B T A)(B T A)+ vec(C)

= vec(C).

It
that (M
It is
is aa straightforward
straightforward exercise
exercise to
to show
show that
(M
N)
N) + =
= M+
M
<8>N+.
N . Thus,
Thus,(13.16)
(13.16)has
hasaa

if
solution
solution if
if and
and only
only if
vec(C)

(B T A)B+{ A+)vec(C)

= [(B+ B{ AA+]vec(C)
= vec(AA +C B+ B)
+
+
and
hence if
C B+
B = C.
and hence
if and
and only
only if
if AA
AA+
CB
B
C.
The
general
solution
of
(13
.16)
by
The general solution of (13.16) is
is then
then given
given by

vec(X) = (B T A) + vec(C)

+ [I -

(B T A) + (B T A)]vec(Y),

where
be rewritten
the form
where YY is
is arbitrary.
arbitrary. This
This equation
equation can
can then
then be
rewritten in
in the
form
vec(X)

= B+{

A+)vec(C)

+ [I

- (BB+{ A+ A]vec(y)

or,
or, using
using Theorem
Theorem 13.26,
13.26,

The
The solution
solution is
is clearly
clearly unique
unique if
if B
BBB+
<8>AA++A
A ==I.I.

0D

EXERCISES
EXERCISES
I.
A and
1. For
For any
any two
two matrices
matrices A
and B
B for
for which
which the
the indicated
indicated matrix
matrix product
product is
is defined,
defined,
xn
(vec(B == Tr(A
lR nxn
show
show that
that (vec(AT
(vec(A)) r (vec(fl))
Tr(ATr B).
). In
In particular,
particular, if
if B
B Ee Rn
,, then
then Tr(B)
Tr(fl) ==
r
vec(/J
vec(fl).
vec(Inl
vec(B).
2.
matrices A
A and
2. Prove
Prove that
that for
for all
all matrices
and B,
B, (A
(A
B)+
B)+ =
= A+
A+
B+.
B+.

3. Show
Show that
that the
the equation
equation AX
B == C
C has
has aa solution
solution for
for all
all C
C if
if A
full row
row rank
and
3.
AX B
A has
has full
rank and
B
has full
full column
column rank.
rank. Also,
Also, show
show that
that aa solution,
solution, if
it exists,
exists, is
unique if
if A
A has
has full
B has
if it
is unique
full
column
rank and
row rank.
column rank
and B
B has
has full
full row
rank. What
What is
is the
the solution
solution in
in this
this case?
case?
4. Show
Show that
that the
general linear
linear equation
4.
the general
equation
k

LAiXBi =C
i=1

can
can be
be written
written in
in the
the form
form
[BT AI

+ ... + B[ Ak]vec(X) =

vec(C).

Exercises
Exercises

149
149

T
5. Let x E
]Rn. Show that *
x rT
T .
]Rm
Mm and y Ee E".
<8>yy==y Xyx

6. Let A e R" xn and e M m x m .


(a)
Show that
||A
<8>BII2
B||2 =
= IIAII2I1Blb.
||A||2||||2.
(a) Show
that IIA
(b)
A
II FF in
terms of
the Frobenius
norms of
A and
your
(b) What
What is
is II||A
B
B\\
in terms
of the
Frobenius norms
of A
and B?
B? Justify
Justify your
answer carefully.
carefully.
answer
(c) What is the spectral radius of
A
of A
<8>BBininterms
termsof
ofthe
thespectral
spectralradii
radiiof
ofAAand
and B?
B?
Justify your
answer carefully.
carefully.
Justify
your answer
x
7.
Let A,
B eR"
E ]Rnxn.
7. Let
A, 5
".

A)k =
= I/
l ==BkBfc
I /forforallallintegers
(a) Show that (l
(/ A)*
<8>Ak
A*and
and(B
(fl<g>I /)*
integersk.&.
l A
A
5
7
B
A and eB1 = e B I.
(b) Show that elA
=
I

e
e
<g) e and e = e (g) /.

(c) Show
that the
AAand
(c)
Show that
the matrices
matrices /I (8)
andBBI /commute.
commute.
(d)
that
(d) Show
Show that
e AEIlB

= eUA)+(Bl) = e B e A .

(Note: This
result would
would look
look aa little
little "nicer"
"nicer" had
defined our
our Kronecker
Kronecker
(Note:
This result
had we
we defined
sum the other way around. However, Definition 13.14
13.14 is conventional in the
literature.)
8. Consider the Lyapunov matrix equation (13.11) with

[~ _~ ]

A =
and C
the symmetric
and
C the
symmetric matrix
matrix

[~

Clearly
Clearly

Xs

[~ ~ ]

is
the equation.
Verify that
that
is aa symmetric
symmetric solution
solution of
of the
equation. Verify

Xns =

[_~ ~

is
also aa solution
solution and
and is
is nonsymmetric.
in light
light of
of Theorem
Theorem 13.21.
13.21.
is also
nonsymmetric. Explain
Explain in
9. Block
9.
Block Triangularization:
Triangularization: Let
Let

xn
A eE Rn
]Rn xn and D E
xm.. It is desired to find
find a similarity
where A
e ]Rm
Rmxm
similarity transformation
of
form
of the
the form

T=[~ ~J

such
that TST is
is block
block upper
upper triangular.
triangular.
such that
T l1ST

150
150

Chapter
Products
Chapter 13.
13. Kronecker
Kronecker Products

(a) Show that S is similar to

A +OBX

B
]
D-XB

if X
X satisfies
satisfies the
so-called matrix
matrix Riccati
Riccati equation
equation
if
the so-called

C-XA+DX-XBX=O.
(b) Fonnulate
Formulate a similar result for block lower triangularization of S.
S.

to. Block
Block Diagonalization: Let
10.
S=

[~ ~

xn
mxm
where A Ee Rn
jRnxn and D E
jRmxm.. It is desired to find a similarity transfonnation
of
ER
transformation of
the fonn
form

T=[~ ~]

such that TST is block


block diagonal,
diagonal.
T l1ST
(a) Show that S is similar to

if YY satisfies the Sylvester equation


AY - YD = -B.

(b) Formulate
Fonnulate a similar result for block diagonalization of
of

Bibliography
[1]
[1] Albert, A., Regression and the Moore-Penrose Pseudoinverse, Academic Press, New
York, NY,
NY, 1972.
1972.
York,
[2]
[2] Bartels,
Bartels, RH.,
R.H., and
and G.w.
G.W. Stewart,
Stewart, "Algorithm
"Algorithm 432.
432. Solution
Solution of the Matrix Equation
Equation
AX +
XB =
AX
+ XB
= C,"
C," Comm.
Cornm. ACM, 15(1972),820-826.
15(1972), 820-826.
[3]
New
[3] Bellman,
Bellman, R,
R., Introduction to
to Matrix Analysis, Second
Second Edition,
Edition, McGraw-Hill,
McGraw-Hill, New
York, NY,
NY, 1970.
York,
1970.

[4] Bjorck,
for Least
Squares Problems,
SIAM, Philadelphia,
[4]
Bjorck, A., Numerical
Numerical Methods
Methodsfor
Least Squares
Problems, SIAM,
Philadelphia, PA,
PA,
1996.
1996.
[5]
on the
the Generalized
of the
the Product
Rev.,
[5] Cline,
Cline, R.E.,
R.E., "Note
"Note on
Generalized Inverse
Inverse of
Product of
of Matrices,"
Matrices," SIAM
SIAM Rev.,
6(1964),57-58.
6(1964), 5758.
[6]
Nash, and
the Problem
[6] Golub,
Golub, G.H.,
G.H., S.
S. Nash,
and C.
C. Van
Van Loan,
Loan, "A
"A Hessenberg-Schur
Hessenberg-Schur Method
Method for
for the
Problem
AX
+
X
B
=
C,"
IEEE
Trans.
Autom.
Control,
AC-24(1979),
909-913.
AX XB = C," IEEE
AC-24(1979),
[7]
[7] Golub,
Golub, G.H.,
G.H., and
and c.F.
C.F. Van
VanLoan,
Loan,Matrix
Matrix Computations,
Computations, Third
Third Edition,
Edition, Johns
JohnsHopkins
Hopkins
Univ. Press,
Press, Baltimore,
Baltimore, MD,
1996.
Univ.
MD, 1996.
[8]
[8] Golub,
Golub, G.H.,
G.H., and
and lH.
J.H. Wilkinson,
Wilkinson, "Ill-Conditioned
"Ill-Conditioned Eigensystems
Eigensystems and
and the Computation
Computation
ofthe
of the Jordan Canonical Form," SIAM
SIAM Rev., 18(1976),578-619.
18(1976), 578-619.
[9]
T.N.E., "Note
Inverse of
of aa Matrix
Product," SIAM
Rev.,
[9] Greville,
Greville, T.N.E.,
"Note on
on the
the Generalized
Generalized Inverse
Matrix Product,"
SIAM Rev.,
8(1966),518-521
249].
8(1966), 518521 [Erratum,
[Erratum, SIAM
SIAM Rev., 9(1967),
9(1967), 249].

[10] Halmos,
Halmos, P.R,
PR., Finite-Dimensional
Finite-Dimensional Vector
Vector Spaces,
Second Edition,
Edition, Van
Van Nostrand,
Nostrand,
[10]
Spaces, Second
Princeton, NJ,
NJ, 1958.
1958.
Princeton,
Numerical Algorithms,
Algorithms, Second
[11]
Higham, N.J.,
N.1., Accuracy
Accuracy and Stability of
[11] Higham,
of'Numerical
Second Edition,
Edition, SIAM,
SIAM,
Philadelphia,
Philadelphia, PA, 2002.
2002.
[12] Hom,
Horn, RA.,
R.A.,and
andC.R.
C.R.Johnson,
Johnson,Matrix
MatrixAnalysis,
Analysis, Cambridge
Cambridge Univ.
Univ.Press,
Press,Cambridge,
Cambridge,
UK, 1985.
1985.
UK,
[13] Hom,
RA., and C.R.
C.R. Johnson, Topics in Matrix Analysis,
Analysis, Cambridge Univ.
Univ. Press,
Horn, R.A.,
Cambridge,
UK, 1991.
1991.
Cambridge, UK,
151
151

152
152

Bibliography
Bibliography

[14]
Kenney, C,
C., and
and A.J.
AJ. Laub,
Laub, "Controllability
Stability Radii
Radii for
[14] Kenney,
"Controllability and
and Stability
for Companion
Companion Fonn
Form
Systems,"
Math. of
of Control,
Systems," Math,
Control, Signals, and Systems, 1(1988),361-390.
1(1988), 361-390.
[15]
Kenney, C.S.,
[15] Kenney,
C.S., andAJ.
and A.J.Laub,
Laub,"The
"TheMatrix
MatrixSign
SignFunction,"
Function," IEEE
IEEE Trans.
Trans.Autom.
Autom.Control,
Control,
40(1995),1330-1348.
40(1995), 13301348.
[16]
Lancaster, P.,
P., and
M. Tismenetsky,
Tismenetsky, The Theory
of Matrices, Second
Edition with
with
[16] Lancaster,
and M.
Theory of
Second Edition
Applications, Academic
FL, 1985.
1985.
Applications,
Academic Press,
Press, Orlando,
Orlando, FL,
[17]
Laub, A.J.,
AJ., "A
Riccati Equations,"
IEEE Trans
..
[17] Laub,
"A Schur
Schur Method
Method for
for Solving
Solving Algebraic
Algebraic Riccati
Equations," IEEE
Trans..
913-921.
Autom. Control, AC-24( 1979),
1979), 913921.

Analysis and Applied


Applied Linear
Linear Algebra, SIAM,
Philadelphia, PA,
PA,
[18]
Meyer, C.D.,
[18] Meyer,
C.D., Matrix Analysis
SIAM, Philadelphia,
2000.
2000.
[19] Moler,
C.B.,and
andc.P.
C.F.Van
VanLoan,
Loan,"Nineteen
"NineteenDubious
DubiousWays
WaystotoCompute
Computethe
theExponential
Exponential
[19]
Moler, c.B.,
of
of aa Matrix,"
Matrix," SIAM
SIAM Rev., 20(1978),801-836.
20(1978), 801-836.
[20]
Noble, B., and
Daniel, Applied
Applied Linear
Linear Algebra, Third
Third Edition,
[20] Noble,
and J.w.
J.W. Daniel,
Edition, Prentice-Hall,
Prentice-Hall,
Englewood
Cliffs, NJ,
NJ, 1988.
1988.
Englewood Cliffs,
Plenum, New York, NY,
NY, 1987.
1987.
[21]
[21] Ortega,
Ortega, J., Matrix Theory. A Second Course, Plenum,

Proc. Cambridge Philos. Soc.,


Soc.,
[22]
R., "A
Inverse for
for Matrices,"
[22] Pemose,
Penrose, R.,
"A Generalized
Generalized Inverse
Matrices," Proc.
51(1955),406-413.
51(1955), 406413.
[23]
NY,
[23] Stewart,
Stewart, G.W.,
G. W., Introduction to
to Matrix Computations, Academic
Academic Press,
Press, New
New York,
York, NY,
1973.
1973.

[24] Strang,
Strang, G.,
and Its
Edition, Harcourt
Brace
[24]
G., Linear
Linear Algebra
Algebra and
Its Applications,
Applications, Third
Third Edition,
Harcourt Brace
Jovanovich, San
San Diego,
CA, 1988.
Jovanovich,
Diego, CA,
1988.
of Matrix Computations, Second
[25]
[25] Watkins, D.S.,
D.S., Fundamentals of
Second Edition,
Edition, WileyInterscience,
New York,
York, 2002.
2002.
Interscience, New
[26]
Wonham, W.M.,
W.M., Linear Multivariable Control.
[26] Wonham,
Control. A Geometric Approach, Third
Third Edition,
Edition,
NY, 1985.
1985.
Springer-Verlag,
New York,
York, NY,
Springer-Verlag, New

Index
Index
A-invariant subspace,
Ainvariant
subspace, 89
89
matrix
matrix characterization
characterization of,
of, 90
90
algebraic multiplicity,
multiplicity, 76
76
algebraic
angle between
between vectors,
vectors, 58
58
angle

congruence,
103
congruence, 103
conjugate
transpose,
conjugate transpose, 22
contragredient transformation,
transformation, 137
contragredient
137
controllability,
46
controllability, 46

11
basis,
basis, 11
natural, 12
12
natural,
block
block matrix,
matrix, 22
definiteness
definiteness of,
of, 104
104
diagonalization,
150
diagonalization, 150
inverse
inverse of,
of, 48
48
LV
LU factorization,
factorization, 55
triangularization, 149

defective,
defective, 76
76
degree
degree
of
85
of aa principal
principal vector,
vector, 85
determinant,
4
determinant, 4
of
block matrix,
of aa block
matrix, 55
properties
properties of,
of, 4-6
46
dimension,
12
dimension, 12
direct sum
direct
sum
of subspaces,
subspaces, 13
of
13
domain,
17
domain, 17

en,
C",

(pmxn
mxn

eigenvalue,
eigenvalue, 75
75
invariance
transforinvariance under
under similarity
similarity transformation, 81
mation,81
elementary
84
elementary divisors,
divisors, 84
equivalence
transformation, 95
equivalence transformation,
95
orthogonal, 95
unitary,
unitary, 95
95
equivalent generalized
generalized eigenvalue
eigenvalue probprobequivalent
lems,
lems, 127
equivalent
pencils, 127
127
equivalent matrix
matrix pencils,
exchange
89
exchange matrix,
matrix, 39,
39, 89
exponential
of aa Jordan
block, 91,
91, 115
115
exponential of
Jordan block,
exponential
109
exponential of
of aa matrix,
matrix, 81,
81, 109
computation of,
of, 114-118
114118
computation
inverse
inverse of,
of, 110
110
properties of,
of, 109-112
109112
properties

(p/nxn 1
e~xn,
1

Cauchy-Bunyakovsky-Schwarz
InequalCauchyBunyakovskySchwarz Inequality,58
ity, 58
Cayley-Hamilton
75
CayleyHamilton Theorem,
Theorem, 75
chain
chain
of eigenvectors,
eigenvectors, 87
of
87
characteristic polynomial
polynomial
characteristic
of
75
of aa matrix,
matrix, 75
of
125
of aa matrix
matrix pencil,
pencil, 125
Cholesky factorization,
factorization, 101
Cholesky
101
co-domain,
17
codomain, 17
column
column
rank,
rank, 23
23
vector, 11
vector,
companion matrix
companion
matrix
inverse
105
inverse of,
of, 105
pseudoinverse
of, 106
pseudoinverse of,
106
singular values
values of,
of, 106
singular
106
singular
106
singular vectors
vectors of,
of, 106
complement
complement
of aa subspace,
subspace, 13
of
13
orthogonal,
orthogonal, 21
21

field,
field, 7
four
four fundamental
fundamental subspaces,
subspaces, 23
23
function
function of
of aa matrix,
matrix, 81
81
generalized
125
generalized eigenvalue,
eigenvalue, 125
generalized real
real Schur
Schur form,
form, 128
generalized
128

153

Index
Index

154
generalized
generalized Schur form, 127
generalized
generalized singular value decomposition,
decomposition,
134
134
geometric multiplicity, 76
geometric
Holder Inequality, 58
Hermitian transpose,
transpose, 2
higher-order difference equations
higherorder
conversion
first-order form, 121
conversion to firstorder
higherorder
higher-order differential equations
conversion to firstorder
first-order form, 120
higher-order
higherorder eigenvalue problems
problems
conversion to firstorder
first-order form, 136
i,2
i, 2
idempotent,
idempotent, 6, 51
51
identity matrix, 4
inertia, 103
initial-value
initialvalue problem, 109
for higher-order
higherorder equations, 120
for homogeneous
homogeneous linear difference
equations, 118
for homogeneous
homogeneous linear differential
equations, 112
for inhomogeneous
inhomogeneous linear
linear difference
for
difference
equations, 119
for inhomogeneous
inhomogeneous linear differendifferential equations,
equations, 112
inner product
product
inner
complex, 55
complex Euclidean,
Euclidean, 44
complex
Euclidean, 4, 54
real, 54
usual, 54
weighted, 54
invariant factors, 84
inverses
of block matrices, 47

j,22
7,
Jordan block, 82
Jordan canonical form (JCF), 82
Kronecker
Kronecker canonical
canonical form (KCF), 129
Kronecker
Kronecker delta, 20

Kronecker product, 139


determinant
determinant of, 142
eigenvalues of, 141
eigenvectors
eigenvectors of, 141
products of, 140
pseudoinverse of, 148
singUlar
singular values of, 141
trace of, 142
transpose of, 140
Kronecker sum, 142
eigenvalues of, 143
eigenvectors of,
of, 143
143
eigenvectors
exponential of, 149
leading principal submatrix, 100
left eigenvector, 75
left generalized eigenvector, 125
left invertible.
invertible, 26
left
left nullspace, 22
left principal vector, 85
linear dependence, 10
linear equations
equations
linear
characterization of
of all
all solutions,
solutions, 44
44
characterization
existence of
of solutions,
solutions, 44
44
existence
uniqueness
of solutions,
solutions, 45
45
uniqueness of
linear independence,
independence, 10
10
linear
linear least squares problem,
problem, 65
general solution of, 66
geometric solution of, 67
residual of, 65
solution via QR factorization, 71
71
decomsolution via singular value decomposition, 70
statement of, 65
uniqueness of solution, 66
linear regression, 67
linear transformation, 17
codomain of, 17
co-domain
composition of, 19
domain of, 17
invertible, 25
left invertible.
invertible, 26
matrix representation of, 18
nonsingular, 25
nulls
pace of, 20
nullspace

Index
Index
range of, 20
right invertible, 26
LV factorization, 6
LU
block,55
block,
Lyapunov
differential equation,
Lyapunov differential
equation, 113
113
Lyapunov equation,
equation, 144
and asymptotic stability, 146
integral form
of solution,
integral
form of
solution, 146
146
symmetry
of solution,
solution, 146
symmetry of
146
uniqueness of
of solution, 146

matrix
matrix
asymptotically stable, 145
best rank k approximation to, 67
companion, 105
defective, 76
definite, 99
derogatory, 106
diagonal,2
diagonal,
2
exponential, 109
Hamiltonian, 122
Hermitian, 2
Householder,
Householder, 97
97
indefinite, 99
lower Hessenberg, 2
lower triangular, 2
nearest singular matrix to, 67
nilpotent, 115
nonderogatory, 105
normal, 33, 95
orthogonal, 4
pentadiagonal, 2
quasi-upper-triangular, 98
quasiuppertriangular,
sign of
of a, 91
square root of
10 1
of a, 101
symmetric, 2
symplectic, 122
tridiagonal, 2
unitary,
unitary, 4
4
upper
upper Hessenberg,
Hessenberg, 22
upper triangular, 2
matrix
matrix exponential,
exponential, 81,
81, 91,
91, 109
109
matrix norm, 59
1-,60
1.60
2-,60
2,
60
00-,60
oo,
60

155

p-,60
/?,
60
61
consistent, 61
Frobenius, 60
induced by a vector norm, 61
mixed,
60
mixed,60
mutually consistent, 61
61
relations among, 61
Schatten,60
Schatten, 60
spectral,
spectral, 60
subordinate
subordinate to a vector norm, 61
unitarily
unitarily invariant,
invariant, 62
62
matrix pencil, 125
equivalent, 127
reciprocal, 126
regular, 126
singUlar, 126
singular,
matrix sign function, 91
minimal polynomial, 76
monic polynomial, 76
Moore-Penrose pseudoinverse, 29
MoorePenrose
multiplication
multiplication
matrix-matrix, 3
matrixmatrix,
matrix-vector, 3
matrixvector,
Mumaghan-Wintner Theorem, 98
MurnaghanWintner
negative definite, 99
negative invariant subspace, 92
92
nonnegative definite,
definite, 99
99
criteria for, 100
nonpositive definite, 99
norm
norm
induced,
56
induced,56
natural,56
natural,
56
normal equations, 65
normed
normed linear
linear space,
space, 57
57
nullity, 24
nullspace,20
nullspace,
20
left, 22
22
right, 22
observability,
46
observability, 46
one-to-one (11),
(1-1), 23
onetoone
conditions for, 25
onto, 23
for, 25
conditions for,

Index
Index

156
156
orthogonal
complement, 21
21
matrix, 4
4
matrix,
projection, 52
52
projection,
subspaces, 14
vectors, 4,
4, 20
20
vectors,
orthonormal
orthonormal
vectors, 4, 20
outer product,
product, 19
19
outer
and Kronecker
Kronecker product, 140
121
exponential of, 121
pseudoinverse of, 33
singular value decomposition
decomposition of, 41
41
various matrix norms of, 63
pencil
equivalent, 127
equivalent,
127
of
of matrices, 125
reciprocal, 126
regular, 126
singular, 126
126
singular,
Penrose theorem,
theorem, 30
30
Penrose
polar factorization,
factorization, 41
41
polar
polarization
polarization identity, 57
positive definite,
definite, 99
criteria for, 100
positive invariant
invariant subspace,
subspace, 92
92
positive
power (kth)
(Kth) of a Jordan block, 120
powers of a matrix
computation of, 119-120
119120
principal submatrix, 100
projection
projection
51
oblique, 51
on four fundamental
fundamental subspaces, 52
orthogonal, 52
pseudoinverse, 29
four Penrose conditions for, 30
of a full-column-rank
fullcolumnrank matrix, 30
of a full-row-rank
fullrowrank matrix,
matrix, 30
of aa matrix
matrix product,
product, 32
32
of
of aa scalar,
scalar, 31
31
of
of aa vector,
vector, 31
31
of
uniqueness, 30
via singular value decomposition, 38
Pythagorean Identity, 59

Q-orthogonality,
Q orthogonality, 55
QR factorization,
factorization, 72
72
QR
TO"
JR.n,, 11I
IK
mxn i
MJR.mxn,1
, 1
mxn 11
MlR.~xn,
r
'

Mnxn
JR.~xn,1 I
n
' '

range, 20
20
range,
range inclusion
range
inclusion
characterized by pseudoinverses, 33
rank, 23
column, 23
row,
23
row, 23
rankone
matrix, 19
rank-one matrix,
19
rational canonical form, 104
Rayleigh quotient, 100
reachability, 46
real Schur canonical form,
form, 98
real Schur
Schur form,
form, 98
98
real
reciprocal matrix pencil, 126
reconstructibility, 46
regular matrix pencil, 126
residual, 65
III
resolvent, 111
reverseorder
identity matrix,
matrix, 39,
39, 89
89
reverse-order identity
right eigenvector, 75
right generalized eigenvector, 125
right invertible, 26
right nullspace, 22
right
principal vector,
vector, 85
right principal
85
row
row
rank, 23
vector, 1I
vector,

Schur canonical form,


form, 98
generalized, 127
Schur complement, 6, 48, 102, 104
Schur Theorem, 98
Schur vectors, 98
secondorder eigenvalue
eigenvalue problem,
problem, 135
second-order
135
conversion to
to firstorder
form, 135
conversion
first-order form,
135
Sherman-Morrison-Woodbury
ShermanMorrisonWoodbury formula,
formula,
48
48
signature, 103
signature,
103
similarity transformation,
transformation, 95
and invariance
invariance of eigenvalues,
eigenvalues, 81h

Index
Index
orthogonal, 95
orthogonal,
unitary, 95
simple eigenvalue, 85
simultaneous diagonalization, 133
decomposition, 134
via singular value decomposition,
singular matrix pencil, 126
singular value decomposition
decomposition (SVD), 35
and bases for four fundamental
subspaces, 38
pseudoinverse, 38
and pseudoinverse,
and rank, 38
characterization of a matrix factorcharacterization
ization as, 37
dyadic expansion, 38
examples, 37
compact, 37
full vs. compact,
fundamental theorem, 35
nonuniqueness, 36
singular values, 36
singular vectors
left, 36
right, 36
span, 11
spectral radius, 62, 107
spectral
representation, 97
spectral representation,
spectrum, 76
subordinate norm, 61
61
subspace, 9
A-invariant, 89
Ainvariant,
deflating, 129
reducing, 130
subspaces
complements of, 13
complements
direct sum of, 13
direct
equality of, 10
four fundamental, 23
intersection of, 13
orthogonal, 14
sum of, 13
Sylvester differential
differential equation, 113
Sylvester
Sylvester equation, 144
of solution, 145
integral form of
uniqueness of solution, 145

157
157
Sylvester's Law of Inertia, 103
Sylvester's
symmetric generalized
generalized eigenvalue problem,131
lem,
131
squares, 68
total least squares,
trace, 6
transpose, 2
characterization by inner product, 54
characterization
of a block matrix, 2
of
triangle inequality
for matrix norms, 59
for vector norms, 57
unitarily invariant
matrix norm, 62
vector norm, 58
variation of
of parameters, 112
vec
vec
of a matrix, 145
of
of a matrix product, 147
of
vector norm, 57
1-,57
l, 57
2-,57
2,
57
00-,57
oo,
57
p-,57
P,
51
equivalent, 59
Euclidean, 57
Euclidean,
Manhattan, 57
relations among, 59
unitarily invariant, 58
weighted, 58
p-, 58
weighted p,
vector space, 8
dimension
dimension of, 12
vectors, 1
column, 1
linearly dependent, 10
linearly independent,
independent, 10
linearly
orthogonal, 4, 20
orthonormal, 4, 20
row, 11
row,
of a set of, 11
span of
zeros
of a linear dynamical system, 130
of