Vous êtes sur la page 1sur 42

Linear Algebra and Applications

Benjamin Recht
February 28, 2005
The Gameplan
Basics of linear algebra
matrices as operators
matrices as data
matrices as costs/constraints
1
Basics of linear algebra
A vector is a collection of real numbers arranged in an array.
Vectors can be multiplied by real numbers and added to one
another.
Lowercase letters like x, y, z will denote arrays of size n1. The
set of all n1 vectors is denoted R
n
. Capital letters like A, B, C, D
will denote mn dimensional arrays and are called matrices. The
set of all mn matrices is denoted R
mn
.
The entries of vectors and matrices are given by non-boldfaced
letters. For example, the element in the ith row and jth column
of the matrix A is A
ij
.
2
Bases
If x
1
, . . . , x
N
are vectors, a linear combination is a sum

N
k=1
a
k
x
k
which is also a vector.
A set of vectors x
1
, . . . , x
N
in linearly independent if

N
k=1
a
k
x
k
=
0 only when a
k
= 0 for all k.
A basis is a linearly independent set of n-vectors e
1
, . . . , e
n
such that any n-vector v can be written as a linear combina-
tion of the e
k
. That is, v =

n
k=1
a
k
e
k
for some a
k
.
3
Matrices as Operators (1)
linearity: If x, y are vectors then ax + by is a vector for any
scalars a and b.
linearity (2): f : R
m
R
n
is linear if f(ax + by) = af(x) +
bf(y) for all x, y R
m
and scalars a, b.
fact: If f is linear, then there is an nm matrix A such that
f(x) = Ax.
A n m, B ml, then AB is n l.
4
Matrices as Operators (2)
Let 11 denote the map where 11x = x for all x.
A n n, if there exists a matrix A
1
such that A
1
A = 11
then A
1
is called the inverse of A.
FACT: A is invertible (i.e., A has an inverse) if and only if
the columns of A are linearly independent (and hence form
a basis).
A

is the transpose of A. If A
ij
is the entry in the ith row
and jth column of A, A
ji
is the entry in the ith row and jth
column of A

. A matrix is symmetric if A

= A.
5
Matrices as operators (3)
If x and y are vectors, y

x is 11, a scalar. This is the inner


product of x and y.
If y

x = 0 then x and y are orthogonal. If furthermore


x

x = y

x = 1 then the vectors are orthonormal.


xy

is n n. This is the outer product of x and y


6
Matrices and Systems
An n n matrix A can map n-vectors over time. Continuous
system:
dx
dt
= Ax(t)
Discrete time system:
x[n] = Ax[n 1]
7
Continuous Time Solution
dx
dt
= Ax(t) ANSATZ: x(t) = exp(At)x(0)
Dene:
exp(At) =

k=0
1
k!
(At)
k
= 11 +At +
1
2
A
2
t
2
+
1
6
A
3
t
3
+. . .
taking d/dt gives
d
dt
exp(At) =

k=1
k
k!
A
k
t
k1
= A

k=1
1
(k 1)!
A
k1
t
k1
= Aexp(At)
proving
d
dt
exp(At)x(0) = Aexp(At)x(0)
8
Properites of the exponential map
If S is invertible, exp(SAS
1
) = Sexp(A)S
1
.
If D is diagonal, E = exp(D) is diagonal and E
jj
= exp(D
jj
)
9
Discrete Time Solution
x[n] = A
n
x[0]
Here the proof is immediate.
10
Analysis
What can we say about linear systems without simulation?
Does the system oscillate?
Does the system converge to zero?
Does the system diverge to innity?
11
Eigenvalues
If A is an n n matrix, is an eigenvalue of A if Av = v for
some v ,= 0. v is an eigenvector.
FACT: If v
1
and v
2
are both eigenvectors of A with eigenvalues

1
,=
2
, then v
1
and v
2
are linearly independent.
Proof By contradiction, assume there exist nonzero a and b
such that
av
1
+bv
2
= 0 = A(av
1
+bv
2
) = 0 = a
1
v
1
+b
2
v
2
= 0
12
Multiply the rst equation by
1
and subtract to nd
b(
2

1
)v
2
= 0
which is a contradiction.
FACT: If A has n distinct eigenvalues then its eigenvectors are
linearly independent.
FACT: If S = [x
1
, . . . , x
n
] then S
1
AS is a diagonal matrix.
Proof
S
1
AS = S
1
[
1
x
1
, . . . ,
n
x
n
]
= S
1
[x
1
, . . . , x
n
]D
= S
1
SD = D
13
FACT: If A is symmetric, Ax =
1
x and Ay =
2
y then
1
,=

2
= y

x = 0.
Proof
y

Ax = y

(Ax) =
1
y

x
= (Ay)

x =
2
y

x
Since
1
,=
2
, we have y

x = 0
Corollary: The eigenvectors of a symmetric matrix A may be
chosen to be orthonormal. If S = [x
1
, . . . , x
n
] then S

S = 11 and
S

AS is a diagonal matrix.
14
Stability: Continuous Time
Suppose A has n distinct eigenvalues or is symmetric. Then
x(t) = exp(At)x(0) = S
1
exp(Dt)Sx(0)
If Re(
n
) < 0 for all n, lim
t
x(t) = 0. The system is stable
If Re(
n
) 0 for all n, |x(t)| < for all t. The system is
oscillating.
If Re(
n
) > 0 for any n, then lim
t
x(t) = . The system
is unstable.
15
Stability: Discrete Time
Suppose A has n distinct eigenvalues or is symmetric. Then
x[k] = A
k
x[0] = S
1
D
k
Sx(0)
If [
n
[ < 1 for all n, lim
k
x[k] = 0. The system is stable
If [
n
[ 1 for all n, |x[k]| < for all k. The system is
oscillating.
If [
n
[ > 1 for any n, then lim
k
x[k] = . The system is
unstable.
16
Random Vectors
Suppose we are observing a process described by a list of d
numbers
x =
_

_
x
1
.
.
.
x
d
_

_
If each x
i
are random variables x is a random vector.
The joint probability distribution is given by p(x). We have
_

p(x)dx
1
. . . dx
d
= 1
17
Moments and Expectations
The expected value of a matrix valued function A(x) is a
matrix E[A] with entries
_
A
jk
(x)p(x)dx.
The mean: x = E[x].
The correlation: R
x
= E[xx

].
The covariance:
x
= R
x
xx

18
Positive Semidenite Matrices
A matrix Q is positive semidenite (psd) if Q = Q

and for all


x, x

Qx 0. This is denoted Q 0
If P 0 and Q 0 and a > 0, then aQ 0 and Q+P 0.
If Q 0, then Q is diagonalizable and has only nonnegative
eigenvalues. Moreover, the eigenvectors can be chosen to be
orthonormal.
The outer product xx

is psd. The correlation and covari-


ances matrices of a random vector are psd.
19
Matrices as Data
N data points x
1
, . . . x
N
each consisting of a list of d numbers
x
n
= x
1n
, . . . , x
dn

Images (e.g., 640x480 pixels)


Audio (e.g., samples)
Diagnostics (e.g., lab results)
The data matrix is dened to be X where X
ij
= x
ij
. It is d N.
20
Empirical Statistics
mean x
N
=
1
N

N
i=1
x
i
zero-mean data matrix

X
N,ij
= x
ij
x
N,i
covariance
N
=
1
N

X
N

N
.
N
0
gram matrix K
N
=

X

X
N
. K
N
0
21
Multivariate Gaussians
p(x) =
1
_
[2[
exp
_

1
2
(x x)

(x x)
_
The mean of this random vector is x. The covariance is .
Since 0, there is a matrix C such that = C

C with
C

C = 11 and is a diagonal matrix. The random vector y =


C(x x) has zero mean and covariance . That means the
components of y are independent.
22
Marginal and Conditional Moments
Let y = Ax + v with v constant. Then y = Ax + v and

y
= A
x
A

Let z = [x, y] be gaussian. Then


x =
_
x
y
_
and
z

_

x

xy

yx

y
_

xy
=

yx
p(x[y) is a gaussian with mean x+
xy

1
y
(y y) and variance

xy

1
y

xy
.
23
PCA
Given a zero-mean random vector x, let us suppose that we want
to represent x as a

i
a
i
y
i
with the y
i
uncorrelated. Then the
best solution is to have y
i
= C
i
, a
i
=
ii
.
When we only have nitely many examples,
N
is the best es-
timate of the actual covariance. Given a matrix of data X,

N
= C
N

N
C

N
. So we can use y
i
= C
N,i
and a
i
=
N,ii
.
If we only have a small number of data points as compared to
dimensions, diagonalizing
N
can be very computer intensive.
The Singular Value Decomposition makes this tractable.
24
Singular Value Decomposition
If A is a matrix of size mn then there exists orthogonal matrices
V (mm) and W (n n) such that
V

AW= diag(
1
, . . . ,
p
)
with
1
. . .
p
0, p = min(m, n).
Proof Without loss of generality, assume m n. Since A

A
0, the eigenvalues of A

A are equal to
2
1

2
2
. . .
2
n
0 for
some
k
0. Let r be the largest number for which
r
> 0.
25
Let x
k
be norm 1 eigenvectors of A

A corresponding to
2
k
for
k = 1, . . . , r. Let y
k
= Ax
k
/
k
. Since |Ax
k
|
2
= x

k
A

Ax
k
=
2
k
,
y
k
are norm 1. Furthermore, y

j
y
k
=
1

k
x

j
A

Ax
k
= 0 when
k ,= j so the y
k
are orthonormal.
Completing y
k
to an orthonormal basis for R
n
gives matrices
W [x
1
, . . . , x
n
] and V [y
1
, . . . , y
r
, V
2
]. It is easy algebra to
check that V

AW has the desired form (See Problem 2).


26
SVD(A,0)
When A is a matrix of m n with m > n, we would rather
compute the eigenvalues of A

A than of AA

. Furthermore,
we need only compute the rst n columns of V for A = VSW

to hold.
The matlab command:
[V,S,W]=svd(A,0)
performs this computation eciently.
27
Matrices as cost/constraints
We will frequently encounter cost functions and constraints de-
ned by matrices:
linear equalities: Ax = b
linear inequalities: Ax b
linear cost: c(x) = c

x
quadratic cost: c(x) = x

Ax +b

x
least squares: c(x) = |Ax b|
2
28
Linear Constraints
Note that linear equalities and linear inequalities are interchange-
able by adding constraints or variables:
Ax = b Ax b and Ax b
Ax b Ax = b +s and s 0
Such s are called slack variables
29
Unconstrained Quadratic Programming
min
x
x

Ax =
_
_
_
0 A 0
otherwise
min
x
x

Ax 2b

x +c
Dierentiate with respect to x to nd that at the optimum
Ax = b
If A is invertible then the minimum is b

A
1
b +c
30
Schur Complements
Let
M=
_
A B
B

C
_
The Schur complement of C in A is given by
(M[A) = CB

A
1
B
Similarly
(M[C) = ABC
1
B

31
Facts about Schur complements:
M 0 C 0 and (M[C) 0
M
1
=
_
(M[C)
1
A
1
B(M[A)
1
C
1
B

(M[C)
1
(M[A)
1
_
32
More Quadratic Programming
For the quadratic minimization
min
x
2
_
x
1
x
2
_

_
A B

B C
_ _
x
1
x
2
_
2
_
b
1
b
2
_

_
x
1
x
2
_
x

2
= C
1
(b
2
Bx
1
)
Plug that back into the cost function:
x

1
(M[C)x
1
2(b
1
BC
1
b
2
)

x
1
33
Matrix Inversion Lemma
(ABC
1
B

)
1
= A
1
+A
1
B(CB

A
1
B)
1
B

A
1
To check this, apply the partitioned matrix formula twice and
set the rst blocks equal to each other.
Standard form: C C
1
(A+BCB

)
1
= A
1
A
1
B(C
1
+B

A
1
B)
1
B

A
1
34
Problem1: Emergence of thermodynamics
The following ODE describes the time evolution of a set of
coupled masses and springs called the Caldeira-Leggett (CL)
model
dx
k
dt
= p
k
/m
k
for k = 0, . . . , N
dp
0
dt
= m
0

2
x
0
+
N

k=1
g
k
(
k
x
k
g
k
x
0
/m
k
)
dp
k
dt
= m
k

2
k
x
k
+g
k

k
x
0
for k = 1, . . . , N
x
k
and p
k
respectively denote the position and momenta of the
kth spring (see gure).
35
Let N = 200, = 1, = 1, m
k
= 1,
k
= 10k/N, and g
k
=
_
40/(N). Is the system stable, oscillatory, or unstable?
How many oscillatory modes are there?
36
Write a program to compute x
0
(t) with the initial condition
x
0
(0) = 1, x
k
(0) = 0 for all k = 1, . . . , N. Plot x
0
(t) from
t = 0 to t = 100.
Consider the system
dQ
dt
= P/m
0
dP
dt
= m
0
(
2
+
2
)Q2P
with the same parameter settings as above. Is this system
stable, oscillatory, or unstable? Analytically compute Q(t)
as a function of time. Plot Q(t) for t = 0 to t = 100 with
Q(0) = 1 and compare to the output of the CL model.
37
Problem2: Eigenfaces
Download the database of faces o the class website.
Finish the proof of the singular value decomposition. That
is, verify that
V

AW= diag(
1
, . . . ,
p
)
Compute the SVD of the data matrix. What do the principle
components look like as images?
38
Generate 4 a
1
, a
2
, a
3
, a
4
numbers drawn independently from
a gaussian and compute the image
a
1

1
V
1
+a
2

2
V
2
+a
3

3
V
3
+a
4

4
V
4
This is an eigenface
Compute an eigensomething for a something of your choice.
39
Problem 3: Networks of Resistors
V
out
V
8
V
7
V
6
V
5
V
4
V
3
V
2
V
1
R
1
R
1
R
1
R
1
R
1
R
1
R
2 V
c
V
b
V
a
V
g
V
f
V
e
V
d
R
2
R
2
R
2
R
2
R
2
R
2
R
2
R
2
V
h
R
1
R
1
Recall from electronics that the voltage drop across a resistor
(i.e., the dierence of the voltages at either end) is equal to
the current across the resistor times the voltage. Furthermore,
remember that the sum of all currents into a node must equal
zero. In equations that is:
gV = I,

I
i
Aa
I
i
= 0
40
where g = 1/R is the conductance of a resistor.
Write down these two conditions as matrix constraints on
the resistor network. That is, nd an 17 18 matrix G and
an 8 17 matrix K such that
GV = I and KI = 0
If n is a number between 0 and 255, let b
8
b
7
b
6
b
5
b
4
b
3
b
2
b
1
be
the binary expansion. If R
1
= 20K and R
2
= 10
K
, what is
V
out
when V
i
= b
i
for i = 1, . . . 8?
41

Vous aimerez peut-être aussi