Vous êtes sur la page 1sur 11

Chapter 5

Inner products and


orthogonality
Reading
The list below gives examples of relevant reading. (For full publication details, see
Chapter 1.)
Leon, S.J., Linear Algebra with Applications. Chapter 5, Sections 5.1, 5.3, 5.5.
Ostaszewski, A. Advanced Mathematical Methods. 2.3, 2.4, 2.7 and 2.8.
Simon, C.P. and Blume, L., Mathematics for Economists. Chapter 10, Section
10.4.

Introduction
In this short chapter we examine more generally the concept of orthogonality, which
has already been encountered in our work on orthogonal diagonalisation.

The inner product of two real n-vectors


For x, y Rn , the inner product (sometimes called the dot product or scalar
product) is defined to be the number hx, yi given by
hx, yi = xT y = x1 y1 + x2 y2 + + xn yn .
Example: If x = (1, 2, 3)T and y = (2, 1, 1) then
hx, yi = 1(2) + 2(1) + 3(1) = 3.

71

Inner product spaces,


norms, orthogonality,
Gram-Schmidt process

It is important to realise that the inner product is just a number, not another vector
or a matrix.

Inner products more generally


Suppose that V is a vector space (over the real numbers). An inner product on V is
a mapping from (or operation on) pairs of vectors x, y to the real numbers, the result
of which is a real number denoted hx, yi, which satisfies the following properties:
(i) hx, xi 0 for all x V , and hx, xi = 0 if and only if x = 0, the zero vector
of the vector space
(ii) hx, yi = hy, xi for all x, y V
(iii) hx + y, zi = hx, zi + hy, zi for all x, y, z V and all , R.
Some other basic facts follow immediately from this definition: for example,
hz, x + yi = hz, xi + hz, yi.

Activity 5.1 Prove this.

It is a simple matter to check that the inner product defined above for real vectors
is indeed an inner product according to this more abstract definition, and we shall
call it the standard inner product on Rn . The abstract definition, though, applies
to more than just the vector space Rn , and there is some advantage in developing
results in terms of the general notion of inner product. If a vector space has an inner
product defined on it, we refer to it as an inner product space.
Example: Suppose that V is the vector space consisting of all real polynomial functions of degree at most n; that is, V consists of all functions of the form
p(x) = a0 + a1 x + a2 x2 + + an xn , where a0 , a1 , . . . , an R.
The addition and scalar multiplication are, as usual, defined pointwise. Let x1 , x2 , . . . , xn+1
be n fixed, different, real numbers, and define, for p, q V ,
hp, qi =

n+1
X

p(xi )q(xi ).

i=1

Then this is an inner product. To see this, we check the properties in the definition
of an inner product. Property (ii) is clear. For (i), we have
hp, pi =

n+1
X

p(xi )2 0.

i=1

Clearly, if p is the zero vector of the vector space (which is the identically-0 function),
then hp, pi = 0. To finish verifying (i) we need to check that if hp, pi = 0 then p must
be the zero function. Now, hp, pi = 0 must mean that p(xi ) = 0 for i = 1, 2, . . . , n + 1.
So p has n + 1 different roots. But p has degree no more than n, so p must be the
identically-zero function. (A non-zero polynomial of degree at most n has no more
than n distinct roots.) Part (iii) is left to you:
72

Activity 5.2 Prove that, for any alpha, R and any p, q, r V ,


hp + q, ri = hp, ri + hq, ri.

Norms in a vector space


For any x in an inner product space V , the inner product hx, xi is non-negative (by
definition). Now, because hx, xi 0, we may take its square root (obtaining a real
number). We define the norm or length kxk of a vector x to be
p
kxk = hx, xi.
For example, for the standard inner product on Rn ,
hx, xi = x21 + x22 + + x2n ,
(which is clearly non-negative since it is a sum of squares), and we obtain the standard
Euclidean length of a vector:
q
kxk = x21 + x22 + + x2n .

Orthogonality
Orthogonal vectors
We have already said (in the discussion of orthogonal diagonalisation) what it means
for two vectors x, y in Rn to be orthogonal: it means that xT y = 0. In other words,
x, y are orthogonal if hx, yi = 0. We take this as the general definition of orthogonality
in an inner product space:
Definition 5.1 Suppose that V is an inner product space. Then x, y V are said to
be orthogonal if hx, yi = 0. We write x y to mean that x, y are orthogonal.
Example: With the usual inner product on R3 , the vectors x = (1, 1, 0)T and
y = (2, 2, 3)T are orthogonal.
Activity 5.3 Check this!

Geometrical interpretation
A geometrical interpretation can be given to the notion of orthogonality in Rn . Consider a very simple example with n = 2. Suppose that x = (1, 1)T and y = (1, 1)T .
Then x, y are orthogonal, as is easily seen. We can represent x, y geometrically on the
standard two-dimensional (x, y)-plane: x is represented as an arrow from the origin
(0, 0) to the point (1, 1); and y is represented as an arrow from the origin to the point
(1, 1). This is shown in the figure. It is clear that these arrowsthe geometrical
interpretations of x, yare at right angles to each other: they are perpendicular.
73

y
6

(1, 1)

(1, 1)
@
I
@
@

6
@
@
@
@
@
@
(0, 0)

In fact, this geometrical interpretation is valid in Rn , for any n. This is because it


turns out that if x, y Rn then the inner product hx, yi equals kxk kyk cos where
is the angle between the geometrical representations of the two vectors. If neither
x nor y is the zero-vector, then the inner product is 0 if and only if cos = 0, which
means that is /2 or 3/2 radians, in which case the angle between the vectors is
a right angle.

Orthogonality and linear independence


If a set of (non-zero) vectors are pairwise orthogonal (that is, any two are orthogonal)
then it turns out that the vectors are linearly independent:

Theorem 5.1 Suppose that V is an inner product space and that vectors v1 , v2 , . . . , vk
V are pairwise orthogonal (vi vj for i 6= j), and none is the zero-vector. Then
{v1 , v2 , . . . , vk } is a linearly independent set of vectors.
Proof We need to show that if
1 v1 + 2 v2 + + k vk = 0,
(the zero-vector), then 1 = 2 = = k = 0. Let i be any integer between 1 and
k. Then
hvi , 1 v1 + 2 v2 + + k vk i = hvi , 0i = 0.
But, since hvi , vj i = 0 for j 6= i,
hvi , 1 v1 +2 v2 + +k vk i = 1 hvi , v1 i+2 hvi , v2 i+ +k hvi , vk i = i hvi , vi i = i kvi k2 .
So we have i kvi k2 = 0. Since vi 6= 0, kvi k2 6= 0 and hence i = 0. But i was any
integer in the range 1 to n, so we deduce that
1 = 2 = . . . = k = 0,
as required.

74

Orthogonal matrices and orthonormal sets


We have already met the word orthogonal in a different context: we spoke of orthogonal matrices when considering orthogonal diagonalisation. Recall that a matrix
P is orthogonal if P T = P 1 . Now, this means that P T P = I, the identity matrix.
Suppose that the columns of P are x1 , x2 , . . . , xn . Then the fact that P T P = I means
that xTi xj = 0 if i 6= j and xTi xi = 1. To help see this, consider the case n = 3. Then,
P = (x1 x2 x3 ) and since P T P = I we have

T
x1 x1 xT1 x2 xT1 x3
1 0 0
x1
0 1 0 = I = P T P = xT2 ( x1 x2 x3 ) = xT2 x1 xT2 x2 xT2 x3 .
xT3 x1 xT3 x2 xT3 x3
0 0 1
xT3
But, if i 6= j, xTi xj = 0 means precisely that the columns xi , xj are orthogonal. The
second statement is that kxi k2 = 1, which means (since kxi k 0) that kxi k = 1;
that is, xi is of length 1. This indicates the following characterisation: a matrix P
is orthogonal if and only if, as vectors, its columns are pairwise orthogonal, and each
has length 1.
When a set of vectors {x1 , x2 , . . . , xk } is such that any two are orthogonal and,
furthermore, each has length 1, we say that the vectors form an orthonormal set
(ONS) of vectors. So we can restate our previous observation as follows.

Theorem 5.2 A matrix P is orthogonal if and only if the columns of P form an


orthonormal set of vectors.

The Cauchy-Schwarz inequality


This important inequality is as follows.

Theorem 5.3 (Cauchy-Schwarz inequality) Suppose that V is an inner product


space. Then
|hx, yi| kxkkyk
for all x, y V .

Proof Let x, y be any two vectors of V . For any real number , we consider the
vector x + y. Certainly, kx + yk2 0 for all . But
kx + yk2

= hx + y, x + yi
= 2 hx, xi + hx, yi + hy, xi + hy, yi
= 2 kxk2 + 2hx, yi + kyk2 .

Now, this quadratic expression in is non-negative for all . Generally, we know


that if a quadratic expression az 2 + bz + c is non-negative for all z then b2 4ac 0.
Applying this observation, we see that
2

(2hx, yi) 4kxk2 kyk2 0,


or
(hx, yi)2 kxk2 kyk2 .

75

Taking the square root of each side we obtain


|hx, yi| kxkkyk,
which is what we need.

(Recall that |hx, yi| denotes the absolute value of the inner product.)
For example, if we take V to be Rn and consider the standard inner product on Rn ,
then for all x, y Rn , the Cauchy-Schwarz inequality tells us that
v

v
u n
n
n
X
u
X
uX

u
2
t
xi t
yi2 .
xi yi



i=1

i=1

i=1

Generalised Pythagoras theorem


A version of Pythagoras theorem will no doubt be familiar to almost all of you:
namely, that if a is the length of the longest side of a right angled triangle, and b and
c the lengths of the other two sides, then a2 = b2 + c2 . The generalised Pythagoras
theorem is:

Theorem 5.4 (Generalised Pythagoras Theorem) In an inner product space V ,


if x, y V are orthogonal, then
kx + yk2 = kxk2 + kyk2 .
Proof This is easy to prove. We know that for any z, kzk2 = hz, zi, simply from the
definition of the norm. So,
kx + yk2

=
=
=
=
=

hx + y, x + yi
hx, x + yi + hy, x + yi
hx, xi + hx, yi + hy, xi + hy, yi
kxk2 + 2hx, yi + kyk2
kxk2 + kyk2 ,

where the last line follows from the fact that, x, y being orthogonal, hx, yi = 0.

We also have the triangle inequality for norms.

Theorem 5.5 (Triangle inequality for norms) In an inner product space V , if


x, y V , then
kx + yk kxk + kyk.
Proof We have
kx + yk2

= hx + y, x + yi
= hx, x + yi + hy, x + yi
76

= hx, xi + hx, yi + hy, xi + hy, yi


= kxk2 + 2hx, yi + kyk2
kxk2 + kyk2 + 2|hx, yi|
kxk2 + kyk2 + 2kxkkyk
= (kxk + kyk)2 ,
where the last inequality used is the Cauchy-Schwarz inequality. Thus kx + yk
kxk + kyk, as required.

Gram-Schmidt orthonormalisation process


The orthonormalisation procedure
Given a set of linearly independent vectors {v1 , v2 , . . . , vk }, the Gram-Schmidt orthonormalisation process is a way of producing k vectors that span the same space as
is spanned by {v1 , v2 , . . . , vk }, and that form an orthonormal set. That is, the process
produces a set {e1 , e2 , . . . , ek } such that:
Lin{e1 , e2 , . . . , ek } = Lin{v1 , v2 , . . . , vk }
{e1 , e2 , . . . , ek } is an orthonormal set.
It works as follows. First, we set
e1 =

v1
.
kv1 k

Then we define
u2 = v2 hv2 , e1 ie1 ,
and set
e2 =

u2
.
ku2 k

Next, we define
u3 = v3 hv3 , e1 ie1 hv3 , e2 ie2
and set
e3 =

u3
.
ku3 k

Generally, when we have e1 , e2 , . . . , ei , we let


ui+1 = vi+1

i
X

hvi+1 , ej iej , ej+1 =

j=1

ui+1
.
kui+1 k

It turns out that the resulting set {e1 , e2 , . . . , ek } has the required properties.
Example: In R4 , let us find an orthonormal basis for the linear span of the three
vectors
v1 = (1, 1, 1, 1)T , v2 = (1, 4, 4, 1)T , v3 = (4, 2, 2, 0).

77

First, we have
e1 =

v1
v1
1
=
= v1 = (1/2, 1/2, 1/2, 1/2)T .
2
2
2
2
kv1 k
2
1 +1 +1 +1

Next, we have
u2 = v2 hv2 , e1 ie1 = (1, 4, 4, 1)T (3)(1/2, 1/2, 1/2, 1/2)T = (5/2, 5/2, 5/2, 5/2)T ,
and we set
e2 =

u2
= (1/2, 1/2, 1/2, 1/2).
ku2 k

(Note: to do this last step, I merely noted that a normalised vector in the same
direction as u2 is also a normalised vector in the same direction as (1, 1, 1, 1)T ,
and this second vector is easier to work with.) Continuing, we have
u3

= v3 hv3 , e1 ie1 hv3 , e2 ie2


= (4, 2, 2, 0)T (2)(1/2, 1/2, 1/2, 1/2)T (2)(1/2, 1/2, 1/2, 1/2)T
= (2, 2, 2, 2)T .

Then,
e3 =
So

u3
= (1/2, 1/2, 1/2, 1/2)T .
ku3 k

1/2
1/2
1/2

1/2 1/2 1/2


{e1 , e2 , e3 } =
,
,
.
1/2
1/2

1/2

1/2
1/2
1/2

Activity 5.4 Verify that the set {e1 , e2 , e3 } of this example is an orthonormal set.

Orthogonal diagonalisation when eigenvalues are not


distinct
We have seen in an earlier chapter that if a symmetric matrix has distinct eigenvalues, then (since eigenvectors corrsponding to different eigenvalues are orthogonal) it
is orthogonally diagonalisable. But, in fact, n n symmetric matrices are always
orthogonally diagonalisable, even if they do not have n distinct eigenvalues.
What we need for orthogonal diagonalisation is an orthonormal set of n eigenvectors.
If it so happens that there are n different eigenvalues then any set of n corresponding
eigenvectors form a pairwise orthogonal set of vectors, and all we need do to transform
the set into an orthonormal set is normalise each vector. However, if we have repeated
eigenvalues, more care is required. Suppose that 0 is a repeated eigenvalue of A, by
which we mean that, for some k 2, ( 0 )k is a factor of the characteristic
polynomial of A. The multiplicity of 0 is the largest k for which this is the case.
The eigenspace corresponding to 0 is
E(0 ) = {x : (A 0 )x = 0},
the subspace consisting of all eigenvectors corresponding to 0 , together with the
zero-vector 0. An important fact, which we shall not prove here, is that, if A is
symmetric, the dimension of E(0 ) is exactly the multiplicity k of 0 . This means
that there is some basis {x1 , x2 , . . . , xk } of size k of the eigenspace E(0 ). We can
78

use the Gram-Schmidt orthonormalisation process to produce an orthonormal basis


of E(0 ). Eigenvectors from different eigenspaces are orthogonal (and hence linearly
independent). So if we compose a set of n vectors by taking orthonormal bases for
each of the eigenspaces, the resulting set is orthonormal, and we can orthogonally
diagonalise the matrix A by means of the matrix P with these vectors as its columns.

Learning outcomes
At the end of this chapter and the relevant reading, you should be able to:
explain what is meant by an inner product on a vector space
verify that a given inner product is indeed an inner product
compute norms in inner product spaces
explain why orthogonality of a set of vectors implies linear independence
explain what is meant by an orthonormal set of vectors
explain why an nn matrix is orthogonal if and only if it possesses an orthonormal set of n eigenvectors
know and apply the Cauchy-Schwarz inequality, the Generalised Pythagoras
Theorem, and the triangle inequality for norms
use the Gram-Schmidt orthonormalisation process

Sample examination questions


The following are typical exam questions, or parts of questions.
Question 5.1 Let V be the vector space of all m n real matrices (with matrix
addition and scalar multiplication). Define, for A = (aij ) and B = (bij ) V ,
hA, Bi =

n
m X
X

aij bij .

i=1 j=1

Prove that this is an inner product on V .


Question 5.2 Prove that in any inner product space V ,
kx + yk2 + kx yk2 = 2kxk2 + 2kyk2 ,
for all x, y V .
Question 5.3 Suppose that v Rn . Prove that the set of vectors orthogonal to v,
W = {x Rn : x v}, is a subspace of Rn . How would you describe this subspace
geometrically? More generally, suppose that S is any (not necessarily finite) set of
vectors in Rn and let S denote the set
S = {x Rn : x v for all v S}.
Prove that S is a subspace of Rn .
79

Question 5.4 Use the Gram-Schmidt process to find an orthonormal basis for the
subspace of R4 spanned by the vectors



1
1
0
0
2
1
v1 = , v2 = , v3 = .
1
1
2
0
1
1

Sketch answers or comments on selected questions


Question 5.1 Property (i) of the definition of inner product is easy to check:
hA, Ai =

m X
n
X

a2ij 0,

i=1 j=1

and this equals zero if and only if for every i and every j, aij = 0, which means that
A is the zero matrix, which in this vector space is the zero vector. Property (ii) is
easy to verify, as also is (iii).

Question 5.2 We have:


kx + yk2 + kx yk2

= hx + y, x + yi + hx y, x yi
= hx, xi + 2hx, yi + hy, yi + hx, xi 2hx, yi + hy, yi
= 2hx, xi + 2hy, yi
= 2kxk2 + 2kyk2 .

Question 5.3 Suppose x, y W and , R. Because x v and y v, we have


(by definition) hx, vi = hy, vi = 0. Therefore,
hx + y, vi = hx, vi + hx, vi = (0) + (0) = 0,
and hence x + y v; that is, x + y W . Therefore W is a subspace. In fact,
W is the set {x : hx, vi = 0}, which is the hyperplane through the origin with normal
vector v. (Hyperplanes are discussed again in a later chapter.) We omit the proof
that S is a subspace. This is a standard result, which can be found in the texts: S
is known as the orthogonal complement of S.

Question 5.4 To start with,

e1 = v1 /kv1 k = (1/ 2)(1, 0, 1, 0)T .


Then we let


1
1
0
2
1
2
0
2

u2 = v2 hv2 , e1 ie1 = = .
1
0
2 2 1
1
0
1
Then


0
u2
1 2
e2 =
= .
ku2 k
5 0
1
80

Next,

1
1/5
u3 = v3 hv3 , e2 ie2 hv3 , e1 ie1 = =
.
1
2/5

Normalising u3 we obtain
1
e3 = (5, 1, 5, 2)T .
55
The required basis is {e1 , e2 , e3 } .

81

Vous aimerez peut-être aussi