Vec Spac and Lin Alg

SESA2021 Engineering Analysis: Vector Spaces and Linear Algebra Lecture Notes 2009/2010.
Lecturer: Dr A. A. Shah School of Engineering Sciences, University of Southampton Room 1027, Building 25 e-mail: A.Shah@soton.ac.uk
Copyright c 2010 University of Southampton
Contents
1 Introduction and example applications 2 Basic denitions and examples 3 Subspaces of vector spaces 4 Linear Transformations 5 Span 6 Linear independence 7 Basis and dimension 8 Changing the basis 9 Fundamental subspaces 10 Square matrices and systems of linear equations 11 Inner product spaces and orthogonality 12 Orthogonal and orthonormal bases 13 Orthogonal projections 14 The Gram-Schmidt process 15 Least squares approximations 1 6 10 11 15 20 23 26 35 47 51 62 67 74 80
Engineering Analysis SESA 2021

x1 x2
k1
k2
k1
Figure 1: A mechanical system with 2 masses and 3 springs (vibration example 1.1).
Introduction and example applications
In this course we will be concerned primarily with solving systems of linear equations (including eigenvalue problems), which are dicult to avoid in any aspect of engineering. These systems, which can be very large, can be written as equations involving matrices and vectors. Lets look at some examples in which vectors, matrices and eigenvalues arise. Example 1.1 Consider the system with 2 masses and 3 springs shown in Figure 1. We can use Newtons second law along with Hookes law to write down a system of equations for the displacements, x1 and x2 , of the two masses
m1 + k1 x1 + k2 (x1 x2 ) = 0 x m2 + k1 x2 + k2 (x2 x1 ) = 0 x where k1 and k2 are the spring constants. These equations can be written as
(1)
Page 1
x1 =
k1 + k2 m
x1 +
k2 x2 m x2
k2 x2 = x1 m
k1 + k2 m
(2)
We could now write the system in matrix form by rst introducing a vector form of the solution: x = (x1 , x2 ). Then
x1 x1 = x2 x2
x A x
(3)
where = (k1 + k2 )/m and = k2 /m. The obvious thing to do is look for oscillatory solutions that are of the form
v1 it x = veit = e v2
(4)
where is the vibration frequency. The new vector v contains just constants, v1 and v2 , which we would want to nd. The variable part is in the eit . Substituting (4) into (3) and cancelling the eit terms on both sides gives us a new system of equations for v
v1 v1 2 = v2 v2
or
Av = 2 v
(5)
This is an example of an eigenvalue problem, i.e., something of the form: a
Page 2
1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1 0 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 00 11 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1 0 1 0 1111111111111111111111 0000000000000000000000 1 0 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 11 00 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 11 1111 00 0000 1111111111111111111111 0000000000000000000000 11 11 00 00 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 11 00 1111111111111111111111 0000000000000000000000 11 00 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1 11 0 00 1111111111111111111111 0000000000000000000000 1 11 0 00 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000
Temperature
Time
Figure 2: Output from an experiment in which temperature is measured with time.

The objective is to get the best straight line t to the data. Data tting example 1.2.
transformation (in this case a matrix A) acting on a object (the vector v) and giving us a constant (in this case 2 ) times the object. We may now ask how many solutions there are and what they look like. In this present case we would be most interested in the frequencies of vibration and the corresponding solutions (normal modes). It turns out that there are 2 frequencies because there are two degrees of freedom.
Example 1.2 Suppose you have run an experiment and collected some data that you would like to t to a line or curve. Lets say youve taken measurements of temperature against time and expect a linear rise in temperature but due to experimental error, not all points will fall nicely onto a straight line, as seen in Figure 2. Lets say there are 4 temperature measurements T1 to T4 taken at times t1 to t4 , and we want to represent temperature as T = a + bt. We need to Page 3
nd a and b. If we take the two points T1 and T2 at times t1 and t2 we have T1 = a + bt1 T2 = a + bt2
which we can rearrange to nd a and b. The problem is that if we use another two points we will get dierent values of a and b. Lets dene the vector (T1 , T2 , T3 , T4 ). We need to nd one value for a and one for b. The matrix equation we need to solve is
T1 a + bt1 T2 a + bt2 = T3 a + bt3 a + bt4 T4
1 t1 1 t2 a = 1 t3 b 1 t4
(6)
Notice however, that we have more equations than variables (a and b)! This is an example of anoverdetermined system. How do we solve the system? Well, we cant solve it exactly but what we can do is nd the best t in some sense. One method we will look at to do this is called least squares. Example 1.3 Partial dierential equations (conservation laws) from aerodynamics are usually solved on a computer. The numerical solutions are constructed rst by discretising the equations using the nite dierence, nite volume or nite element methods in space together with a time-stepping procedure if the equations are unsteady. This means that you approximate the solutions at discrete points in time and space and try to nd the solutions at those points. By discretising the equations this way you will end up with large matrix
Page 4
systems. The more points (ner mesh) you chose and the higher the dimension the larger the systems become. There are a great many ways of solving these systems depending on the accuracy and speed required and the stability of the methods. To understand these methods and to choose the most appropriate (in for example a CFD code) for a given problem you need to understand some linear algebra (theory of matrix systems). We rst need to develop the ideas of vectors and transformations (for our purposes these are matrices) so that you are familiar with the language used to describe matrix systems.
Page 5

y
Figure 3: Vectors in the plane R2 (left) and in space R3 (right). For both these spaces
we can represent vectors graphically but in higher dimensions this is obviously not possible.
Basic denitions and examples
A vector is a quite general object. It doesnt have to be a directed line segment in space or in the plane, as shown in Figure 3. In fact we can have vectors in higher dimensions, as shown in example 1.2. More on this below. When we look at a particular set of vectors we will call it a vector space and give it a symbol like V . The individual vectors will be given symbols like u or v . So what is a vector space and what isnt? Lets look at a familiar example. Example 2.1 Euclidean n spaces, denoted Rn , are vector spaces (we will see why in a second). In this course we will deal almost exclusively with these vector spaces. There are two that you are familiar with: R2 (two dimensional space) and R3 (three-dimensional space). An example of a vector in R2 is u = (2, 3), which is Page 6
sometimes written as 2e1 +3e2 . The numbers 2 and 3 are called the coordinates of the vector u = (2, 3) in the standard basis vectors e1 = (1, 0) and e2 = (0, 1). We will look at these concepts in detail later on. Vectors in R2 and R3 can be represented graphically as shown in Figure 3. More generally we can dene vectors that have n coordinates. These are vectors in the vector space Rn . For example, the vector (T1 , T2 , T3 , T4 ) in example 1.2 is in R4 and (1, 0, 1, 2, 4, 0, 1) is in R7 . We are not able to visualise these in a graph. To construct a vector space we basically take a bunch (set) of vectors and dene ways of adding them together and multiplying them by numbers (scalars). Lets recall some basic facts about the familiar vectors in the Euclidean 2 and 3 spaces R2 and R3 . These will help us to understand what a vector space is precisely. (1) On R2 (space) we can add two vectors as follows:
(1, 1) + (2, 5) = (1 + 2, 1 + 5) = (3, 4)
(7)
i.e., we just add the individual coordinates. We can add vectors in any Rn space in this way. Note that we have chosen to dene addition in this way. We could instead choose another way. By doing it as above, we have made sure that the sum of two vectors in R2 is another vector in R2 .
(2) On R2 we can multiply a vector by a scalar (number) as follows:
Page 7
2(1, 3) = (2 1, 2 3) = (2, 6)
(8)
where we just multiply each coordinate by the scalar (number) 2. We can multiply vectors in any Rn space by a scalar in this way. Again, we have dened multiplication by a scalar in a certain way. We could instead choose another way. By doing it as above, we have made sure that multiplying a vector in R2 by a scalar gives another vector in R2 .
(3) Now that we have dened a way of adding vectors in Rn (add individual components) and of multiplying them by scalars (multiply each component by the scalar), it doesnt matter which way round we add vectors in Rn or which way round we multiply them by scalars. There are some obvious rules, such as
(i) (ii)
u+v =v+u
e.g.
(2, 1) + (1, 0) = (1, 0) + (2, 1) e.g. (3 + 2)(2, 1) = 3(2, 1) + 2(2, 1)
(9)
(c + k)u = cu + ku
for any three vectors u , v , w in Rn and any scalars c and k.
(4) In R2 we have a zero vector 0, i.e. (0, 0). When we add 0 to any vector, e.g.
(2, 1) + (0, 0) = (2, 1)
the vector doesnt change. Page 8
In a general vector space V we have to dene the way we add vectors and multiply them by scalars. When constructing these denitions, we have to make sure that the rules above for the familiar way of doing things in Rn are preserved. For V to be a vector space:
The way we add vectors in V has to lead to other vectors in V . We say that V is closed with respect to addition if this is true. When a vector in V is multiplied by a scalar, the answer must be another vector in V . We say that V is closed with respect to scalar multiplication if this is true. The way we dene addition and scalar multiplication of the vectors in V has to preserve rules (9) and other similar rules. V has to have a zero vector and adding it to any vector should not change that vector.
If just one of these requirements is not satised, V will NOT be a vector space. Example 2.2 Lets dene vector addition in R2 in the usual way (add individual components), but instead of the usual scalar multiplication we will use
cu = c(u1 , u2 ) = (u1 , cu2 )
(10)
i.e., we only multiply the second coordinate. Lets try to satisfy the last rule in Page 9
Engineering Analysis SESA 2021 equations (9) with any vector in R2 and any two scalars: 2 (1, 1) + 3 (1, 1) = (1, 2) + (1, 3) = (2, 5) but 5 (1, 1) = (1, 5) = 2 (1, 1) + 3 (1, 1) So, dening scalar multiplication this way does NOT lead to a vector space. We can also treat functions and even more abstract objects as vectors in vector spaces. In this course, however, we will not consider these types of spaces, which are usually referred to as function spaces. One last bit of notation. If V consists of vectors v1 , v2 , v3 , ....., vn , we use curly brackets as follows V = {v1 , v2 , v3 , ....., vn } to represent this set of vectors. For example, if we have a set of vectors consisting of v1 = (1, 0) and v2 = (0, 1), we write V = {(1, 0), (0, 1)}
Subspaces of vector spaces
For some vector spaces it is possible to take a subset W (i.e. some of the vectors) of the original space V and obtain a new vector space using the same rules for addition and scalar multiplication. We call W a subspace of V . There are some very important subspaces we will meet later on.
Page 10
It turns out that to be a subspace, we only need to make sure that the subspace is closed with respect to addition and scalar multiplication, i.e., when we add vectors in W or multiply a vector in W by a scalar we get another vector in W . Example 3.1 Let W be the set of all vectors in R3 that are of the form (0, u2 , u3 ), i.e., the rst coordinate is zero. Is this a subspace of R3 with the usual rules for addition and scalar multiplication? To nd out we need to verify that addition and scalar multiplication of vectors (0, u2 , u3 ) lead to vectors of the form (0, u2 , u3 ). This is the case (Exercise: Check that it is), so the space consisting of such vectors is a subspace of R3 . Example 3.2 Let W be the set of all vectors (u1 , u2 ) in R2 such that u1 0, i.e., the rst coordinate non-negative. Is this a subspace of R2 with the usual rules for addition and scalar multiplication? Multiply (u1 , u2 ) by c < 0. We get (cu1 , cu2 ), where the rst coordinate is negative. Therefore, this space is not closed with respect to scalar multiplication (multiplication by a negative scalar will give a vector that is not in W ). Therefore, W is NOT a subspace of R2 .
Linear Transformations
The idea of a transformation (or map) is that it takes a vector, say u in a space V , and transforms or maps it into another vector Au in a space W , which may or may not be the same as V . This is like a function f (x) taking a number x and giving us another number y = f (x). Page 11
When we are dealing with the Euclidean n spaces, we can write a transformation as a matrix. In what sense is a matrix a transformation? Lets take a look at an example. Example 4.1 Consider the following 3 3 matrix 1 2 0 A = 1 2 1 3 2 5
(11)
Lets take the vector u = (1, 3, 0) in R3 and transform it with A (i.e. left multiply it by A) into another vector b in R3
1 1 2 0 Au = 1 2 1 3 3 2 5 0
A u
2 = 7 3
b
(12)
In this example, A takes a vector in R3 and transforms it by multiplication into another vector in R3 . We write A : R3 R3 to signify this. This is pronounced A maps R3 to R3. The output vector b is called the image of u under A. In the general case, an n m (n rows and m columns) matrix Anm takes any vector in Rm and transforms it by multiplication into a vector in Rn , i.e., Anm :
Rm Rn .
Page 12
The domain of Anm is the set of inputs, which is Rm . The range of Anm is the set of all possible outputs (images) in Rn . Lets take a look at some more examples: Example 4.2 Consider the following multiplication (transformation) of a vector u by a 4 3 matrix A, which leads to another vector b 1 1 5 0 4 3 9 2 1 3 7 3
A
2 2 7 1 = 19 1 2
u in
(13)
R3
b in
R4
In this example we multiply a column vector in R3 (the domain) by A and get a column vector in R4 , so A : R3 R4 . The set of possible outputs in R4 is the range of A. Clearly, b is in the range of A (it is one of the possible outputs). Example 4.3 Consider the following multiplication of a vector by a 3 2 matrix A
3 7 1 1 4 3
A
1 1 = 3 2 10 u in R2 b in R3
(14)
Page 13
This time we multiply a column vector in R2 (the domain) by A and get a column vector in R3 , so A : R2 R3 . The set of possible outputs in R3 is the range of A. b is in the range of A. From the rules of matrix multiplication, we know that for any matrix A and vectors u and v
A(u + v) = Au + Av
and
A(cu) = cAu
(15)
where c is any number (scalar). These rules tell us that A preserves linear combinations. Example 4.4 2 3 A= 1 1 Then (17) 1 u= 1 1 v= 0
(16)
2 3 1 2 3 1 Au + Av = + 1 1 1 1 1 0 1 2 3 = = + 1 1 2 and Page 14
1 1 2 3 0 3 A(u + v) = A = = 1 + 0 1 1 1 1
(18)
Exercise: Check that A(5u) = 5Au, i.e., that A times 5u is the same as 5 times Au. Because matrices satisfy the rules (15) we call them linear transformations (or linear maps).
Span
We want to be able to write all vectors in a space V as sums of some special fundamental vectors. We will build up to this slowly over the next few sections. Without perhaps knowing it, you have already done this in the Euclidean spaces using the standard basis vectors in example 2.1. Lets look at an example. Example 5.1 The standard basis vectors in R3 are
e1 = (1, 0, 0)
e2 = (0, 1, 0)
e3 = (0, 0, 1)
You can write any vector in R3 as a linear combination of e1 , e2 and e3 . This means that any vector is a constant times e1 plus a constant times e2 plus constant times e3 . For example:
Page 15
(2, 3, 1) = 2 e1 + 3 e2 + 1 e3 = 2(1, 0, 0) + 3(0, 1, 0) + (0, 0, 1) = (2, 3, 1) and this holds in all of the Euclidean spaces. We can generalise this idea of linear combinations to general vector spaces. We say that a vector w in V is a linear combination of vectors v1 , v2 , v3 , ..., vn (all in V ) if it can be written as: (19)
w = c1 v1 + c2 v2 + ....... + cn vn
(20)
for some scalars c1 , c2 , ...cn . Example 5.2 Is u = (12, 20) in R2 a linear combination of v1 = (1, 2) and v2 = (4, 6)? If it is, then
(12, 20) = c1 (1, 2) + c2 (4, 6) or c1 + 4c2 = 12 and 2c1 6c2 = 20
(21)
(22)
The solution to these equations is c1 = 4 and c2 = 2. So u = 4v1 2v2 , i.e., u is a linear combination of v1 and v2 . Page 16
Engineering Analysis SESA 2021 Example 5.3 Is u = (1, 4) in R2 a linear combination of v1 = (2, 10) and v2 = (3, 15)? If it is, then
(1, 4) = c1 (2, 10) + c2 (3, 15) or 2c1 3c2 = 1 10c1 15c2 = 4 2c1 3c2 = 4 5
(23)
(24)
The second equation contradicts the rst, so there is no solution. u is NOT a linear combination of v1 and v2 . Suppose we have a vector space V . We are interested in nding a set S of vectors from V that allows us to write any vector in V as a linear combination of the vectors in S. Lets look at an example. Example 5.4 Any vector in R3 can be written as a linear combination of the standard basis vectors e1 , e2 and e3 . We say that these vectors span R3 , i.e., all vectors in R3 can be written as linear combinations of them. Remember that we write a set of vectors inside curly brackets, so the set of basis vectors in R3 is written {e1 , e2 , e3 }. To signify that this set spans R3 we write
R3 = span {e1 , e2 , e3 }
We now generalise the idea of a span to general vector spaces:
Page 17
Let S = {v1 , v2 , ....vn } be a set of vectors in a space V and let W be the set of all linear combinations of the vectors in S. The set W is called the span of the vectors v1 , v2 , ....vn and we write
v W = span S = span {1 , v2 , ..., vn }
Example 5.5 Do the following vectors span R3 ?
v1 = (2, 0, 1)
v2 = (1, 3, 4)
v3 = (1, 1, 2)
If they do, then any vector in R3 , say u = (u1 , u2 , u3 ), can be written as a linear combination of v1 , v2 and v3 :
u = (u1 , u2 , u3 ) = c1 v1 + c2 v2 + c3 v3 (25) (u1 , u2 , u3 ) = c1 (2, 0, 1) + c2 (1, 3, 4) + c3 (1, 1, 2) or 2c1 c2 + c3 = u1 3c2 + c3 = u2 c1 + 4c2 2c3 = u3 We need to be able to nd values for c1 , c2 and c3 . These equations can be written in matrix form as (26)
Page 18
2 1 1 0 3 1 1 4 2
A
c1 u 1 c2 = u 2 u3 c3
c u
(27)
To have a solution c, the matrix A has to be invertible, i.e., have an inverse. For then: c = A1 u. To have an inverse, the determinant of A has to be non-zero. Exercise: check that the determinant of A is 24. So we can nd values for c1 , c2 and c3 . Therefore span {v1 , v2 , v3 } = R3
Page 19
Linear independence
We want to know when a set of vectors S will span the whole of a vector space V , i.e., when we can write all vectors in V as linear combinations of the vectors in S. There are two things we have to make sure of: (i) there are enough vectors in S to describe all of V and (ii) there are no redundant vectors in S so we can write each vector in V as a linear combination in only one way. By a redundant vector, we mean that it is a linear combination of the other vectors in the set, so we dont really need it. In order to reach this goal, we need rstly to identify when vectors in a set are independent of each other, by which again we mean that none of them is a linear combination of the others. Example 6.1 Consider the vectors v1 = (2, 2, 4) and v2 = (3, 5, 4) and v3 = (0, 1, 1) in R3 . If these vectors are dependent we can form linear combinations, i.e., we should be able to get
c1 v 1 + c2 v 2 + c3 v 3 = 0
or
c1 v1 = c2 v2 c3 v3
(28)
where the scalars c1 , c2 and c3 cannot all be zero (otherwise it is not possible to form a linear combination and the vectors are independent). Substituting, we get
c1 (2, 2, 4) + c2 (3, 5, 4) + c3 (0, 1, 1) = (0, 0, 0)
which leads to a system of equations in matrix form Page 20
3 0 2 2 5 1 4 4 1
A
c1 0 c2 = 0 0 c3
c
(29)
For equations of the form Ac = 0, there is only the trivial solution (c = 0) if A is invertible. Otherwise, there will be a non-trivial solution (at least one of c1 , c2 and c3 will not be zero). Exercise: check that det(A) = 0. So we can nd at least one non-trivial solution, c to equation (28). Thus, the vectors v1 , v2 and v3 are not independent. This leads us on to the denition of linear independence, which is just a generalisation of the dependence concept above. Let S = {v1 , v2 , ....vn } be a set of vectors in some vector space V . If the equation c1 v1 + c2 v2 + ... + cn vn = 0 is only satised when c1 = c2 = ... = cn = 0, we say that the vectors v1 , v2 , ....vn are linearly independent. Otherwise, we say that the vectors are linearly dependent. Lets look at some more examples. Example 6.2 Are the vectors v1 = (3, 1) and v2 = (2, 2) in R2 linearly independent? Lets set up the equation: Page 21
c1 v 1 + c2 v 2 = 0
c1 (3, 1) + c2 (2, 2) = (0, 0)
which leads to a system of equations
3c1 2c2 = 0
c1 + 2c2 = 0
the only solution to which is c1 = c2 = 0 (the trivial solution). Therefore, v1 and v2 linearly independent. Example 6.3 The standard basis vectors in R3 , e1 , e2 and e3 , are linearly independent. Try to nd numbers c1 , c2 and c3 such that c1 e1 + c2 e2 + c3 e3 = 0. Its impossible!
Page 22
Basis and dimension
To this point, weve been using the term standard basis in the Euclidean n spaces, without really knowing what the basis part of this expression means. Moreover, in these so-called n-dimensional spaces, what does dimension actually mean. In R2 and R3 the dimension is usually thought of geometrically as the number of axes, typically labelled x, y and z. The more general concept of dimension will reduce to this denition. First we will tackle the issue of basis. Earlier, I said we were working towards writing all vectors in a space V as sums (linear combinations) of some special fundamental vectors, S = {v1 , v2 , ..., vn }. There should be enough vectors to span the whole of V , i.e., V = span S (any vector in V can be obtained from a linear combination of the vectors in S). At the same time, there should be no redundant (linearly dependent) vectors in S because the linear combinations should be unique. These two requirements basically lead to the special set of vectors we are looking for, and we call this set a basis for V . Let S = {v1 , v2 , ....vn } be a set of vectors in some vector space V . If V = span {v1 , v2 , ..., vn } and v1 , v2 , ....vn are linearly independent, we call S a basis for V .
Example 7.1 The standard basis vectors e1 , e2 and e3 form a basis for R3 (hence the name). (i) We already know from examples 5.1 and 5.4 that R3 = span {e1 , e2 , e3 }. Page 23
(ii) From example 6.3 we know that the standard basis vectors are linearly independent. Example 7.2 Determine if the vectors v1 = (1, 1, 1), v2 = (0, 1, 2) and v3 = (3, 0, 1) form a basis for R3 . First we have to check whether these vectors are linearly dependent., i.e., can we nd c1 , c2 and c3 (not all zero) such that c1 v1 + c2 v2 + c3 v3 = 0? c1 (1, 1, 1) + c2 (0, 1, 2) + c3 (3, 0, 1) = (0, 0, 0) or in matrix form
1 0 3 1 1 0 1 2 1
A
c1 0 c2 = 0 c3 0
c
(30)
Also, to have R3 = span {v1 , v2 , v3 }, any vector (u1 , u2 , u3 ) in R3 has to be a linear combination of v1 , v2 and v3 C1 (1, 1, 1) + C2 (0, 1, 2) + C3 (3, 0, 1) = (u1 , u2 , u3 ) or in matrix form
1 0 3 1 1 0 1 2 1
A
C 1 u1 C 2 = u2 C3 u3
C u
(31)
Page 24
If det(A) is not zero, then (31) has a unique solution C and the only solution to equation (30) is the trivial solution c = 0. Exercise: Check that det(A) = 10. Therefore, v1 , v2 and v3 are linearly independent and they span R3 . Thus, they form a basis for R3 . We now come to the concept of dimension. Suppose that S = {v1 , v2 , ...., vn } is a basis for a vector space V . If the number of vectors in S is nite, say n, we say that V is nite dimensional with dimension n. We write dim(V ) = n . Otherwise, the space is said to be innite dimensional. It turns out importantly that All bases of V contain the same number of vectors
Example 7.3 All the spaces Rn are nite dimensional with dimension n. For example, R3 has dimension 3. All bases for R3 will have 3 vectors. If there are more, they will not be linearly independent. If there are fewer, they will not span
R3 .
Page 25
Changing the basis
Weve already seen through examples that a basis for a vector space is not unique. For example, the standard basis in R3 and the set of vectors {v1 , v2 , v3 } in example 7.2 are both bases in R3 . The standard basis in Rn is generally the easiest one to work with but there may be cases in which an alternative basis is preferable. Therefore, we need to nd a way to convert between dierent bases. Lets look at an example to sort out some terminology.
Example 8.1 Using the standard basis in R3 we can write the vector (3, 5, 2) as (3, 5, 2) = 3(1, 0, 0) + 5(0, 1, 0) + 2(0, 0, 1) = 3e1 + 5e2 + 2e3
The numbers multiplying the basis vectors, 3, 5 and 2, are called the coordinates of the vector. It is clear that the coordinates will change depending on the basis. For the standard basis, the coordinates are simple to nd: they are just the numbers in the vector itself. For other bases, you have to think a bit more.
We now generalise the idea of coordinates.
Page 26
Let S = {v1 , v2 , ....vn } be a basis for a vector space V . Since S is a basis, we can express any vector u in V as a linear combination of the vectors in S. u = c1 v1 + c2 v2 + .... + cn vn The numbers c1 , c2 , ..., cn are called the coordinates of u with respect to the basis S The coordinates for a vector with respect to a basis S can themselves be written as a vector in Rn , which we call a coordinate vector
(u)S = (c1 , c2 , ..., cn )
The subscript S makes it clear that the coordinates are with respect to S. For the standard bases in Rn , the coordinate vector (u)S is exactly the same as the vector u itself, as seen in the example above.
Example 8.2 Determine the coordinate vector (u)S of the vector u = (10, 5, 0) relative to the following bases. (i) The standard basis in R3 . In this case u = 10e1 + 5e2 + 0e3 so the coordinates are 10, 5 and 0, and the coordinate vector is simply (u)S = (10, 5, 0) = u
Page 27
(ii) S = {v1 , v2 , v3 } where v1 = (1, 1, 1), v2 = (0, 1, 2) and v3 = (3, 0, 1). In this case, we have to nd the coordinates c1 , c2 and c3 such that
c1 (1, 1, 1) + c2 (0, 1, 2) + c3 (3, 0, 1) = (10, 5, 0)
This is equivalent to the system of equations
c1 + 3c3 = 10 c1 + c2 = 5 c1 + 2c2 c3 = 0 The answer is c1 = 2, c2 = 3 and c3 = 4. Exercise: Check this result. Therefore, (u)S = (2, 3, 4) Now onto how to change bases. We will work in R2 to demonstrate the procedure Suppose we have two bases for the space R2 : (32)
B = {v1 , v2 } C = {w1 , w2 }
Basis 1 Basis 2
Now because B is a basis for R2 , each of the basis vectors in C can be written as a linear combination of the basis vectors in B
Page 28
w1 = av1 + bv2 (33) w2 = cv1 + dv2 This means that the coordinate vectors for w1 and w2 relative to the basis B are
(w1 )B = (a, b)
and
(w2 )B = (c, d)
Unfortunately, we now have to introduce a new notation for writing these coordinate vectors. Instead of ( )B we are going to write them using [ ]B and call them coordinate matrices.
a [w1 ]B = b
and
c [w2 ]B = d
(34)
They are basically the same as the coordinate vectors, written as columns. Next, let u be any vector in V . In terms of the basis C, we can write u as
u = c1 w 1 + c2 w 2
(35)
The coordinate matrix of u relative to C is:
c1 [u]C = c2 Page 29
(36)
Equation (33) tells us how to write the basis vectors in C as linear combinations of the basis vectors B. Substituting equation (33) into equation (35), we get
u = c1 w 1 + c2 w 2 = c1 (av1 + bv2 ) + c2 (cv1 + dv2 ) = (ac1 + cc2 )v1 + (bc1 + dc2 )v2 This gives us the coordinate matrix of u relative to the basis B (37)
ac1 + cc2 [u]B = bc1 + dc2 Let us re-write this as
(38)
ac1 + cc2 a c c1 a c [u]B = = = [u]C b d c2 bc1 + dc2 b d

P [u]C
(39)
The matrix P is called the transition matrix from C to B: given the coordinate matrix of a vector relative to the basis C, we can use it to nd the coordinate matrix relative to the basis B. Notice that its columns are the coordinate matrices for the basis vectors C relative to B, [w1 ]B and [w2 ]B . We can therefore write P compactly as
P = [[w1 ]B [w2 ]B ] Page 30
Equation (39) can then be written compactly as
[u]B = P [u]C = [[w1 ]B [w2 ]B ] [u]C
We can now generalise this result. Suppose we have two bases for the vector space V :
B = {v1 , v2 , ..., vn } C = {w1 , w2 , ..., wn }
Basis 1 Basis 2
The transition matrix from C to B is dened as
P = [[w1 ]B [w2 ]B ...... [wn ]B ]
(40)
where the ith column of P is the coordinate matrix of wi relative to B. The coordinate matrix of a vector u in V relative to B is then related to the coordinate matrix of u relative to C by
[u]B = P [u]C
(41)
Example 8.3 Consider the standard basis B = {e1 , e2 , e3 } and the basis C = {v1 , v2 , v3 }, where v1 = (1, 1, 1) and v2 = (0, 1, 2) and v3 = (3, 0, 1), for R3 . Page 31
(i) Find the transition matrix from C to B (ii) Find the transition matrix from B to C (i) Recall that the columns of the transition matrix are coordinate matrices for the basis vectors C relative to B. In other words, we have to nd the coordinates of the basis vectors v1 , v2 and v3 when they are written as linear combinations of e1 , e2 and e3 . We know from examples 8.1 and 8.2 that in the standard basis, the coordinate vector (and therefore the coordinate matrix) is simply the vector itself. Thus
1 [v1 ]B = 1 1
0 [v2 ]B = 1 2
3 [v3 ]B = 0 1
(42)
From equation (40), the transition matrix from C to B is then
1 0 3 P = [[v1 ]B [v2 ]B [v3 ]B ] = 1 1 0 1 2 1
(43)
(ii) To nd the transition matrix from B to C we need the coordinate matrices of the standard basis vectors relative to C. In other words, we have to nd the coordinates of e1 , e2 and e3 when they are written as linear combinations of v1 , v2 and v3 . This requires more work (for you!) Exercise: Verify that
Page 32
e1 =
1 10 v1
1 10 v2
3 10 v3
2 1 3 e2 = 5 v 1 + 5 v 2 + 5 v 3
e3 =
3 10 v1
3 10 v2
1 10 v3
Therefore, the coordinate matrices of the standard basis vectors relative to C are
1/10 [e1 ]B = 1/10 3/10
3/5 [e2 ]B = 2/5 1/5
3/10 [e3 ]B = 3/10 1/10
(44)
and the transition matrix from B to C is
1/10 3/5 3/10 P = [[e1 ]C [e2 ]C [e3 ]C ] = 1/10 2/5 3/10 3/10 1/5 1/10
(45)
Example 8.4 Using the results of the previous example, compute (i) [u]B given (u)C = (2, 3, 4) (ii) [u]C given (u)B = (10, 5, 0) (i) All we need to do now is use equation (41), i.e., some matrix multiplication
Page 33
1 0 3 [u]B = P [u]C = 1 1 0 1 2 1
2 10 3 = 5 4 0
(46)
Looking back at example 8.2(ii), we can see that this is the right result. Once we have the transition matrix, we can perform this computation quickly and easily for many vectors. (ii) This time, we swap the bases and use the transition matrix P instead of P , since were going from B to C.
1/10 3/5 3/10 [u]C = P [u]B = 1/10 2/5 3/10 3/10 1/5 1/10 as expected from part (i).
10 2 5 = 3 4 0
(47)
There is one nal observation to make The transition matrix from the basis B to C is the inverse of the transition matrix from C to B. Exercise: Check that P is the inverse of P in example 8.4.
Page 34
Fundamental subspaces
There are some very important subspaces of Rn that we will be interested in. These subspaces are associated with matrices. Lets look at a general n m matrix (48)
Anm
a11 a12 a21 a22 = . . .. . . . . . an1 an2
a1m a2m . . . anm
It has n rows and m columns. The row vectors are the vectors formed out of the rows of Anm (these are in Rm ) and the column vectors are the vectors formed out of the columns of Anm (these are in Rn ). Example 9.1 Consider the 4 2 matrix 1 5 0 4 A= 9 2 3 7 The row vectors are (49)
r1 = (1, 5)
r2 = (0, 4)
r3 = (9, 2)
r4 = (3, 7)
(50)
which are vectors in R2 (there are m = 2 columns) and the column vectors are Page 35
1 0 c1 = 9 3
5 4 c2 = 2 7
(51)
which are vectors in R4 (there are n = 4 columns). There are three important subspaces of Rn and Rm associated with a matrix Anm . We call them the fundamental subspaces of Anm . First lets recall that a matrix Anm is a linear transformation that takes any column vector in Rm and transforms by multiplication into a column vector in Rn . We write Anm : Rm Rn . The domain of Anm is Rm (the set of inputs) and the range of Anm is the set of all possible outputs (images) in Rn (which is generally not all of Rn , just a subspace of it). Now onto the fundamental subspaces of Anm .
(1) The rst subspace is related to the zero vector in Rn . The set of all vectors u in the domain Rm that give Anm u = 0 (52)
is called the null space or kernel of Anm . In other words, those vectors in the domain (inputs) that when operated on by Anm give us the zero vector in Rn . We write the null space of a matrix A as null(A) or ker(A)
Page 36
(2) The span of the row vectors of Anm , i.e., the set of all linear combinations of the row vectors, is called the row space of Anm . Because the row vectors are in Rm , the row space is a subspace of Rm . We write the row space of a matrix A as row(A)
(3) The span of the column vectors of Anm , i.e., the set of all linear combinations of the column vectors, is called the column space of Anm . Because the column vectors are in Rn , the column space is a subspace of Rn . We write the column space of a matrix A as col(A)
We will be interested in nding bases for each of these spaces. First another example. Example 9.2 Find the null space ker(A) of the following matrix
1 7 A= 3 21
(53)
To nd the null space, we use equation (52). Lets assume that (u1 , u2 ) is a vector in ker(A). Then equation (52) leads to
1 7 u1 0 A= = 0 u2 3 21 which can be written as as system of linear equations
(54)
Page 37
u1 7u2 = 0 3u1 + 21u2 = 0 u1 + 7u2 = 0
(55)
The two equations are equivalent, and are satised when (u1 , u2 ) = (7t, t) for any number t. Therefore, ker(A) consists of all vectors of the form (7t, t) for any number t, of which there are innitely many. Now, this is one way of nding the null space and a basis for it. However, we want to be able to nd bases for all the fundamental spaces for more complicated matrices using just one procedure. This procedure is described through another example. Before we move onto the example, we rst have to review the concepts of augmented matrices and reduced echelon forms, which you have covered in your rst year maths modules. Suppose we have a linear system of homogeneous (right hand sides are zero) equations:
u1 + 2u2 u3 + 5u4 + 6u5 = 0 4u1 4u2 4u3 12u4 8u5 = 0 2u1 6u3 2u4 + 4u5 = 0 3u1 + u2 + 7u3 2u4 + 12u5 = 0 We can write this in matrix form as (56)
Page 38
5 6 1 2 1 4 4 4 12 8 2 0 6 2 4 3 1 7 2 12
A
u1 u2 u3 = u4 u5
0 0 0 0
(57)
A convenient way of writing this system of equations is by forming the augmented matrix (58)
5 6 0 1 2 1 4 4 4 12 8 0 2 0 6 2 4 0 3 1 7 2 12 0
The entries to the left of the line represent the coecients of u1 to u5 in equations (56) and (57). The zeros to the right of the line represent the terms on the right hand sides of the = signs in equations (56) and (57). Now, in the system of equations (56) we can multiply or divide any equation by a constant, we can add or subtract equations or we can swap the equations around without altering the solutions. You do this, e.g., when you solve 2 linear simultaneous equations.
Page 39
Aside. Solve the following system and make a note of the steps required. u1 2u2 = 2 3u1 + u2 = 2
The same is true, therefore, of the augmented matrix (58), which represents the system of equations (56): We can Interchange 2 rows Multiply or divide a row by a non-zero number Add a multiple of one row to another. These are called elementary row operations. They are equivalent to adding equations (56), multiplying them by constants and interchanging them. The augmented matrix is just a more compact way of doing it. We also have to be careful about the right hand sides when we perform the operations. However, for the homogeneous system above they are zero, so they do not aect the row operations. We now want to nd the reduced row echelon form of the matrix. We get this by performing elementary row operations until the augmented matrix satises the following properties. In each row, the rst non zero entry from the left is 1. This is called the leading 1. The leading 1 in each row is to the right of the leading 1 in the row above. Page 40
All rows consisting entirely of zeros are at the bottom of the matrix. Exercise: go through the following steps on the augmented matrix (58) (1) row 2 + 4 row 1 (2) row 3 + 2 row 1 (3) row 2 4 (4) row 1 1 (5) row 3 4 (6) row 4 + 3 row 1 (7) row 3 - row 2 (8) row 4 + 5 row 2 (9) row 4 row 3 (10) row 4 7 to conrm that the reduced row echelon form is (59)
1 2 1 5 6 0 4 0 0 1 2 2 U = 0 0 0 1 2 0 0 0 0 0 0 0 We now move onto the example.
Example 9.3 Determine a basis for the null space of the following 4 5 matrix
Page 41
5 6 1 2 1 4 4 4 12 8 A= 2 0 6 2 4 3 1 7 2 12
(60)
To nd the null space we need to solve equation (52) for u = (u1 , u2 , u3 , u4 , u5 ) in R5 . This is the same as equation (57) above. We put it into the augmented matrix, which is given by matrix (58). Now we need the reduced row echelon form of the matrix. Again, we have done this already. The answer is given by matrix (59) (61)
1 2 1 5 6 0 4 0 0 1 2 2 U = 0 0 0 1 2 0 0 0 0 0 0 0
Thus, we only have 3 equations (the top 3 rows), but 5 unknowns. Lets set u5 = s, where s is any number. The third equation (row) gives u4 = 2u5 = 2s Now set u3 = t for any number t. The second equation (row) gives u2 = 2u3 2u4 4u5 = 2t 8s Finally, the rst equation (row) gives u1 = 2u2 u3 + 5u4 + 6u5 = 3t Page 42
The full solution is 3t 2t 8s u= t 2s s 3 2 1 +s 0 0 0 8 0 2 1

u2
= t
(62)
u1
for any numbers t and s. There are innitely many solutions because the number of unknowns is greater than the number of equations. So, the null space consists of all vectors of the form c1 u1 + c2 u2 . In the above example, we havent quite answered the question - we still havent specied a basis! It looks like the vectors u2 and u2 could form a basis. The certainly span the whole of the null space, but are they linearly independent. Yes, they are (Exercise: check that they are). So, they satisfy the two properties required to be a basis. We now come to the main reason for solving the system by nding the reduced row echelon form.
Page 43
Let Anm be an n m matrix. The vectors found for the null space of the reduced echelon form of Anm are always linearly independent. They form a basis for the null space of the reduced echelon matrix and for the null space of the original matrix. The dimension of the null space (i.e., number of basis vectors) is called the nullity of Anm , written nullity(Anm ) . The row vectors containing the leading 1s in the reduced echelon form of Anm form a basis for the row space of the reduced echelon matrix and for the row space of the original matrix Anm . The column vectors containing the leading 1s in the reduced echelon from of Anm form a basis for the column space of the reduced echelon form. Suppose that these columns vectors correspond to column numbers m1 , m2 , .., mk . The column vectors of the original matrix Anm corresponding to column numbers m1 , m2 , .., mk form a basis for the original matrix.
Page 44
Example 9.4 Lets look again at the matrix A in example 9.3. The reduced row echelon form U is given by equation (59) (63)
1 2 1 5 6 0 4 0 0 1 2 2 U = 0 0 0 1 2 0 0 0 0 0 0 0
We found that there are two vectors in the basis for the null space. All bases have the same number of vectors. Therefore nullity(A) = 2. Rows 1, 2 and 3 of U contain the leading 1s. Therefore, a basis for the row space of both A and U is given by r1 = (1, 2, 1, 5, 6) r2 = (0, 1, 2, 2, 4) r3 = (0, 0, 0, 1, 2)
with
dim(row(A))=3
Columns 1, 2 and 4 of U contain the leading 1s. Therefore, a basis for the column space of U is given by the 1st , 2nd and 4th column vectors of U
c = (1, 0, 0, 0) 1 c = (2, 1, 0, 0) 2 c = (5, 2, 1, 0) 4 A basis for the column space of A is therefore given by the 1st , 2nd and 4th
Page 45
column vectors of A
c1 = (1, 4, 2, 3) c2 = (2, 4, 0, 1)
with
dim(col(A))=3
c4 = (5, 12, 2, 2)
Notice in this example that dim(row(A)) = dim(col(A)), i.e., the column and row spaces have the same dimension. This is always true.
The row space and column space of a general nm matrix Anm have the same dimension. We call this dimension the rank of Anm , written rank(Anm ) .
The second thing to notice from the example above is that nullity(A)+rank(A) = 2 + 3 = 5, i.e., the number of columns. This again is always true.
For a general n m matrix Anm (m columns) nullity(Anm ) + rank(Anm ) = m For an n n matrix A nullity(A) + rank(A) = n (64)
Page 46
10
Square matrices and systems of linear equations
The concepts of rank and nullity are important. Lets consider a square n n matrix A : Rn Rn . A typical problem in many applications of engineering is to nd a solution u in Rn to the equation
Au = b
(65)
where the vector b in Rn is known. We will look at certain aspects of this problem with an example. Example 10.1 Consider the matrix
1 2 1 A= 2 1 2 = (c1 c2 c3 ) 3 0 2 where c1 , c2 and c3 are the column vectors of A
(66)
1 c1 = 2 3
2 c2 = 1 0
1 c3 = 2 2
(67)
Now consider the procedure for multiplying a vector u = (u1 , u2 , u3 ) by A
Page 47
1 2 1 u1 2 1 2 u2 3 0 2 u3
1 u1 + (2) u2 + 1 u3 = 2 u1 + 1 u2 + (2) u3 (3) u1 + 0 u2 + 2 u3
(68)
1 2 1 = u1 2 + u2 1 + u3 2 = u1 c1 + u2 c2 + u3 c3 2 0 3 i.e., any matrix multiplication leads to a linear combination of the column vectors, i.e., a vector in the column space. From the above example we can see that if we want to solve equation (65), the vector b has to be in the column space of A. It also shows that all output vectors (i.e. the range of A) are in the column space of A
The range of a square matrix is its column space
Next, lets consider the nullity and rank. What happens when the rank of an n n matrix is less than n? From the denition of rank, we know that if rank(A) < n, some of the column and row vectors will be linearly dependent they can be obtained from the other rows by forming linear combinations and are, therefore, redundant. If we were to set up the matrix system (65) with some vector b and look for a solution u, then we would not have enough equations or some equations would contradict each other. Therefore, a solution will not exist Page 48
all or there will be innitely many solutions A will not have an inverse.
The rank of A is less than n A is not invertible
A square n n matrix A with rank(A) = n is said to have full rank (obviously the rank cannot be any bigger!) If rank(A) < n, the matrix A is said to be rank decient. We can restate the above as
A is rank decient A is not invertible
By denition, if a matrix A is rank decient some of the rows are linearly dependent. By performing elementary row operations (adding multiples of rows to other rows) we can get a new matrix B that will have a row of zeros. The determinants of A and B will dier only by a constant. Therefore, since det(B) = 0, we have det(A) = 0, which means that A will not have an inverse.
A is rank decient det(A) = 0
Another way to look at nullity and rank is by considering the solutions to
Av = 0
(69)
Page 49
the solution to this equation clearly gives us the null space ker(A). The nullity of A is the number of vectors in the basis for ker(A). If there are non-zero solutions to equation (69), then nullity(A) > 0. Equation (64) then tells us that rank(A) < n. In this case, we can write
A(u + v) = Au + Av = b + 0 = b
What does this tell us? It tells us that if u is a solution to equation (69) then so is u + v, and there may be an innite number of the v. This suggests that if a solution to equation (65) exists, it will not be unique.
A is rank decient no unique solution to Au = b
Page 50
11
Inner product spaces and orthogonality
There is a special class of spaces that we are going to look at. The Euclidean spaces fall into this category. What we would like to do, as in Rn , is measure the (i) magnitude (or length) of a vector and (ii) angles and distances between vectors. In R2 and R3 you can visualise these but in higher dimensions you cant. The basic idea is to introduce generalisations of the familiar dot product and magnitude of a vector in R2 or R3 . Example 11.1 The dot product in R2 and R3 is dened as follows
u v = (u1 , u2 , u3 ) (v1 , v2 , v3 ) = u1 v1 + u2 v2 + u3 v3
where we multiply the rst, second, etc. coordinate of the rst vector by the rst, second etc. coordinate of the second vector and add the results. The dot product has a geometric interpretation
u v = |u||v| cos
where |u| =
u2 + u2 + u2 and |v| = 1 2 3
2 2 2 v1 + v2 + v3 are the magnitudes of
the vectors and is the angle between the vectors in the plane that contains them both. Notice also that
uu=
u2 + u2 + u2 = |u| 1 2 3 Page 51
and that
uv =vu (u + v) w = u w + v w (cu) v = c(u v) for any scalar c (70)
u u = u2 + u2 + u2 0 1 2 3 uu=0 if and only if u=0
In general Rn spaces we can dene the same dot product (multiply individual respective components)
u v = u1 v1 + u2 v2 + .... + un vn
and the length of a vector in Rn is given by
|u| =
u2 + u2 + ... + u2 = n 1 2
uu
Now lets look at a general vector space V . We want similar measures of angles and magnitudes.
Page 52
What we do is extend the idea of the dot product and call it an inner product Like the dot product of two vectors, the inner product of two vectors gives us a number. As with the dot product, we will be able to use the inner product to measure angles and magnitudes. We write u, v to represent the inner product of two vectors. Example 11.2 The dot product on Euclidean spaces is an example of an inner product. It is called the standard inner product on these spaces. Example 11.3 Let u = (1, 2, 4), v = (2, 0, 1) and w = (3, 2, 2). With the standard inner product (i.e., just the dot product)
u, v = (1, 2, 4), (2, 0, 1) = 1 (2) + (2) 0 + 4 1 = 2 v, u = (2, 0, 1), (1, 2, 4) = (2)1 + 0(2) + 4 = 2 = u, v u + v, w = u, w + v, w Exercise: Check this (71) cu, v = (c, 2c, 4c), (2, 0, 1) = 2c + 0 + 4c = 2c = c u, v u, cv = c u, v u, u = Exercise: Check this 21 = |u|
12 + (2)2 + 42 =
Page 53
The properties demonstrated in this example always hold. The property u, v = v, u is called symmetry. The third property is termed linearity in the rst argument (the two arguments are the vectors on either side of the comma. Exercise: Show that u, v + w = u, v + u, w for the vectors in the example above. This means that the inner product is linear in the second argument as well as the rst. It is, therefore, bilinear. Hardish exercise (used later on): Show that (additivity property)
(v1 + v2 + .... + vn ), w = v1 , w + v2 , w + ..... + vn , w
(72)
HINT: We can write v1 + v2 + .... + vn = v1 + (v2 + .... + vn ) The sum s = v2 + .... + vn is just a single vector when we perform the addition. Then we can apply the third rule in (71). Repeat the procedure by taking out v2 from the sum s to form a new sum: s2 = v3 + .... + vn . Keep going until the new sum has only the term vn .
Example 11.4 We can dene other inner products on the Rn spaces. To x ideas, lets take vectors u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) in R3 . The following denes an inner product New: u, v = w1 u1 v1 + w2 u2 v2 + w3 u3 v3 u, v = u1 v1 + u2 v2 + u3 v3
Standard inner product:
Page 54
The new and standard inner products are the same except for the numbers w1 , w2 and w3 multiplying the rst second and third terms in the sum respectively. These numbers are called weights. This is an example of a weighted inner product.
A vector space on which we can dene an inner product is called a inner product space. The inner product has to satisfy the rules (70) when we swap the dot product for the inner product. We are mainly interested in the vector spaces Rn with the inner product dened by standard inner product, i.e. dot product.
So how do we measure the magnitude of a vector? In the last computation in example 11.3 you saw that u, u is the mag-
nitude of u. Before we go on to dene the magnitude in general we are going to rename it. We will not say the magnitude of u but will instead say the norm of u. Moreover, we we will not write the norm (magnitude) as |u|, but instead we will write it as u . A norm can be dened without reference to an inner product. However, we are interested in inner product spaces and the inner product allow us to dene a norm as
Page 55
u =
u, u
Example 11.5 In the Euclidean spaces with the standard inner product, the norm induced by the inner product is
u =
u, u =
u2 + u2 + ... + u2 n 1 2
Note that for this space the norm u is identical to the magnitude |u|
Example 11.6 Find the norms of the vectors u = (3, 4) and v = (2, 1, 2, 3) using the standard inner product u = v = u, u = v, v = 32 + 42 = 5 (73) 22 + (1)2 + 22 + (3)2 = 18 = 3 2
Example 11.7 In the Euclidean spaces, the norm induced by the standard inner product satises certain properties. For example, for all vectors u = (u1 , u2 , u3 ) in R3 u = u, u = u2 + u2 + u2 = 0 1 2 3 if and only if u=0 (74) cu = c u for any scalar c
Exercise: For u = (1, 2, 2), check that 2u = 2 u = 6 Page 56
All norms must satisfy these properties. Next we must nd a way to compute distances between vectors. In R2 and
R3 , the distance between u and v is given by |u v|, i.e., the magnitude of the
dierence. For a general inner product space we have
The distance between two vectors u and v is given by the metric d(u, v) = u v = u v, u v
(also called distance function)
Example 11.8 In the Euclidean spaces with the standard inner product, the metric is
d(u, v) = u v = =
u v, u v
(u1 v1 )2 + (u2 v2 )2 + ... + (un vn )2
Note that for this space, u v is identical to |u v| . Exercise: Try to show that
d(u, v) = d(v, u)
HINT: (a b)2 = (b a)2 for any scalars a and b.
Page 57
Example 11.9 Calculate the metric for u = (3, 4, 1, 1) and v = (2, 1, 2, 3)
d(u, v) = u v =
(3 2)2 + (4 + 1)2 + (1 2)2 + (1 + 3)2 =
31
Exercise: Check that d(u, v) = d(v, u), i.e., u v = v u . Recall that in R2 and R3 , two vectors are at right angles if u v = 0 because u v = |u||v| cos . We say that these vectors are orthogonal. In direct analogy, for a general inner product space, we say that
u and v are orthogonal if u, v = 0
Example 11.10 The standard basis vectors in Rn are orthogonal to each other with the standard inner product. For example
(1, 0, 0), (0, 1, 0) = 0,
(0, 1, 0), (0, 0, 1) = 0
and so on (remember these are just dot products).
Now, suppose that W is a subspace of an inner product space V . We say that a vector u from V is orthogonal to W if it is orthogonal to every vector in W . The set of all vectors that are orthogonal to W is called the orthogonal complement of W and is denoted by W ( W perp).
Page 58
Engineering Analysis SESA 2021 Example 11.11 Consider the space R3 with the standard basis. Let W be the subspace of R3 consisting of all vectors that lie in the xy plane, i.e., of the form q = (q1 , q2 , 0), for any scalars q1 and q2 . The orthogonal complement of W will be all vectors u in R3 that are orthogonal to every vector in W , that is
u, q = (u1 , u2 , u3 ), (q1 , q2 , 0) = 0
For this to be true for any choice of u1 , u2 , u3 , q1 and q2 , we must have u1 = u2 = 0. It doesnt matter what u3 is because the third component of q is always zero. So, we are looking at vectors of the form (0, 0, u3 ). These are vectors in the direction of e3 . The span of e3 is all linear combinations of e3 , which means vectors of the form ce3 = (0, 0, c) for any c. Therefore
W = span {e3 }
Armed with the denition or orthogonal complement, lets briey revisit the fundamental subspaces of a matrix. There is actually another one. It is associated with the transpose of the matrix.
Example 11.12 Consider the 3 3 matrix A and its transpose AT a11 a12 a13 A = a21 a22 a23 a31 a32 a33 a11 a21 a31 T A = a12 a22 a32 a13 a23 a33
(75)
Page 59
To get AT , we swap the columns for the rows. The 3 column vectors of A are
c1 = (a11 , a21 , a31 ),
c2 = (a12 , a22 , a32 ),
c1 = (a13 , a23 , a33 )
These are also the 3 row vectors of AT . It follows that
Finding a basis for the column space of A is equivalent to nding a basis for the row space of AT .
Now consider the procedure for multiplying a vector u = (u1 , u2 , u3 ) by AT
a11 a21 a31 a12 a22 a32 a13 a23 a33
u1 a11 u1 + a21 u2 + a31 u3 u2 = a12 u1 + a22 u2 + a32 u3 u3 a13 u1 + a23 u2 + a33 u3
u, c1 = u, c2 u, c3
(76)
Suppose that the vector u is in the null space of AT , i.e., ker(AT ). Then AT u = 0 which, looking at equation (76) means that u, ci = 0, for i = 1, 2, 3 (u is orthogonal to every one of the column vectors of A). Let v be any vector in col(A), i.e., all linear combinations of c1 , c2 and c3 . Then v has the form v = a1 c1 + a2 c2 + a3 c3 for some numbers a1 , a2 and a3 . This gives
Page 60
u, v
= u, a1 c1 + a2 c2 + a3 c3 (77) = a1 u, c1 + a2 u, c2 + a3 u, c3 = 0 + 0 + 0 = 0
which means that u is orthogonal to any vector v in col(A). We have demonstrated is that if u is in ker(AT ), it must also be in the orthogonal complement of col(A), written as col(A) . Now suppose that u is in col(A) . Then it is orthogonal to every vector in col(A), in particular, to the individual column vectors c1 , c2 and c3 . From equation (76) we then see that AT u = 0, so u is in ker(AT ). We have demonstrated is that if u is in col(A) , it must also be in ker(AT ). Combining this will the previous result, we conclude that col(A) and ker(AT ) are the same thing! We also know that col(A) and the range of A are the same. Therefore
ker(AT ) = col(A) = range(A)
(HARD) Exercise: Use similar arguments to show that
ker(A) = row(A)
ker(AT ) is the fourth fundamental subspace, called the left null space or cokernel. Page 61
12
Orthogonal and orthonormal bases
We now come back to the issue of basis. Recall that B is a basis for a vector space V if every vector in V can be written as a linear combination of the vectors in B and the vectors in B are linearly independent (none of them is a linear combination of the others). If B is a basis for V and, furthermore, the space V has an inner product (i.e. it is an inner product space), we can turn B into a special type of basis. This new basis will have important and very useful properties. Before showing you how to construct it, you will need to understand a few basic concepts.
Let S be a set of vectors in an inner product space. If each distinct pair of vectors is orthogonal we call S an orthogonal set. If S is an orthogonal set and each vector in S has a norm of 1, then S is called an orthormal set.
Example 12.1 Given the vectors v1 = (2, 0, 1), v2 = (0, 1, 0) and v3 = (2, 0, 4) in R3 (a) Show that they form an orthogonal set with the standard inner product but do not form an orthonormal set. (b) Turn them into an orthonormal set u1 , u2 and u3 . (a) To show that they form an orthogonal set, we have to demonstrate that each distinct pair is orthogonal. Page 62
v1 , v2 = 2 0 + 0 (1) + (1) 0 = 0 v1 , v3 = 2 2 + 0 0 + (1) 4 = 0 v2 , v3 = 0 2 + (1) 0 + 0 4 = 0 Exercise: Why didnt we compute v2 , v1 , v3 , v1 and v3 , v2 ? Now, to be an orthonormal set, the norms (magnitudes) of v1 , v2 and v3 have to be 1. Lets compute them (78)
v1 = v2 = v3 =
v1 , v1 = v2 , v2 = v3 , v3 =
22 + 02 + (1)2 =
# (79) #
02 + (1)2 + 02 = 1 22 + 02 + 42 = 20 = 2 5
(b) Most of the work is done. All we have to do is divide each vector by its norm 1 v1 = (2, 0, 1) = v1 5 v2 = (0, 1, 0) v2 1 v3 = (2, 0, 4) = v3 2 5 1 2 , 0, 5 5 2 1 , 0, 5 5 (80)
u1 = u2 = u3 =
Exercise: Verify that the norms of these vectors are 1 and that they are orthogonal.
Example 12.2 The standard basis vectors in Rn form an orthonormal set with the standard inner product. For example, e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = Page 63
Engineering Analysis SESA 2021 (0, 0, 1) in R3 . Exercise: Compute the norms of these vectors and their pairwise inner products to show that they form an orthonormal set.
There is a special property of orthogonal/orthonormal sets that will come in very handy
If S is an orthogonal set of vectors in an inner product space, then S is also a set of linearly independent vectors
How can we show this? Let S = {v1 , v2 , ...., vn } be the set of vectors in question. We know they are orthogonal. Lets recall the denition of linear independence: The vectors v1 , v2 , ...., vn are linearly independent if the only way to get
c1 v1 + c2 v2 + .... + cn vn = 0
(81)
is by having all the numbers c1 , c2 , ...cn equal to zero. This is equivalent to saying that no vector can be a linear combination of the others. Lets now take the inner product of both sides of (81) with any of the vectors, lets say v1
(c1 v1 + c2 v2 + .... + cn vn ), v1 = 0, v1
(82)
The inner product has to satisfy equation (72) (called additivity), which gives us a way to simplify the left hand side of equation (82)
Page 64
(c1 v1 + c2 v2 + .... + cn vn ), v1 = c1 v1 , v1 + c2 v2 , v1 + ..... + cn vn , v1 (83) = c1 v1 , v1 + c2 v2 , v1 + ..... + cn vn , v1 = c1 v 1 , v 1 What happened to all the terms after c1 v1 , v1 ? Remember that the set S = {v1 , v2 , ...., vn } is orthogonal. Therefore, the inner product of two distinct vectors is zero so the only nonzero term in the third line of (83) is c1 v1 , v1 . The right hand side of equation (82) is obviously zero, so we end up with
c1 v 1 , v 1 = 0
Now v1 , v1 = v1
> 0 unless v1 is the zero vector, which it isnt. Therefore,
we must have c1 = 0. If we perform the same procedure with v2 instead of v1 , we will get c2 = 0, and so on with all the other scalars. Therefore, the set S is linearly independent. The great thing about having an orthogonal/orthonormal basis for a space V is that we can easily nd the coordinates of any vector in V wih respect this basis. Remember that the coordinates are the numbers multiplying the basis vectors in the linear combination: if S = {v1 , v2 , ...., vn } is the orthogonal/orthonormal basis for V , then any vector u (in V ) can be written as u = c1 v1 + c2 v2 + .... + cn vn
Page 65
Lets take the inner product of both sides with v1 (same as the procedure above)
u, v1
= (c1 v1 + c2 v2 + .... + cn vn ), v1 = c1 v1 , v1 + c2 v2 , v1 + ..... + cn vn , v1 = c1 v 1 , v 1 (84)
Since we know u and we know v1 we can nd c1
c1 =
u, v1 u, v1 = v1 , v1 v1 2
using the denition of the norm. Similarly
c2 =
u, v2 , v2 2
c3 =
u, v3 , v3 2
.........
cn =
u, vn vn 2
Therefore, we can write the vector u as
u=
u, v2 u, vn u, v1 v + v + .... + vn 2 1 2 2 v1 v2 vn 2
(85)
If {v1 , v2 , .....vk } is an orthonormal basis, then
u = u, v1 v1 + u, v2 v2 + .... + u, vn vn
(86)
Page 66
u = (2,2,1) z y /2 Q O W = xy plane x proj W u = (2,2,0) P v = (0,0,1)
Figure 4: The othogonal projection of a vector u in R3 on the xy plane W (example

13.1).
13
Orthogonal projections
We now introduce the idea of orthogonal projections. Lets look at a simple example Example 13.1 Lets take the vector u = (2, 2, 1) = 2e1 +2e2 +e3 in R3 . We can dene a subspace W of R3 as that space with all vectors of the form q = (q1 , q2 , 0), where q1 and q2 are any scalars. This is nothing more than those vectors in R3 that lie in the xy plane. They are linear combinations of e1 and e2 . The orthogonal projection of u on the xy plane W is the vector u = (2, 2, 0). What is this exactly? Basically, what we do is drop a straight line from the point P in Figure 4 to the xy plane, landing at a point Q. The line vector v = P Q has to be perpendicular to the xy plane. The only possibility for this is v = (0, 0, 1), i.e., it is parallel to the z axis. The vector OQ is the orthogonal projection. We write it as proj W u. Notice that it lies in W (the xy plane). Page 67
Why do we call it orthogonal? Well, there is the obvious reason that the line P Q we drop is perpendicular (orthogonal) to the xy plane. Notice that the vector v = P Q is orthogonal to every vector in q = (q1 , q2 , 0) in W (the xy plane):
v, q = (0, 0, 1), (q1 , q2 , 0) = 0
Therefore, v is in the orthogonal complement W of W (see example 11.11). The orthogonal projection proj W u, on the other hand, is in W , and
v + proj W u = (0, 0, 1) + (2, 2, 0) = (2, 2, 1) = u
What we have managed to do is decompose u into two parts, one in W and the other in W . The two parts are orthogonal to each other. We can get these two parts by splitting the linear combination of orthogonal basis vectors
u = 2e1 + 2e2 +
proj W u (in W )
3 e
v (in W )
Finally, we can see from Figure 4 that v is the shortest distance between P and the plane W . If we wanted to approximate the vector u using only the basis vectors in W (e1 and e2 ), proj W u would be the best approximation. Now this is all well and good but what if we have a vector in a general Rn space and we want to approximate it by a vector in a general subspace of Rn . For instance, in the above example, rather than choosing the subspace as the xy Page 68
plane we could have chosen another plane, such as 2x + 3y z = 2. We would then have to approximate the vector u by a linear combination of basis vectors that describe this plane in order to obtain the orthogonal projection. Let u be a vector in Rn endowed with the standard inner product. Let W be a subspace of Rn with an orthogonal basis {v1 , v2 , .....vk }, where k n. The orthogonal projection of u on W is given by
proj W u =
u, v1 u, v2 u, vk v + v + .... + vk 2 1 2 2 v1 v2 vk 2
(87)
If {v1 , v2 , .....vk } is an orthonormal basis for W , then proj W u = u, v1 v1 + u, v2 v2 + .... + u, vk vk (88)
proj W u is in W and the vector v = (uproj W u) is in W . The vectors proj W u and v are, therefore, orthogonal. The shortest distance between the vector u and the subspace W is the norm (magnitude) of v: v = shortest distance between u and W .
Of all the vectors in the subspace W , the vector proj W u is the best approximation to u.
Most of these facts are suggested by example 13.1, but we havent quite shown that they hold in the general case. Lets start with the claim that that vectors v = (u proj W u) and proj W u are orthogonal. To simplify the notation Page 69
lets assume that the basis {v1 , v2 , .....vk } is orthonormal, i.e. all the vi s have vi = 1. Then
v, proj W u = (u proj W u), proj W u = u, proj W u proj W u, proj W u = u, u, v1 v1 + u, v2 v2 + .... + u, vk vk

proj W u
proj W u, proj W u
(89)
= u, u, v1 v1 + .... + u, u, vk vk proj W u, proj W u = u, v1 u, v1 + .... + u, vk = u, v1

2
u, vk proj W u, proj W u
2
+ u, v2
+ .... + u, vk
proj W u, proj W u
= proj W u, proj W u proj W u, proj W u = 0 so they are indeed orthogonal.
Exercise: Repeat this procedure for an orthogonal (but not orthornormal) basis for W .
Now, proj W u clearly lies in W by the way it is dened (a linear combination of the basis vectors in W ). How do we show that v is in W ? If it is, then v is orthogonal to every vector in W . Since every vector in W is a linear combination of the vectors in {v1 , v2 , ...vk }, we just need to show that v is orthogonal to each of these basis vectors (why?). Again, lets assume they
Page 70
are orthonormal. We choose any one of them, say v1 , and take the inner product
v, v1 = (u proj W u), v1 = u, v1 proj W u, v1 = u, v1 = u, v1 u, v1 v1 + u, v2 v2 + .... + u, vk vk , v1 (90) u, v1 v1 , v1 + u, v2 v2 , v1 + .... + u, vk vk , v1 vk , v1
= u, v1 u, v1 v1 , v1 + u, v2 v2 , v1 + .... + u, vk = u, v1 u, v1 v1 , v1 = u, v1 u, v1 = 0 We can do the same with all the basis vectors.
Exercise: Repeat this procedure for an orthogonal (but not orthornormal) basis for W .
Now onto the statement about shortest distance and best approximation. In R2 and R3 (as you can see in Figure 4), the vector v takes us from a P to the closest point on the subspace W because the shortest distance between two points is a straight line! It is essentially this concept that we want to generalise for higher dimensions. Lets restate clearly want we want to do: nd the vector in W that gives us the best approximation to a general vector u in Rn . We are claiming that this
Page 71
c2 = a 2+ b
Figure 5: Illustration of the Pythagorean theorem. vector is proj W u. Lets start by choosing any vector w in W that is NOT the same as proj W u. We can write (a simple mathematical trick that will help us)
u w = (u proj W u) + (proj W u w)
The vector (proj W u w) is a combination of (basis) vectors in W and so belongs to W itself. We already know that the vector v = u proj W u is in W . Therefore (u proj W u) and (proj W u w) are orthogonal. To proceed, we look at a familiar concept: the Pythagorean theorem for a right triangle, demonstrated in Figure 5. For two vectors a and b in R2 or R3 that are at right angles (orthogonal), the Pythagorean theorem becomes |a + b|2 = |a|2 + |b|2 . For two orthogonal vectors a and b in a general inner product space, the equivalent theorem is a+b
2
= a
+ b
Page 72
Putting a = (u proj W u) and b = (proj W u w) in this formula we get uw

2
= u proj W u > u proj W u
+ proj W u w
because w = proj W u
Therefore u proj W u
2
< uw
for all vectors w in W , except proj W u
In turn, this means that The shortest distance between the vector u and W is the norm (magnitude) of the vector v = u proj W u. proj W u gives us the best approximation to u by a vector in W . One nal note. We have shown that given any subspace W of Rn for which we have an orthogonal basis, we can write any vector in Rn as a sum of a vector in W and a vector in W . The dimension of Rn has to be n. Therefore the dimensions (number of basis vectors) of W and W have to sum to n: dim W + dim W = n We have essentially partitioned Rn into the two spaces W and W . We say, therefore, that Rn is the direct sum of W and W . This is written as
Rn = W W
Page 73
14
The Gram-Schmidt process
The rst important application of orthogonal projections is the Gram-Scmidt process. We will meet another in the next section. Suppose we have an arbitraty (non-orthogonal) basis for Rn . The basis vectors are linearly independent but we would prefer them to be orthogonal too. It turns out that this is always possible: given n linearly independent vectors for
Rn we can turn them into an orthogonal basis. The way we do this is called the
Gram-Schmidt process. We will illustrate the procedure using two examples. Throughout this section, the standard inner product on Rn will be assumed.
u2 = v2 proj W v2
Old basis
v2 v1
v2
v1 = u1
proj W v2
u2
New basis
u1
Figure 6: An illustration of the Gram Schmidt process in R2 (see example 14.1). Here,
the orthogonal basis u1 and u2 is constructed from v1 and v2 .
Example 14.1 Consider the basis S = {v1 , v2 } for R2 , where v1 = (3, 1) and v2 = (2, 2). This may not be a particularly convenient basis, unlike the standard basis, where the vectors e1 and e2 are perpendicular (orthogonal). Weve seen Page 74
how easy it is in that case to write down the coordinates for a general vector. What if we could turn these linearly independent vectors into an orthogonal basis {u1 , u2 }? It would then resemble the standard basis, but with a dierent origin. In fact we can! We start by putting u1 = v1 . This will be our rst orthogonal basis vector. We now need to construct a second vector that is orthogonal to u1 . We could do this by simply looking at Figure 6 and making the simple observation that in order for u2 to be orthogonal to u1 = v1 , it must be of the form u2 = t(1, 3) for any number t. However, we want a systematic way of doing it because there are generally many more than two basis vectors. Lets form the subspace W = span {u1 } of R2 . W is the set of all the linear combinations (in this case multiples) of u1 . We can project the vector v2 from the original basis S onto W . This orthogonal projection, proj W v2 , is shown in Figure 6. It is nothing more than the component of v2 that points in the direction of u1 . It is in the space W because it is a multiple of u1 . In the previous section we saw that any vector u in Rn can be written as a sum of two vectors: (i) the projection proj W u of u onto a subspace W (which has an orthogonal basis) and (ii) the vector u proj W u in W . We can apply this information here by putting u = v2 and W = span {u1 }. The vector v2 proj W v2 is orthogonal to every vector in W = span {u1 }, in particular to u1 . So we set u2 = v2 proj W v2 to get a vector orthogonal to u1 = v1 . The formula for the projection is by equation (87)
proj W v2 =
v2 , u1 u1 u1 2
(only one basis vector u1 in W )
(91)
Page 75
This gives
u2 = v2
4 2 v2 , u1 u1 = v2 u1 = (1, 3) u1 2 5 5
(92)
Exercise: Check that u1 and u2 are orthogonal.
Example 14.2 Given the basis v1 = (2, 1, 0), v2 = (1, 0, 1) and v3 = (3, 7, 1), nd an orthogonal basis {u1 , u2 , u3 } for R3 . As in the last example we set u1 = v1 and form the subspace W1 = span {u1 } of R3 . Again, we project v2 on W1 to get the component of v2 in the direction
of u1 . The component of v2 in W1 then gives us u2
u2 = v2 proj W1 v2 = v2 v2 , u1 u1 u1 2 ( v2 , u1 = 2 and u1
2
2 = (1, 0, 1) (2, 1, 0) 5 1 = (1, 2, 1) 5
(93) =5)
Now what? We repeat the previous steps. We want a vector orthogonal to both u1 and u2 . The subspace W2 = span {u1 , u2 } of R3 consists of all linear combinations of u1 and u2 . Our task is, therefore, to nd a vector u3 that lies in
W2 , the subspace of all vectors that are orthogonal to both u1 and u2 . So lets
project v3 on W2 to get a vector proj W2 v3 in the subspace W2 . Then, the vector

v = v3 proj W2 v3 lies in W2 . The vector proj W2 v3 is again given by equation
Page 76
(87), so u3 is
u3 = v3 proj W2 v3 = v3 v3 , u1 v3 , u2 u + u2 2 1 u1 u2 2 (94)
22/5 1 u2 = v3 u1 + 5 6/5 8 = (1, 2, 1) 3 Exercise: Check that u1 , u2 and u3 are mutually orthogonal.
Exercise: How do we know that the orthogonal set of vectors we have constructed in this example, {u1 , u2 , u3 }, is actually a basis for R3 ? In other words, does this set of vectors span the whole of R3 ? HINT: Think about (i) the number of basis vectors required to span Rn , and (ii) the relationship between linear independence and orthogonality. In the two examples above we have developed a procedure for turning a general basis into an orthogonal basis. We now summarise the procedure.
Page 77
Engineering Analysis SESA 2021 Let v1 , v2 , .....vn be a set of linearly independent vectors for Rn . Then an orthogonal basis u1 , u2 , .....un for Rn can be found by the following GramSchmidt process Step 1.
u1 = v1 Step 2.
u2 = v2 Step 3.
v2 , u1 u1 u1 2
u3 = v3
v3 , u2 v3 , u1 u1 + u2 u1 2 u2 2
. . . . . . Step n.
un = vn
vn , u1 vn , u2 vn , un1 u1 + u2 + . . . . . . + un1 u1 2 u2 2 un1 2
Page 78
Note that to obtain an orthonormal basis from the new orthogonal basis, we simply divide each new member of the orthogonal basis by its norm. Example 14.3 Convert the orthogonal basis found in example 14.2 into an orthonormal basis {w1 , w2 , w3 }. The orthogonal basis is
u1 = (2, 1, 0)
1 u2 = (1, 2, 1) 5
8 u3 = (1, 2, 1) 3
We compute 8 6 = 3
u1 = which yields
u2 =
30 5
u3
w1 = w2 = w3 =
u1 1 = (2, 1, 0) u1 5 u2 1 = (1, 2, 5) u2 30 1 u3 = (1, 2, 1) u3 6
Exercise: (a) Given the basis v1 = (1, 1, 1, 1), v2 = (1, 1, 1, 0), v3 = (1, 1, 0, 0) and v4 = (1, 0, 0, 0) for R4 , construct an orthogonal basis for R4 . (b) Convert the orthogonal basis found into an orthonormal basis.
Page 79
15
Least squares approximations
We now come to a second important application of orthogonal projections. Remember the temperature data example 1.2 in which we wanted to t a line to the data but ended up with a system of equations that had more equations than unknowns? This type of system is called overdetermined. The other way round, when we have more unknowns than equations, the system is called underdetermined. Both of these types of systems are called inconsistent. Suppose that we have an inconsistent system of n equations in m unknowns. In matrix form, the system is:
Au = b
for an n m matrix A and vector b in Rn . There is no solution u (in Rm ) to this equation, i.e., there is no vector u such that Au = b. Perhaps, however, we can look for a vector u such that Au will be close to b. To this end, lets dene a residual r as follows:
r = b Au
(95)
r is obviously a vector (both b and Au are vectors). It is a measure of how close a vector u will be to satisfying the equation. What we do is look for the vector u that makes the norm (magnitude) of r as small as possible. This leads to the least squares solution.
Page 80
Given an inconsistent system Au = b, the vector ul that makes r = b Aul as small as possible is called the least squares solution.
Okay, this has given us some sort of criterion, but how exactly do we nd this vector ul ? Recall example 10.1 in which it was shown that the multiplication of a vector by a matrix results in a linear combination of the matrix column vectors, i.e., all outputs Au are in col(A), the column space. Now put W = col(A). Au will be in W for any u. Indeed, the set of outputs Au for all choices of u in Rm will span W ; any linear combination of the column vectors is possible for the right choice of u = (u1 , u2 , ..., um ). Therefore: range(A) = col (A) = W At this point lets state the least squares problem in a dierent way: Given an inconsistent system Au = b for some b in Rn , nd the vector Aul in W = col (A) that is the closest approximation to b, i.e., b Aul < b Au for all possible choices of Au. Lets restate a result from section 13 on orthogonal projections. Suppose W is a subspace of Rn and x is a vector in Rn . The closest approximation to x by a vector in W is given by projW x, the orthogonal projection of x
Page 81
Engineering Analysis SESA 2021 on W. projW x is in W by denition and x projW x is in W . Lets put W = col (A) and swap x for Au (all vectors in the column space (range) of A). Then the closest approximation to b by a vector in W is projW b. This is the vector Aul that we want: Aul = projW b We could nd Aul this way and invert the result to nd ul , but there is a better way to solve the problem. The least squares solution ul to the problem Au = b also satises the normal system:
AT Aul = AT b
(96)
This system is always consistent. If the equation Ax = 0 has only the trivial solution x = 0, a unique solution to the least squares problem is
ul = (AT A)1 AT b
Before we move onto an example, lets see why the above statements are true. Weve determined that Aul = projW b, which is a vector in W = col (A). We can always nd ul = (u1 , u2 , ..., um ), with the right choice of coordinates, such that Aul gives us the projection vector we want. This means we always have a solution. Page 82
The residual, given by equation (95), satises: r = b Aul = b projW b The vector on the right-hand side is in W = col (A) , as stated above. In example 11.12 we showed that for a matrix A, col (A) is the same as ker (AT ), the null space of AT . So, for the least squares solution, the residual is in ker (AT ), which means that AT r = 0, i.e.,
AT r = AT (b Aul ) = 0
or
AT b = AT Aul
If we had two solutions u1 and u2 , then we would have: l l Au1 = Au2 l l or A(u1 u2 ) = 0 l l
But this has only the solution (u1 u2 ) = 0, so u1 = u2 . In other words, we l l l l have contradicted ourselves, which means we cant have two solutions.
Example 15.1 Use a least squares approximation to nd the equation of the line that will best approximate the points (x, y) = (2, 65), (1, 20), (7, 105) and (5, 34). The line will have the form y = ax + b. If we put each of the x and y values into y = ax + b we will get 4 equations for a and b (clearly too many!). The system is overdetermined. It is written in matrix form as follows:
Page 83
2 1 65 1 1 a 20 = 105 7 1 b u 34 5 1
A b
(97)
The normal system (96) for the least squares solution is given by multiplying both sides by the transpose of A:
2 2 1 7 5 1 1 1 1 1 7 T A 5
1 65 1 a 2 1 7 5 20 = 1 b 1 1 1 1 105 T u A 1 34
b
(98)
which leads to a much simpler equation
79 3 a 1015 = 3 4 b 156 This is an easy system to solve. The answer is a = 11.7 and b = 47.8.
(99)
Exercise Find the least squares solution to the following system:
Page 84
2 1 1 4 a 1 5 2 2 b = 3 1 4 5 c 1 1 1 1
u A b
(100)
Page 85

Vec Spac and Lin Alg

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Vec Spac and Lin Alg

Transféré par

Droits d'auteur :

Formats disponibles

SESA2021 Engineering Analysis: Vector Spaces and Linear Algebra Lecture Notes 2009/2010.

Copyright c 2010 University of Southampton

Engineering Analysis SESA 2021

Introduction and example applications

Engineering Analysis SESA 2021

This is an example of an eigenvalue problem, i.e., something of the form: a

Engineering Analysis SESA 2021

Figure 2: Output from an experiment in which temperature is measured with time.

Engineering Analysis SESA 2021

T1 a + bt1 T2 a + bt2 = T3 a + bt3 a + bt4 T4

Engineering Analysis SESA 2021

Engineering Analysis SESA 2021

Basic denitions and examples

Engineering Analysis SESA 2021

(1, 1) + (2, 5) = (1 + 2, 1 + 5) = (3, 4)

(2) On R2 we can multiply a vector by a scalar (number) as follows:

Engineering Analysis SESA 2021

(2, 1) + (1, 0) = (1, 0) + (2, 1) e.g. (3 + 2)(2, 1) = 3(2, 1) + 2(2, 1)

for any three vectors u , v , w in Rn and any scalars c and k.

(2, 1) + (0, 0) = (2, 1)

the vector doesnt change. Page 8

Engineering Analysis SESA 2021

cu = c(u1 , u2 ) = (u1 , cu2 )

Subspaces of vector spaces

Engineering Analysis SESA 2021

Engineering Analysis SESA 2021

Engineering Analysis SESA 2021

Engineering Analysis SESA 2021

Engineering Analysis SESA 2021

Engineering Analysis SESA 2021

(12, 20) = c1 (1, 2) + c2 (4, 6) or c1 + 4c2 = 12 and 2c1 6c2 = 20

We now generalise the idea of a span to general vector spaces:

Engineering Analysis SESA 2021

v W = span S = span {1 , v2 , ..., vn }

Example 5.5 Do the following vectors span R3 ?

Engineering Analysis SESA 2021

Engineering Analysis SESA 2021

c1 (2, 2, 4) + c2 (3, 5, 4) + c3 (0, 1, 1) = (0, 0, 0)

which leads to a system of equations in matrix form Page 20

Engineering Analysis SESA 2021

Engineering Analysis SESA 2021

c1 (3, 1) + c2 (2, 2) = (0, 0)

which leads to a system of equations

Engineering Analysis SESA 2021

Basis and dimension

Engineering Analysis SESA 2021

Engineering Analysis SESA 2021

Engineering Analysis SESA 2021

Changing the basis

We now generalise the idea of coordinates.

Engineering Analysis SESA 2021

(u)S = (c1 , c2 , ..., cn )

Engineering Analysis SESA 2021

c1 (1, 1, 1) + c2 (0, 1, 2) + c3 (3, 0, 1) = (10, 5, 0)

This is equivalent to the system of equations

Engineering Analysis SESA 2021

The coordinate matrix of u relative to C is:

Engineering Analysis SESA 2021

ac1 + cc2 [u]B = bc1 + dc2 Let us re-write this as

ac1 + cc2 a c c1 a c [u]B = = = [u]C b d c2 bc1 + dc2 b d

P = [[w1 ]B [w2 ]B ] Page 30

Engineering Analysis SESA 2021