Vous êtes sur la page 1sur 86

Vector Algebra and Elements of Linear Algebra

Werner Stulpe
Contents

1 Vector Algebra and Geometry 1


1.1 Points, Vectors, and Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 The Vector Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Straight Lines and Planes in Space . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Elements of Linear Algebra 26


2.1 Systems of Linear Equations I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Linear Independence, Bases, and Dimension . . . . . . . . . . . . . . . . . . . . . 34
2.4 Linear Maps and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5 Kernel, Image, and Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6 Systems of Linear Equations II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.7 Remarks on the Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.8 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.9 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

References 80

0
Chapter 1

Vector Algebra and Geometry

1.1 Points, Vectors, and Coordinate Systems


Since the elementary concept of vectors is closely related to geometry, we recall some fundamental
geometrical notions intuitively. Everything happens somewhere in space; we denote the latter
by P3 . The space P3 consists of points P ; set-theoretically speaking, the points are the elements
of the set P3 , P ∈ P3 . Points have no extension and no dimension. A line segment P Q with
the end points P and Q, P 6= Q, is realized by the “shortest” junction of P and Q, we have
P Q = QP ⊂ P3 . A straight line L through the points P and Q is an infinite extension of the
line segment P Q, P Q ⊂ L ⊂ P3 . We assume that we know what the distance d ≥ 0 between
two points P and Q is (resp., the length l > 0 of the line segment P Q where P 6= Q). Moreover,
we take the concept of the angle α between two straight lines, half-lines (rays), or line segments
for granted where 0◦ ≤ α ≤ 180◦ , respectively, 0 ≤ φ ≤ π. Finally, according to our experience
from everyday life we say that the space P3 is three-dimensional. The set P3 with the indicated
structures is called the three-dimensional affine-Euclidean space.—It is rather obvious that in
P3 the Pythagoras theorem for right triangles holds, later we shall prove this characteristic
statement on the basis of vectors.
A plane H ⊂ P3 is determined by two different intersecting or parallel straight lines or by
three (different) points not lying on one straight line. In P3 two different non-parallel straight
lines do in general not intersect. To study plane figures and curves like triangles, quadrilaterals,
polygons, circles, discs, etc., it is sufficient to consider these as subsets of one plane that is kept
fixed, distinguished by definition, and called the two-dimensional affine-Euclidean space P2 ; the
corresponding aspect of geometry is called plane geometry. Spatial geometry moreover deals
with the study of solids, (curved) surfaces, and spatial curves, like cubes, balls, spheres, helices,
etc. where the spatial objects are considered as subsets of P3 .

Definition 1.1
-
(a) A vector ~a = P Q is given by an ordered pair of points P, Q ∈ P, P denoting P3 or P2 .
Two pairs of points represent the same vector, i.e.,
- -
~a = P Q = RS ,

if and only if the quadrilateral P QSR is a (possibly degenerated) parallelogram (Figure


1.1).
A vector can also be seen as a quantity determined by its length (magnitude) and its
(oriented) direction. In every case a vector can be represented by an arrow.

(b) We denote the set of all vectors ~a acting in P3 by E3 ; thus, ~a ∈ E3 . Analogously, if ~a acts
in P2 , ~a ∈ E2 . The denotation E means E3 or E2 .

1
S

R a

Figure 1.1: Two representatives of the same vector

-
(c) The vector ~0 := P P , P ∈ P, is called the zero vector, where in general the arrow is
- -
omitted, i.e., ~0 =: 0. The inverse of ~a ∈ E, ~a = P Q, is defined to be the vector −~a := QP .

(d) The length of a vector ~a is denoted by |~a|. A vector ~e is called a unit vector if |~e| = 1.

(e) We have the following algebraic operations with vectors:

(i) The addition of two vectors ~a, ~b ∈ E results in the sum ~a + ~b ∈ E that is defined
according to the parallelogram law, respectively, by
- - -
~a + ~b = P Q + QR := P R
- -
where ~a = P Q and ~b = QR (Figure 1.2).
(ii) The (scalar) multiplication of a vector ~a by a real number λ ∈ R results in the product
λ~a ∈ E that is defined to be the vector of length λ|~a| in the direction of ~a if λ > 0,
resp., the vector of length |λ||~a| in the opposite direction of ~a if λ < 0, resp., the zero
vector if λ = 0 (Figure 1.3).

(f) The sets E3 and E2 , equipped with the structures of addition of vectors, multiplication
of vectors by numbers, length of a vector, and angle between two vectors, are called the
three- and the two-dimensional Euclidean vector space, respectively.

l 0
a la
x a
x

R l 0
la
b a +b
b a+ b
a
x
a
=
( 1)
P Q a
a a

Figure 1.2: Addition of two vectors Figure 1.3: Multiplication of a vec-


tor by a number

2
The algebraic operations with vectors satisfy a lot of rules, summarized in the following
theorem.

Theorem 1.2 Let ~a, ~b, ~c ∈ E be vectors and λ, µ ∈ R real numbers. The following statements
hold:

(a) Vector-space axioms:

(i) ~a + ~b = ~b + ~a commutative law


(ii) (~a + ~b) + ~c = ~a + (~b + ~c) associative law
(iii) ~a + ~0 = ~a zero vector
(iv) ~a + (−~a) = ~0 inverse vector
(v) λ(~a + ~b) = λ~a + λ~b
distributive laws
(vi) (λ + µ)~a = λ~a + µ~a
(vii) λ(µ~a) = (λµ)~a =: λµ~a mixed associative law
(viii) 1~a = ~a

(b)

(i) ~a + ~x = ~b ⇐⇒ ~x = ~b + (−~a) =: ~b − ~a
(ii) λ~x = ~0 ⇐⇒ λ = 0 or ~x = ~0
(iii) (−1)~a = −~a

(c)

(i) |~a| ≥ 0
(ii) |~a| = 0 ⇐⇒ ~a = ~0
(iii) |−~a| = |~a|
(iv) |λ~a| = |λ||~a|
¯ ¯
¯ ~a ¯
(v) ¯ |~a| ¯ = 1, ~a 6= ~0.

All these rules are obvious consequences of our geometrical definition of vectors and their
algebraic operations. The statements of (a) are called the axioms of vector space since they have
a more general meaning in mathematics and are the basis of many important conclusions, as we
shall see in Chapter 2.
To understand rule (v) of (c), we have to define the division of a vector by a number λ 6= 0,
namely, ~λa := λ1 ~a. Rule (v) then means that every nonzero vector ~a divided by its length gives
a unit vector ~e in the direction of ~a, ~e = |~~aa| ; equivalently, ~a = |~a|~e.—Note that the division by
a vector is not defined.

Proof of 1.2: The vector-space axioms can be concluded only by means of geometrical evidence,
whereas the statements of group (b) are implied by (a). The associative law is clear by Figure
-
1.4. The distributive law (v) follows from Figure 1.5; in fact, we have that AC = λ~a + λ~b as well
-
as AC = λ(~a + ~b), hence, λ(~a + ~b) = λ~a + λ~b. All the other vector-space axioms are obvious.
To show the first of (b), consider the equation

~a + ~x = ~b (1.1)

and add −~a to both sides:


−~a + (~a + ~x) = −~a + ~b.

3
c)
+c= a + (b +
(a + b) D C

c
a+ +
b b
c lb
a
b b
a+
A a B
b la

Figure 1.4: Associative law Figure 1.5: Distributive law

Taking account of rules (ii), (iv), (iii), and (i) of (a), we obtain

~x = −~a + ~b = ~b + (−~a) =: ~b − ~a. (1.2)

That is, Eq. (1.1) has always the unique solution (1.2) (Figure 1.6). Now we prove the second
statement of (b). According to rules (viii) and (vi) of (a), we can write

~x = 1~x = (1 + 0)~x = 1~x + 0~x,

i.e.,
~x = ~x + 0~x.
It follows that 0~x = ~x − ~x = ~x + (−~x) = ~0, i.e., 0~x = ~0. Similarly,

λ~a = λ(~a + ~0) = λ~a + λ~0

which implies λ~0 = λ~a − λ~a = ~0, i.e., λ~0 = ~0. Conversely, let

λ~x = ~0. (1.3)


1
If λ = 0, there is nothing to prove. If λ 6= 0, we multiply both sides of (1.3) by λ and obtain

1 1
(λ~x) = ~0
λ λ
where ~
¡ 1 we
¢ already know that the right-hand side is equal to 0. The left-hand side can be written
~
as λ λ ~x = 1~x = ~x; hence, ~x = 0. Finally, to show (iii) of (b), observe that

~a + (−1)~a = 1~a + (−1)~a = (1 + (−1))~a = 0~a = ~0,

from which it follows that (−1)~a = ~0 + (−~a) = −~a.


The statements of (c) are clear by definition, (v) is also a consequence of (iv). 2

Example 1.3 Show that the diagonals of a parallelogram divide each other into two halves of
equal length.
Consider a parallelogram in P2 and denote its vertices counterclockwise by A, B, C, and D
- - - -
(Figure 1.7). Defining ~a = AB and ~b = AD as well as ~x = AM and ~y = BM where M is the
intersection point of the diagonals of the parallelogram, we have that

~x = λ(~a + ~b) (1.4)


~y = µ(~b − ~a) (1.5)
~x = ~a + ~y . (1.6)

4
D C
b
a b
a+
M
b b
b a
a b b x y
a+

a a A a B

Figure 1.6: Difference of two vectors Figure 1.7: The diagonals of a parallelogram

1
We have to show that λ = 2 = µ. Inserting Eqs. (1.4) and (1.5) into (1.6), we obtain

λ(~a + ~b) = ~a + µ(~b − ~a),

resp.,
(λ − 1 + µ)~a = (µ − λ)~b.
Since the vectors ~a, ~b 6= ~0 neither have the same nor the opposite direction (otherwise they would
not span a real parallelogram), it follows that

λ−1+µ = 0
µ − λ = 0.
1
Hence, λ = 2 = µ and
1
~x = (~a + ~b)
2
1~
~y = (b − ~a),
2
showing that the point M is in fact the midpoint of each of the two diagonals.

We remark that the system of the three equations (1.4)–(1.6) determines the four unknowns
~x, ~y , λ, and µ uniquely. The reason is that a vector of E2 corresponds to two real numbers
(namely, to its components w.r.t. a coordinate system, as we shall see next); therefore, the
three vectorial equations are equivalent to a system of six real equations in six real unknowns.—
Although the solution presented in the example is quite instructive, the problem can be solved
easier. Namely, let ~z be the vector from the point B to the midpoint of the diagonal from A to
C. Then
1
~z = (~a + ~b) − ~a.
2
1~ 1 1 ~
The latter implies that ~z = b − ~a = (b − ~a), i.e., ~z is also the vector from B to the midpoint
2 2 2
of the other diagonal.

Definition 1.4 A right-handed Cartesian coordinate system in the affine-Euclidean space P3 is


given by a quadruple (O; ~e1 , ~e2 , ~e2 ) (Figure 1.8) where
(i) O ∈ P3 is a fixed point, called the origin of the coordinate system
(ii) ~e1 , ~e2 , ~e3 ∈ E3 are mutually orthogonal unit vectors (i.e., ~e1 , ~e2 , ~e3 are unit vectors perpen-
dicular to each other)
(iii) ~e1 , ~e2 , ~e3 constitute a right-handed system.
Similarly, a “right-handed” Cartesian coordinate system in P2 is given by a triple (O; ~e1 , ~e2 )
consisting of a fixed point O ∈ P2 and a positively oriented system ~e1 , ~e2 of two orthogonal
vectors of E2 .

5
e3 e3 e2
O
e1

O O O e1 e2
e2 e2 e1
e1

Figure 1.8: Two right-handed Cartesian coordinate systems in P3 and two positively oriented
ones in P2

It is obvious that, w.r.t. a coordinate system, every vector ~a ∈ E3 can uniquely be decomposed
according to
~a = ~v1 + ~v2 + ~v3 = a1~e1 + a2~e2 + a3~e3 (1.7)
where the vectorial components of ~a, ~v1 , ~v2 , ~v3 ∈ E3 , are parallel to the coordinate axes and the
(scalar) components a1 , a2 , a3 ∈ R of ~a are, up to the sign, just the respective lengths of ~v1 , ~v2 , ~v3
(Figure 1.9). For vectors ~a ∈ E2 , Eq. (1.7) reads
~a = ~v1 + ~v2 = a1~e1 + a2~e2
(Figure 1.10).—In the following of this chapter, we formulate our statements in most cases only
for E3 , resp., P3 . Next we summarize the observation (1.7) and two other ones.

v3
e2
a
v2

O e1 a
e3 v1 d v2

O e2
e1 v1

Figure 1.9: Components of a vector ~a ∈ E3 Figure 1.10: Components of a vector ~a ∈ E2

Observation/Definition 1.5 Let a fixed coordinate system (O; ~e1 , ~e2 , ~e3 ) in P3 be given.
(a) Every vector ~a ∈ E3 is uniquely characterized by its components a1 , a2 , a3 ∈ R:
 
a1
~a = a1~e1 + a2~e2 + a3~e3 =:  a2  .
a3

(b) Every point P ∈ P3 corresponds uniquely to a vector ~r ∈ E3 such that


-
~r = OP .
Such a vector ~r is called a position vector (Figure 1.11).
(c) For every point P ∈ P3 , we have
   
- x1 x
~r = OP = x1~e1 + x2~e2 + x3~e3 =  x2  =:  y  .
x3 z
The components x1 = x, x2 = y, x3 = z of the position vector ~r ∈ E3 are called the
coordinates of the point P .

6
z
P
P

P1 X
P z
e3
r

O r1 X P2
e2 y r1
e1
x r2
r2
y X X
O
x O

Figure 1.11: Position vector and coordinates Figure 1.12: The sum of the position vectors
of a point of two points depends on the origin

Again we emphasize the difference between points and vectors. A vector ~a ∈ E is represented
by an ordered pair of points; given any point P ∈ P, there is exactly one point Q ∈ P such
-
that ~a = P Q. One can say that the vector ~a transforms the point P into Q; in this sense, the
vectors of E act on the points of P as translations. Whereas vectors can be added, the addition
of two points is not defined. There is no meaningful interpretation of the sum of the two position
vectors of two points, in particular, the result depends on the coordinate system (Figure 1.12).
Next we present several rules and formulas which in particular express the operations with
vectors in terms of their components.

Observation 1.6 With respect to a given coordinate system, we have that


(i)
     
a1 b1 a1 + b1
~a + ~b =  a2  +  b2  =  a2 + b2 
a3 b3 a3 + b3

where ~a, ~b ∈ E3
(ii)
   
a1 λa1
λ~a = λ  a2  =  λa2 
a3 λa3

where λ ∈ R and ~a ∈ E3

(iii) the length of the vector ~a ∈ E3 is given by


q
|~a| = a21 + a22 + a23
µ a1

where ~a = a2
a3

(iv) the distance d of a point P ∈ P3 from the origin O of the coordinate system is
p
d = |~r| = x2 + y 2 + z 2
µ x ¶
-
where ~r = OP = y
z

7
(v) the distance d of any two points P1 , P2 ∈ P3 is
p
d = |~r1 − ~r2 | = (x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2
µ x ¶ µ x ¶
- 1 - 2
where ~r1 = OP1 = y1 and ~r2 = OP2 = y2 (Figure 1.13).
z1 z2

Proof: The rule (i) follows according to


 
a1 + b1
~a+~b = (a1~e1 +a2~e2 +a3~e3 )+(b1~e1 +b2~e2 +b3~e3 ) = (a1 +b1 )~e1 +(a2 +b2 )~e2 +(a3 +b3 )~e3 =  a2 + b2 
a3 + b3

where we have used some of the rules stated in Theorem 1.2. Similarly we obtain
 
λa1
λ~a = λ(a1~e1 + a2~e2 + a3~e3 ) = λa1~e1 + λa2~e2 + λa3~e3 =  λa2  .
λa3

The vector ~a = a1~e1 + a2~e2 + a3~e3 can be considered as one of the spatial diagonals of a
rectangular box whose sides are given by the vectorial components a1~e1 , a2~e2 , and a3~e3 (Figure
1.9). The lengths of the sides are |a1 |, |a2 |, and |a3 |, and a twofold application of the Pythagoras
theorem yields
|~a|2 = (|a1 |2 + |a2 |2 ) + |a3 |2 = a21 + a22 + a23 ,
i.e., statement (iii). Statement (iv) follows from d = |~r| and (iii), and statement (v) is implied
by ¯   ¯ ¯ ¯
¯ x1 x2 ¯¯ ¯¯ x1 − x2 ¯¯
¯
d = |~r1 − ~r2 | = ¯¯ y1  −  y2 ¯¯ = ¯¯ y1 − y2 ¯¯
¯ z1 z2 ¯ ¯ z1 − z2 ¯
and (iii) again; moreover, (iv) is a consequence of (v). 2

P1
r1
r1
r2 r2
r1
xM
P2
r2
r1
O
y rM

x r2
x O

Figure 1.13: Distance of two points Figure 1.14: Midpoint of a line segment

W.r.t. a given coordinate system, we can identify the points with their position vectors;
instead of using the precise formulation “the point with the position vector ~r,” we say briefly
“the point ~r ”.
µ 1 ¶
Example 1.7 Determine the length of the line segment with the end points ~r1 = 3 and
−1
µ −4 ¶
~r2 = 5 as well as the coordinates of its midpoint (Figure 1.14).
2

8
The length is
¯   ¯ ¯ ¯
¯ 1 −4 ¯¯ ¯¯ 5 ¯¯ √
¯ √
d = |~r1 − ~r2 | = ¯¯ 3  −  5 ¯¯ = ¯¯ −2 ¯¯ = 25 + 4 + 9 = 38,
¯ −1 2 ¯ ¯ −3 ¯

and the midpoint is given by


       3 
1 −4 −3 −2
1 1 1  1
~rM = ~r2 + (~r1 − ~r2 ) = (~r1 + ~r2 ) = 3  +  5  =  8  =  4  .
2 2 2 2 1
−1 2 1 2

The concepts point, vector, length, area, volume are defined without respect to a coordinate
system, whereas the components of a vector, the position vector of a point, and the coordinates
of a point depend on the chosen coordinate system; the components of a vector depend on the
direction of the coordinate axes, the position vector of a point depends on the origin, and the
coordinates of a point depend on the origin as well as on the direction of the axes. In physics,
scalar quantities are those that are given by their magnitude, i.e., by a number and a unit, and do
not depend on the coordinate system (e.g., length, area, volume, mass, temperature); vectorial
quantities are determined by their magnitude and their direction and do consequently not depend
on the coordinate system (e.g., velocity, force, momentum, electrical field strength), provided
that their definition does not involve the position vector (e.g., torque or angular momentum).
The components of a vectorial physical quantity or the coordinates of a point are given by their
values with respect to a coordinate system, so these are nonscalar physical quantities.—Finally,
we remark that, although a vector is invariant under translation, in physics the beginning point
is often essential (e.g., where a force applies).

1.2 The Scalar Product


In the remaining sections of this chapter, we need, besides the concept of vectors, only to know

(i) the elementary trigonometric definition of sin φ and cos φ in the context of a right triangle,
0 ≤ φ ≤ π2

(ii) sin φ = sin(π − φ) and cos φ = − cos(π − φ) where 0 ≤ φ ≤ π

(iii) the definition of the angle φ between two vectors ~a, ~b 6= 0, in particular, 0 ≤ φ ≤ π.

Definition 1.8 The scalar product (inner product, dot product) of two vectors ~a, ~b ∈ E is defined
by
~a · ~b := |~a||~b| cos φ
where φ is the angle between ~a and ~b. (Clearly, if ~a = 0 or ~b = 0, it is understood that ~a · ~b = 0
although the angle between the vectors is not uniquely defined.)

b
b
f
f a p-f a
p p

Figure 1.15: Scalar product of two vectors including an acute, resp., obtuse angle

9
Remark 1.9 We have that
(i) if 0 < φ < π2 , ~a · ~b = |~a||~b| cos φ = |~a|p where p = |~b| cos φ is the length of the projection of
~b onto ~a (Figure 1.15)

(ii) if π2 < φ < π, ~a · ~b = |~a||~b| cos φ = −|~a|p where p = |~b| cos(π − φ) = −|~b| cos φ is the length
of the projection of ~b onto the straight line determined by ~a

(iii) if φ = 0, ~a · ~b = |~a||~b|

(iv) if φ = π2 , ~a · ~b = 0

(v) if φ = π, ~a · ~b = −|~a||~b|.

The following theorem summarizes the algebraic properties of the scalar product.

Theorem 1.10 Let ~a, ~b, ~c ∈ E and λ ∈ R. Then

(i) ~a · ~b = ~b · ~a commutative law


(ii) ~a · (~b + ~c) = ~a · ~b + ~a · ~c
distributive laws
(~a + ~b) · ~c = ~a · ~c + ~b · ~c
(iii) ~a · (λ~b) = λ(~a · ~b) = (λ~a) · ~b =: λ~a · ~b
(iv) ~a · ~a = |~a|2 ≥ 0
(v) ~a · ~a = 0 ⇐⇒ ~a = 0
(vi) ~a · ~b = 0 ⇐⇒ ~a = 0 or ~b = 0 or (~a, ~b 6= 0 and ~a ⊥ ~b).

Proof: The commutative law is clear. To show the first distributive law, assume that ~a and ~b
as well as ~a and ~c include an acute angle (Figure 1.16). Let p be the length of the projection of ~b
onto ~a and q the length of the projection of ~c onto ~a. It is geometrically evident that the length
of the projection of ~b + ~c is just the sum p + q. According to statement (i) of the preceding
remark, it follows that

~a · (~b + ~c) = |~a|(p + q) = |~a|p + |~a|q = ~a · ~b + ~a · ~c.

The cases where at least one of the considered angles is not acute, are treated similarly. By the
commutative law, the second distributive law is a consequence of the first.
If φ is the angle between ~a and ~b, we conclude that
(
|~a|λ|~b| cos φ, λ≥0
~a · (λ~b) = ~
|~a||λ||b| cos(π − φ), λ<0
= λ|~a||~b| cos φ
= λ(~a · ~b),

c
b
b+c
a
p c
q

Figure 1.16: Distributive law for the scalar product

10
i.e.,
~a · (λ~b) = λ(~a · ~b). (1.8)
The latter implies
(λ~a) · ~b = ~b · (λ~a) = λ(~b · ~a) = λ(~a · ~b); (1.9)
Eqs. (1.8) and (1.9) yield statement (iii) of the theorem.
Finally, ~a · ~a = |~a|2 cos 0 = |~a|2 ≥ 0, ~a · ~a = 0 if and only if |~a| = 0, i.e., ~a = ~0, and ~a · ~b = 0 if
and only if |~a||~b| cos φ = 0, i.e., ~a = 0 or ~b = 0 or φ = π2 . 2

Next we look at the component representation of the scalar product.


µ a1
¶ µ b1

Observation 1.11 If ~a = a2 and ~b = b2 w.r.t. a coordinate system, then
a3 b3

~a · ~b = a1 b1 + a2 b2 + a3 b3 .

Proof: Since ~e1 , ~e2 , ~e3 is an orthogonal system of unit vectors, we conclude that ~e1 · ~e1 = 1,
~e1 · ~e2 = 0, etc. Hence, a straight forward calculation yields
   
a1 b1
~a · ~b =  a2  ·  b2  = (a1~e1 + a2~e2 + a3~e3 ) · (b1~e1 + b2~e2 + b3~e3 )
a3 b3
= a1 b1~e1 · ~e1 + a1 b2~e1 · ~e2 + a1 b3~e1 · ~e3 + a2 b1~e2 · ~e1 + a2 b2~e2 · ~e2 + a2 b3~e2 · ~e3
+ a3 b1~e3 · ~e1 + a3 b2~e3 · ~e2 + a3 b3~e3 · ~e3
= a1 b1 + a2 b2 + a3 b3 .

Combining Definition 1.8 and Observation 1.11, we obtain

~a · ~b = |~a||~b| cos φ = a1 b1 + a2 b2 + a3 b3 (1.10)

which allows to calculate the angle between two vectors from their components. Moreover,
setting ~b = ~a in (1.10), it follows that |~a|2 = a21 + a22 + a23 , resp.,
q
|~a| = a21 + a22 + a23 . (1.11)
µ 1

Setting ~b = ~e1 = 0in (1.10), it follows that |~a| cos α1 = a1 where α1 is the angle between
0
µ 0 ¶ µ 0 ¶
~
~a and the first coordinate axis. The choice b = ~e2 = 1 ~
, resp., b = ~e3 = 0 yields two
0 1
analogous equations. Summarizing,

a1 = |~a| cos α1
a2 = |~a| cos α2 (1.12)
a3 = |~a| cos α3 .

Note that Eq. (1.11) was already obtained from the Pythagoras theorem as part (iii) of Ob-
servation 1.6 and that Eqs. (1.12) are also geometrically evident. Finally, (1.12) and (1.11)
imply
cos2 α1 + cos2 α2 + cos2 α3 = 1.

11
Example 1.12
µ 1
¶ µ −1

(a) What is the angle between ~a = 2 and ~b = 0 ?
3 5

According to (1.10) and (1.11) we obtain

~a · ~b a1 b1 + a2 b2 + a3 b3
cos φ = =p 2 p (1.13)
|~a||~b| a1 + a22 + a23 b21 + b22 + b23
r r
−1 + 0 + 15 14 14 7
= √ √ =√ √ = = .
1 + 4 + 9 1 + 25 14 26 26 13

Using a pocket calculator, we find φ ≈ 42.79◦ .—The angle φ can also be calculated by
means of the cosine theorem of elementary trigonometry. To that end, consider the triangle
spanned by the vectors ~a and ~b (Figure 1.17). That is, two sides of this triangle are given
by ~a and ~b including the angle φ, and the third is given by ~b − ~a or ~a − ~b. Now the cosine
theorem yields
|~b − ~a|2 = |~a|2 + |~b|2 − 2|~a||~b| cos φ,
resp.,
|~a|2 + |~b|2 − |~a − ~b|2
cos φ = . (1.14)
2|~a||~b|
The use of formula (1.14) requires more calculation work than the use of (1.13). More
importantly, the cosine theorem itself is a consequence of vector algebra, as the next
example shows.

(b) Proof of the cosine theorem


Let a, b, and c be the lengths of the sides of a triangle and let γ be the angle opposite
the side of length c. Introduce vectors ~a, ~b, and ~c for the corresponding sides such that
~a + ~b = −~c (Figure 1.18); note that the angle between ~a and ~b is π − γ. We conclude that

|~c|2 = ~c · ~c = (−~c) · (−~c) = (~a + ~b) · (~a + ~b)


= ~a · ~a + ~a · ~b + ~b · ~a + ~b · ~b
= |~a|2 + 2~a · ~b + |~b|2
= |~a|2 + |~b|2 + 2|~a||~b| cos(π − γ)
= |~a|2 + |~b|2 − 2|~a||~b| cos γ,

i.e.,
c2 = a2 + b2 − 2ab cos γ

C
p-g
b g
a
b b a

a b
f A B
a c

Figure 1.17: Angle between two vectors Figure 1.18: Triangle

12
which is the cosine theorem. For γ = π2 , we obtain the Pythagoras theorem

c2 = a2 + b2

as a special case.

(c) Projection of ~b along ~a


The projection of ~b ∈ E along ~a ∈ E, ~a 6= 0, is the vector

p~ = |~b| cos φ ~e

where φ is the angle between ~b and ~a and ~e is the unit vector satisfying ~a = |~a|~e (Figure
1.19). It follows that µ ¶
~a ~a
p~ = (~e · ~b)~e = · ~b ;
|~a| |~a|
an expression like (~e · ~b)~e is sometimes written as ~e · ~b~e. The result for p~ is

~a · ~b ~a · ~b
p~ = ~
a = ~a.
|~a|2 ~a · ~a

According to the decomposition ~b = p~ + ~q, the vector ~q = ~b − p~ must be orthogonal to ~a;


in fact,
~a · ~b
~q · ~a = ~b · ~a − p~ · ~a = ~a · ~b − 2 ~a · ~a = 0.
|~a|
³ ´ ³ ´ ³ ´ ³ ´
2 −1 2 1
For instance, if ~a = −1 and ~b = 1 , we obtain p
~ = − 3
5 −1 and ~
q = 1
5 2 .

q b

f
e p a
Figure 1.19: Projection of a vector along some other one

We conclude this section with several remarks. First, a product of the kind ~a · ~b · ~c cannot
be defined since, on the one hand, it should be equal to ~a(~b · ~c) and, on the other hand, equal to
(~a · ~b)~c; however, a multiple of the vector ~a is in general different from a multiple of the vector ~b.
Second, the square of a vector can be understood according to ~a2 := ~a · ~a = |~a|2 —we do not use
this; higher powers of ~a are not defined. Third, as already indicated, the product ~a · ~b~c (without
a second dot!) is defined, namely, ~a · ~b~c := (~a · ~b)~c; it is, however, clearer to set the parentheses.
Finally, one cannot divide a number λ by a vector ~a since the equation ~a · ~x = λ, ~a 6= ~0, has
always infinitely many solutions. In fact, we have |~x| cos φ = |~λa| ; that is, if λ > 0, every vector
λ
~x that includes an acute angle with ~a and whose projection onto ~a is of length |~a| , is a solution
of ~a · ~x = λ.

13
1.3 The Vector Product
Definition 1.13 The vector product (cross product) of two vectors ~a, ~b ∈ E3 is defined to be
that vector ~a × ~b ∈ E3 that is determined as follows (Figure 1.20):

(i) the length of ~a × ~b is equal to the area of a parallelogram spanned by ~a and ~b, i.e.,

|~a × ~b| = |~a||~b| sin φ

where φ is the angle between ~a and ~b

(ii) ~a × ~b is perpendicular to ~a as well as to ~b

(iii) ~a, ~b, ~a × ~b constitute a right-handed system.

a x b

We remark that the vec-


tor product of two vectors
is defined only in the (ori-
ented) three-dimensional Eu-
b clidean vector space E3 . There
h is no analog in the space E2 .—
f The following theorem sum-
marizes the algebraic proper-
a ties of the vector product.

Figure 1.20: Vector product of two vectors

Theorem 1.14 Let ~a, ~b, ~c ∈ E3 and λ ∈ R. Then

(i) ~a × ~b = −(~b × ~a) =: −~b × ~a anticommutative law


(ii) ~a × (~b + ~c) = ~a × ~b + ~a × ~c
distributive laws
(~a + ~b) × ~c = ~a × ~c + ~b × ~c
(iii) ~a × (λ~b) = λ(~a × ~b) = (λ~a) × ~b =: λ~a × ~b
(iv) ~a × ~a = ~0
(v) ~a × ~b = ~0 ⇐⇒ ~a = ~0 or ~b = ~0 or ~a = µ~b, µ ∈ R
⇐⇒ ~a = µ~b or ~b = ν~a, µ, ν ∈ R.

Proof: The anticommutative law is a direct consequence of the preceding definition, in par-
ticular, of the defining property (iii). To show the second distributive law, consider a plane
perpendicular to ~c and project the vector ~a orthogonally onto that plane (see Figure 1.21). The
projected vector has the length |~a| sin φ where φ is the angle between ~a and ~c. Stretching the
projected vector by the factor |~c| and rotating it clockwise by a right angle around ~c, we obtain

14
c
b
+
b a

a
f

b x
c |a|
a x |c| s

c
c in f
b )x
+
(a

Figure 1.21: Distributive law for the vector product

the vector ~a × ~c. The vectors ~b × ~c and (~a + ~b) × ~c are constructed the same manner. Looking
at the parallelogram spanned by ~a × ~c and ~b × ~c, we conclude that

(~a + ~b) × ~c = ~a × ~c + ~b × ~c.

The first distributive law is implied by the second and by (i) according to

~a × (~b + ~c) = −((~b + ~c) × ~a) = −(~b × ~a + ~c × ~a) = ~a × ~b + ~a × ~c.

Statement (iii) of the theorem is obvious. From |~a ×~a| = |~a||~a| sin 0 = 0 we obtain ~a ×~a = ~0.
Finally, ~a × ~b = ~0 is equivalent to |~a × ~b| = 0, i.e., to |~a||~b| sin φ = 0. Hence, ~a = ~0 or ~b = ~0
or φ = 0 or φ = π; that is, ~a = ~0 or ~b = ~0 or ~a = µ~b with µ 6= 0. Equivalently, ~a = µ~b or ~b = ν~a. 2

Next we look at the component representation of the vector product.


µ a ¶ µ b ¶
1 1
Observation 1.15 If ~a = a 2
~
and b = b2 w.r.t. a coordinate system, then
a3 b3
 
a2 b3 − a3 b2
~a × ~b =  a3 b1 − a1 b3  .
a1 b2 − a2 b1

Proof: Since ~e1 , ~e2 , ~e3 is a right-handed orthogonal system of unit vectors, it follows that
~e1 × ~e1 = ~0, ~e1 × ~e2 = ~e3 , ~e2 × ~e1 = −~e3 , etc. In particular, ~ei × ~ej = ~ek for any cyclic
permutation of the indices i, j, k = 1, 2, 3. Hence, a straight forward calculation yields
   
a1 b1
~a × ~b =  a2  ×  b2  = (a1~e1 + a2~e2 + a3~e3 ) × (b1~e1 + b2~e2 + b3~e3 )
a3 b3
= a1 b1~e1 × ~e1 + a1 b2~e1 × ~e2 + a1 b3~e1 × ~e3 + a2 b1~e2 × ~e1 + a2 b2~e2 × ~e2 + a2 b3~e2 × ~e3
+ a3 b1~e3 × ~e1 + a3 b2~e3 × ~e2 + a3 b3~e3 × ~e3
= (a1 b2 − a2 b1 )~e3 + (a2 b3 − a3 b2 )~e1 + (a3 b1 − a1 b3 )~e2 .

15
Combining Definition 1.13 and Observation 1.15, we obtain
 
a2 b3 − a3 b2
~a × ~b = |~a||~b| sin φ ~e =  a3 b1 − a1 b3 
a1 b2 − a2 b1

where ~e is a unit vector in the direction of ~a × ~b. The relation


¯ ¯
¯ a2 b3 − a3 b2 ¯
¯ ¯
|~a × ~b| = |~a||~b| sin φ = ¯¯ a3 b1 − a1 b3 ¯¯ (1.15)
¯ a1 b2 − a2 b1 ¯

enables the calculation of the area of a parallelogram spanned by the vectors ~a and ~b directly
from the components of ~a and ~b.
Example 1.16
µ 1
¶ µ −1

(a) What is the area of the parallelogram spanned by ~a = 2 and ~b = 0 ?
3 5
According to (1.15) we obtain
¯   ¯ ¯ ¯ ¯ ¯
¯ 1 −1 ¯ ¯ 10 − 0 ¯ ¯ 1 ¯
¯ ¯ ¯ ¯ ¯ ¯
~ ¯
A = |~a × b| = ¯ 2   ¯
0 ¯=¯ ¯ −3 − 5 ¯ ¯  ¯
× ¯ = ¯ −8 ¯
¯ 3 5 ¯ ¯ 0 − (−2) ¯ ¯ 2 ¯
√ √
= 100 + 64 + 4 = 168.
³ ´ ³ ´
a b
(b) Area of a parallelogram spanned by ~a = a12 and ~b = b12
We identify the two-dimensional Euclidean plane P2 with a plane of the three-dimensional
Euclidean space P3 and extend the coordinate system (O; ~e1 , ~e2 ) in P2 to a coordinate
system
µ (O;¶~e1 , ~e2 , ~e3 ) inµP3 . The
¶ parallelogram (Figure 1.22) is then spanned by the vectors
a1 b1
~=
A a2 ~ =
and B b2 and its area is consequently
0 0
¯ ¯
¯ 0 ¯
¯ ¯ p
~ ~ ¯
A = |A × B| = ¯ 0 ¯ = 0 + 0 + (a1 b2 − a2 b1 )2 ,
¯
¯ a1 b2 − a2 b1 ¯

i.e.,
A = |a1 b2 − a2 b1 |. (1.16)

(c) Proof of the sine theorem


The area of the triangle of Figure 1.18 is
1 1 1
A = |~a × ~b| = |~b × ~c| = |~c × ~a|
2 2 2
from which it follows that
1 ~ 1 1
|~a||b| sin(π − γ) = |~b||~c| sin(π − α) = |~c||~a| sin(π − β).
2 2 2
Denoting the lengths of the sides of the triangle simply by a, b, and c, we obtain
ab sin γ = bc sin α = ac sin β
or, equivalently,
a sin α b sin β a sin α
= , = , =
c sin γ c sin γ b sin β
which is known to be the sine theorem.

16
b

a
e2

O e1

Figure 1.22: Area of a parallelogram in the plane P2

³ ´
a1 b1
Remark 1.17 A real 2 × 2 matrix is a quadratic scheme of four real numbers, e.g., a2 b2 .
The determinant of such a matrix is denoted and defined by
µ ¶ ¯ ¯
a1 b1 ¯ a1 b1 ¯
det =¯¯ ¯ := a1 b2 − a2 b1 .
a2 b2 a2 b2 ¯

According to Eq. (1.16) the determinant of a 2 × 2 matrix is, up to the sign, just the area of the
parallelogram spanned by its column vectors.

Observation/Definition 1.18

(a) The volume of a parallelepiped spanned by the vectors ~a, ~b, ~c ∈ E3 is given by

V = |(~a × ~b) · ~c|;

in particular,
V = (~a × ~b) · ~c
if the system ~a, ~b, ~c is right-handed. The number (~a × ~b) · ~c is called the box product of
~a, ~b, ~c ∈ E3 .

(b) The box product is positive if ~a, ~b, ~c is a right-handed system, it is negative if ~a, ~b, ~c is
left-handed. The box product is zero if and only if the vectors ~a, ~b, and ~c lie in one plane,
the latter including the case that ~a, ~b, or ~c is zero.

(c) The box product is invariant under cyclic permutation of its factors:

(~a × ~b) · ~c = (~b × ~c) · ~a = (~c × ~a) · ~b.

(d) Representing the vectors ~a, ~b, ~c w.r.t. a coordinate system, we have

(~a × ~b) · ~c = (a2 b3 − a3 b2 )c1 + (a3 b1 − a1 b3 )c2 + (a1 b2 − a2 b1 )c3 . (1.17)

Proof: Consider the parallelogram spanned by ~a and ~b as the base of the parallelepiped and
let h be the corresponding height (Figure 1.23). If ~a, ~b, ~c is a right-handed system, the angle φ
between ~c and ~a × ~b is acute, and we have h = |~c| cos φ. In consequence,

V = Ah = |~a × ~b||~c| cos φ = (~a × ~b) · ~c.

17
a x b

b
f A

Figure 1.23: Volume of a parallepiped

If ~a, ~b, ~c is left-handed, the angle φ between ~c and ~a × ~b is obtuse, and h = |~c| cos(π − φ).
Consequently,

V = Ah = |~a × ~b||~c| cos(π − φ) = −|~a × ~b||~c| cos φ = −(~a × ~b) · ~c.

Hence, if the vectors ~a, ~b, and ~c span a real parallelepiped, i.e., if they do not lie in one plane,
the box product is positive in the right-handed case and negative in the left-handed case; in
both cases the volume of the parallelepiped is V = |(~a × ~b) · ~c|. From

(~a × ~b) · ~c = |~a||~b| sin θ|~c| cos φ

where θ is the angle between ~a and ~b, it follows that the box product is zero if and only if the
three vectors lie in one plane.
Since the systems ~a, ~b, ~c and ~b, ~c, ~a as well as ~c, ~a, ~b have the same orientation and their vectors
span the same (possibly degenerated) parallelepiped, statement (c) is implied by (a) and (b).
Finally, a straightforward calculation involving Observations 1.11 and 1.15 yields
   
a2 b3 − a3 b2 c1
(~a × ~b) · ~c =  a3 b1 − a1 b3  ·  c2 
a1 b2 − a2 b1 c3
= (a2 b3 − a3 b2 )c1 + (a3 b1 − a1 b3 )c2 + (a1 b2 − a2 b1 )c3 .

Remark 1.19
µ a1 b1 c1

(a) A real 3 × 3 matrix is a quadratic scheme of nine real numbers, e.g., a2 b2 c2 . The
a3 b3 c3
determinant of such a matrix is denoted and defined by
  ¯ ¯
a1 b1 c1 ¯ a1 b1 c1 ¯
¯ ¯
det  a2 b2 c2  = ¯ a2 b2 c2 ¯
¯ ¯
a3 b3 c3 ¯ a3 b3 c3 ¯
¯ ¯ ¯ ¯ ¯ ¯
¯ a2 b2 ¯ ¯ a1 b1 ¯ ¯ a1 b1 ¯
¯
:= c1 ¯ ¯ ¯
− c2 ¯ ¯ + c3 ¯¯ ¯ (1.18)
a3 b3 ¯ a3 b3 ¯ a2 b2 ¯
= c1 (a2 b3 − a3 b2 ) − c2 (a1 b3 − a3 b1 ) + c3 (a1 b2 − a2 b1 )
= c1 (a2 b3 − a3 b2 ) + c2 (a3 b1 − a1 b3 ) + c3 (a1 b2 − a2 b1 )

18
(cf. Remark 1.17). Comparing the last expression with the right-hand side of Eq. (1.17),
we see that the box product can be represented as a determinant:
¯ ¯
¯ a1 b1 c1 ¯
¯ ¯
(~a × ~b) · ~c = ¯¯ a2 b2 c2 ¯¯ .
¯ a3 b3 c3 ¯

In consequence, the determinant of a 3 × 3 matrix is, up to the sign, just the volume of
the parallelepiped spanned by its column vectors.

(b) From
(~a × ~b) · ~c = −(~b × ~a) · ~c
it follows that    
a1 b1 c1 b1 a1 c1
det  a2 b2 c2  = − det  b2 a2 c2  .
a3 b3 c3 b3 a3 c3
More generally, using the invariance of the box product under cyclic permutation of its
factors as well as the anticommutative law of the vector product, one can easily show that
a 3 × 3 determinant changes its sign if any two columns are interchanged.

(c) The following determinant contains the unit vectors ~e1 , ~e2 , ~e3 and is not a real determinant.
Applying the definition (1.18) to this formal determinant, we obtain
¯ ¯
¯ a1 b1 ~e1 ¯ ¯ ¯ ¯ ¯ ¯ ¯
¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯
¯ a2 b2 ~e2 ¯ = ~e1 ¯ a2 b2 ¯ − ~e2 ¯ a1 b1 ¯ + ~e3 ¯ a1 b1 ¯
¯ ¯ ¯ a3 b3 ¯ ¯ a3 b3 ¯ ¯ a2 b2 ¯
¯ a3 b3 ~e3 ¯
= ~e1 (a2 b3 − a3 b2 ) − ~e2 (a1 b3 − a3 b1 ) + ~e3 (a1 b2 − a2 b1 )
= (a2 b3 − a3 b2 )~e1 + (a3 b1 − a1 b3 )~e2 + (a1 b2 − a2 b1 )~e3
 
a2 b3 − a3 b2
=  a3 b1 − a1 b3 
a1 b2 − a2 b1
= ~a × ~b.

The formal result ¯ ¯


¯ a1 b1 ~e1 ¯
¯ ¯
~a × ~b = ¯¯ a2 b2 ~e2 ¯¯
¯ a3 b3 ~e3 ¯

is often used to memorize the component representation of the vector product.


µ 1
¶ µ −1

Example 1.20 What is the volume of the parallelepiped spanned by ~a = 2 , ~b = 0 ,
3 5
µ 1 ¶
and ~c = 1 ?
−2
According to part (a) of the preceding remark, we obtain
¯  ¯ ¯¯ ¯¯
¯ 1 −1 1 ¯¯ ¯¯¯¯ 1 −1 1 ¯¯¯¯ ¯¯¯¯ ¯ ¯ ¯ ¯ ¯¯
¯ 2 0 ¯ ¯ 1 −1 ¯ ¯ 1 −1 ¯¯
V = ¯¯det  2 0 1 ¯¯ = ¯¯¯¯ 2 0 1 ¯¯¯¯ = ¯¯¯¯ ¯−¯
¯ ¯ 3
¯ − 2¯ ¯¯
¯ 3 5 5 ¯ ¯ 2 0 ¯¯
3 5 −2 ¯ ¯¯ 3 5 −2 ¯¯
= |10 − 8 − 4| = 2.

Finally, we prove the so-called “bac-cab” rule for twofold vector products which is often used
in physics.

19
Theorem 1.21 (bac-cab rule) For ~a, ~b, ~c ∈ E3 , we have

~a × (~b × ~c) = (~a · ~c)~b − (~a · ~b)~c = ~b(~a · ~c) − ~c(~a · ~b). (1.19)

Proof: We calculate the first component of ~a × (~b × ~c) w.r.t. a coordinate system and denote it
by (~a × (~b × ~c))1 :

(~a × (~b × ~c))1 = a2 (~b × ~c)3 − a3 (~b × ~c)2


= a2 (b1 c2 − b2 c1 ) − a3 (b3 c1 − b1 c3 )
= b1 (a2 c2 + a3 c3 ) − c1 (a2 b2 + a3 b3 ).

Adding a1 b1 c1 to the last expression and subtracting a1 b1 c1 from that, we obtain

(~a × (~b × ~c))1 = b1 (a2 c2 + a3 c3 + a1 c1 ) − c1 (a2 b2 + a3 b3 + a1 b1 ) = (~a · ~c)b1 − (~a · ~b)c1 .

Hence, the first component of the left-hand side of (1.19) is equal to the first component of the
right-hand side of (1.19). The analogous calculations for the other two components then prove
Eq. (1.19). 2

1.4 Straight Lines and Planes in Space


We look at some ideas concerning the description of geometrical objects in terms of equations
involving vectors. A straight line L in the two- or three-dimensional space P is determined by
one of its points and by its direction, i.e., if a coordinate system or at least an origin is given, by
the position vector ~r0 of that point and a vector ~v lying in the line (cf. Figure 1.24). Denoting
the position vector of any point of L by ~r, we have that

~r = ~r0 + t~v (1.20)

where t ∈ R. Eq. (1.20) is called a parametric equation of the straight line L. If ~r00 is the
position vector of any other given point of L and ~v 0 any other direction vector, we obtain a
second parametric equation of the same straight line, namely,

~r = ~r00 + s~v 0 , (1.21)

s ∈ R. According to (1.20) and (1.21), the same point ~r—again notice the abuse of language—is
characterized by different values of the parameters t and s. µ ¶ x
Introducing the coordinates of the points and the components of the vectors, ~r = y ,
z
µ x ¶ µ v ¶
0 1
~r0 = y0 , ~v = v2 , the vectorial equation (1.20) of a line L in the three-dimensional
z0 v3
space P3 is equivalent to the three equations

x = x0 + v1 t
y = y0 + v2 t
z = z0 + v3 t.

In the two-dimensional space P2 , we have only two equations:

x = x0 + v1 t (1.22)
y = y0 + v2 t. (1.23)
x−x0
Using (1.22) to eliminate the parameter t, we obtain t = v1 and
v2
y − y0 = (x − x0 ), (1.24)
v1

20
provided that v1 6= 0. Eq. (1.24) can be written in the usual form

y = ax + b (1.25)
v2
where a = v1 is the slope of the line. One can also eliminate t by means of (1.23), yielding
v1
x − x0 = (y − y0 ),
v2
provided that v2 6= 0.
Similarly, a plane H in the three-dimensional space P3 is determined by the position vector
~r0 of one of its points and two nonzero vectors ~u and ~v lying in the plane, ~u and ~v having neither
the same nor the opposite direction, i.e., ~v 6= λ~u (Figure 1.25). Any point ~r of the plane can be
represented according to
~r = ~r0 + s~u + t~v (1.26)
where s, t ∈ R. Eq. (1.26) is called a parametric equation of the plane H. Again, ~r0 , ~u, and ~v
are not uniquely determined by the plane.
The fact that a plane H is also determined by a fixed point ~r0 and a normal vector ~n (Figure
1.26) leads to the normal equation of H; by a normal vector of H we understand a nonzero
vector perpendicular to the plane, not necessarily a unit vector. Namely, if ~r is any point of H,
the vector ~r − ~r0 lies in the plane and consequently

~n · (~r − ~r0 ) = 0. (1.27)

Writing this equation as ~n · ~r − ~n · ~r0 = 0 and introducing p := ~n · ~r0 , we obtain

~n · ~r = p (1.28)

which is called the normal equation of H. Eq. (1.28) is a consequence of (1.27). Conversely,
(1.28) implies (1.27). In fact, since a given point ~r0 of H satisfies (1.28), it follows
µ ¶p = ~n ·µ~r0 and

n1 x
thus ~n · (~r − ~r0 ) = 0.—Representing the vectors by their components, ~n = n2 , ~r = y ,
n3 z
Eq. (1.28) reads
n1 x + n2 y + n3 z = p. (1.29)
If n3 6= 0, (1.29) can be written in the form
n1 n2 p
z=− x− y+ (1.30)
n3 n3 n3
or, with suitable abbreviations,
z = ax + by + c. (1.31)
Eq. (1.31) for a plane in P3 is the analog of the common representation (1.25) of a straight line
in the two-dimensional space P2 .
Next we discuss the geometrical meaning of the constant p in (1.28). Let ~r1 be the position
vector of that point of the plane that is closest to the origin of the coordinate system (Figure
1.27). If the plane does not pass through the origin and the normal vector ~n is not directed
towards the origin, we have that

~n · ~r = p = ~n · ~r1 = |~n||~r1 | cos 0 = |~n||~r1 | > 0.


p |p|
Therefore, p > 0 and the distance of the plane H from the origin O is d = |~r1 | = |~
n| = n| .
|~ If H
does not pass through O and ~n is directed towards O, then

~n · ~r = p = ~n · ~r1 = |~n||~r1 | cos π = −|~n||~r1 | < 0.

21
|p|
In this case, p < 0 and the distance of H from O is d = |~r1 | = − |~np | = n| .
|~ Finally, if the plane
passes through the origin,
~n · ~r = p = ~n · ~0 = 0
holds because O is a point of H and the corresponding position vector ~0 satisfies (1.28). Hence,
d = 0 as well as p = 0.
Summarizing, if p > 0, the normal vector ~n is not directed towards the origin, if p < 0, ~n is
directed towards the origin, and if p = 0, the plane passes through O. In each case,

|p|
d= (1.32)
|~n|

is the distance of the plane from the origin. If ~n is a unit vector, then d = |p|.
For vectors ~n, ~r ∈ E2 and a number p ∈ R, ~n · ~r = p is the equation of a straight line in
|p|
P2 with normal vector ~n and distance |~ n| from the origin; in terms of components the equation
reads n1 x + n2 y = p or, if n2 6= 0, y = − nn12 x + np2 (cf. Eqs. (1.24), (1.25), and (1.30)).
Now we consider several typical problems of analytical geometry:

1. Line through two points ~r1 , ~r2


Choosing ~r0 = ~r1 and ~v = ~r2 − ~r1 in Eq. (1.20), we obtain

~r = ~r1 + t(~r2 − ~r1 ).

2. Plane through three points ~r1 , ~r2 , ~r3


Eq. (1.26) and the choice ~r0 = ~r1 , ~u = ~r2 − ~r1 , and ~v = ~r3 − ~r1 yield

~r = ~r1 + s(~r2 − ~r1 ) + t(~r3 − ~r1 ).

3. Normal equation of a plane from its parametric equation


Multiplying each side of (1.26) in the sense of the scalar product by ~u × ~v , it follows that

(~u × ~v ) · ~r = (~u × ~v ) · ~r0 .

Defining ~n := ~u × ~v and p := (~u × ~v ) · ~r0 where ~n is a normal vector of the plane, we obtain

~n · ~r = p.

~ from a plane ~n · ~r = p
4. Distance d of a point R
~ Since the vector R−~
Let ~r1 be that point of the plane that is closest to R. ~ r1 is perpendicular
to the plane, we have that

~n · ~r1 = p (1.33)
~
R − ~r1 = λ~n (1.34)

where λ is some real number. The scalar equation (1.33) and the vectorial equation (1.34)
are equivalent to a system of four real equations in four real unknowns. Multiplying (1.34)
in the sense of the dot product by ~n and taking account of (1.33), we can eliminate ~r1 .
Thus,
~ − p = λ|~n|2 ,
~n · R
i.e.,
~n · R~ −p
λ= . (1.35)
|~n|2

22
~ − ~r1 |, it follows from (1.34) that d = |λ||~n|. The
Since the sought distance is just d = |R
result (1.35) now implies
|~n · R~ − p|
d= . (1.36)
|~n|
For R~ = ~0, formula (1.36) reduces to (1.32).—To calculate the point ~r1 , use (1.34) and
insert the value of λ according to (1.35).
~ from a straight line ~r = ~r0 + t~v
5. Distance d of a point R
~ Since the vector R
Let ~r1 be that point of the straight line that is closest to R. ~ − ~r1 is
perpendicular to the line, we have that

~r1 = ~r0 + t1~v (1.37)


~
(R − ~r1 ) · ~v = 0 (1.38)

where t1 is the parameter value corresponding to ~r1 . Replacing ~r1 in (1.38) by the right-
hand side of (1.37), we obtain an equation to determine t1 . Inserting the solution t1 into
(1.37), we can calculate ~r1 and then d = |R ~ − ~r1 |.—If we are only interested in d and
not in the result for ~r1 , we can proceed differently. Taking only (1.37) into account and
multiplying
~ − ~r1 = R
R ~ − ~r0 − t1~v

in the sense of the cross product by ~v , it follows that


~ − ~r1 ) × ~v = (R
(R ~ − ~r0 ) × ~v . (1.39)
~ − ~r1 ) × ~v | = |R
Since |(R ~ − ~r1 ||~v | sin π = |R
~ − ~r1 ||~v |, (1.39) implies
2

~ − ~r1 ||~v | = |(R


|R ~ − ~r0 ) × ~v |.

~ − ~r1 | = d we conclude that


From this and |R
~ − ~r0 ) × ~v |
|(R
d= . (1.40)
|~v |

~ 0 + tw
6. Distance d of two skew straight lines ~r = ~r0 + t~v and ~r = R ~
Let ~r1 be that point of the first straight line and ~r2 be that point of the second straight
line such that the vector ~r1 − ~r2 is perpendicular to both lines. Then d = |~r1 − ~r2 | and

~r1 = ~r0 + t1~v (1.41)


~r2 = R~ 0 + t2 w
~ (1.42)
~r1 − ~r2 = λ~v × w
~ (1.43)

where λ is some real number. Eqs. (1.41)–(1.43) constitute a system of nine real equations
in nine real unknowns. To conclude a formula for the distance d, we proceed similarly as
in the context of (1.40). Eliminating ~r1 and ~r2 in (1.43) by means of (1.41) and (1.42), we
obtain
~ 0 + t1~v − t2 w
~r0 − R ~ = λ~v × w.
~
The multiplication of both sides of this equation by ~v × w ~ in the sense of the dot product
yields
~ 0 ) · (~v × w)
(~r0 − R ~ 2,
~ = λ|~v × w|
i.e.,
(~r0 − R~ 0 ) · (~v × w)
~
λ= . (1.44)
|~v × w| ~2

23
Note that |~v × w|
~ 6= 0 since we have supposed that the lines are not parallel. From (1.43)
it follows that d = |~r1 − ~r2 | = |λ||~v × w|.
~ Hence, by (1.44),

|(~r0 − R~ 0 ) · (~v × w)|


~
d= .
|~v × w|~

7. Intersection of a straight line ~r = ~r0 + t~v and a plane ~n · ~r = p


An intersection point ~r1 satisfies the equations

~r1 = ~r0 + t1~v (1.45)


~n · ~r1 = p (1.46)

with some parameter value t1 . Eliminating ~r1 in (1.46) by means of (1.45), we obtain

~n · ~r0 + t1~n · ~v = p

and in consequence
p − ~n · ~r0
t1 = ,
~n · ~v
provided that ~n · ~v 6= 0. Inserting t1 into (1.45), one can calculate ~r1 .—The case ~n · ~v = 0
means that the line is parallel to the plane or lies in the plane.

1.5 Exercises
1.1 The center of gravity of n points in space described by the position vectors ~r1 , . . . , ~rn is
defined by
n
~r1 + . . . + ~rn 1X
~rG := = ~ri .
n n
i=1

a) Consider the straight lines through the vertices of a triangle and the midpoints of the
opposite sides. Prove that these lines intersect in one point which divides the line segments
between the vertices and the midpoints in ratio 2 : 1 and which is the center of gravity of
the vertices, resp., the center of gravity of the triangle.

b) Consider the straight lines through the vertices of a triangle being perpendicular to the
opposite sides. Show that these lines intersect in one point.

c) Consider a tetrahedron in three-dimensional space and the lines through the vertices and
the centers of gravity of the opposite faces. Show that these lines intersect in the center
of gravity of the tetrahedron, dividing the line segments between the vertices and the face
centers in ratio 3 : 1.

1.2

a) Prove that the midpoints of an arbitrary quadrilateral are the vertices of a parallelogram.

b) Show that the spatial diagonals of a parallelepiped intersect in one point which devides
each diagonal into two halves of equal length.

1.3 Determine the lengths of the sides and the angles of a triangle whose vertices are
     
2 1 3
~r1 =  −1  , ~r2 =  −3  , ~r3 =  −4  .
1 −5 −4

24
1.4 Prove that
a) |~a ± ~b|2 = |~a|2 ± 2~a · ~b + |~b|2
b) |~a ± ~b| ≤ |~a| + |~b| (triangle inequality)
c) |~r1 − ~r2 | ≤ |~r1 − ~r3 | + |~r2 − ~r3 | (triangle inequality)
d) |~a + ~b|2 + |~a − ~b|2 = 2|~a|2 + 2|~b|2 (parallelogram identity)
³ ´
e) ~a · ~b = 14 |~a + ~b|2 − |~a − ~b|2 .

1.5 Let the points with the position vectors


       
1 2 −1 0
~r1 =  0  , ~r2 =  1  , ~r3 =  2  , ~r4 =  −2 
−1 −3 1 1
be given. Find
a) the parametric as well as the normal equation of the plane through the points ~r1 , ~r2 , and ~r3
b) the distance of the point ~r4 from that plane as well as the point of the plane nearest to ~r4
c) the distance of the point ~r4 from the straight line through ~r1 and ~r2 as well as that point
of the line nearest to ~r4
d) the distance of the line through ~r1 and ~r2 from the line through ~r3 and ~r4 .

1.6 Determine
a) the intersection of the straight line
   
0 1
~r =  4  + t  −2 
1 −3
and the plane
4x − 3y + 5z = 0
as well as the angle between these
b) the intersection of the two planes

2x − y + z = 0 and x + 2y − z = 1

as well as the angle between these


c) the distances of the planes

−x + 3y + z = 2 and − 3x + 9y + 3z = −1

from the origin and each other.

1.7
a) Show that the volume of the tetrahedron spanned by three vectors ~a, ~b, and ~c not lying in
one plane is
1 ¯¯ ¯
¯
V = ¯(~a × ~b) · ~c¯ .
6
b) Calculate the volume of the tetrahedron whose vertices are the four points of Exercise 1.5.

25
Chapter 2

Elements of Linear Algebra

2.1 Systems of Linear Equations I


For reasons of some preparation, we begin our discussion of linear algebra with a preliminary
investigation of simultaneous systems of linear equations.

Definition 2.1 A system of m linear equations in n unknowns x1 , . . . , xn ∈ R is given by

a11 x1 + a12 x2 + . . . + a1n xn = b1


a21 x1 + a22 x2 + . . . + a2n xn = b2
..
.
am1 x1 + am2 x2 + . . . + amn xn = bm ,

briefly,
n
X
aij xj = bi , i = 1, . . . , m,
j=1
 
x1
where aij ∈ R and bi ∈ R are given numbers. A solution is an n-tupel x =  ..
.
 whose
xn
components satisfy the system.
If all bi are zero, we have a homogeneous system:
n
X
aij xj = 0, i = 1, . . . , m.
j=1

Example 2.2

(a) The system

x1 + x2 = 2
−x1 + 2x2 = 1
³ ´
1
has the only solution x2 = 1, x1 = 1, i.e., x = 1 .

(b) Consider the system

x1 + x2 + x3 = 1 (2.1)
x1 + x2 − x3 = 0 (2.2)

26
of two linear equations in three unknowns. The subtraction of the two equations yields
x3 = 12 . Inserting this value into (2.1), we obtain

1
x1 + x2 = ; (2.3)
2
Eq. (2.2) implies the same. In consequence, x1 = 12 − x2 where x2 can take any real value
t, so x2 = t and x1 = 12 − t. The solution vectors are
   1
  1   
x1 2 −t 2 −t
x =  x2  =  t  =  0  +  t ,
1 1
x3 2 2 0

that is,    
1
2 −1
x =  0  + t 1  (2.4)
1
2 0
1
where t ∈ R. From (2.3) we can also conclude that x2 = 2 − x1 where x1 can take any
real value s; therefore,    
0 1
x =  21  + s  −1  . (2.5)
1 0
2

Eqs. (2.4) and (2.5) are different representations of the same set of solutions. In fact, setting
t = −s+ 12 , (2.4) transforms into (2.5). Geometrically, (2.4) and (2.5) can be interpreted as
two different parametric equations of the same straight line.—We emphasize that, e.g., a
system of three linear equations in three unknowns can also have infinitely many solutions.

(c) The system

2x1 + 5x2 = 2
−6x1 − 15x2 = −1
1
obviously has no solution; in fact, the second equation is equivalent to 2x1 + 5x2 = 3 and
thus contradicts the first.

Summarizing, a system of linear equations can have exactly one solution or many solutions
or none. In the second case there are always infinitely solutions which, geometrically interpreted,
form a straight line or some plane in an n-dimensional space, as we shall see later. It is one
central topic in linear algebra to give statements on the existence and structure of the solutions
of systems of linear equations. The results are of far-reaching significance, for instance, in the
theory of linear differential equations. The following theorem is our first essential step into the
realm of linear algebra, the statement, however, is intuitively clear.

Theorem 2.3 The homogeneous system

a11 x1 + . . . + a1n xn = 0
..
. (2.6)
am1 x1 + . . . + amn xn = 0
 
0
has always the trivial solution x = 0 =  .
.. . If n > m (i.e., more unknowns than equations),
0
there exist nontrivial solutions x 6= 0.

27
Proof: It is obvious that a homogeneous system has always the trivial solution. By induction
over m, we prove that a system of m equations in n > m unknowns has also nontrivial solutions.
For m = 1, there is only one equation:

a1 x1 + . . . + an xn = 0. (2.7)
 
1
 0 
If a1 = 0, there is a solution x =
6 0, e.g., x =  .. . If a1 6= 0, it follows that
.
0
x1 = − a11 (a2 x2 + a3 x3 + . . . + an xn ) with no restriction for x2 , x3 , . . . , xn . Choosing x2 = 1
 − a2 
a1
1
 
and x3 = . . . = xn = 0, we obtain the solution x = 

0  of (2.7).

..
.
0
Now assume that the statement is true for a system of m − 1 equations in more than m − 1
unknowns. We have to show that the system (2.6) of m equations in more than m unknowns
has a solution x 6= 0. If, on the one hand, all first
 coefficients
 of the equations (2.6) are zero, i.e.,
1
 0 
if a11 = a21 = . . . = am1 = 0, then again x =  .
..  is a nontrivial solution. If, on the other
0
hand, at least one of these coefficients is not zero, say ak1 6= 0, we can solve the k-th equation
of (2.6) for x1 :

ak1 x1 = −(ak2 x2 + . . . + akn xn )


n
1 X
x1 = − akj xj . (2.8)
ak1
j=2

Inserting (2.8) into the other equations of (2.6),

ai1 x1 + ai2 x2 + . . . + ain xn , i = 1, . . . , m, i 6= k,

we obtain
n n
ai1 X X
− akj xj + aij xj = 0
ak1
j=2 j=2

or, equivalently,
n µ
X ¶
ai1
aij − akj xj = 0 (2.9)
ak1
j=2

where i = 1, . . . , m, i 6= k. Eqs. (2.8) and (2.9) are equivalent to the system (2.6). By induction
hypothesis, the homogeneous system (2.9) of
 m − 1 linear equations in the unknowns x2 , . . . , xn
x2
does have a nontrivial solution, say,  ..
.
 6= 0. Supplementing this solution by x1 according
xn
to (2.8), we have constructed a nontrivial solution
 
x1
 x2 
x= ..  6= 0
.
xn

of the system (2.6). Thus, the proof of the theorem is finished. 2

28
2.2 Vector Spaces
Linear algebra is founded on the famous vector-space axioms which are stated in the next defini-
tion. The concept of vector space refers to a common structure of different concrete mathematical
objects.

Definition 2.4 A vector space V over the real numbers R is a set of elements for which an
addition and a multiplication by numbers are defined, i.e.,

x+y ∈ V for any x, y ∈ V


λx ∈ V for any λ ∈ R and any x ∈ V,

such that the following rules hold:

I. (i) for all x, y ∈ V, x + y = y + x (commutative law)


(ii) for all x, y, z ∈ V, (x + y) + z = x + (y + z) (associative law)
(iii) there exists a uniquely determined zero element 0 ∈ V such that for all x ∈ V,
x+0=x
(iv) for each x ∈ V, there exists a uniquely determined inverse element −x ∈ V such that
x + (−x) = 0

II. (i) for all λ ∈ R and all x, y ∈ V, λ(x + y) = λx + λy (first distributive law)
(ii) for all λ, µ ∈ R and all x ∈ V, (λ + µ)x = λx + µx (second distributive law)
(iii) for all λ, µ ∈ R and all x ∈ V, λ(µx) = (λµ)x (mixed associative law)
(iv) for all x ∈ V, 1x = x.

The elements of V are called vectors and, in this context, the numbers are called scalars. The
multiplication of the vectors by numbers is often called the scalar multiplication.

The sum of more than two vectors is successively defined according to

x1 + x2 + x3 := (x1 + x2 ) + x3 = x1 + (x2 + x3 ),

etc. As in the case of numbers, one writes


n
X
x1 + x2 + . . . + xn =: xi .
i=1

From the vector-space axioms listed under I it follows that, for given vectors a, b ∈ V, the
equation
a+x=b (2.10)
always has the unique solution
x = b + (−a) =: b − a.
Namely, adding −a to both sides of (2.10), we obtain

(−a) + (a + x) = (−a) + b. (2.11)

The application of the axioms (ii), (i), (iv), and (iii) of I to the left-hand side of (2.11) yields

(−a) + (a + x) = ((−a) + a) + x = (a + (−a)) + x = 0 + x = x + 0 = x;

hence, by (2.11), x = b + (−a) which is, by definition, the difference b − a. Moreover, the
following statements are valid.

29
Theorem 2.5
(a) Let λ ∈ R and x ∈ V. Then
λx = 0 ⇐⇒ λ=0 or x = 0.

(b) For all x ∈ V,


(−1)x = −x.
Proof: If λ = 0, we have that λx = 0x = (0 + 0)x = 0x + 0x and so 0x = 0x + 0x. Adding
−(0x) to both sides of the latter equation, we obtain 0 = 0x, i.e., 0x = 0. Similarly, if x = 0,
λx = λ0 = λ(0 + 0) = λ0 + λ0, so λ0 = λ0 + λ0 and hence λ0 = 0.
Conversely, let
λx = 0. (2.12)
1
If λ = 0, there is nothing to prove. If λ 6= 0, the multiplication of (2.12) by λ yields
1 1
(λx) = 0. (2.13)
λ λ
We already know that the right-hand side of (2.13) ¡is the¢ zero vector. Applying axioms (iii) and
(iv) of II to the left-hand side, we obtain λ1 (λx) = λ1 λ x = 1x = x. Hence, by (2.13), x = 0.
To show statement (b), consider the equality chain 0 = 0x = (1 + (−1))x = 1x + (−1)x =
x + (−1)x which implies x + (−1)x = 0. Hence, by axiom I.(iv), (−1)x = −x. 2
Example 2.6
(a) Consider the set of all n-tupels of real numbers, i.e.,
  ¯ 

 x1 ¯¯ 

n  .. ¯
R := x =  . ¯ x1 , . . . , xn ∈ R .

 ¯ 

xn ¯
For any x, y ∈ Rn and any λ ∈ R, define
     
x1 y1 x1 + y1
   ..  :=  
x + y =  ...  +  .  
..
. ∈R
n

xn yn x n + yn
and    
x1 λx1
   
λx = λ  ...  :=  ...  ∈ Rn .
xn λxn
To show that Rn equipped with this addition and this scalar multiplication is a vector
space, we have to verify the vector-space axioms. The commutative law and the associative
law are simple consequences of the corresponding laws for numbers:
   
x1 + y1 y1 + x1
 ..   .. 
x+y = . = .  = y + x,
x n + yn yn + xn
     
x1 + y1 z1 (x1 + y1 ) + z1
 ..   ..   .. 
(x + y) + z =  . + . = . 
x n + yn zn (xn + yn ) + zn
 
x1 + (y1 + z1 )
 .. 
=  .  = x + (y + z).
xn + (yn + zn )

30
The vectors    
0 −x1
   
0 :=  ...  , −x :=  ... 
0 −xN
are the zero vector and the inverse of x ∈ RN according to axioms (iii) and (iv) of I. The
axioms of II are again inherited from corresponding laws for numbers.

(b) The three- or two-dimensional Euclidean vector space E (E = E3 , resp., E = E2 ) introduced


geometrically in Chapter 1 is a vector space in the sense of Definition 2.4, as Theorem
1.2 shows. The vectors of E are defined as equivalence classes of ordered pairs of points
and are usually represented by arrows (cf. Definition 1.1). The dimensions of E2 and E3
are intuitively clear, in the next section we shall give a precise definition of the concept
of dimension. The attribute Euclidean refers to the fact that in the vector space E we
know—again intuitively—what the length of a vector and the angle between two vectors
is; correspondingly, we can think about a vector of E as a quantity determined by its length
and its direction. It is essential to understand that in a general vector space the concepts
length and angle are not defined.
Given the three basis vectors ~e1 , ~e2 , ~e3 of a coordinate system, a vector ~x ∈ E3 can be
characterized by its components and these can be summarized in terms of a column vector
x ∈ R3 :  
x1
~x = x1~e1 + x2~e2 + x3 ~x3 ←→ x =  x2  . (2.14)
x3
In other words, there is a one-one correspondence between the vectors of E and the column
vectors of R3 . With respect to a second coordinate system with basis vectors ~e10 , ~e20 , ~e30 ,
we have  0 
x1
~x = x01~e10 + x02~e20 + x03~e30 ←→ x0 =  x02  . (2.15)
x30

Note that in (2.14) and (2.15) ~x ∈ E3 is the same vector whereas x ∈ R3 and x0 ∈ R3 are
different column vectors! Accordingly, (2.14) and (2.15) represent two different bijective
maps between the vector spaces E3 and R3 .
In Chapter 1, we considered a coordinate system as given and kept it fixed. Thus, we
were allowed to identify the vectors ~x ∈ E3 with the column vectors of R3 ; in this chapter,
however, we distinguish between these two kinds of vectors and consider E3 and R3 as
different vector spaces.
Finally, with respect to a given coordinate system (O; ~e1 , ~e2 , ~x3 ) in the Euclidean point
space P3 we can represent the vectors by position vectors, thus obtaining a one-one cor-
respondence between points and vectors. Hence, there isµalso ¶ a one-one correspondence
x1
between the points P ∈ P3 and the column vectors x = x2 ∈ R3 ; in fact, x1 , x2 , x3
x3
are just the coordinates of P . The three bijective maps P ↔ ~x, ~x ↔ x, and P ↔ x depend
on the coordinate system.

(c) This example shows the generality of the vector-space concept and teaches the student
with knowledge from school that things are vector spaces he or she would never expect.
Consider functions f : R → R, x 7→ y = f (x); remember that f denotes the function
as a rule uniquely assigning numbers y to numbers x whereas f (x) denotes the assigned
number y, the value of the function at x. If g is a second such function with domain R, we
can define a third function h according to h(x) := f (x) + g(x) for all x ∈ R. This function
h is denoted by f + g; f + g is that function that associates each number x with the sum

31
f (x) + g(x) of the values of f and g at x. Similarly, if λ is a constant real number, the
function λf associates x with the product λf (x).
In consequence, the set of the continuous functions on R for instance,

C 0 (R) := {f : R → R | f continuous},

becomes a vector space by the pointwise defined sum of two functions,

(f + g)(x) := f (x) + g(x), (2.16)

and the pointwise defined product of a function by a number λ ∈ R,

(λf )(x) := λf (x). (2.17)

In fact, if f and g are continuous, then f + g and λf are continuous, and the vector-space
axioms are satisfied. We verify some of them explicitly, e.g., the associative law. From

((f + g) + h)(x) = (f + g)(x) + h(x) = ((f (x) + g(x)) + h(x)


= f (x) + (g(x) + h(x))
= f (x) + (g + h)(x)
= (f + (g + h))(x)

for all x ∈ R it follows that (f + g) + h = f + (g + h), i.e., the associative law. The zero
vector is the function 0 that vanishes identically, i.e., 0(x) := 0 for all x ∈ R. The function
−f defined by (−f )(x) := −f (x) for all x ∈ R is inverse to f with respect to the addition
of functions. Finally, to show the first distributive law, consider

(λ(f + g))(x) = λ(f + g)(x) = λ(f (x) + g(x))


= λf (x) + λg(x)
= (λf )(x) + (λg)(x)
= (λf + λg)(x)

for all x ∈ R, which implies λ(f + g) = λf + λg.

(d) Let P(R) be the set of all real polynomials on R, i.e., p ∈ P(R) is a function of the form

p(x) = an xn + an−1 xn−1 + . . . + a2 x2 + a1 x + a0

where n is any natural number and a0 , a1 , . . . , an are real constants. Defining the sum
p1 + p2 of two polynomials and the product λp of a polynomial by a number pointwise
as in (2.16) and (2.17), P(R) becomes a vector space where the vector-space axioms are
verified the same way as in the preceding example.

Definition 2.7 A nonempty subset S of a vector space V is called a subspace of V if


(i) for all x, y ∈ S, x + y ∈ S

(ii) for all λ ∈ R and all x ∈ S, λx ∈ S.

The definition says that a subspace S ⊆ V is closed under the addition and scalar mul-
tiplication defined in the vector space V so that S itself is equipped with an addition and a
multiplication by numbers. Since S is supposed to be nonempty, it contains an element x and,
as a consequence of the defining property (ii), the zero vector 0 = 0x. As a further consequence
of (ii), S contains, with each x ∈ S, also the inverse vector −x = (−1)x. Thus, the vector-space
axioms (iii) and (iv) of I are satisfied, and all the other vector-space axioms hold in S because
they hold in V. Hence, a subspace is a vector space itself.

32
Example 2.8
(a) In the Euclidean vector space E3 , all vectors that are parallel to a fixed plane constitute
a subspace S. Representing all vectors of E3 by position vectors w.r.t. the origin of a
coordinate system and interpreting the position vectors as points, S becomes a plane
through the origin.
In fact, besides the trivial cases {~0} and E3 , the subspaces of E3 can be imagined as the
planes and straight lines through the origin of a coordinate system. A straight line or a
plane not passing through the origin does not represent a subspace since, for instance, the
sum of the position vectors of two points of such a plane corresponds to a point outside
the plane. Another example of a subset of E3 that is not a subspace is the set of all vectors
with a length less or equal than 1; this set corresponds to a ball of radius 1 centered at
the origin.
(b) The set S of all solutions x ∈ Rn ofthe homogeneous
 system (2.6) of linear equations
  is
x1 y1
a subspace of Rn . Namely, if x =  ..
.
 is one solution of (2.6) and y =  ..
.
 is
xn yn
another one, then the addition of the i-th equation of (2.6) and the i-th equation of (2.6)
with y instead of x yields, for all i = 1, . . . , m,
a11 (x1 + y1 ) + . . . + a1n (xn + yn ) = 0
..
.
am1 (x1 + y1 ) + . . . + amn (xn + yn ) = 0;
that is, with x and y being solutions, x + y is also a solution. Furthermore, multiplying
the equations (2.6) by λ ∈ R, we obtain
a11 (λx1 ) + . . . + a1n (λxn ) = 0
..
.
am1 (λx1 ) + . . . + amn (λxn ) = 0;
that is, λx is a solution if x is. Since the homogeneous system (2.6) has always the trivial
solution x = 0, S is in particular not empty. Hence, the solution set of a homogeneous
system of n linear equations is a subspace of Rn .
(c) Consider one homogeneous linear equation in three unknowns,
a1 x1 + a2 x2 + a3 x3 = 0, (2.18)
which is a particular case of the system (2.6). The solution vectors of (2.18) constitute a
subspace of R3 . Assuming that at least one of the coefficients a1 , a2 , a3 is not zero and
interpreting the solution vectors x ∈ R3 w.r.t. a coordinate system as points, (2.18) is
the equation of a plane through the origin which, as we already know, corresponds to a
subspace of E3 .
The inhomogeneous equation
a1 x1 + a2 x2 + a3 x3 = b (2.19)
where b 6= 0 and not all coefficients are zero, has infinitely many solutions that do not
constitute a subspace of R3 . Namely, the requirements (i) and (ii) of Definition 2.7 are
not fulfilled; for instance, if x, y ∈ R3 satisfy (2.19), x + y satisfies the equation
a1 (x1 + y1 ) + a2 (x2 + y2 ) + a3 (x3 + y3 ) = 2b
which is different from (2.19). Geometrically speaking, the plane described by Eq. (2.19)
does not pass through the origin and does consequently not represent a subspace of E3 .

33
(d) The vector space P(R) of all real polynomials is a subspace of the space C 0 (R) of all
continuous functions on R.

The next theorem gives a method to construct subspaces. Let v1 , . . . , vm be a system of


vectors
Pmof a vector space V. A vector x ∈ V is called a linear combination of v1 , . . . , vm if
x = i=1 λi vi for some numbers λ1 , . . . , λm ∈ R.

Theorem 2.9 Let V be a real vector space and let v1 , . . . , vm ∈ V. The set of all linear combi-
nations
m
X
x= λi vi
i=1

where λ1 , . . . , λm are any real numbers, is a subspace S of V.

Proof: Since, e.g., v1 = 1v1 + 0v2 + . . . + 0vm , v1 is also a linear combination of v1 , . . . , vm and
thus v1 ∈ S. Consequently, S 6= ∅, i.e., the set S is not empty. If x, y ∈ S, then, according to
m
X m
X m
X
x+y = λi vi + µi vi = (λi + µi )vi ,
i=1 i=1 i=1

x + y ∈ S. Further, if λ ∈ R and x ∈ R, then


m
X
λx = (λλi )vi ,
i=1

i.e., λx ∈ S. Hence, S is a subspace. 2

The subspace S of the theorem is called the subspace generated by v1 , . . . , vm or spanned by


v1 , . . . , vm .

Example 2.10 Two nonzero vectors ~x, ~y ∈ E3 where ~y 6= λ~x, λ ∈ R, span a subspace corre-
sponding to a plane through the origin. If ~y is some multiple of ~x 6= ~0, the spanned subspace
corresponds to a straight line through the origin, and if ~x and ~y are zero, the spanned subspace
is the trivial subspace {~0}.

2.3 Linear Independence, Bases, and Dimension


All our further investigations in linear algebra are based on the fundamental concept of linear
dependence of vectors, respectively, linear independence.

Definition 2.11 A system v1 , . . . , vm of vectors of a vector space V is called linearly independent


if
Xm
λi vi = 0
i=1

is possible only if
λ1 = λ2 = . . . = λm = 0.
The system v1 , . . . , vm ∈ V is linearly dependent if there exists numbers λ1 , . . . , λm not all being
zero such that
Xm
λi vi = 0.
i=1

34
Remark 2.12

(a) For two linearly dependent vectors v1 , v2 ∈ V we have

λ1 v1 + λ2 v2 = 0

for a nontrivial choice of the two coefficients. Assuming λ1 6= 0, we obtain v1 = − λλ12 v2 .


That is, in the Euclidean vector space E two vectors ~v1 , ~v2 6= ~0 are linearly dependent if
and only if they have the same or opposite direction.

(b) For three linearly independent vectors v1 , v2 , v3 ∈ V we have

λ1 v1 + λ2 v2 + λ3 v3 = 0

for a nontrivial choice of the three coefficients. Assuming λ1 6= 0, we obtain v1 =


− λλ21 v2 − λλ31 v3 . That is, in the Euclidean vector space E3 three vectors ~v1 , ~v2 , v~3 are lin-
early dependent if and only if they can be represented by arrows lying in one plane. In
particular, in E2 any three vectors are linearly dependent.

(c) If one of the vectors v1 , . . . , vm is zero, the system is linearly dependent. If v1 , . . . , vm is


linearly dependent, every larger system v1 , . . . , vm , vm+1 , . . . , vn is also linearly dependent.

Example 2.13
µ 0
¶ µ 0
¶ µ 1

(a) The vectors 1 , 2 , 5 ∈ R3 are linearly independent. Namely,
1 1 3
     
0 0 1
λ 1  + µ 2  + ν  5  = 0
1 1 3

is equivalent to    
ν 0
 λ + 2µ + 5ν  =  0  ,
λ + µ + 3ν 0
resp.,

ν = 0
λ + 2µ + 5ν = 0
λ + µ + 3ν = 0.

That is,

ν = 0
λ + 2µ = 0
λ + µ = 0,

resp., ν = 0, µ = 0, and λ = 0.
µ 1 ¶ µ 0 ¶ µ 3

(b) Are the vectors 1 , −1 , and 4 ∈ R3 linearly independent or not? The
1 2 1
vectorial equation      
1 0 3
λ  1  + µ  −1  + ν  4  = 0
1 2 1

35
yields
λ + 3ν = 0
λ − µ + 4ν = 0
λ + 2µ + ν = 0.
The latter system is equivalent to
λ + 3ν = 0
−µ + ν = 0
2µ − 2ν = 0,
resp.,
λ + 3ν = 0
µ = ν.
Hence, we have nontrivial solutions; one is, e.g., ν = 1, µ = 1, λ = −3. So the vectors are
linearly dependent.
(c) Consider V = Rn and let
 
  0 

1   0
   1   .. 
 0     
e1 :=  .. , e2 :=  0 , ..., en :=  .  . (2.20)
 .   ..   0 
 . 
0 1
0
The system e1 , . . . , en is linearly independent:
 
λ1
Xn  λ2 
 
0= λi ei =  .  =⇒ λ1 = λ2 = . . . = λn = 0.
 .. 
i=1
λn

(d) Consider the vector space P(R) of the real polynomials and let qi ∈ P(R) be defined by
qi (x) := xi
where x ∈ R and i = 0, 1, 2, . . .. Note that every polynomial p ∈ P(R) is a linear combi-
nation of the monomials qi :
p(x) = an xn + an−1 xn−1 + . . . + a1 x + a0
= an qn (x) + an−1 qn−1 (x) + . . . + a1 q1 (x) + a0 q0 (x)
= (an qn + an−1 qn−1 + . . . + a1 q1 + a0 q0 )(x),
i.e.,
n
X
p = an qn + an−1 qn−1 + . . . + a1 q1 + a0 q0 = ai qi .
i=0
The polynomials q0 , q1 , . . . , qn are linearly independent. In fact,
n
X
λi qi = 0
i=0
means à !
n
X n
X n
X
0= λi qi (x) = λi qi (x) = λi xi
i=0 i=0 i=0
for all x ∈ R, hence, λ0 = λ1 = . . . = λn = 0.

36
The next two definitions and Theorem 2.18 are crucial.

Definition 2.14 A vector space V is called n-dimensional (briefly, dim V = n) if

(i) there exists a system of n linearly independent vectors v1 , . . . , vn ∈ V

(ii) every system of n + 1 vectors w1 , . . . , wn+1 ∈ V is linearly dependent.

A vector space V is called infinite-dimensional if, for every n ∈ N, there exists a linearly inde-
pendent system v1 , . . . , vn ∈ V. The trivial vector space {0} has the dimension 0.

In other words, n = dim V is the maximal number of linearly independent vectors of V.

Example 2.15

(a) We prove that the dimension of Rn is n. By part (c) of Example 2.13 we already know that
there are n linearly independent vectors in Rn , namely, the vectors e1 , . . . , en according to
(2.20). We have to show that any n + 1 vectors u1 , . . . , un+1 ∈ Rn are linearly dependent.
The equation
λ1 u1 + . . . + λn+1 un+1 = 0
is equivalent to
       
u11 u21 un+1,1 0
       ..  ,
λ1  ...  + λ2  ...  + . . . + λn+1  ..
. = . 
u1n u2n un+1,n 0
resp.,

λ1 u11 + . . . + λn+1 un+1,1 = 0


..
.
λ1 u1n + . . . + λn+1 un+1,n = 0

where uij are the components of the column vectors ui . According to Theorem 2.3, the
latter homogeneous
  system of n linear equations in the n + 1 unknowns λj has a nontrivial
λ1
solution  ..
.
 6= 0. Hence, u1 , . . . , un+1 are linearly dependent.
λn+1

(b) For the space of the polynomials, we have that dim P(R) = ∞. In fact, according to part
(d) of Example 2.13, the n + 1 monomials q0 , . . . , qn are linearly independent, for every
n ∈ N.

Definition 2.16 A system v1 , . . . , vn ∈ V is called a basis of V (or a basis in V) if every vector


x ∈ V has a representation of the form
n
X
x= ξi vi
i=1

where the coefficients ξi ∈ R are uniquely determined. The numbers ξi are the components of x
w.r.t. the basis v1 , . . . , vn .

37
Example 2.17

(a) In the two-dimensional Euclidean vector space E2 , two unit vectors ~e1 , ~e2 being perpen-
dicular to each other form a basis since every vector x ∈ E2 can uniquely be represented
according to ~x = x1~e2 + x2~e2 . The vectors ~v1 = ~e1 + ~e2 and ~v2 = −~e1 + ~e2 obviously
constitute a new basis in E2 (~v1 and ~v2 are perpendicular to each other as well, but they
are not unit vectors). We calculate the components ξ1 and ξ2 of a vector ~x w.r.t. to the
new basis from those w.r.t. the old basis. From

~x = x1~e1 + x2~e2 = ξ1~v1 + ξ2~v2


= ξ1 (~e1 + ~e2 ) + ξ2 (~e2 − ~e1 )
= (ξ1 − ξ2 )~e1 + (ξ1 + ξ2 )~e2

it follows that x1 = ξ1 − ξ2 and x2 = ξ1 + ξ2 , i.e.,


1
ξ1 = (x1 + x2 )
2 (2.21)
1
ξ2 = (x2 − x1 ).
2
The three vectors ~e1 , ~e2 , ~v1 , for instance, are not a basis since the representation of any
vector ~x ∈ E2 as a linear combination of these is not unique, as the equality
1 1 1
~x = x1~e1 + x2~e2 + 0~v1 = x1~e1 + (x2 − x1 )~e2 + x1 (~e1 + ~e2 )
2 2 2
1 1 1
= x1~e1 + (x2 − x1 )~e2 + x1~v1
2 2 2
shows. The vectors ~v1 , ~v2 are perpendicular to each other, but are not unit vectors. Notice
that any two linearly independent vectors of E2 form a basis, neither they have to be
orthogonal nor unit vectors.

(b) In the three-dimensional Euclidean vector space E3 , three orthogonal unit vectors ~e1 , ~e2 , ~e3
are a basis; every vector x ∈ E3 has the unique representation ~x = x1~e1 + x2~e2 + x3~e3 . Any
three linearly independent vectors of E3 obviously constitute a basis, whereas two vectors,
three vectors lying in one plane, or four vectors do not form a basis.

The following theorem states how the fundamental concepts of linear independence, dimen-
sion, and basis are related.

Theorem 2.18 A system v1 , . . . , vn ∈ V is a basis of V if and only if

(i) v1 , . . . , vn is linearly independent

(ii) n = dim V (in particular, dim V < ∞).

Proof: The proof consists of two parts. First, we prove that the conditions (i) and (ii) imply
that v1 , . . . , vn is a basis.
Suppose the system v1 , . . . , vn satisfies (i) and (ii). For any vector x ∈ V, the system
v1 , . . . , vn , x is, according to (ii), linearly dependent. Let
n
X
λi vi + λx = 0 (2.22)
i=1
P
where not all of the coefficients λ1 , . . . , λn , λ are zero. Assume λ = 0. It follows that ni=1 λi vi =
0 and, by (i), λ1 = . . . = λn = 0. In consequence, λ1 = . . . = λn = λ = 0 which contradicts our

38
nontrivial choice of the coefficients in (2.22). Hence, λ 6= 0, and we can solve Eq. (2.22) for x,
obtaining
n n µ ¶ n
1X X λi X
x=− λi vi = − vi = ξi vi
λ λ
i=1 i=1 i=1

where ξi := − λλi . That is, every vector x is a linear combination of the vectors vi . It remains to
show the uniqueness of the coefficients ξi . The equality
n
X n
X
x= ξi vi = ηi vi
i=1 i=1
Pn
implies i=1 (ξi − ηi )vi = 0; consequently, by (i) again, ξ1 = η1 , . . . , ξn = ηn . Hence, the system
v1 , . . . , vn is a basis of V.
Second, we have to prove that a basis v1 , . . . , vn has the properties (i) and (ii).
Suppose the system v1 , . . . , vn is a basis. In particular, the zero vector has the unique
representation
Xn
0= ξi vi ,
i=1

so all the coefficients ξi must be zero. Hence, the n vectors v1 , . . . , vn are linearly independent.
It remains to show that dim V = n. To that end, we prove that any system w1 , . . . , wm ∈ V,
m > n, is linearly dependent. Inserting the representation
n
X
wi = aij vj ,
j=1

i = 1, . . . , m, into
m
X
λi wi = 0, (2.23)
i=1

we obtain that
m n n
m X n
Ãm !
X X X X X
0= λi aij vj = λi aij vj = λi aij vj .
i=1 j=1 i=1 j=1 j=1 i=1

Since we already know that the vectors v1 , . . . , vn are linearly independent, we conclude that
m
X
aij λi = 0 (2.24)
i=1

for all j = 1, . . . , n. According to Theorem 2.3, the homogeneous


 system
 (2.24) of n linear
λ1
equations in the m > n unknowns λj has a nontrivial solution  ..
.
 6= 0. Hence, by the
λn+1
equivalence of (2.23) and (2.24), w1 , . . . , wm are linearly dependent; consequently, n = dim V. 2

Remark 2.19 If v1 , . . . , vn and w1 , . . . , wm are bases of V, then n = m = dim V.

Example 2.20

(a) From Example 2.15, part (a), we know that dim Rn = n. Hence, the linearly independent
vectors e1 , . . . , en introduced in part (c) of Example 2.13 are a basis of Rn . This can also

39
be seen directly. Namely, every x ∈ Rn is a unique linear combination of the vectors ei ; in
fact,  
x1 n
 ..  X
x= . = ξi ei
xn i=1

if and only if ξi = xi . Among all the bases of Rn , the basis e1 , . . . , en is distinguished by


the fact that the components of every vector x ∈ Rn are just the entries of the column;
e1 , . . . , en is called the canonical or the standard basis of Rn .

(b) As an example of another basis of Rn for n = 4, consider the four linearly independent
vectors        
1 1 1 1
 1   1   1   0 
 ,  ,  ,  
 1   1   0   0 
1 0 0 0
of R4 which, according to the preceding theorem, are a basis of R4 .

(c) The first two of the four vectors


µ ¶ µ ¶ µ ¶ µ ¶
1 0 1 −1
e1 = , e2 = ; v1 := , v2 :=
0 1 1 1

of R2 constitute the canonical basis of R2 , the second ones are linearly independent and
form a second basis. For any x ∈ R2 we have
µ ¶
x1
x= = x1 e1 + x2 e2 = ξ1 v1 + ξ2 v2 .
x2

From µ ¶ µ ¶ µ ¶ µ ¶
x1 1 −1 ξ1 − ξ2
= ξ1 v1 + ξ2 v2 = ξ1 + ξ2 =
x2 1 1 ξ1 + ξ2
it follows x1 = ξ1 − ξ2 , x2 = ξ1 + ξ2 , respectively, Eqs. (2.21). In fact, the current example
is analogous to Example 2.17, part (a).

Clearly, if S is a subspace of a finite-dimensional vector space V, then dim S ≤ dim V, and


dim S = dim V if and only if S = V. Moreover, the following statement holds.

Theorem 2.21 Let S be a (nontrivial) subspace of a vector space V and dim S = m < n =
dim V. Then every basis v1 , . . . , vm of S can be supplemented to a basis

v1 , . . . , vm , vm+1 , . . . , vn

of V (in particular, v1 , . . . , vm ∈ V, vm+1 , . . . , vn ∈ V \ S).

Proof: Take any basis v1 , . . . , vm of V and choose any vector vm+1 ∈ V \ S (such a vector exists
since S is a proper subspace of V because of dim S < dim V). The system v1 , . . . , vm , vm+1 is
linearly independent. Namely, the assumption λm+1 6= 0 in
m
X
λi vi + λm+1 vm+1 = 0
i=1

implies
m
X
1
vm+1 = − λi vi .
λm+1
i=1

40
Since S is a subspace, the right-hand side of the latter equation is a vector of S, whereas vm+1 6∈
S. Because of this contradiction we obtain λm+1 = 0, and because of the linear independence of
v1 , . . . , vm we conclude that λ1 = . . . = λm = 0. Hence, the vectors v1 , . . . , vm , vm+1 are linearly
independent.
If m + 1 = n, the theorem has been proved. If m + 1 < n, consider the subspace S1 generated
by v1 , . . . , vm , vm+1 ; S1 is a proper subspace of V. Choose a vector vm+2 ∈ V \ S1 and show as
above that the system v1 , . . . , vm , vm+1 , vm+2 is linearly independent. Thus, after n − m steps
of this kind, we obtain a basis v1 , . . . , vm , vm+1 , . . . , vn of V. 2

2.4 Linear Maps and Matrices


Besides the concepts of vector space and linear independence, the concept of a linear mapping
is the most fundamental one. We motivate this important concept by the next example.

Example 2.22 Consider the rotation of the vectors of the two-dimensional Euclidean space E2
by an angle φ in the positive sense. Represent all vectors by arrows with the same beginning
point, say, as position vectors w.r.t. the origin of a coordinate system, and rotate the position
vectors counterclockwise around the origin by φ. For ~x ∈ E2 , call the rotated vector L(~x); that
is, L : E2 → E2 is a map transforming the vectors into the rotated ones. It is evident that the
sum ~x + ~y of two vectors coincides after rotation with the sum of the rotated vectors L(~x) and
L(~y ), i.e., L(~x + ~y ) = L(~x) + L(~y ). Furthermore, L(λ~x) = λL(~x) for any real number λ.

Definition 2.23 Let V and W be real vector spaces. A map

L: V → W
x 7→ L(x)

assigning a vector L(x) ∈ W to each vector x ∈ V, is called linear if

(i) for all x, y ∈ V, L(x + y) = L(x) + L(y)

(ii) for all λ ∈ R and all x ∈ V, L(λx) = λL(x).

A linear map L : V → V is also called a linear transformation and a linear map l : V → R a


linear function.

The rotation L of Example 2.22 is a particular linear transformation with additional prop-
erties; for instance, it preserves the lengths of vectors (i.e., |L(~x)| = |~x|) and the angles be-
tween vectors. As follows from Definition 2.23, every linear map L : V P→ W preserves
Pm sums
m
and linear combinations, e.g., L(x + y + z) = L(x) + L(y) + L(z), L ( i=1 xi ) = i=1 L(xi ),
L(λx + µy) = λL(x) + µL(y), and L(x − y) = L(x + (−1)y) = L(x) + (−1)L(y) = L(x) − L(y).
The general statement reads Ãm !
X Xm
L λi xi = λi L(xi )
i=1 i=1
where λi ∈ R and xi ∈ V.—We consider some further examples for linear maps.

Example 2.24
µ x1

(a) Let x = x2 ∈ R3 . According to
x3
µ ¶
3x1 + 2x2 − 4x3
L(x) := ∈ R2 ,
x1 − x2 + 2x3

41
a map L : R3 → R2 is defined. By an easy calculation we obtain
µ ¶
3(x1 + y1 ) + 2(x2 + y2 ) − 4(x3 + y3 )
L(x + y) = = L(x) + L(y)
(x1 + y1 ) − (x2 + y2 ) + 2(x3 + y3 )

and µ ¶
3λx1 + 2λx2 − 4λx3
L(λx) = = λL(x),
λx1 − λx2 + 2λx3
that is, L is linear.

(b) The (orthogonal) projection p~ of a vector ~x ∈ E along a unit vector ~u ∈ E is

p~ = |~x| cos φ~u = (~u · ~x)~u

where φ is the angle between ~x and ~u (cf. Example 1.12, part (c)). Keeping the unit vector
~u fixed, a map L : E → E is defined by ~x 7→ p~ =: L(~x). The projection p~ depends linearly
on ~x, i.e., L is a linear map:

L(~x + ~y ) = (~u · (~x + ~y ))~u = (~u · ~x)~u + (~u · ~y )~u = L(~x) + L(~y ),
L(λ~x) = (~u · (λ~x))~u = λ(~u · ~x)~u = λL(x).

The linear map L is called the orthogonal projection onto the one-dimensional subspace
spanned by ~u.

(c) Let x ∈ R4 . According to

l(x) := x1 + x2 + 4x3 − 2x4 ∈ R,

a linear function l : R4 → R is defined.


Rb
(d) The definite integral a f (x)dx of a continuous function f : [a, b] → R is a real num-
ber. Hence, we can define a linear function l : C 0 ([a, b]) → R on the vector space of the
continuous functions on [a, b] by
Z b
l(f ) := f (x)dx.
a

Since l acts on vectors that are functions, l is also called a linear functional.

Linear maps acting between finite-dimensional vector spaces can be represented by matrices,
as we are going to show. Let V be a vector space of dimension n, W a vector space of dimension
m, and L : V → W be a linear map. Choose a basis v1 , . . . , vn in V and a basis w1 , . . . , wm in
W. The images of the basis vectors vj under L can be decomposed with respect to the basis in
W,
m
X
L(vj ) = aij wi , j = 1, . . . , n, (2.25)
i=1

and the coefficients aij can be summarized by an m × n matrix A:


 
a11 . . . a1n
 ..  .
A :=  ... .  (2.26)
am1 . . . amn

Note that the first index of the entries aij counts the rows and the second index the columns.
The j-th column of A consists of the components of the vector L(vj ); A is called the matrix of
L w.r.t. v1 , . . . , vn and w1 , . . . , wm . By means of A, one can calculate the image y = F (x) of

42
P
every vector x ∈ V under the linear map L. Writing x = nj=1 ξj vj , it follows from the linearity
of L and (2.25) that
 
X n n
X n
X Xm
y = L(x) = L  ξj vj  = ξj L(vj ) = ξj aij wi .
j=1 j=1 j=1 i=1

Applying some rules for calculations with sums (which here are consequences of the vector-space
axioms) to the last expression, we obtain
 
Xn m
X Xn Xm m X
X n m
X n
X
ξj aij wi = ξj aij wi = ξj aij wi =  aij ξj  wi .
j=1 i=1 j=1 i=1 i=1 j=1 i=1 j=1

Hence,  
m
X n
X m
X
y = L(x) =  aij ξj  wi = ηi wi (2.27)
i=1 j=1 i=1

where ηi are the components of y w.r.t. the basis w1 , . . . , wm .


The result (2.27) states that, w.r.t. the given bases, the components ηi of y = L(x) can be
calculated from the components ξj of x according to
n
X
ηi = aij ξj .
j=1

It is convenient to introduce column vectors X ∈ Rn and Y ∈ Rm corresponding to x ∈ V and


y = L(x) ∈ W, namely,
     Pn 
ξ1 η1 j=1 a1j ξj
     
X :=  ...  , Y :=  ...  =  ..
. . (2.28)
Pn
ξn ηm j=1 amj ξj

Using the definition of the product of an m × n matrix by a column vector of Rn ,


    Pn 
a11 . . . a1n ξ1 j=1 a1j ξj
 .. ..   ..  :=  .. 
 . .  .   , (2.29)
Pn .
am1 . . . amn ξn j=1 amj ξj

we finally obtain
Y = AX. (2.30)
We summarize our results by the following theorem.

Theorem 2.25 If, according to (2.25) and (2.26), A is the matrix of a linear map L : V → W
w.r.t. the bases v1 , . . . , vn and w1 , . . . , wm , then
n
X
y = L(x) ⇐⇒ ηi = aij ξj , i = 1, . . . , m ⇐⇒ Y = AX
j=1

where X ∈ Rn and Y ∈ Rm consist of the components ξj of x and ηi of y, respectively. Con-


versely, an m × n matrix A with entries aij defines a linear map L : V → W according to
 
X m Xn
L(x) :=  aij ξj  wi (2.31)
i=1 j=1
Pm
where again x = j=1 ξj vj .

43
Proof: Because of the preceding discussion, it only remains to show the converse statement of
the theorem. But it is obvious that the map L defined by (2.31) is linear. 2

We emphasize that the matrix of a linear map depends on the bases chosen in V and W—like
the components of a vector depend on the basis. In Eq. (2.31) a matrix A and bases are used
to define a linear map L, but the vectors x, y = L(x), and the resulting map L do not depend
on any bases. The comparison with (2.25) or (2.27) shows that the matrix of the linear map
defined by (2.31) is again A.
In the case of a linear map L : Rn → Rm , i.e., V = Rn and W = R, one commonly works
with the canonical bases e1 , . . . , en of Rn and ê1 , . . . , êm of Rm :
       
1 0 1 0
 0   ..   0   .. 
       
e1 =  .  , . . . , en =  .  ; ê1 =  .  , . . . , êm =  .  .
 ..   0   ..   0 
0 1 0 1

Note that the column vectors ej ∈ Rn have n entries whereas êi ∈ Rm have m entries. Since
the entries of a column vector x ∈ Rn coincide with its components w.r.t. the basis e1 , . . . , en ,
x coincides with the column vector X introduced in (2.28); analogously, y = L(x) = Y . Hence,
when working with column vectors and the canonical bases, it follows that y = L(x) = Ax. The
columns of the matrix A of L are just the vectors L(e1 ), . . . , L(en ).

Example 2.26
(a) Consider the rotation L : E2 → E2 of Example 2.22 and choose a positively oriented
orthonormal basis ~e1 , ~e2 in V = W = E2 . (An orthonormal basis in E2 consists of two
orthogonal unit vectors. Any basis ~v1 , ~v2 in E2 is positively oriented if the vector ~v2 follows
~v1 counterclockwise.) It is geometrically evident that the rotated basis vectors are given
by

L(~e1 ) = cos φ~e1 + sin φ~e2


L(~e2 ) = − sin φ~e1 + cos φ~e2 .

Therefore, according to (2.25) and (2.26),


µ ¶
cos φ − sin φ
A= (2.32)
sin φ cos φ

is the matrix of the rotation w.r.t. the basis ~e1 , ~e2 . Consequently, for an arbitrary vector
~x = x1~e1 + x2~e2 , the components of the rotated vector ~y = L(~x) can be calculated by Eq.
(2.30), yielding

y1 = x1 cos φ − x2 sin φ
y2 = x1 sin φ + x2 cos φ.

Note that all vectors are referred to the same basis. W.r.t. a basis that is not orthonormal
and positively oriented, the matrix of a rotation is different from (2.32).

(b) We are going to calculate the matrix of the projection map of Example 2.24, part (b),
w.r.t. an orthonormal basis ~e1 , ~e2 , ~e3 in E3 . (An orthonormal basis in E3 consists of three
orthogonal unit vectors.) From the decompositions

~x = x1~e1 + x2~e2 + x3~e3


~u = u1~e1 + u2~e2 + u3~e3
p~ = p1~e1 + p2~e2 + p3~e3

44
and
p~ = L(~x) = (~u · ~x)~u
it follows that

p1 = (~u · ~x)u1 = (u1 x1 + u2 x2 + u3 x3 )u1 = u21 x1 + u1 u2 x2 + u1 u3 x3

and analogously

p2 = u2 u1 x1 + u22 x2 + u2 u3 x3
p3 = u3 u1 x1 + u3 u2 x2 + u23 x3 .

In matrix denotation we can write


    
p1 u21 u1 u2 u1 u3 x1
 p2  =  u2 u1 u22 u2 u3   x2 
p3 u3 u1 u3 u2 u23 x3

or, briefly, P = AX where


 
u21 u1 u2 u1 u3
A =  u2 u1 u22 u2 u3  . (2.33)
u3 u1 u3 u2 u23

Note that the matrix A of the projection map L is symmetric and that the entries are
products of the components of the fixed unit vector ~u. Since the components of ~u depend on
the chosen orthonormal basis, A depends on that. For a basis that is not orthonormal, the
matrix A looks different from (2.33), i.e., the entries of A are no longer simply aij = ui uj .

(c) The linear map L : R3 → R2 of Example 2.24, part (a), can directly be written in matrix
form:  
µ ¶ µ ¶ x
3x1 + 2x2 − 4x3 3 2 −4  1 
L(x) = = x2 .
x1 − x2 + 2x3 1 −1 2
x3
³ ´
3 2 −4
According to the remark preceding these examples, the matrix A := 1 −1 2 is just
the matrix of L w.r.t. the canonical bases of R3 and R2 . This corresponds to the fact that
L(e1 ), L(e2 ), and L(e3 ) are the columns of A.

(d) Let L : R2 → R2 be the linear map defined by


µ ¶ µ ¶µ ¶
x1 + 2x2 1 2 x1
L(x) := = = Ax. (2.34)
3x1 + 4x2 3 4 x2

Clearly, A is the matrix 2 0


³ of ´ L w.r.t.
³ the´canonical basis of R . What is the matrix A of L
1 −1
w.r.t. the basis v1 = 1 , v2 = 1 ?

We have to look at the equation


Y 0 = A0 X 0
³ ´
x
where the entries of the column vectors X 0 and Y 0 are the components of x = x12 =
³ ´
y
ξ1 v1 + ξ2 v2 and y = L(x) = y12 = η1 v1 + η2 v2 w.r.t. the basis v1 , v2 . According to
Example 2.20, part (c), we have

x1 = ξ1 − ξ2
(2.35)
x2 = ξ1 + ξ2 ,

45
1
ξ1 = (x1 + x2 )
2
1
ξ2 = (x2 − x1 ),
2
as well as
1
η1 = (y1 + y2 )
2 (2.36)
1
η2 = (y2 − y1 ).
2
Using (2.36) and
y1 = x1 + 2x2
y2 = 3x1 + 4x2
which is implied by Eq. (2.34), it follows that
µ ¶ Ã 1 ! Ã
1
!
η1 (y 1 + y 2 ) 2 ((x1 + 2x2 ) + (3x1 + 4x2 ))
Y0 = = 2
1 = 1 ,
η2 2 (y2 − y1 ) 2 ((3x1 + 4x2 ) − (x1 + 2x2 ))

i.e., µ ¶
0 2x1 + 3x2
Y = .
x1 + x2
Replacing x1 and x2 by ξ1 and ξ2 according to (2.35), we obtain
µ ¶ µ ¶ µ ¶µ ¶
0 2(ξ1 − ξ2 ) + 3(ξ1 + ξ2 ) 5ξ1 + ξ2 5 1 ξ1
Y = = = ,
(ξ1 − ξ2 ) + (ξ1 + ξ2 ) 2ξ1 2 0 ξ2

i.e., Y 0 = A0 X 0 where µ ¶
0 5 1
A = .
2 0

(e) Let P2 (R) be the set of all polynomials of degree 3 or less. A polynomial p ∈ P2 (R) is of
the form
p(x) = a0 + a1 x + a2 x2 (2.37)
where x ∈ R. The sum of two such polynomials and the product of such a polynomial
and a number are a polynomials of the same type. Since, moreover, the addition and
the scalar multiplication in P2 (R) satisfy the vector-space axioms, P2 (R) is a vector space
(and in fact a subspace of the vector space P(R) of all polynomials, cf. Example 2.6,
part (d), Example 2.13, part (d), and Example 2.15, part (b)). Using the monomials
q0 , q1 , q2 defined by q0 (x) = 1, q1 (x) = x, and q2 (x) = x2 , it follows from (2.37) that
p(x) = a0 q(x) + a1 q1 (x) + a2 q2 (x) or briefly

p = a0 q0 + a1 q1 + a2 q2 . (2.38)

Hence, q0 , q1 , q2 is a basis of P2 (R) and the coefficients a0 , a1 , a2 of the polynomial p are


the components of p, considered as a vector, w.r.t. this basis. In particular, P2 (R) is a
three-dimensional vector space.
Now we use the differentiation of polynomials to define a map that to each polynomial
p ∈ P2 (R) assigns its derivative p0 :

d
: P2 (R) → P2 (R)
dx
d
p 7→ p0 = p.
dx

46
d d
Here we understand the symbol dx as a denotation for the mapping p 7→ p0 . Since dx
is obviously a linear map, it can be represented by a matrix D w.r.t. the basis q0 , q1 , q2 .
From (2.37) it follows that p0 (x) = a1 + 2a2 x, i.e.,

p0 = a1 q0 + 2a2 q1 . (2.39)

According toµ(2.38)
¶ and (2.39),
µ we associate
¶ the polynomials p and p0 with the column
a0 a1
vectors P = a1 and P 0 = 2a2 . From P 0 = DP and the obvious equality
a2 0
    
a1 0 1 0 a0
 2a2  =  0 0 2   a1 
0 0 0 0 a2
d
we conclude that the matrix of dx w.r.t. the given basis is
 
0 1 0
D =  0 0 2 .
0 0 0

The composition of linear maps is closely related to the multiplication of matrices. We begin
this dicussion with the definition of the latter.

Definition 2.27 Let A be an m × n matrix and B an n × p matrix. The product C = AB is


the m × p matrix with the entries
n
X
cik = aij bjk , i = 1, . . . , m, k = 1, . . . , p. (2.40)
j=1

Note that the matrix product AB is only defined when the number of columns of A coincides
with the number of rows of B. The entry cik of C is some kind of scalar product of the i-th row
of A and the k-th column of B.
     
.. b ..
 .   1k
 . 
.. 
C = AB =  a
 i1 . . . a 
in   · · · . ···  = 
 . . . c 
ik . . . 
.. bnk .
..
.
m×n n×p m×p

Notice also that, since a column vector X ∈ Rn is an n × 1 matrix, the definition of the product
of an m × n matrix A by X ∈ Rn is a particular case of Definition 2.27. That is, Eq. (2.29) is a
particular case of (2.40).

Example 2.28 We calculate the product of a 3 × 3 matrix A and a 3 × 2 matrix B, the result
being a 3 × 2 matrix:
    
1 0 2 1 1 5 8
AB =  3 −1 0   −1 2  =  4 1  .
2 5 3 2 3 2 21

Theorem 2.29 Let L : V → W and K : W → U be linear maps. Then the composite map
K ◦ L : V → U is linear and, if L is represented by the m × n matrix B and K by the l × m
matrix A, K ◦ L is represented by the l × n matrix C = AB.

47
Proof: The linearity of L and K implies the linearity of K ◦ L according to

(K ◦ L)(x + y) = K(L(x + y)) = K(L(x) + L(y)) = K(L(x)) + K(L(y)) = (K ◦ L)(x) + (K ◦ L)(y)

and
(K ◦ L)(λx) = K(L(λx)) = K(λL(x)) = λK(L(x)) = λ(K ◦ L)(x)
where x, y ∈ V and λ ∈ R.
Now let v1 , . . . , vn ∈ V, w1 , . . . , wm ∈ W, and u1 , . . . , ul ∈ U be bases of the respective vector
spaces, let x be any vector of V, and consider the following decompositions:
n
X
x = ξk vk
k=1
m
X
y := L(x) = ηj wj
j=1
l
X
z := (K ◦ L)(x) = K(L(x)) = K(y) = ζi ui .
i=1

If, w.r.t. the given bases, the linear maps K, L, and K ◦ L are represented by the matrices A,
B, and C, then, according to Theorem 2.25,
n
X
ηj = bjk ξk , j = 1, . . . , m (2.41)
k=1
Xm
ζi = aij ηj , i = 1, . . . , l (2.42)
j=1
Xn
ζi = cik ξk , i = 1, . . . , l. (2.43)
k=1

Inserting (2.41) into (2.42), we obtain


 
m
X n
X n
X Xm
ζi = aij bjk ξk =  aij bjk  ξk . (2.44)
j=1 k=1 k=1 j=1

Since x ∈ V is arbitrary, its components ξk can take all real values. Therefore, the comparison
of Eqs. (2.43) and (2.44) yields
Xm
cik = aij bjk ,
j=1

i.e., C = AB. 2

The first part of the following concluding remark is addressed to readers with strong interest
in mathematics whereas the second part is important for everyone.

Remark 2.30

(a) Consider the set of all linear maps between the same two vector spaces V and W:

L(V, W) := {L : V → W | L linear}.

The sum of two linear maps K : V → W and L : V → W is defined according to

(K + L)(x) := K(x) + L(x)

48
and the product of L by a real number is given by

(λK)(x) := λK(x).

It is easy to see that the maps K + L : V → W and λL : V → W are again linear; that
is, if K, L ∈ L(V, W) and λ ∈ R, then K + L ∈ L(V, W) and λL ∈ L(V, W). Moreover,
one easily verifies the validity of the vector-space axioms for L(V, W); hence, L(V, W) is
a vector space of linear maps.
Assuming that V and W are finite-dimensional spaces and choosing fixed bases in V and
W, each linear map L ∈ L(V, W) corresponds to an m × n matrix B; denote this one-one
correspondence by L ↔ B. The set of all m × n matrices is also a vector space, denoted
by Mmn . From K ↔ A, L ↔ B, and λ ∈ R it follows that K + L ↔ A + B and λL ↔ λB.
That is, the vector spaces L(V, W) and Mmn have completely the same structure, they
are isomorphic. In particular, they have the same dimension; consequently, L(V, W) is an
mn-dimensional vector space.

(b) For the multiplication of matrices, the following rules hold (Mmn denoting the vector
space of the real m × n matrices):

(i) for A ∈ Mmn and B ∈ Mnm , AB 6= BA in general (i.e., the matrix multiplication is
not commutative)
(ii) for all A ∈ Mmn , all B ∈ Mnp , and all C ∈ Mpq , (AB)C = A(BC) =: ABC
(associative law)
(iii) for all A, B ∈ Mmn and all C ∈ Mnp , (A + B)C = AC + BC (first distributive law)
(iv) for all A ∈ Mmn and all B, C ∈ Mnp , A(B + C) = AB + AC (second distributive
law)
(v) for all numbers λ ∈ R, all A ∈ Mmn , and all B ∈ Mnp , (λA)B = λ(AB) = A(λB) =:
λAB (mixed associative law)
(vi) for all A ∈ Mmn and all B ∈ Mnp , (AB)T = B T AT (the superscript T denoting
transposition).

The proof of the rules of part (b) of the remark is left to the reader as an exercise which the
concerned reader, meanwhile equipped with mathematical experience, will find quite simple.

2.5 Kernel, Image, and Rank


The next concepts are useful to investigate the properties of linear maps. They have applications
in the context of linear equations and the invertibility of matrices, the latter being defined in
this section later.

Definition 2.31

(a) Let L : V → W be a linear map. The kernel and the image of L are defined by

Ker L := {x ∈ V | L(x) = 0}
Im L := {y ∈ W | y = L(x) for some x ∈ V}.

The image of L is the same as its range, Im L = RL .

(b) The rank of an m × n matrix A, briefly rank A, is the maximal number of the linearly
independent columns of A (where the columns are considered as vectors of Rm , resp., as
m × 1 matrices).

49
Remember that a (not necessarily linear) map L : V → W is called injective (or one-one)
if L(x1 ) = L(x2 ) implies x1 = x2 (i.e., x1 6= x2 implies L(x1 ) 6= L(x2 )). The map L is called
surjective (or a map onto W) if, for each y ∈ W, y = L(x) for some x ∈ V. A map that is both
injective and surjective is bijective.

Theorem 2.32 Let L : V → W be a linear map.

(a) The set Ker L ⊆ V is a subspace of V.

(b) The set Im L ⊆ W is a subspace of W.

(c) The linear map L is injective if and only if Ker L = {0}.

(d) The map L is surjective if and only if Im L = W.

(e) If the vector spaces V and W are finite-dimensional and the linear map L is represented
by a matrix A w.r.t. some bases, then

rank A = dim Im L.

In particular, different matrices representing the same linear map w.r.t. different bases,
have the same rank.

Proof: Since L(0) = L(x − x) = L(x) − L(x) = 0, 0 ∈ Ker L, and Ker L is not empty.
If u, v ∈ Ker L, then L(u) = L(v) = 0 and consequently L(u + v) = L(u) + L(v) = 0, so
u + v ∈ Ker L. Similarly, λ ∈ R and u ∈ Ker L implies L(λu) = λL(u) = 0, i.e., λu ∈ Ker L.
Hence, Ker L is a subspace of V.
From L(0) = 0 we further conclude that 0 ∈ Im L, so Im L 6= ∅. If y, z ∈ Im L, then y = L(u)
and z = L(v) for some u, v ∈ V; consequently, y + z = L(u) + L(v) = L(u + v), so y + z ∈ Im L.
Similarly, λ ∈ R and y ∈ Im L implies λy = λL(u) = L(λu), i.e., λy ∈ Im L. Hence, Im L is a
subspace of W.
To show statement (c), suppose L is injective. The equation L(x) = 0 has the solution x = 0,
and this is, since L is injective, the only solution. Hence, Ker L = {0}. Conversely, suppose
Ker L = {0}. The equality L(u) = L(v) implies L(u − v) = 0 and consequently, because of the
supposition, u − v = 0, i.e., u = v. Hence, L is injective.
Statement (d) is just a reformulation of the definition of a surjective map. In fact, L is
surjective if and only if every y ∈ W is of the form y = L(x), x ∈ V. Equivalently, W = Im L.
To prove statement (e), let v1 , . . . , vn be a basis of V and w1 , . . . , wm a basis of W. The
matrix A of L w.r.t. these bases is defined according to
m
X
L(vj ) = aij wi , j = 1, . . . , n
i=1

(cf. Eqs. (2.25) and (2.26)). Consider the equation


n
X
λj L(vj ) = 0 (2.45)
j=1

where λ1 , . . . , λn ∈ R. Because of
 
n
X n
X m
X m
X Xn
λj L(vj ) = λj aij wi =  aij λj  wi ,
j=1 j=1 i=1 i=1 j=1

50
Pn
(2.45) is equivalent to j=1 λj aij = 0 for all i = 1, . . . , m. The latter statement means
 
n a1j
X  
λj  ...  = 0. (2.46)
j=1 amj
Therefore, Eq. (2.45) can be satisfied only for λ1 = . . . = λn = 0 if and only if Eq. (2.46) can be
satisfied only for λ1 = . . . = λn = 0, and the vectors L(v1 ), . . . , L(vn ) are linearly independent
if and only if the columns of the matrix A are linearly independent. Moreover, by the same
argumentation, any subsystem of p ≤ n vectors of L(v1 ), . . . , L(vn ) is linearly independent (resp.,
dependent) if and only if the corresponding columns are linearly independent (resp., dependent).
Hence,
rank A := maximal number of linearly independent columns of A
(2.47)
= maximal number of linearly independent vectors of L(v1 ), . . . , L(vn ).
Now observe that every vector y ∈ Im L is a linear combination
³P of
´ the vectors L(vj ); namely,
Pn n
from y = L(x) and x = j=1 ξj vj it follows that y = L j=1 ξj vj , i.e.,

n
X
y= ξj L(vj ). (2.48)
j=1

If rank A = n, the vectors L(v1 ), . . . , L(vn ) are, according to (2.47), linearly independent, and
the coefficients in (2.48) are uniquely determined. Hence, the vectors L(v1 ), . . . , L(vn ) are a basis
of Im L, and consequently dim Im L = n = rank A. If r := rank A < n, then, again according to
(2.47), the system L(v1 ), . . . , L(vn ) contains r linearly independent vectors whereas any r + 1
vectors of L(v1 ), . . . , L(vn ) are linearly dependent. Without loss of generality, assume that just
the first r vectors L(v1 ), . . . , L(vr ) are linearly independent. The equation
r
X
λj L(vj ) + λk L(vk ) = 0 (2.49)
j=1

where k = r + 1, . . . , n can be satisfied for a nontrivial choice of the coefficients λ1 , . . . , λr , λk ,


and the same argumentation as used in the context of Eq. (2.22) shows that in particular λk 6= 0.
In consequence, we can solve Eq. (2.49) for vk , obtaining that
r
X
L(vk ) = αkj L(vj ) (2.50)
j=1

with some coefficients αkj , k = r + 1, . . . , n. From Eqs. (2.48) and (2.50) it follows that
n
X r
X n
X r
X n
X
y = ξj L(vj ) = ξj L(vj ) + ξj L(vj ) = ξj L(vj ) + ξk L(vk )
j=1 j=1 j=r+1 j=1 k=r+1
Xr Xn r
X
= ξj L(vj ) + ξk αkj L(vj )
j=1 k=r+1 j=1
r r
à n
!
X X X
= ξj L(vj ) + αkj ξk L(vj )
j=1 j=1 k=r+1
r
à n
!
X X
= ξj + αkj ξk L(vj ).
j=1 k=r+1

That is, every y ∈ Im L is a linear combination of the r linearly independent vectors L(v1 ), . . . , L(vr ).
Hence, these vectors are a basis of Im L and dim Im L = r = rank A. 2

51
Example 2.33 We determine the kernel and the image of the linear map L : R2 → R2 given by
µ ¶ µ ¶µ ¶
x1 + 2x2 1 2 x1
L(x) := = = Ax.
3x1 + 6x2 3 6 x2

The equation L(x) = 0 is equivalent to the homogeneous system

x1 + 2x2 = 0
3x1 + 6x2 = 0
³ ´
−2
with the solutions x = λ 1 ; hence,
½ ¯ µ ¶ ¾
¯ −2
Ker L = x ∈ R2 ¯x = λ , λ∈R .
¯ 1

The vectors L(x) can be written according to


µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ µ ¶
x1 + 2x2 1 2 1 1 1
L(x) = = x1 + x2 = x1 + 2x2 (x1 + 2x2 )
3x1 + 6x2 3 6 3 3 3

where the real number x1 + 2x2 can take any value; hence,
½ ¯ µ ¶ ¾
¯
2 ¯ 1
Im L = y ∈ R ¯ y = µ , µ∈R .
3

The linear map L is neither injective nor surjective, and rank A = 1 = dim Im L. Moreover,

dim Ker L + dim Im L = 1 + 1 = 2 = dim R2 (2.51)

where R2 is in particular the vector space on which L is defined.

The result (2.51) is a particular case of a general result which is in fact the central statement
on linear maps between finite-dimensional vector spaces.

Theorem 2.34 Let L : V → W be a linear map and let dim V = n (and dim W = m). Then

dim Ker L + dim Im L = dim V = n.

Proof: If Im L = {0} (which is of course a trivial case), then L(x) = 0 for all x ∈ V and

Ker L = V, dim Ker L + dim Im L = n + 0 = n = dim V.

Assume Im L 6= {0}. Consequently, r := dim Im L ≥ 1. Choose a basis w1 , . . . , wr of Im L and


write
w1 = L(v1 ), . . . , wr = L(vr ) (2.52)
with some vectors v1 , . . . , vr ∈ V. Furthermore, choose a basis u1 , . . . , us of Ker L (provided that
Ker 6= {0}) and consider the system of the vectors

u1 , . . . , us , v1 , . . . , vr (2.53)

(if Ker L = {0}, consider only v1 , . . . , vr ). We show now that the system (2.53) has two particular
properties.
First, the vectors u1 , . . . , us , v1 , . . . , vr are linearly independent. Namely, let

λ1 u1 + . . . + λs us + µ1 v1 + . . . + µr vr = 0. (2.54)

52
Applying the linear map L to both sides of this equation, we obtain

λ1 L(u1 ) + . . . + λs L(us ) + µ1 L(v1 ) + . . . + µr L(vr ) = 0. (2.55)

Since u1 , . . . , us ∈ Ker L, L(u1 ) = . . . = L(us ) = 0. Because of (2.52), it follows from (2.55) that

µ1 w1 + . . . + µr wr = 0.

This implies µ1 = . . . = µr = 0 since the vectors w1 , . . . , wr are linearly independent because


they are a basis of Im L. In consequence, Eq. (2.54) reduces to

λ1 u1 + . . . + λr us = 0.

This implies λ1 = . . . = λs = 0 because of the basis property of u1 , . . . , us . Hence, Eq. (2.54)


can be satisfied only if λ1 = . . . = λs = µ1 = . . . = µr = 0, and the system (2.53) is linearly
independent.
Second, the vectors u1 , . . . , us , v1 , . . . , vr generate the vector space V, i.e., every vector x ∈ V
can be written as a linear combination of them. In fact, since w1 , . . . , wr is a basis of Im L, it
follows that
L(x) = η1 w1 + . . . + ηr wr
where η1 , . . . , ηr ∈ R; further, taking account of (2.52) and using the linearity of L,

L(x) = η1 L(v1 ) + . . . + ηr L(vr ) = L(η1 v1 + . . . + ηr vr ).

Again by linearity, we obtain

L(x − η1 v1 − . . . − ηr vr ) = 0;

in consequence, x − η1 v1 − . . . − ηr vr ∈ Ker L and

x − η1 v1 − . . . − ηr vr = ξ1 u1 + . . . + ξs us ,

ξ1 , . . . , ξs ∈ R. Hence,
x = ξ1 u1 + . . . + ξs us + η1 v1 + . . . + ηr vr ,
that is, every vector x ∈ V is a linear combination of the system (2.53).
Because the system (2.53) is linearly independent, every vector x ∈ V is a unique linear
combination of the system (2.53). Hence, the vectors u1 , . . . , us , v1 , . . . , vr constitute a basis of
V. Since every basis of V consists of n vectors, we conclude that

dim Ker L + dim Im L = s + r = n = dim V.

Remark 2.35

(a) In the proof of the theorem it has not been used that W is a finite-dimensional vector
space. In fact, the statement of Theorem 2.34 holds for all linear maps L : V → W where
V is of dimension n < ∞, but W can be infinite-dimensional. However, the range of L is
finite-dimensional, namely, dim Im L = n − dim Ker L.

(b) Without proof, we mention the following interesting (and, at first sight, surprising) result.
If A is an m × n matrix, then

rank A := maximal number of linearly independent columns of A


= maximal number of linearly independent rows of A.

53
(The rows can be considered as 1 × n matrices which form an n-dimensional vector space,
so the linear independence of the rows of A is defined.) If the m × n matrix A represents
a linear map L : V → W, then

rank A := column rank A = row rank = dim Im L

(where the definition of the column rank, resp., the row rank is obvious).

We draw some simple, but important conclusions from Theorem 2.34.

Conclusion 2.36 Let L : V → W be a linear map, dim V = n, and dim W = m. Then

(a) if m < n, L cannot be injective

(b) if m > n, L cannot be surjective

(c) if m = n, L is injective if and only if L is surjective (i.e., if m = n, an injective or


surjective linear map is automatically bijective)

(d) if m 6= n, L cannot be bijective.

Proof: Let m < n. From dim Im L ≤ m < n and Theorem 2.34 it follows that dim Ker L ≥ 1.
Hence, Ker L 6= {0}, and, according to statement (c) of Theorem 2.32, L is not injective.
Let m > n. From Theorem 2.34 it follows that dim Im L ≤ n < m, and dim Im L < m =
dim W implies that Im L is a proper subspace of W, i.e., Im L ⊆ W and Im L 6= W (briefly,
Im L ⊂ W). Hence, according to statement (d) of Theorem 2.32, L is not surjective.
Let m = n and L be injective. The latter is equivalent to Ker L = {0}. That is, dim Ker L =
0 or (again by Theorem 2.34), equivalently, dim Im L = n = m = dim W. The statement
dim Im L = dim W is equivalent to Im L = W which means that L is surjective and consequently
even bijective.
If m 6= n, then, according to parts (a) and (b) of the conclusion, L is not injective or not
surjective and consequently not bijective. 2

The inverse of a bijective linear map is related to the inverse of a matrix. We begin this
discussion with the definition of the unit matrix and the inverse of a matrix.

Definition 2.37 The n × n unit matrix is defined by


 
1 0 ... 0
 0 1 ... 0 
 
In :=  . . . .. ,
 .. .. . . . 
0 0 ... 1

also simply denoted by I. The entries of the unit matrix are denoted by the Kronecker symbol
δij , i.e., ½
1 if i = j
δij =
0 if i 6= j.

For any n × m matrix A and any m × n matrix B we have that

In A = A
BIn = B.

54
Namely,
n
X
(In A)ik = δij ajk = aik
j=1
n
X
(BIn )ik = bij δjk = bik ,
j=1

(In A)ik and (BIn )ik denoting the entries of the respective product matrices.
Next let A be a quadratic n × n matrix and assume there exists an n × n matrix B such that

BA = AB = In . (2.56)

As a preparation for the following definition, we show that B is uniquely determined. Namely,
given A, let C be a second n × n matrix satisfying (2.56). From

BA = AB = In
CA = AC = In

it follows that
C = CIn = C(AB) = (CA)B = In B = B,
i.e., C = B.

Definition 2.38 An n × n matrix is called invertible if there exists an n × n matrix B such that

BA = AB = In .

The uniquely determined matrix B is called the inverse of A, briefly, B =: A−1 . Thus, A−1 A =
AA−1 = In .
³ ´
1 0
We remark that an n × n matrix need not have an inverse, e.g., A = 0 0 . If A had an
³ ´
b b12
inverse B = b1121 b22 , the equation AB = I, i.e., the equation
µ ¶µ ¶ µ ¶
1 0 b11 b12 1 0
= ,
0 0 b21 b22 0 1

would imply µ ¶ µ ¶
b11 b12 1 0
=
0 0 0 1
which is a contradiction. Hence, the inverse of A does not exist.
The theorem now states the relation between invertible linear maps and invertible matrices.

Theorem 2.39 Let L : V → W be linear, dim V = dim W = n, and let A be a corresponding


n × n matrix. The following statements are then equivalent:

(i) A is invertible

(ii) L is bijective

(iii) Ker L = {0}

(iv) the homogeneous linear system AX = 0, X ∈ Rn , has only the trivial solution X = 0

(v) rank A = n.

55
Moreover, the inverse matrix A−1 corresponds to the inverse map L−1 : W → V which is also
linear.

Proof: Assume L is bijective. First of all, we show that L−1 is also linear. Let y, z ∈ W.
Since L is bijective, there exists uniquely determined vectors x, u ∈ V such that y = L(x) and
z = L(u). In consequence, y + z = L(x + u) and

L−1 (y + z) = x + u = L−1 (y) + L−1 (z). (2.57)

Now let y ∈ W and λ ∈ R. Then y = L(x), λy = L(λx), and consequently

L−1 (λy) = λx = λL−1 (y). (2.58)

Hence, by (2.57) and (2.58), L−1 is linear.


Let A be the matrix of L and B the matrix of L−1 w.r.t. a basis in V and a basis in W.
According to Theorems 2.25 and 2.29, the equations

L−1 (L(x)) = x
(2.59)
L(L−1 (y)) = y,
resp.,

(L−1 ◦ L)(x) = x
(L ◦ L−1 )(y) = y

read in matrix representation

BAX = X
ABY = Y

where the column vectors X, Y ∈ Rn represent x ∈ V and y ∈ W. The last two equations can
be written as
BAX = In X
(2.60)
ABY = In Y.
Since Eqs. (2.59) hold for all x ∈ V and all y ∈ W, Eqs. (2.60) hold for all X, Y ∈ Rn ;
consequently, BA = In as well as AB = In . Hence, the inverse A−1 exists, and A−1 = B
corresponds to the linear map L−1 .
Now assume the matrix A is invertible. Let y ∈ W be arbitrary and consider the equation

y = L(x) (2.61)

which is equivalent to Y = AX. For every Y ∈ Rn , the latter equation is uniquely solved by
X = A−1 Y . Therefore, (2.61) has always a unique solution x ∈ V, and the linear map L is
bijective.
It remains to show the equivalence of the statements (ii)–(v). Because of dim V = dim W
and Conclusion 2.36, part (c), the linear map L is bijective if and only if it is injective; the
latter is, by Theorem 2.32, part (c), equivalent to Ker L = {0}. Statement (iii) means that the
equation L(x) = 0 has only the trivial solution x = 0, i.e., AX = 0 has only the trivial solution
X = 0. Finally, Ker L = {0} if and only if dim Ker L = 0, that is, according to Theorem 6.32,
part (e) and Theorem 2.34, rank A = dim Im L = n − dim Ker L = n. 2

The first part of the following remark completes the rules for calculations with matrices
whereas the second part is again addressed to readers with strong interests in mathematics.

Remark 2.40

56
(a) We supplement the rules (i)–(vi) of part (b) of Remark 2.30 by two further rules:

(vii) for any two invertible matrices A, B ∈ Mnn , (AB)−1 = B −1 A−1


(viii) for any invertible matrix A ∈ Mnn , (A−1 )−1 = A.

In fact, from

(B −1 A−1 )(AB) = B −1 (A−1 (AB)) = B −1 ((A−1 A)B) = B −1 (In B) = In

and
(AB)(B −1 A−1 ) = In
it follows that (AB)−1 exists and (AB)−1 = B −1 A−1 . From

AA−1 = A−1 A = In

it is obvious that (A−1 )−1 exists and (A−1 )−1 = A.

(b) In Definition 2.38 the two conditions BA = In and AB = In have been used to define the
invertibility of the n × n matrix A. In fact, one of the two conditions is sufficient, i.e., one
implies the other one. For instance, if, for an n × n matrix A, there is an n × n matrix B
satisfying
BA = In , (2.62)
then B = A−1 . To prove this statement, define a linear map K : Rn → Rn according
to K(x) := Ax and a linear map L : Rn → Rn according to L(x) := Bx. Eq. (2.62) is
equivalent to BAx = x for all x ∈ Rn . The latter means (L ◦ K)(x) = x for all x, i.e.,

L(K(x)) = x (2.63)

for all x. From Eq. (2.63) it follows that the map K is injective. Namely, K(x1 ) = K(x2 )
implies L(K(x1 )) = L(K(x2 )) and, by (2.63), x1 = x2 . Hence, K is injective and, by part
(c) of Conclusion 2.36, even bijective. Eq. (2.63) then states that L = K −1 ; consequently,
the matrix A is invertible and B = A−1 .

Finally, we discuss how the inverse of a matrix can be calculated. The most convenient
method is a version of Gauss–Jordan elimination. According to part (b) of the preceding remark,
the inverse of an invertible n × n matrix A is uniquely determined by the matrix equation

AA−1 = In . (2.64)

Denoting the entries of A by aij and the entries of A−1 by xij , (2.64) is equivalent to the n2
linear equations
Xn
aij xjk = δik , (2.65)
j=1

i, k = 1, . . . , n. For k = 1, these equations read explicitly

a11 x11 + . . . + a1n xn1 = 1


a21 x11 + . . . + a2n xn1 = 0
..
.
an1 x11 + . . . + ann xn1 = 0

and constitute a system of n linear equations in the n unknowns x11 , . . . , xn1 . We can write this
system as the vector equation
AX1 = e1

57
where X1 is the first column of A−1 and e1 the first vector of the canonical basis of Rn . For
k = 2, . . . , n, we obtain analogous equations; in fact, the equations (2.65) are equivalent to the
n vector equations

AX1 = e1 , AX2 = e2 , ..., AXn = en (2.66)

involving the columns of A−1 and the canonical basis of Rn . The n systems (2.66) of the
respective n linear equations can be solved simultaneously by Gauss–Jordan elimination. The
corresponding augmented matrix is
 ¯ 
a11 . . . a1n ¯ 1 ... 0
¡ ¯ ¢ ¡ ¯ ¢  . ¯
.. ¯ .. . . ..  .
A ¯ e1 . . . en = A ¯ In =  .. . ¯ . . .  (2.67)
¯
an1 . . . ann ¯ 0 ... 1

Since (2.64)–(2.67) are equivalent,


the elimination procedure yields a unique result, namely
 ¯ 
1 ... 0 ¯ x11 . . . x1n
¯
 .. . . .. ¯ .. ..  = ¡ I ¯¯ A−1 ¢ ,
 . . . ¯ . .  n
¯
0 ... 1 ¯ xn1 . . . xnn
i.e.,  
x11 . . . x1n
 ..  .
A−1 =  ... . 
xn1 . . . xnn

Example 2.41 Determine the inverse of the matrix


 
1 2 3
A =  0 −3 −6  .
−1 2 0

Solving the homogeneous system Ax = 0 (e.g., by Gauss-Jordan elimination), one verifies that
x = 0 is the only solution. Hence, according to Theorem 2.39, A−1 exists. The corresponding
Gauss-Jordan elimination procedure yields
 ¯   ¯ 
1 2 3 ¯¯ 1 0 0 1 2 3 ¯¯ 1 0 0
 0 −3 −6 ¯ 0 1 0  ⇐⇒  0 −3 −6 ¯ 0 1 0  ⇐⇒
¯ ¯
−1 2 0 ¯ 0 0 1 0 4 3 ¯ 1 0 1
 ¯   ¯ 
1 2 3 ¯ 1 0 0 1 2 3 ¯¯ 1 0 0
¯
 0 1 2 ¯ 0 −1
¯ 1 3 0  ⇐⇒  0 1 2 ¯¯ 0 − 13 0  ⇐⇒
0 1 3 ¯ 0 1
0 0 − 45 ¯ 41 1 1
4 4 4 3 4
 ¯   ¯ 
1 2 3 ¯¯ 1 0 0 1 2 0 ¯¯ 85 4
5
3
5
 0 1 2 ¯ 0 − 31 0   ¯ 2 
¯ ⇐⇒  0 1 0 ¯ 25 1
5 5  ⇐⇒
¯ ¯ 1
0 0 1 − 5 − 15 − 15
1 4
0 0 1 ¯ − 5 − 15 − 15
4

 ¯ 
1 0 0 ¯¯ 54 2
5 − 15
 ¯ 
 0 1 0 ¯ 25 1
5
2
5 ,
¯
0 0 1 ¯ − 15 4
− 15 − 15
i.e.,  
4 2 −1
1
A−1 =  2 1 2 .
5 4
−1 − 3 −1

58
2.6 Systems of Linear Equations II
We now draw important conclusions for simultaneous systems of linear equations where we shall
essentially use the results of the preceding section. We write a system of m linear equations in
n unknowns,

a11 x1 + a12 x2 + . . . + a1n xn = b1


a21 x1 + a22 x2 + . . . + a2n xn = b2
..
.
am1 x1 + am2 x2 + . . . + amn xn = bm ,

briefly in matrix form, i.e.,


Ax = b, x ∈ Rn , b ∈ Rm (2.68)
where A is an m × n matrix. Introducing the linear map L : Rn → Rm , L(x) := Ax, (2.68) can
be rewritten as
L(x) = b.
The first step in our discussion of the solutions of a system of linear equations is the following
simple, but important theorem.

Theorem 2.42 The general solution of Ax = b is given by the sum of a fixed particular solution
x0 (if there is any) and any homogeneous solution xh , i.e.,

x = x0 + xh

where Ax0 = b and Axh = 0. (If there is no particular solution, then there is no solution of
Ax = b.)
Briefly, if Ax0 = b, x0 fixed, then

{x ∈ Rn | Ax = b} = {x ∈ Rn | x = x0 + xh , Axh = 0} =: x0 + Ker L.

Since L(x) = Ax, the kernel of L just consists of the solutions xh of the homogeneous system
Ax = 0 of linear equations. The denotation x0 + Ker L is a short writing for the set of all vectors
of the form x = x0 + x1 with x0 fixed and x1 ∈ Ker L, i.e., for the set of all vectors x = x0 + xh .

Proof of 2.42: Adding the two equations

Ax0 = b
Axh = 0,

it follows that A(x0 + xh ) = b, i.e., x = x0 + xh is a solution of the the inhomogeneous system


Ax = b. Conversely, let x and x0 be solutions of the inhomogeneous system. Subtracting

Ax = b
Ax0 = 0,

we obtain A(x − x0 ) = 0, i.e., xh := x − x0 is a solution of the homogeneous system. In conse-


quence, any solution of Ax = b is of the form x = x0 + xh . 2

By the definition of the rank of a matrix (Definition 2.31, part (b)), it is clear that r :=
rank A ≤ n. By part (e) of Theorem 2.32, we know that r = rank A = dim Im L. From
Im L ⊆ Rm it follows that r = rank A = dim Im L ≤ dim Rm = m; thus, r = rank A ≤ m (this
is also implied by part (b) of Remark 2.35). Furthermore, according to Theorem 2.34, we have
that dim Ker L + dim Im L = n. Hence, r ≤ n, m, r = n − dim Ker L, and dim Ker L = n − r.

59
Choosing a basis v1 , . . . , vn−r of Ker L, we can write every solution of Ax = b as

x = x0 + xh = x0 + t1 v1 + . . . + tn−r vn−r

where t1 , . . . , tn−r ∈ R. This representation of the solutions is similar to the parametric equation
of a straight line or a plane (cf. Eqs. (2.20) and (2.26)). Thus, we have proved the following
result.

Conclusion 2.43 The solutions of a system of linear equations of rank r in n unknowns form
an (n − r)-dimensional plane in Rn through x0 (provided that there are solutions). If r = n, this
plane is degenerated to a point.

We investigate the question when a system of linear equations does have a solution (this
means at least one). The system Ax = b has a solution if and only if there exists an x ∈ Rn
such that b = Ax = L(x), i.e., if and only if b ∈ Im L. If r < m, i.e., if dim Im L < m, then Im L
is a proper subset of Rm ; in this case it can happen that b 6∈ Im L and there is no solution. If
r = m, then dim Im L = m and Im L = Rm ; consequently, b ∈ Im L, and Ax = b has a solution.
Moreover, we have the following criterion.

Theorem 2.44 The system Ax = b has a solution if and only if

rank A = rank (A | b)

where  
a11 . . . x1n b1
 .. 
(A | b) =  ... ..
. . 
xn1 . . . xnn bn
is the augmented matrix of the system.

Proof: First assume that r = rank A = rank (A | b). Then there are r linearly independent
columns of A, say, Ci1 , . . . , Cir , but the r + 1 columns Ci1 , . . . , Cir , b are linearly dependent. So

λ1 Ci1 + . . . + λr Cir + λb = 0

where not all coefficients are zero, in particular, λ 6= 0. Therefore, we can solve the equation for
b:
λ1 λr
b = − Ci1 − . . . − Cir .
λ λ
Observing that Cij = Aeij where j = 1, . . . , r and eij is the ij -th canonical basis vector, we
obtain µ ¶
λ1 λr λ1 λr
b = − Aei1 − . . . − Aeir = A ei − . . . eir .
λ λ λ 1 λ
Hence, we have constructed a solution of Ax = b, namely, x0P:= λλ1 ei1 − . . . − λr
λ eir .
Conversely, assume that Ax = b has a solution, say x = ni=1 xi ei . Then
à n ! n n
X X X
b=A xi ei = xi Aei = xi Ci . (2.69)
i=1 i=1 i=1

Again, let Ci1 , . . . , Cir a system of r = rank A linearly independent columns. Since every larger
system Ci1 , . . . , Cir , Ck of columns of A is linearly dependent, every column Ck , k = 1, . . . , m,
k 6= i1 , . . . , ir , is a linear combination of Ci1 , . . . , Cir . According to (2.69), b can also be written
as a linear combination of Ci1 , . . . , Cir (in fact, these columns form a basis of Im L). Hence, the
system Ci1 , . . . , Cir , b is linearly dependent and consequently rank (A | b) = rank A. 2

60
We remark that either rank (A | b) = rank A or rank (A | b) = rank A + 1. We now give a
summarizing discussion of the different cases that can occur in the context of the solution of a
system of linear equations.
Case 1: The rank of A coincides with the number of rows, i.e., r = rank A = m. This
is possible only if n ≥ m because r ≤ m, n. Since the matrix (A | b) has also m rows, it
follows that rank (A | b) ≤ m. Therefore, m = rank A ≤ rank (A | b) ≤ m, which implies that
rank A = rank (A | b). Hence, according to the preceding theorem, Ax = b has a solution. This
result can also be concluded from dim Im L = rank A = m, as indicated above after Conclusion
2.43. Furthermore, the general solution of Ax = b has n − r parameters and is unique if r = n.
A matrix satisfying m = r = n is invertible, so the unique solution is

x = A−1 b. (2.70)

Case 2: The rank of A is smaller than the number of rows, i.e., r = rank A < m. Then two
subcases are possible. First, rank A = rank (A | b). Then Ax = b has a solution, and the general
solution involves n − r parameters. If r = n, the solution is unique, but cannot be represented
by (2.70) since n = r < m and A−1 is not defined. Moreover, because m > r = rank (A | b) and
r is also the maximal number of linearly independent rows of A (cf. Remark 2.35, part (b)),
m − r rows of the augmented matrix (A | b) can be represented by r linearly independent ones.
Since each row of the augmented matrix corresponds to an equation of Ax = b, the system of
the m linear equations is equivalent to a system of r linear equations and m − r equations are
unnecessary.—The second subcase is r = rank A < rank (A | b) = r + 1. In this case m − (r + 1)
equations are unnecessary; however, neither Ax = b nor the reduced system has a solution.

Example 2.45 Let A be a 7 × 4 matrix of maximally possible rank, i.e., r = 4, and consider
the system Ax = b, b ∈ R7 . If rank (A | b) = 4, then there is exactly one solution of the system,
and three equations are unnecessary. If rank (A | b) = 5, then the system of the seven equations
can be reduced to an equivalent system of five equations, but there is no solution.

Summarizing, if r = m, Ax = b has a solution. If r < m, Ax = b has a solution if and


only if r = rank (A | b). Whenever there exists a solution, the general solution contains n − r
parameters (and is unique if n = r).

2.7 Remarks on the Scalar Product


In the Euclidean vector space E (the symbol E means E3 or E2 , cf. Definition 1.1), we defined
the scalar product of two vectors by ~x · ~y := |~x||~y | cos φ (Definition 1.8). According to Theorem
1.10, the scalar product has the following properties:

(i) symmetry: ~x · ~y = ~y · ~x

(ii) linearity in the second argument: ~x · (~y + ~z) = ~x · ~y + ~x · ~z, ~x · (λ~y ) = λ(~x · ~y ); linearity in
the first argument: (~x + ~y ) · ~z = ~x · ~z + ~y · ~z, (λ~x) · ~y = λ(~x · ~y )

(iii) positive definiteness: ~x · ~x ≥ 0, ~x · ~x = 0 if and only if ~x = 0.

In abstract linear algebra, these properties are used to define a scalar product in a general vector
space V; we do not consider this general definition.
According to the discussion following Definition 1.4, three orthogonal unit vectors ~e1 , ~e2 , ~e3
constitute a basis of E3 ; then, by Theorem 2.18, three orthogonal unit vectors are linearly
dependent, and the dimension of E3 is three. Moreover, again by Theorem 2.18, any linearly
independent system ~v1 , ~v2 , ~v3 in E3 is a basis of E3 . More than three vectors of E3 are necessarily
linearly dependent, but does there exist an orthogonal system of more than three nonzero vectors

61
of E3 ? Obviously not; we can prove this as follows. Let ~v1 , . . . , ~vm be an orthogonal system of
nonzero vectors of E3 , i.e.,
~vi 6= 0, ~vi · ~vj = 0 (2.71)
where i, j = 1, . . . , m and i 6= j. The equation

λ1~v1 + . . . + λm~vm = ~0

implies that
~vi · (λ1~v1 + . . . + λm~vm ) = 0.
Using (2.71), it follows that λi~vi · ~vi = 0, i.e., λi = 0 for all i = 1, . . . , m. Hence, the system
~v1 , . . . , ~vm is linearly independent and thus m ≤ 3.

Definition 2.46 A system of vectors ~e1 , ~e2 , ~e3 ∈ E3 satisfying


½
0, i 6= j
~ei · ~ej = δij =
1, i = j
is called an orthonormal basis of (in) E3 .

The orthonormal bases of E3 constitute a distinguished class of bases in E3 and are used
for convenience. Let ~e1 , ~e2 , ~e3 be an orthonormal basis in E3 and let ~x be any vector of E3 .
Multiplying the equation
~x = x1~e1 + x2~e2 + x3~e3 .
in the sense of the scalar product by ~ei , we obtain ~ei · ~x = xi , i.e.,

xi = ~ei · ~x = |~x| cos αi (2.72)

where αi is the angle between ~x and ~ei . The scalar product of two vectors ~x, ~y ∈ E3 reads in
terms of components

~x · ~y = (x1~e1 + x2~e2 + x3~e3 ) · (y1~e1 + y2~e2 + y3~e3 ) = x1 y1 + x2 y2 + x3 y3 , (2.73)

the length of ~x is given by q



|~x| = ~x · ~x = x21 + x22 + x23 , (2.74)
and the distance d of two points with position vectors ~x and ~y is
p
d = |~x − ~y | = (x1 − y1 )2 + (x2 − y2 )2 + (x3 − y3 )2 . (2.75)

For a nonorthonormal basis ~v1 , ~v2 , ~v3 , formulas (2.72)–(2.75) become more complicated. The
analog of (2.73), for instance, is
à 3 !  3 
3 3
X X X X
~x · ~y = ξi~vi ·  ηj ~vj  = ξi ηj ~vi · ~vj = gij ξi ηj
i=1 j=1 i,j=1 i,j=1

where gij := ~v1 · ~vj .


It is clear how Definition 2.46 and the results presented for E3 read in the case E2 .—Next we
introduce a scalar product in Rn .

Definition 2.47 The scalar product in Rn is defined according to


n
X
x · y := x1 y1 + . . . + xn yn = xi yi = xT y
i=1
 
y1
where x, y ∈ Rn and xT y is the product of the matrices xT = (x1 . . . xn ) and y =  .
.. .
yn

62
It easy to show that the scalar product in Rn satisfies the same rules as the scalar product
in E3 ; in particular, it has the following properties:

(i) symmetry: x · y = y · x

(ii) bilinearity: x·(y +z) = x·y +x·z, x·(λy) = λ(x·y); (x+y)·z = x·z +y ·z, (λx)·y = λ(x·y)

(iii) positive definiteness: x · x ≥ 0, x · x = 0 if and only if x = 0.

The following definition is analogous to statements (2.74) and (2.75) as well as to Definition
2.46.

Definition 2.48 One defines

(i) the length (Euclidean norm) of x ∈ Rn by


√ q
|x| := x · x = x21 + . . . + x2n

(ii) the distance of two points x, y ∈ Rn by


√ p
d := |x − y| := x · x = (x1 − y1 )2 + . . . + (xn − yn )2

(iii) a system of vectors a1 , . . . , an ∈ Rn being an orthonormal basis of Rn if

ai · aj = δij .

The concepts of scalar product, length, and distance in R3 are closely related to their coun-
terparts in E3 . Let ~e1 , ~e2 , ~e3 and ~e10 , ~e20 , ~e30 be two orthonormal bases of E3 and let ~x be any
vector of E3 . According to

~x = x1~e1 + x2~e2 + x3~e3 = x01~e10 + x02~e20 + x3~e30


µ x1

the vector ~x can, w.r.t. the basis ~e1 , ~e2 , ~e3 , be represented by the column vector x := ∈ R3 x2
x3
µ x0 ¶
1
and, w.r.t. the other basis ~e10 , ~e20 , ~e30 , by the column vector x0 := x02 ∈ R3 . Note that
x03
~x 6= x 6= x0 6= ~x, but |~x| = |x| = |x0 |. Moreover, if

~y = y1~e1 + y2~e2 + y3~e3 = y10 ~e10 + y20 ~e20 + y3~e30


µ y ¶ µ y0 ¶
1 1
is a second vector of E3 and y := y 0
, y := y20 , then
2
y3 y30

~x · ~y = x1 y1 + x2 y2 + x3 y3 = x01 y10 + x02 y20 + x03 y30 = x · y = x0 · y 0 .

Example 2.49
³ ´ ³ ´
1 2
(a) The vectors v1 := √1 and v2 := √1
satisfy v1 ·v1 = 1, v2 ·v2 = 1, and v1 ·v2 = 0,
5 2 5 −1
³ ´
1
so they form an orthonormal basis of R2 . What are the components of x := 1 w.r.t.
v1 , v2 ?
From x = ξ1 v1 + ξ2 v2 it follows that
µ ¶ µ ¶ µ ¶
1 1 1 1 2
= ξ1 √ + ξ2 √ ,
1 5 2 5 −1

63
that is,
1
1 = √ (ξ1 + 2ξ2 )
5
1
1 = √ (2ξ1 − ξ2 ).
5

From these equations we obtain ξ1 = √35 and ξ2 = √15 . Using the fact that v1 , v2 is an
orthonormal basis, we can can find this result easier:
µ ¶ µ ¶
1 1 1 3
ξ1 = v1 · x = √ · =√
5 2 1 5
µ ¶ µ ¶
1 2 1 1
ξ2 = v2 · x = √ · =√ .
5 −1 1 5

(ii) The canonical basis e1 , . . . , en of Rn is an orthonormal basis.

A nontrivial vector space has infinitely many bases, and in general no basis is distinguished.
In a vector space in which a scalar product is defined (in our context, E2 , E3 , and Rn ), there is a
distinguished class of bases, namely, the orthonormal bases. In the Euclidean vector spaces E2
and E3 , there is, among the orthonormal bases, again no distinguished basis; however, the vector
space Rn has, due to its structure, a distinguished (orthonormal) basis, namely, its canonical
basis. If one chooses an orthonormal basis ~e1 , ~e2 , ~e3 in E3 and refers every vector ~x ∈ E3 to this
fixedµbasis,¶ ~x = x1~e1 + x2~e2 + x3~e3 , then ~x can be identified with its representative x ∈ R3 ,
x1
x= x2 ; that is, E3 and R3 can be considered as the same vector spaces. The identification
x3
of E3 and R3 (w.r.t. an orthonormal basis of E3 !) makes sense because

(i) for any two vectors ~x, ~y ∈ E3 with representatives x, y ∈ R3 , ~x + ~y corresponds to x + y,

(ii) for any vector ~x with representative x and any number λ ∈ R, λ~x corresponds to λx,

(iii) for any two vectors ~x and ~y with representatives x and y, ~x · ~y = x1 y1 + x2 y2 + x3 y3 = x · y.

In Chapter 1 on elementary vector algebra, we identified E3 and R3 . Moreover, if a fixed Cartesian


coordinate system (O; ~e1 , ~e2 , ~e3 ) (not necessarily right-handed) in the three-dimensional affine-
Euclidean space P3 of points is given, then every point P ∈ P3 can be identified with its position
-
vector ~x = OP . Since
µ ~x can
¶ be identified with x, the point P can finally be identified with the
x1
column vector x = x2 .
x3
Conversely, 3
µ x ¶ by means of a three-dimensional coordinate system, every column vector x ∈ R ,
1
x = x2 , can be interpreted as a vector ~x ∈ E3 with components x1 , x2 , x3 or as a point
x3
P ∈ P3 with coordinates x1 , x2 , x3 .—We emphasize that in general it neither makes sense to
identify the vector spaces E3 and R3 nor to identify E3 and P3 . We see this clearly by our
Exercises 2.26 and 2.28, for instance, and by physics: The laws of physics cannot depend on the
choice of the coordinate system, so a coordinate-free formulation of the laws is necessary. For
this reason vectorial physical quantities are described by vectors of E3 , and not of R3 . Linear
relations between vectorial physical quantities are linear transformations L : E3 → E3 in the sense
of linear algebra, in physics these are often called tensors (although in mathematics the concept
of tensor is somehow more general); w.r.t. a coordinate system, such a tensor is represented by
a 3 × 3 matrix, but one has to distinguish between a tensor and a matrix (a matrix is a trivial
concept, a tensor or a linear transformation is not).

64
2.8 Determinants
To motivate determinants, we consider the solution of the system

a11 x1 + a12 x2 = b1 (2.76)


a21 x1 + a22 x2 = b2 (2.77)

of two linear equations. Multiplying Eq. (2.76) by a22 and (2.77) by a12 , we obtain

a11 a22 x1 + a12 a22 x2 = a22 b1


a12 a21 x1 + a12 a22 x2 = a12 b2 .

The subtraction of the equations yields


a22 b1 − a12 b2
x1 = , (2.78)
a11 a22 − a12 a21
provided that the denominator is not zero. Similarly, we find
a11 b2 − a21 b1
x2 = ; (2.79)
a11 a22 − a12 a21
³ ´
x
the uniquely determined values of the unknowns give the unique solution x = x12 . If a11 a22 −
a12 a21 = 0, Eqs. (2.78) and (2.79) do not make sense. In fact, a11 a22 − a12 a21 = 0 implies that
a11 a21
a12 = a22 =: λ (provided that a12 and a22 are not zero), and the matrix A of the coefficients of
the system (2.76,2.77) reads
µ ¶ µ ¶
a11 a12 λa12 a12
A= = ;
a21 a22 λa22 a22

that is, the rank of A is one, and the system of the two linear equation has no solution or
infinitely many ones. If a12 = 0 or a22 = 0, then rank A = 1 also.
The number a11 a22 − a12 a21 obviously plays an important role; it is called the determinant
of the 2 × 2 matrix A (cf. Remark 1.17) and is written as
¯ ¯
¯ a11 a12 ¯
det A = ¯¯ ¯ := a11 a22 − a12 a21 . (2.80)
a12 a22 ¯

By means of this definition, formulas (2.78) and (2.79) can be rewritten according to
¯ ¯ ¯ ¯
¯ b1 a12 ¯ ¯ a11 b1 ¯
¯ ¯ ¯ ¯
¯ b2 a22 ¯ ¯ a21 b2 ¯
x1 = ¯ ¯, x1 = ¯ ¯. (2.81)
¯ a11 a12 ¯ ¯ a11 a12 ¯
¯ ¯ ¯ ¯
¯ a12 a22 ¯ ¯ a12 a22 ¯

We investigate the essential properties of 2 × 2 determinants. From Definition (2.80) it follows


that

(i)
µ ¶
a + ã b
det = (a + ã)d − b(c + c̃) = ad − bc + ãd − bc̃
c + c̃ d
µ ¶ µ ¶
a c ã b
= det + det
c d c̃ d

65
(ii)
µ ¶ µ ¶
λa b a b
det = λad − λbc = λ(ad − bc) = λ det
λc d c d

(iii)
µ ¶ µ ¶
a b b a
det = ad − bc = −(bc − ad) = − det
c d d c

(iv)
µ ¶
1 0
det I2 = det = 1.
0 1

Properties (i) and (ii) say that det A is linear in the first column, property (ii) says that det A
is alternating w.r.t. the columns, and property (iv) is a normalization property. However, det A
is not linear in A; in fact, in general det(A + B) 6= det A + det B and det λA = λ2 det A.
Requiring properties (i)–(iv) for general n × n determinants, we obtain, by means of the
following theorem, a very aesthetic definition of general n × n determinants.
 
R1
Theorem/Definition 2.50 To each n × n matrix A = (C1 , . . . , Cn ) =  .
.. , Ci denoting
Rn
the columns and Ri the rows, i = 1, . . . , n, one can assign a number det A uniquely such that

(i) det A is linear in the first column, i.e.,


e1 , C2 , . . . , Cn ) = det(C1 , C2 , . . . , Cn ) + det(C
det(C1 + C e1 , C2 , . . . , Cn )
det(λC1 , C2 , . . . , Cn ) = λ det(C1 , C2 , . . . , Cn )

(ii) the interchange of two columns of A changes the sign of det A, i.e.,

det(C1 , . . . , Ci , . . . , Cj , . . . , Cn ) = − det(C1 , . . . , Cj , . . . , Ci , . . . , Cn )

(iii) det In = 1.

The number ¯ ¯
¯ a11 . . . a1n ¯
¯ ¯
¯ ¯
det A = ¯ ... ..
. ¯
¯ ¯
¯ an1 . . . ann ¯
is called the determinant of A.

We do not prove this theorem and the statements on determinants we present now; however,
we indicate some reasons why the rules are as they are.

Statements, Properties, and Rules

1. It is not very hard to show that an association A 7→ det A satisfying the conditions (i)–(iii)
stated in the theorem, is necessarily given by
n
X
det A = ²(i1 , . . . , in ) ai1 1 ai2 2 . . . ain n (2.82)
i1 ,...,in =1

66
where

 0 if any two of the indices i1 , . . . , in are equal
²(i1 , . . . , in ) := 1 if i1 , . . . , in is an even permutation of 1, . . . , n

−1 if i1 , . . . , in is an odd permutation of 1, . . . , n.
Conversely, from (2.82) follow the conditions (i)–(iii) stated in the theorem, thus proving
the existence of an association A 7→ det A satisfying these conditions. Hence, formula
(2.82) is equivalent to the statement of Theorem 2.50 and is often used as the definition
of an n × n determinant.
An arrangement i1 , . . . , in of the numbers 1, . . . , n is called an even (odd) permutation of
1, . . . , n if i1 , . . . , in can be obtained from 1, . . . , n by interchanging two numbers an even
(odd) number of times. As an example, we calculate a 3 × 3 determinant. From (2.82)
and the table
i1 i2 i3 ²(i1 , i2 , i3 )
1 2 3 1
1 3 2 −1
2 1 3 −1
2 3 1 1
3 1 2 1
3 2 1 −1

it follows that
¯ ¯
¯ a11 a12 a13 ¯
¯ ¯
¯ a21 a22 a23 ¯=
¯ ¯
¯ a31 a32 a33 ¯
(2.83)
a11 a22 a33 − a11 a32 a23 − a21 a12 a33 + a21 a32 a13 + a31 a12 a23 − a31 a22 a13 .

Formula (2.80) for 2 × 2 determinants is of course the particular case of (2.82) for n =
2. One can use (2.82) to calculate any n × n determinant, but for n > 3 this requires
much work. There are more suitable methods which are consequences of (2.82) and are
mentioned below. The main meaning of (2.82) is that it is the key to prove many results
on determinants.
2. The determinant of a matrix is also given by
n
X
det A = ²(i1 , . . . , in ) a1i1 a2i2 . . . anin . (2.84)
i1 ,...,in =1

The comparison of (2.82) and (2.84) shows that

det A = det AT , (2.85)

i.e., the determinant of a matrix does not change under transposition.


Applying (2.84) to a 3 × 3 determinant, we obtain
¯ ¯
¯ a11 a12 a13 ¯¯
¯
¯ a21 a22 a23 ¯¯ =
¯
¯ a31 a32 a33 ¯
(2.86)
a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31

which in fact coincides with the result (2.83).

67
3. The following so-called Sarrus’ rule applies only to the determinants of 3 × 3 matrices.
Copy the first column of the matrix as a fourth column and the second as a fifth, as written
in (2.87). Multiply the three entries in each diagonal indicated by & and add the three
products, then multiply the three entries in each diagonal indicated by % and subtract
these three products from the sum calculated first; the result is the 3 × 3 determinant, as
one sees by comparison with (2.86).

+ + +
a11 a12 a13 a11 a12
& %
& %
& %
a21 a22 a23 a21 a22 (2.87)
% %
& %
& &
a31 a32 a33 a31 a32
− − −

4. The association A 7→ det A is linear w.r.t. every column or row of A. The linearity w.r.t.
every column follows from statements (i) and (ii) of Theorem 2.50, the linearity w.r.t. the
rows is then a consequence of (2.85).

5. The association A 7→ det A is alternating w.r.t. the columns as well as w.r.t. the rows of
A, i.e., the interchange of any two columns or any two rows (but not the interchange of a
column with a row) changes the sign of det A.

6. For an n × n matrix, the following statements are equivalent:

(i) det A = 0
(ii) the columns of A are linearly dependent
(iii) the rows of A are linearly dependent.

We prove the equivalence of the three statements and show first that P (ii) implies (i). If
the columns C1 , . . . , Cn are linearly dependent, then the equation ni=1 λi Ci = 0 can be
satisfied for a nontrivial choice of the coefficients. Without loss of generality, assume
λ1 6= 0. Then
n n µ ¶
1 X X λi
C1 = − λi Ci = − Ci ;
λ1 λ1
i=2 i=2
therefore, using the linearity of det A in the first column,
à n µ ¶ !
X λi
det A = det(C1 , . . . , Cn ) = det − Ci , C2 , . . . , Cn
λ1
i=2
X n µ ¶
λi
= − det(Ci , C2 , . . . , Ci , . . . , Cn )
λ1
i=2
= 0.

The last step in this equality chain follows from the fact that the determinant of matrix is
zero if two columns coincide. This is a consequence of the alternating property; we have,
for instance,

det(Ci , C2 , . . . , Ci , . . . , Cn ) = − det(Ci , C2 , . . . , Ci , . . . , Cn )

where the first and the i-th column have been interchanged, both being equal to Ci . So
det(Ci , C2 , . . . , Ci , . . . , Cn ) = 0.
Now let det A = 0 and assume that the columns C1 , . . . , Cn are linearly independent.
Then the columns C1 , . . . , Cn constitute a basis of Rn and the columns D1 , . . . , Dn of

68
any other matrix B are linear combinations of the Ci . By linearity, det B is a linear
combination of the determinants of the matrices (Ci1 , . . . , Cin ); by the alternating property,
the determinant of each (Ci1 , . . . , Cin ) is equal to det A, − det A, or 0. Since det A = 0,
the determinant of all matrices (Ci1 , . . . , Cin ) is zero; consequently, det B = 0, which
is a contradiction because B is arbitrary. Hence, the columns C1 , . . . , Cn are linearly
dependent.
By means of (2.85), the equivalence of statements (i) and (ii) implies the equivalence of
(i) and (iii).

7. For an n × n matrix, the following statements are equivalent:

(i) det A 6= 0
(ii) rank A = n
(iii) A−1 exists
(iv) Ax = 0 has only the trivial solution
(v) Ax = b has a unique solution for every b ∈ Rn .

According to point 6, det A 6= 0 is equivalent to the linear independence of the columns


the matrix A, i.e., equivalent to rank A = n. The rest follows essentially from Theorem
2.39.

8. The determinant of a matrix is not changed by adding a multiple of any column to any
other column or a multiple of any row to any other row. For instance,
¯ ¯ ¯ ¯ ¯ ¯
¯ a11 . . . a1n ¯ ¯ a11 . . . a1n ¯ ¯ a11 λa11 a13 . . . a1n ¯
¯ ¯ ¯ ¯ ¯ ¯
¯ .. ¯ = ¯ .. .. ¯ + ¯ .. .. ¯
det A = ¯ ... . ¯¯ ¯¯ . . ¯¯ ¯¯ .
..
.
..
. . ¯¯
¯
¯ an1 . . . ann ¯ ¯ an1 . . . ann ¯ ¯ an1 λan1 an3 . . . ann ¯
¯ ¯
¯ a11 a12 + λa11 a13 . . . a1n ¯
¯ ¯
¯ .. ¯ ;
= ¯ ... ..
.
..
. . ¯¯
¯
¯ an1 an2 + λan1 an3 . . . ann ¯

in fact, what has been added to det A is λ det(C1 , C1 , C3 , . . . , Cn ) = 0.

9. From (2.86) we obtain


¯ ¯
¯ a11 a12 a13 ¯
¯ ¯
det A = ¯¯ a21 a22 a23 ¯¯
¯ a31 a32 a33 ¯
= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )
¯ ¯ ¯ ¯ ¯ ¯
¯ a22 a23 ¯ ¯ a21 a23 ¯ ¯ a21 a22 ¯
= a11 ¯¯ ¯ − a12 ¯ ¯ ¯ ¯
+ a13 ¯ ¯
a32 a33 ¯ a31 a33 ¯ a31 a32 ¯

(cf. Eq. (1.18)). The 3 × 3 determinant det A has been expanded w.r.t. the first row. This
is a particular case of Laplace’s expansion theorem, stating that, for an n × n matrix,
n
X
det A = (−1)i+j aij det Aij , i = 1, . . . , n, (2.88)
j=1
Xn
det A = (−1)i+j aij det Aij , j = 1, . . . , n, (2.89)
i=1

where Aij is the (n − 1) × (n − 1) matrix obtained from A by deleting the i-th row and the
j-th column. Choosing any i and keeping it fixed, formula (2.88) can be used to reduce the

69
calculation of det A to the calculation of n (n − 1) × (n − 1) determinants; det A is expanded
w.r.t. the i-th row. According to (2.89), det A can also be expanded w.r.t. the j-th column.
The number det Aij is called the subdeterminant w.r.t. (i, j); the sign of aij det A is given
by the chess-board rule.
An upper (lower) triangular matrix is a matrix where all entries above (below) the diagonal
are zero. The determinant of a triangular matrix is the product of the diagonal entries,
as we show for a lower triangular matrix. Namely, expanding w.r.t. the first column, we
obtain
¯ ¯
¯ a11 a12 a13 . . . a1n ¯ ¯ ¯
¯ ¯ ¯ a22 a23 . . . a2n ¯ ¯ ¯
¯ 0 a22 a23 . . . a2n ¯ ¯ ¯ ¯ a33 . . . a3n ¯
¯ ¯ ¯ 0 a33 . . . a3n ¯ ¯ ¯
¯ 0 0 a . . . a ¯ ¯ ¯ ¯ .. . . .
. ¯
¯ 33 3n ¯ = a11 ¯ .. .. .. .. ¯ = a11 a22 ¯ . . . ¯¯
¯ .. .. .. .. ¯
.. ¯ ¯ . ¯ ¯
¯ . . . . . ¯ ¯ . . . ¯ ¯ 0 . . . ann ¯
¯ ¯ 0 0 . . . a ¯
¯ 0 0 0 . . . ann ¯ nn

= a11 a22 a33 . . . ann .

10. For the calculation of 3 × 3 determinants, the expansion in terms of 2 × 2 determinants is


suitable. For the numerical calculation of larger determinants, again Gauß elimination is
useful:
¯ ¯
¯ ¯ ¯ ¯
¯ ¯ ¯ a11 a12 . . . a1n ¯ ¯ a11 a12 a13 . . . a1n ¯
¯ a11 . . . a1n ¯ ¯ ¯ ¯ ¯
¯ ¯ ¯ 0 b22 . . . b2n ¯ ¯ 0 b22 b23 . . . b2n ¯
¯ .. . . ¯ ¯ ¯ ¯ 0 c33 . . . c3n ¯¯
¯ . .. .. ¯ = ¯ . .. .. .. ¯ = ¯ 0
¯ ¯ ¯ .. . . ¯ ¯
. ¯ ¯ . . .. .. .. .. ¯¯
¯ an1 . . . ann ¯ ¯ .
¯ 0 bn2 . . . ann ¯ ¯ . . . . ¯
¯ 0 0 0 . . . xnn ¯
= a11 b22 c33 . . . xnn .

11. The determinant-multiplication theorem states that

det AB = det A det B

where A and B are n × n matrices.

12. If the inverse of an n×n matrix A exists, then it follows from the determinant-multiplication
theorem that det A det A−1 = det AA−1 = det In = 1. In consequence, det A 6= 0 and
1
det A−1 = .
det A
It is already clear by point 7 that the existence of A−1 implies that det A 6= 0; in addition,
point 7 says that det A 6= 0 also sufficient for the existence of A−1 .

13. If det A 6= 0, then, again according to point 7, the system Ax = b of linear equations
has a unique −1
 solution, namely, x = A b. Cramer’s rule now states that the solution
x1
x= ..
.
 is given by
xn
¯ ¯
¯ a11 . . . a1,i−1 b1 a1,i+1 . . . a1n ¯
¯ ¯
¯ .. .. .. .. .. ¯
¯ . . . . . ¯
¯ ¯
¯ an1 . . . an,i−1 bn an,i+1 . . . ann ¯
xi = , i = 1, . . . , n;
det A
(2.81) is a particular case of this rule. Cramer’s rule is mainly of theoretical interest since,
beginning with n ≥ 3, its application to solving systems of linear equations requires too

70
many steps of calculation; besides this, it applies only to systems of n linear equations in
n unknowns with a unique solution. Gauß-Jordan elimination requires many fewer steps
of calculation and applies to every system of linear equations.

14. The inverse of a matrix can also be represented in terms of determinants. If det A 6= 0,
then
det Aij
A−1 = transpose of the matrix with the entries (−1)i+j (2.90)
det A
det A
where det Aij is the subdeterminant w.r.t. (i, j); (−1)i+j det Aij is called the cofactor w.r.t.
(i, j). The representation (2.90) is closely related to Cramer’s rule, and it is also mainly of
theoretical interest. Beginning with n ≥ 3, the calculation of the inverse of a 3 × 3 matrix
according to (2.90) is tedious; the application of Gauß-Jordan elimination is again much
more suitable for numerical purposes. The case n = 2 is simple; in fact, we find
µ ¶
−1 1 a22 −a12
A = .
det A −a21 a11

15. Finally, determinants also have a geometrical meaning. Namely, if ~a, ~b, ~c is a right-handed
system of vectors of E3 , ~e1 , ~e2 , ~e3 a right-handed orthonormal basis of E3 , and

~a = a1~e1 + a2~e2 + a3~e3


~b = b1~e1 + b2~e2 + b3~e3
~c = c1~e1 + c2~e2 + c3~e3 ,

then, according to Remark 1.19, ¯ ¯


¯ a1 b1 c1 ¯
¯ ¯
¯ a2 b2 c2 ¯
¯ ¯
¯ a3 a3 c3 ¯

is the volume of the parallelepiped spanned by the vectors ~a, ~b, and ~c (cf. also Remark
1.17). By definition, an n × n determinant is, up to the sign, the volume of n-dimensional
parallelotop.

2.9 Eigenvalue Problems


We finish our study of linear algebra with the so-called eigenvalue problems of linear transforma-
tions which play an important part in such different fields like geometry, differential equations,
mechanics, and quantum mechanics.

Definition 2.51 Let V be a real vector space and L : V → V, x 7→ y = L(x), be a linear


transformation. A number λ ∈ R is called an eigenvalue of L if there exists a vector u 6= 0 such
that
L(u) = λu. (2.91)
The vector u 6= 0 is called an eigenvector belonging to the eigenvalue λ. The set Sλ of all
eigenvectors belonging to λ together with the zero vector, obviously being a subspace of V, is
called the eigenspace belonging to λ.
Now let dim V = n and let Y = AX be the matrix representation of y = L(x) w.r.t. a basis
of V. In matrix form the eigenvalue equation (2.91) reads

AU = λU (2.92)

where λ is also called an eigenvalue of A and U ∈ Rn , U 6= 0, an eigenvector of A.

71
Without the requirement u 6= 0, every real number λ would be an eigenvalue since Eq. (2.91)
is always satisfied for u = 0. So, for an eigenvalue λ, it makes sense to call only the nontrivial
solutions u of (2.91) eigenvectors. However, the number 0 can be an eigenvalue; λ = 0 is an
eigenvalue if there exists a vector u 6= 0 such that L(u) = 0u, i.e., if

L(u) = 0. (2.93)

By definition, the eigenspace Sλ belonging to an eigenvalue λ consists of all corresponding


eigenvectors and u = 0 because without the zero vector, Sλ would not be a subspace. We quickly
show that Sλ is a subspace; we have to verify the conditions of Definition 2.7. Since Sλ contains
an eigenvector as well as the zero vector, Sλ is not empty. If u1 , u2 ∈ Sλ , then L(u1 ) = λu1 and
L(u2 ) = λu2 . From these two equations it follows that

L(u1 + u2 ) = L(u1 ) + L(u2 ) = λu1 + λu2 = λ(u1 + u2 ),

i.e., u1 + u2 ∈ Sλ . Similarly, if µ ∈ R and u ∈ Sλ , then

L(µu) = µL(u) = µλu = λ(µu),

i.e., µu ∈ Sλ . Hence, Sλ is a subspace of V.—If λ = 0 is an eigenvalue of L, then the vectors of


the corresponding eigenspace satisfy Eq. (2.93), which means that the eigenspace coincides with
Ker L (cf. Definition 2.31).

Example 2.52

(a) Let L : R3 → R3 be the linear transformation defined by the matrix


 
2 1 0
A =  0 1 −1 
0 2 4

according to L(x) := Ax. The eigenvalue problem of L, L(u) = λu, reads

Au = λu (2.94)

(compared with the general situation and Eq. (2.92), we have in this example u = U ). Eq.
(2.94) can be rewritten as
(A − λI)u = 0. (2.95)
The number λ is an eigenvalue if this equation has nontrivial solutions u 6= 0. According
to Theorem 2.39 and statement 7 of Section 2.8, (2.95) has nontrivial solutions if and only
if the matrix A − λI is not invertible, i.e., if and only if det(A − λI) = 0. Hence, we can
find the eigenvalues as the solutions of the equation det(A − λI) = 0. For the given matrix
we obtain
¯ ¯
¯ 2−λ 0 ¯¯ ¯ ¯
¯ ¯ 1 − λ −1 ¯
det(A − λI) = ¯ 0 ¯ ¯
1 − λ −1 ¯ = (2 − λ) ¯ ¯ ¯
¯ 0 ¯ 2 4−λ ¯
2 4−λ
= (2 − λ)((1 − λ)(4 − λ) + 2)
= (2 − λ)(λ2 − 5λ + 6)
= 0.

The eigenvalues are the roots of a cubic equation, resp., the zeros of the cubic polynomial
p given by p(λ) := (2 − λ)(λ2 − 5λ + 6). The zeros are λ1 = 2 and λ2 = 3; corresponding
to p(λ) = −(λ − 2)2 (λ − 3), λ1 = 2 is a twofold zero.

72
For each of the two eigenvalues we calculate the corresponding eigenvectors according to
Eq. (2.94) or (2.95). For λ1 , (2.95) reads (A − 2I)u = 0; that is,
  
0 1 0 ξ1
 0 −1 −1   ξ2  = 0 (2.96)
0 2 2 ξ3
µ ξ ¶
1
where u = ξ2 . Eq. (2.96) is equivalent to
ξ3

ξ2 = 0
−ξ2 − ξ3 = 0 (2.97)
2ξ2 + 2ξ3 = 0.

This system of three equations reduces to a system of two equations, which is not an
accident but related to the fact that (2.96) must have nontrivial solutions.µFrom
¶ Eqs.µ(2.97)

t 1
we obtain ξ2 = 0, ξ3 = 0, and ξ1 = t where t is a parameter. Hence, u = 0 =t 0 ,
0 0
and the eigenspace corresponding to λ1 is
 ¯   
 ¯ 0 
¯
S1 = u ∈ R3 ¯¯ u = t  0  , t ∈ R ;
 ¯ 
1

S1 is one-dimensional although λ1 is a twofold zero of the above polynomial p.


For λ2 , (2.95) reads (A − 3I)u = 0; that is,
  
−1 1 0 ξ1
 0 −2 −1   ξ2  = 0.
0 2 1 ξ3
resp.,

−ξ1 + ξ2 = 0
−2ξ2 − ξ3 = 0
2ξ2 + ξ3 = 0.
à ! à !
− 2t − 12
Setting ξ3 = t, we obtain ξ2 = − 2t ,
and ξ1 = − 2t .
Hence, u = − 2t =t − 12 =
t 1
µ 1 ¶ µ 1 ¶
t
−2 1 =s 1 , and the eigenspace corresponding to λ2 is
−2 −2
 ¯   
 ¯ 1 
¯
3 ¯
S2 = u ∈ R ¯ u = s  
1 , s∈R .
 ¯ 
−2

(b) Consider the projection map L : E3 → E3 of Example 2.24, part (b), L(~x) = (~a · ~x)~a where
~a is a unit vector. We solve the eigenvalue problem of L. First, let ~u ∈ E3 be any vector
satisfying ~a · ~u = 0, ~u 6= ~0. It follows that

L(~u) = (~a · ~u)~a = ~0 = 0~u,

i.e., L(~u) = 0~u. Hence, λ = λ1 = 0 is an eigenvalue and ~u an eigenvector. The correspond-


ing eigenspace is
S1 = {~u ∈ E3 | ~a · ~u = 0}. (2.98)

73
Assume now that there is an eigenvector ~u satisfying ~a · ~u 6= 0. The eigenvalue equation
L(~u) = λ~u reads
(~a · ~u)~a = λ~u. (2.99)
The dot multiplication of (2.99) by ~a yields

(~a · ~u)|~a|2 = λ~u · ~a

which implies, since ~a is a unit vector, λ = λ2 = 1. From (2.99) and λ = 1 we obtain


~u = (~a · ~u)~a, i.e., ~u is a multiple of ~a. Conversely, every vector ~u = t~a where t ∈ R, t 6= 0,
satisfies
L(~u) = L(t~a) = (~a · t~a)~a = t~a = ~u,
i.e., L(~u) = ~u. Hence, ~u is an eigenvector to the eigenvalue λ2 = 1. The corresponding
eigenspace is
S2 = {~u ∈ E3 | ~u = t~a, t ∈ R}. (2.100)
Note that this result is geometrically evident: The vectors that have the same or opposite
direction as ~a are not changed by the projection, thus being eigenvectors to the eigenvalue
1; the vectors that are perpendicular to ~a are annihilated, thus being eigenvectors to the
eigenvalue 0. It is also evident that the subspace S1 is one-dimensional and the subspace
S2 is two-dimensional. Moreover, we have that S1 = Ker L and S2 = Im L; the first is a
general property of every linear transformation with eigenvalue 0, whereas the second is a
particular property of this example.
Although the solution of the eigenvalue problem of the considered projection map is very
obvious, it is instructive to solve the eigenvalue problem in matrix representation. The
matrix representation of our projection map L was discussed in Example 2.26, part (b).
With reference to an orthonormal basis, the matrix of L is
 2 
a1 a1 a2 a1 a3
A =  a1 a2 a22 a2 a3  (2.101)
a1 a3 a2 a3 a23

where a1 , a2 , and a3 are the components of the unit vector ~a, ~a = a1~e1 + a2~e2 + a3~e3 (cf.
Eq. (2.33)). To determine the eigenvalues of this matrix, we again rewrite Eq. (2.92) as
(A − λI)U = 0 and use the fact that the latter equation has nontrivial solutions if and
only if det(A − λI) = 0. We obtain that
¯ 2 ¯
¯ a1 − λ a1 a2 a1 a3 ¯¯
¯
det(A − λI) = ¯¯ a1 a2 a22 − λ a2 a3 ¯¯
¯ a1 a3 a2 a3 a23 − λ ¯
= (a21 − λ)[(a22 − λ)(a23 − λ) − a22 a23 ] − a1 a2 (a1 a2 (a23 − λ) − a1 a2 a23 )
+ a1 a3 (a1 a22 a3 − a1 a3 (a22 − λ))
= (a21 − λ)(λ2 − (a22 + a23 )λ) + a21 a22 λ + a21 a23 λ
= a21 λ2 λ3 − a21 (a22 + a23 )λ + (a22 + a23 )λ2 + a21 (a22 + a23 )λ
= (a21 + a22 + a23 )λ2 − λ3

and, since ~a is a unit vector,

p(λ) := det(A − λI) = λ2 − λ3 = λ2 (1 − λ).

From p(λ) = det(A − λI) = 0 it follows that λ = λ1 = 0 and λ = λ2 = 1 where λ1 = 0 is


a twofold zero of the cubic polynomial p. The zeros of p are the eigenvalues of the matrix
A and thus of the linear transformation L.

74
The eigenvectors of the matrix A belonging to the eigenvalue λ1 = 0 are the nontrivial
solutions of AU = 0 which can explicitly be written as
a21 ξ1 + a1 a2 ξ2 + a1 a3 ξ3 = 0
a1 a2 ξ1 + a22 ξ2 + a2 a3 ξ3 = 0 (2.102)
a1 a3 ξ1 + a2 a3 ξ2 + a23 ξ3 = 0
µ ξ1

where U = ξ2 and ~u = ξ1~e1 + ξ2~e2 + ξ3~e3 . Since ~a is a unit vector, at least one of its
ξ3
components is not zero; without loss of generality, let us assume that a1 6= 0. Dividing the
first equation of (2.102) by a1 , we obtain
a1 ξ1 + a2 ξ2 + a3 ξ3 = 0. (2.103)
The other two equations of (2.102) are equivalent to (2.103), as the multiplication of the
latter by a2 , resp., a3 shows. Eq. (2.103) means ~a · ~u = 0 which implies our former result
(2.98).—Solving (2.103) for ξ1 and setting ξ2 = s and ξ3 = t, we obtain
   a2   a2   a3 
ξ1 − a1 s − aa13 t − a1 − a1
U=  ξ2  =  s  =s  1  +t  0 ,
ξ3 t 0 1
i.e., ~u = s(− aa21 ~e1 + ~e2 ) + t(− aa13 ~e1 + ~e3 ) = s~v1 + t~v2 where v1 := − aa21 ~e1 + ~e2 and v2 :=
− aa31 ~e1 + ~e3 . One easily verifies that ~a · ~v1 = 0 and ~a · ~v2 = 0, from which it follows again
that ~a ·~u = 0; the two linearly independent vectors ~v1 and ~v2 form a basis in the eigenspace
S1 .
The eigenvectors of the matrix A belonging to the eigenvalue λ2 = 1 are the nontrivial
solutions of (A − I)U = 0 which reads explicitly
(a21 − 1)ξ1 + a1 a2 ξ2 + a1 a3 ξ3 = 0
a1 a2 ξ1 + (a22 − 1)ξ2 + a2 a3 ξ3 = 0 (2.104)
a1 a3 ξ1 + a2 a3 ξ2 + (a23 − 1)ξ3 = 0.
Taking account of a21 + a22 + a23 = 1, one can verify that the system (2.104) reduces
µ ξ ¶ µ a ¶
1 1
to a system of two equations with the solution U = ξ2 = t a2 . Hence, ~u =
ξ3 a3
t(a1~e1 + a2~e2 + a3~e3 ) = t~a, which is our former result (2.100).
The matrix solution of the eigenvalue problem of the projection map L simplifies essentially
by the choice of an orthonormal basis that is adapted to the situation. Choosing ~e1 , ~e2 , ~e3
such that, for instance, ~e1 = ~a, we obtain a1 = 1 and a2 = a3 = 0. According to (2.101),
the matrix A then takes the simple form
 
1 0 0
A =  0 0 0 .
0 0 0

This immediately implies that p(λ) = det(A − λI) = λ2 (1 − λ) and that the eigenvalues
µ 0 ¶
are λ1 = 0 and λ2 = 1. The eigenvectors of A belonging to λ1 are U = s =
t
µ 0 ¶ µ 0 ¶
s 1 + 0 which means, for the eigenvectors of L, ~u = s~e2 + t~e3 , the latter implying
0 1
againµ~a · ~u¶ = ~e1µ· ~u ¶
= 0 and hence (2.98). The eigenvectors of A belonging to λ2 are
t 1
U = 0 =t 0 which means, for the eigenvectors of L, ~u = t~e1 = t~a, the latter
0 0
implying again (2.100).

75
We now summarize the basic statements on eigenvalue problems.

Theorem 2.53 Let L : V → V be a linear transformation and A a representing n × n matrix.


(a) The eigenvalues of L (resp., of A) are the solutions of the characteristic equation

p(λ) := det(A − λI) = 0

where p is a polynomial of degree n, the characteristic polynomial.

(b) There are at most n eigenvalues of L (resp., of A) and possibly no (real) eigenvalue. If n
is odd, then there exists at least one (real) eigenvalue.

(c) Let λ1 , . . . , λm be the different eigenvalues of L and S1 , . . . , Sm the corresponding eigen-


spaces. Eigenvectors u1 , . . . , um belonging to these different eigenvalues (i.e., u1 ∈ S1 , . . . ,
um ∈ Sm , u1 , . . . , um 6= 0) are linearly independent. Moreover,

dim S1 + . . . + dim Sm ≤ dim V = n.

Proof:
(a) For n = 2, we have
¯ ¯
¯ a −λ a12 ¯
p(λ) = det(A − λI) = ¯¯ 11 ¯ = (a11 − λ)(a22 − λ) − a12 a21
¯
a21 a22 − λ

where p is a poynomial of degree 2. For n = 3, we obtain


¯ ¯
¯ a11 − λ a 12 a 13 ¯
¯ ¯
p(λ) = det(A − λI) = ¯ a21¯ a22 − λ a23 ¯¯
¯ a31 a32 a33 − λ ¯
¯ ¯ ¯ ¯ ¯ ¯
¯ a22 − λ a ¯ ¯ ¯ ¯ ¯
= (a11 − λ) ¯¯ 23 ¯ − a21 ¯ a12 a13 ¯ + a31 ¯ a12 a13 ¯
a32 a33 − λ ¯ ¯ a32 a33 − λ ¯ ¯ a22 − λ a23 ¯

where p is obviously a polynomial of degree 3. By induction one can show that, for an
n × n matrix, λ 7→ p(λ) is a polynomial of degree n.
An eigenvector U of the matrix A is a nontrivial solution of the equation AU = λU ,
resp., of the homogeneous linear system (A − λI)U = 0. According to Theorem 2.39
and statement 7 of Section 2.8, the latter system has nontrivial solutions if and only if
rank (A − λI) < n, i.e., if and only if det(A − λI) = 0. Hence, the eigenvalues are the zeros
of the characteristic polynomial p.

(b) Since a polynomial of degree n has at most n zeros and possibly no real zero, an n × n
matrix A can have at most n eigenvalues and possibly none. However, a real polynomial
of odd degree has at least one real zero, so A has at least one eigenvalue if n is odd.

(c) Consider two different eigenvalues λ1 , λ2 with two corresponding eigenvectors u1 , u2 ,

L(u1 ) = λ1 u1
(2.105)
L(u2 ) = λ2 u2 ,

u1 , u2 6= 0. Let
µ1 u1 + µ2 u2 = 0 (2.106)
where µ1 , µ2 ∈ R. Applying the linear transformation L to both sides of (2.106), we obtain
µ1 L(u1 ) + µ2 L(u2 ) = 0; i.e., by (2.105),

µ1 λ1 u1 + µ2 λ2 u2 = 0. (2.107)

76
Multiplying Eq. (2.106) by λ1 , we obtain

λ1 µ1 u1 + λ1 µ2 u2 = 0. (2.108)

The subtraction of Eq. (2.108) from (2.107) yields (λ1 − λ2 )µ2 u2 = 0. Since λ1 6= λ2 and
u2 6= 0, it follows that µ2 = 0 and, by (2.106) and u1 6= 0, µ1 = 0. Hence, u1 and u2 are
linearly independent. By induction, this procedure can be generalized to m > 2, i.e., to
the case of more than two eigenvalues.
To prove the statement on the dimensions, choose a basis in each eigenspace:
(1) (n1 )
basis of S1 : u1 , . . . , u1
(1) (n2 )
basis of S2 : u2 , . . . , u2
..
.
basis of Sm : u(1) (nm )
m , . . . , um .

We show that the system of the vectors


(1) (n1 ) (1) (n2 )
u1 , . . . , u1 , u2 , . . . , u2 , . . . , u(1) (nm )
m , . . . , um (2.109)

is linearly independent. Let


ni
m X
X (j) (j)
λi ui = 0. (2.110)
i=1 j=1

Defining
ni
X (j) (j)
wi := λi ui , (2.111)
j=1

Eq. (2.110) can be written as

w1 + . . . + wi + . . . + wm = 0. (2.112)
(j)
Note that wi ∈ Si for i = 1, . . . , m since ui ∈ Si and Si is a subspace. Assume that not
all wi are zero; denote the non-zero vectors of (2.112) by wi1 , . . . , wir . Eq. (2.112) then
implies that
wi1 + . . . + wir = 0. (2.113)
The vectors wij ∈ Sij , wij 6= 0, j = 1, . . . , r, are eigenvectors of L. According to Eq.
(2.113), these vectors are linearly dependent (each wij has the coefficient 1), which is a
contradiction. Hence, our assumption is wrong and all vectors w1 , . . . , wm in (2.112) are
zero. From (2.111) it then follows that
ni
X (j) (j)
λi ui =0
j=1

(1) (n )
for all i = 1, . . . , m. Since, for each i, the vectors ui , . . . , ui i are linearly independent,
(j)
we conclude that λi = 0 for all j = 1, . . . , ni and all i = 1, . . . , m. Hence, by (2.110),
the system (2.109) is linearly independent.—The number of the vectors of the linearly
independent system (2.109) is n1 + . . . + nm . In consequence,

n1 + . . . + nm ≤ dim V = n,

and from n1 = dim S1 , . . . , nm = dim Sm we finally obtain the statement on the dimensions
of the eigenspaces. 2

77
Finally, we summarize some consequences of parts (b) and (c) of the preceding theorem.
One can distinguish the following three cases.
Case 1: The linear transformation L has n (different) eigenvalues λ1 , . . . , λn (n = dim V).
Then
(i) corresponding eigenvectors u1 , . . . , un are linearly independent (according to part (c) of
Theorem 2.53)

(ii) each eigenspace is one-dimensional (by the dimension statement of part (c) of the theorem)

(iii) u1 , . . . , un is a basis of V consisting of eigenvectors of L (since dim V = n).


Case 2: The linear map L has less than n eigenvalues, namely, λ1 , . . . , λm (m < n) with
eigenspaces S1 , . . . , Sm , but

dim S1 + . . . + dim Sm = n = dim V.

Then, as in the proof of the theorem, we can choose respective bases of S1 , . . . , Sm and join them,
thus obtaining a system of the type (2.109) of linearly independent vectors. Since n1 +. . .+nm =
n, this system is a basis of V consisting of eigenvectors of L.
Case 3: The map L has less than n eigenvalues, λ1 , . . . , λm (m < n) with eigenspaces
S1 , . . . , Sm and
dim S1 + . . . + dim Sm < n = dim V.
Then there is no basis of V consisting entirely of eigenvectors.
In the first two cases, we can draw some further important conclusions. For the reason of a
simpler formulation, we consider only Case 1. That is,

L(ui ) = λi ui , ui 6= 0, i = 1, . . . , n.

Let v1 , . . . , vn be any basis of V; u1 , . . . , un is a basis of eigenvectors. For every x ∈ V and every


y := L(x), we have
n
X n
X
x = ξi vi = αi ui
i=1 i=1
Xn Xn
y = ηi vi = βi ui
i=1 i=1

and        
η1 ξ1 β1 α1
 ..   ..  ,  ..  0 .. 
 .  = A .   . =A  .  (2.114)
ηn ξn βn αn
where A is the matrix of L w.r.t. the basis v1 , . . . , vn and A0 the matrix of L w.r.t. the basis
u1 , . . . , un . By Exercise 2.25, we know that these matrices are related according to

A0 = BAB −1

where B is the matrix of the basis transformation. However, we can determine A0 more easily.
Using the eigenvector basis u1 , . . . , un , we obtain
à n ! n n
X X X
y = L(x) = L αi ui = αi L(ui ) = αi λi ui .
i=1 i=1 i=1
Pn
Comparing the last expression for y with y = i=1 βi ui , we conclude that

βi = λi αi

78
where i = 1, . . . , n. Since the numbers βi and αi are related by the matrix A0 according to
(2.114), it follows that  
λ1 0 . . . 0
 0 λ2 . . . 0 
 
A0 =  . .. .. .
 .. . . 
0 0 . . . λn
In fact, we have proved the following result.

Theorem 2.54 Let L be a linear transformation satisfying the condition of Case 1. Then the
matrix of L w.r.t. a basis consisting of eigenvectors is diagonal, the eigenvalues being the diagonal
entries.

2.10 Exercises
2.1 Show that

(i) the set Mmn of all real m × n matrices with the usual addition of matrices and the usual
multiplication by real numbers is a real vector space

(ii) a subset of m × n matrices with the entries 0 at the same places is a subspace of Mmn

(iii) the subset Sn of the symmetric n × n matrices is a subspace of Mnn

(iv) the subset An of the antisymmetric n × n matrices is a subspace of Mnn .

Give an example of a subset of Mmn that is not a subspace.

2.2 Let C 0 ([a, b]) be the space of the real-valued continuous functions on the interval [a, b],
a < b. Investigate which of the following subsets are subspaces of C 0 ([a, b]):

(i) the set of all functions being differentiable on [a, b]

(ii) the set of all functions f being continuous on [a, b] and satisfying f (a) = 0

(iii) the set of all functions f being continuous on [a, b] and satisfying f (x) ≥ 0 for all x ∈ [a, b]
Rb
(iv) the set of all continuous functions satisfying a f (x)dx = 0
Rb
(v) the set of all continuous functions satisfying a f (x)dx = 1.

2.3 Verify that the set R+ of all strictly positive real numbers with the operations

x ⊕ y := xy and λ ◦ x := xλ

where x, y > 0 and λ ∈ R, is a vector space.

2.4 Determine whether the following systems of vectors of Rn are linearly dependent or inde-
pendent.
   
4 −4
a)  −1 ,  10 
2 2
       
−2 3 6 7
b)  0 ,  2 ,  −1 ,  0 
1 5 1 2

79
     
0 3 1
 0   3   1 
c)  ,  ,  
 2   0   0 
2 0 −1

2.5 Determine whether the following systems of functions f, g, h ∈ C 0 (R) are linearly depen-
dent or independent.
a) f (x) := 1, g(x) := x, h(x) := ex

b) f (x) := sin x, g(x) := cos x

c) f (x) := 6, g(x) := sin2 x, h(x) := cos2 x

2.6 Let Sn be the subspace of the symmetric matrices of Mnn and An the subspace of the
antisymmetric matrices (cf. Problem 6.1). What are the dimensions of Sn and An ?

2.7 What is the dimension of the vector space of Problem 6.3?

2.8 Applying the method of Gauß elimination, solve the following system of linear equations:

x + y + 2z = 9
2x + 4y − 3z = 1
3x + 6y − 5z = 0.

Represent the system by a matrix of row-echelon type. Furthermore, solve the system according
to Gauß-Jordan elimination and determine the corresponding matrix of reduced row-echelon
type.

2.9 Solve the equation Ax = b, i.e., the corresponding system of linear equations, where
     
4 2 −2 −2 26
a) A =  −3 1 0  and b = b1 =  6 , resp., b = b2 =  12 
1 −4 2 −9 2
   
1 −3 5 −2
b) A =  2 −2 1  and b =  6 
−3 5 −6 −9
   
1 2 −7 2 −2
c) A= 4 7 −26 9  and b = b1 =  6 , resp.,
−3 −5 19 −7 −9
 
−3
b = b2 =  −10 .
7

2.10 Let ~e1 , ~e2 be an orthonormal basis in E2 and ~x ∈ E2 any vector. Dilate ~x w.r.t. the
direction of ~e1 by the factor 2 and then reflect the dilated vector at ~e2 , thus obtaining a vector
~y .
(a) Calculate ~y from ~x and show that the transformation ~x 7→ ~y defines a linear map
L : E2 → E2 .

(b) What is the matrix of L w.r.t. the basis ~e1 , ~e2 ?

(c) Introducing the new basis ~v1 := ~e1 − ~e2 , ~v2 := ~e1 + 2~e2 , find the relation between the
components of ~x w.r.t. the two bases.

80
(d) Determine the matrix of L w.r.t. the new basis.

2.11 Let e1 , . . . , en be the standard basis of Rn and let b1 , . . . , bn be fixed vectors of Rm . Show
that the map L : Rn → Rm defined by
à n ! n
X X
L(x) = L xi ei := xi bi
i=1 i=1

is linear and determine its matrix A with respect to the standard basis.

2.12 Consider the linear map L : R2 → R2 defined by


µ ¶
x1 + 2x2
L(x) := .
3x1 + 4x2

Determine the matrices


³ ´ of L³with ´respect to the standard basis e1 ,e2 as well as with respect to
1 −1
the basis v1 := 1 , v2 := 1 .

2.13
a) Let A be an m × n matrix and b ∈ Rm . Is the map x 7→ L(x) := Ax + b, x ∈ Rn , linear?

b) What are the linear maps from R1 = R into itself?

2.14 Describing a parallelogram in P2 in terms of position vectors ~r ∈ E2 , show that its image
under an affine map ~r 7→ L(~r) + ~b where L is a linear transformation of E2 and ~b ∈ E2 a constant
vector, is again a parallelogram or is degenerated. The corresponding statement is true for
parallelograms and parallelepipeds in three-dimensional space P3 .

2.15 Show that the multiplication of matrices is associative, but not commutative. Moreover,
the multiplication is distributive with respect to the addition of matrices and mixed-associative
with respect to the multiplication of matrices by numbers. Finally, if the matrix A can be
multiplied by B, then (AB)T = B T AT .

2.16 For the quadratic matrices


   
3 0 −2 0 −2 4
A =  4 1 −3  , B= 3 1 2 ,
0 5 6 1 3 5

calculate AB, BA, (A + B)2 , (A − B)2 , and (A + B)(A − B), and for
µ ¶ µ ¶ µ ¶ µ ¶
1 0 0 1 1 0 0 1
C1 = , C2 = , C3 = , C4 = ,
0 0 0 0 0 −1 −1 0

calculate Ci2 , i = 1, 2, 3, 4.

2.17 Consider the three-dimensional vector space P3 of all real polynomials of degree smaller
or equal than two. With respect to the canonical basis of this space, the linear operator of
d
differentiation, dx , has the matrix
 
0 1 0
D =  0 0 2 .
0 0 0

Show that D3 = 0 and explain why this is so. Furthermore, construct matrices A and B such
that A3 6= 0, A4 = 0 and B 4 6= 0, B 5 = 0.

81
2.18 Let ~e1 , ~e2 , ~e3 be a right-handed orthonormal basis in the Euclidean space E3 .

a) Descibe positively oriented rotations of vectors around the basis vectors by an angle α > 0
by matrices.

b) The vector ~x = ~e1 + ~e2 + 3~e3 is rotated first around ~e1 in the positive sense by α1 = 30◦
and then around ~e2 in the positive sense by β = 60◦ . What is the vector obtained this
way?

c) The same question for α = β = 90◦ .

d) The same question for α = β = 90◦ and the converse order of rotations.

e) Find the matrix that describes the operation of a positively oriented rotation around ~e1
by α followed by such a rotation around ~e2 by β.

2.19 Consider the orthogonal projection p~ of a vector ~x ∈ E3 onto a plane with normal vector
~n where |~n| = 1. Let ~e1 , ~e2 , ~e3 be an orthonormal basis.

a) Represent p~ in terms of ~x and ~n.

b) Choose an orthonormal basis ~v1 , ~v2 in the subspace given by ~n · ~r = ~0 and represent p~ in
terms of ~x and ~v1 , ~v2 .

c) Show that the map ~x 7→ p~ =: L(~x) is linear.

d) Determine the matrix A of L w.r.t. ~e1 , ~e2 , ~e3 as well as the matrix A0 w.r.t. the orthonormal
basis ~v1 , ~v2 , ~n.

e) What are Ker L and Im L?

f) For the particular case ~n = √13 (~e1 + ~e2 + ~e3 ), choose some ~v1 , ~v2 and calculate A, A0 , as
well as the projection p~ of ~x = 2~e1 − 6~e2 + 3~e3 .

2.20 An m × n matrix A defines a linear map L : Rn → Rm according to L(x) := Ax. For


the following matrices, determine dim Ker L and rank A = dim Im L as well as a basis of Ker L,
resp., Im L.
 
  5 2 −1 0
1 1 2 0  19 −4 31 −18 
A = A1 =  −3 2 0 1 , A = A2 = 
 8 −1

8 −4 
8 −2 −2 2
2 4 −16 10

2.21

a) What is the dimension of the subspace of Rn consisting of those vectors x that satisfy one
homogeneous equation in n unknowns?

b) What is the dimension of the subspace of the space of the n × n matrices that consists of
those matrices satisfying
Xn
tr A := aii = 0?
i=1

2.22 Show that the following matrices are regular and determine their inverses:
   
3 1 4 4 5 −1
A= 1 2 0 , B= 2 0 1 .
0 1 −2 3 1 0

82
2.23 Consider two bases v1 , . . . , vn and v10 , . . . , vn0 of a vector space V of dimension n. According
to
Xn Xn
0 0
vj = βij vi , vj = γij vi ,
i=1 i=1
j = 1, . . . , n, introduce two matrices B and C with the entries βij and γij .
a) Show that C = B −1 .
Pn Pn 0 0
b) Show that the components of a vector x ∈ V, x = i=1 ξi vi = i=1 ξi vi , transform
according to
n
X n
X
0
ξi = βij ξj , ξi = γij ξj0 ,
j=1 j=1

j = 1, . . . , n.
The matrix B is called the matrix of the basis transformation.

2.24 Consider the Euclidean vector space E3 .


a) Show that, for two orthonormal bases ~e1 , ~e2 , ~e3 and ~e10 , ~e20 , ~e30 , the matrix B of the basis
transformation is orthogonal, i.e., B satisfies B −1 = B T . What is the geometrical meaning
of the entries βij of B?
b) Conversely, if B is an orthogonal matrix
P and ~e1 , ~eP 2, ~
e3 an orthonormal basis, a new or-
thonormal basis is defined by ~ej0 := 3i=1 γij ~ei = 3i=1 βji~ei where γij are the entries of
B −1 .

2.25 Let A be the matrix of a linear transformation L : V → V w.r.t. the basis v1 , . . . , vn and
A0 the matrix of L w.r.t. the basis v10 , . . . , vn0 . Show that

A0 = BAB −1

where B is the matrix of the basis transformation. That is, the matrix of a linear transformation
transforms according to a similarity transformation.

2.26 Consider a right-handed orthonormal basis ~e1 , ~e2 , ~e3 in the Euclidean vector space E3 .
Using the results of the preceding problems, find the matrix of a positively oriented rotation by
an angle α around the axis given by the unit vector n = √12 (~e2 + ~e3 ).

2.27 Let L be a linear transformation acting in E3 such that its matrix w.r.t. one orthonormal
basis is symmetric. Show that then the matrix of L w.r.t. any orthonormal basis is symmetric.
In other words, L is a symmetric tensor in E3 .

2.28 A homogeneous elastic cylindrical body whose center lies in the origin of a Cartesian
coordinate system and whose axis has the direction √12 (~e2 + ~e3 ), is stretched by suitable forces
acting at the end surfaces. The length of the cylinder increases by a factor 1 + α whereas any
diameter perpendicular to its axis decreases by a factor 1 − β, α and β being small positive
numbers. Assume that the position vector ~r 0 = L(~r) of a material point of the body after
deformation depends linearly on its position ~r before deformation. What are the components
of the (symmetric) tensor L w.r.t. the given coordinate system? (Hint: Introduce a second
coordinate system adapted to the situation.)

2.29 Calculate the determinant ¯ ¯


¯ 2 5 1 4 ¯
¯ ¯
¯ −5 3 0 0 ¯¯
¯
¯ 1 7 0 −3 ¯ .
¯ ¯
¯ 9 3 4 5 ¯

83
2.30 Find the eigenvalues, eigenvectors, and eigenspaces of the following matrices, and if pos-
sible, give a basis of eigenvectors.
     
4 0 1 1 −3 3 1 1 0 µ ¶
1 −1
A =  −2 1 0  , B =  3 −5 3 , C =  0 1 1 , D=
1 1
−2 0 1 6 −6 4 0 0 1

2.31 W.r.t. a Cartesian coordinate system in P2 , consider all points satisfying the equation

x21 − x1 x2 + x22 = 1.

Writing this equation in the form ~r · L(~r) = 1 where ~r is a position vector and L a suitable linear
transformation, show that the considered curve is an ellipse and determine the direction as well
as the length of its half-axes.

2.32 By means of a suitable coordinate transformation, diagonalize the symmetric matrix


occurring in the following equation representing a plane curve:
µ ¶ µ √ ¶µ ¶
2
√ x1 1 2 x1
x1 + 2 2x1 x2 = 1, resp., · √ = 1.
x2 2 0 x2

2.33 Rotate the curves given by

x21
+ x22 = 1, x1 x2 = 1
4
in the positive sense by an angle of 45◦ and determine the equations of the rotated curves.

2.34 Again,
µ x transformation
¶ to principal axes. Let A be a fixed real symmetric 3 × 3 matrix
1
and X = x2 . Show that the equation
x3

X · AX = 1, resp., X T AX = 1

represents
 
2 −1 0
a) an ellipsoid if A =  −1 2 −1 
0 −1 2
 
1 −1 0
b) an elliptic cylinder if A =  −1 2 −1 
0 −1 1
 
1 1 1
c) a connected hyperboloid if A =  1 2 −1 
1 −1 1
 
1 −1 1
d) a disconnected hyperboloid if A =  −1 −1 −1 .
1 −1 −1
In each case, determine the axes of the surfaces. What is, for the respective matrices, described
by X · AX = 0?

84

Vous aimerez peut-être aussi