Vous êtes sur la page 1sur 100

Differentiable Manifolds 3 / 34

Lecture Notes
2015
Jonathan Robbins
School of Mathematics
University of Bristol
October 27, 2015

University
c of Bristol 2015. This material is copyright of the University unless explicitly stated other-
wise. It is provided exclusively for educational purposes at the University and is to be downloaded or
copied for your private study only.

Contents
1 Vector Fields, Flows and Diffeomorphisms 3
1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Open and closed sets in Rm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Continuous and differentiable maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 The Inverse Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Diffeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 ODEs and vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 Push-forward map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.8 Jacobi bracket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.9 The Pull-back (functions only) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.10 Noncommuting flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.11 The Frobenius Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.12 More general versions of the Frobenius Theorem [not presented in lectures] (*nonexaminable) 51

2 Algebraic k-forms 55
2.1 Dual space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.3 Algebraic k-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.4 Basis k-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.5 Wedge product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.6 *Proof of properties of the wedge product [nonexaminable] . . . . . . . . . . . . . . . . . . . 71
2.7 Contraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.8 *Proof of properties of contraction [nonexaminable] . . . . . . . . . . . . . . . . . . . . . . . 78
2.9 Algebraic forms on R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3 Differential forms 83
3.1 Definition of differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.2 The exterior derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.3 The Pullback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4 The Lie derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5 The Poincare Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

1
4 Integration of Differential Forms 101
4.1 Singular k-cubes and integration of differential forms . . . . . . . . . . . . . . . . . . . . . . 102
4.2 Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.3 Stokes theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4 *Proofs of results for the boundary map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5 Bibliography 118

6 Summary notes 119

2
1 Vector Fields, Flows and Diffeomorphisms
1.1 Notation
We denote points in Rm as follows:
x = (x1 , . . . , xm ) Rm . (1)
Note that points are denoted in plain text, not boldface (we write x rather than x). Note too that
indices are written as superscripts, not subscripts. This will take some getting used to, but it turns out
to be a very useful convention (writing indices as superscripts is standard in discussions of differentiable
manifolds).
The inner product is denoted as follows:

x y = x1 y 1 + + xm y m . (2)

The norm is denoted as follows:


 1/2
||x|| = (x x)1/2 = (x1 )2 + + (xm )2 . (3)

1.2 Open and closed sets in Rm


The notion of open and closed sets is basic in calculus and analysis, in particular to the notions of
continuous and differentiable functions and maps. If you have taken Analysis 2, the definitions will be
familiar to you. Otherwise, open and closed sets in Rm generalise the familiar notions of open and closed
intervals in R. Besides their definitions, we will state some of the basic properties of open and closed
sets. Ill just remark that the notion of open and closed sets may be extended from Rm to more general
spaces. In the most general setting, this subject is called point-set topology, or general topology. But
we will stick to Rm , and our discussion will be brief and elementary. A number of the results appear as
exercises in Problem Sheet 1, and proofs can be found in the solutions.

1.2.1 Open sets


Definition 1.2.1 (Open ball). Given x Rm , the open -ball about x, also called the -neighbourhood of x,
denoted B (x), is given by
B (x) = {y Rm | ||x y|| < }
See Figure 1.

Figure 1

For example, for m = 1, B (x) is just the open interval (x , x + ).


Definition 1.2.2 (Open set). Let U Rm . U is open if x U ,  > 0 such that B (x) U . See
Figure 2

Proposition 1.2.3 (Properties of open sets).

(i) The union of two (in fact, arbitrarily many) open sets is open.

3
Figure 2

(ii) The intersection of two (or any finite collection of) open sets is open.
Proof. See Problem Sheet 1. The argument makes use of the triangle inequality, ||x + y|| ||x|| + ||y||.
Note: The intersection of infinitely many open sets may not be open. For example, the intervals
(1/N, 1/N ), where N is a positive integer, are open. 0 is the only point which belongs to every interval.
Therefore,

\
(1/N, 1/N ) = {0}.
N =1
But {0} is not open; it contains no -neighbourhood of 0.
Example 1.2.4 (Examples and non-examples of open sets).
a) B (x) is open (See Problem Sheet 1).
b) [0, 1] R is not open, because it contains no -neighbourhood of 0 or 1.
c) The set of points x Rm with x1 rational is not open.
Remark 1.2.5. Open sets will be important for us because we will be studying maps between Rm and
Rn , as in Calculus 2, which are defined only on a subset of Rm . On an open subset, it makes sense to
ask whether the map is differentiable at any point, because the difference f (x + ) f (x) is defined for 
sufficiently small.

1.2.2 Closed sets


Definition 1.2.6 (Limit point). Let X Rm . A point p Rm is a limit point of X if for all , every
-neighbourhood of p contains at least one point of X , i.e.

 > 0, B (p) X 6= ,

where denotes the empty set. See Figure 3.


Limit points are also called boundary points. Note that every x X is a limit point of X (why?).
Definition 1.2.7 (Closed set). X Rm is closed if X contains all of its limit points.
Example 1.2.8 (Examples and non-examples of closed sets).
a) [0, 1] R is closed
b) (0, 1) R is not closed, since 0 and 1 are limit points (why?) but are not contained in (0, 1).
Remark 1.2.9. A set can be both open and closed. For example, Rm is both open and closed.
Definition 1.2.10 (Complement). Given X Rm , its complement, denoted X
, is given by

= {y Rm | y
X / X}.

Proposition 1.2.11 (Relation between open and closed sets). Let U Rm . Then U is open if and only
if U is closed.
Proof. See Problem Sheet 1 and solutions.

4
Figure 3: p is a limit point of X , since every neighbourhood of p has a nonempty intersection with X . q
is not a limit point of X , as it has a neighbourhood which does not intersect with X .

1.3 Continuous and differentiable maps


1.3.1 Continuous maps
Let U Rm and V Rn , and let
F : U V ; x 7 F (x)

denote a map from U to V . We write


 
F (x) = F 1 (x), . . . , F n (x) ,

where F j : U R denotes the j th component of F .


Definition 1.3.1 (Continuous maps). Let U Rm and V Rn be open sets, and let F : U V be a
map from U to V . F is continuous if for all x and for all  > 0, there exists > 0, which may depend on
 and x, such that if x0 U and ||x0 x|| < , then ||F (x0 ) F (x)|| < . Equivalently, x0 B (x) implies
that F (x0 ) B (F (x)).
Notation. We denote the set of continuous maps from U to V by C 0 (U, V ).

There is a nice characterisation of continuous maps in terms of open sets. Before giving this charac-
terisation, we need to define the inverse image of a set under a map.
Definition 1.3.2 (Inverse image). Let U Rm and V Rn , and let F : U V be a map from U to V .
Let Y V . The inverse image of Y under F , denoted F 1 (Y ), is the subset of U given by

F 1 (Y ) = {x U | F (x) Y }.

See Figure 4.

Figure 4: The inverse image F 1 (Y ) of the set Y .

5
Example 1.3.3 (Example of inverse image). Let U = V = R, and f : R R be given by f (x) = x2 . Let
Y = (1, 4) R. Then
f 1 (V ) = (2, 1) (1, 2).

As this example shows, f neednt be invertible in order to define the inverse image. Note that when
U and V are subsets of R, the map is a function, and we will often denote it by a small letter instead of
a capital letter, e.g. f instead of F .
We shall also have occasion to refer to the image of a set under a map.
Definition 1.3.4 (Image). Let U Rm and V Rn , and let F : U V be a map from U to V . Let
X U . The image of X under F , denoted F (X), is the subset of V given by

F (X) = {y V | y = F (x) for some x U }.

See Figure 5.

Figure 5: The image F (X) of the set X

Example 1.3.5 (Example of image). Let U = V = R, and f : R R be given by f (x) = x2 . Let


X = (1, 1) R. Then
f (X) = [0, 1).

Proposition 1.3.6 (Continuity and open sets). Let U Rm and V Rn be open sets, and let F : U V
be a map from U to V . Then F is continuous if and only if for all open sets Y V , F 1 (Y ) is open.
That is, F is continuous if and only if the inverse image of every open set is open.
Proof. See Problem Sheet 1.
Note: The image of an open set under a continuous map is not necessarily open, as Example 1.3.5
shows.
One advantage of this characterisation of continuity, in terms of open sets, is that it generalises to
cases where the / description is either artificial or indeed not available (there may be no natural
notion of distance, which is required for the / definition). Another advantage is that it simplifies
certain arguments, like the fact that the composition of two continuous maps is continuous.
Definition 1.3.7 (Composition of maps). Let U Rm , V Rn and W Rp , and Let F : U V and
G : V W . The composition of F and G, denoted G F , is the map G F : U W defined by

(G F )(x) = G(F (x)).

Proposition 1.3.8 (Composition of continuous maps). Let U, V, W and F, G be as above. Suppose that
U, V, W are open sets and that F and G are continuous. Then G F is continuous. In other words, if
F C 0 (U, V ) and G C 0 (V, W ), then G F C 0 (U, W )

Proof. Let Z W . Then (G F )1 (Z) = F 1 (G1 (Z)), since x (G F )1 (Z) if and only if G(F (x)) Z ,
which holds if and only if F (x) G1 (Z), which holds if and only if x F 1 (G1 (x)).
Now suppose that Z is open. Since G is continuous, G1 (Z) is open. Since F is continuous,
F 1 (G1 (Z)) is open, which from the preceding is equivalent to saying that (G F )1 (Z) is open.
This implies that G F is continuous.

6
1.3.2 Differentiable maps
This is a brief review of some material from Calculus 2.
Let e(k) denote the kth unit vector in Rm , i.e.

e(k) = (0, . . . , 0, 1, 0, . . . , 0),

where the 1 occurs in the kth component. Note that the subscript (k) is a label, not an index.
Definition 1.3.9 (Partial derivative). Let U Rm and V Rn be open, and let F : U V be a map.
The kth partial derivative of the F j , denoted F j /xk (x) is defined by
 
F j F j x + te(k) F j (x)
(x) = lim .
xk t0 t

Note that F j (x + te(k) ) is defined for t sufficiently small, since U is open. Note too, however, that the
limit might not exist!
Definition 1.3.10 (Continuously differentiable maps). The map F : U V is continuously differentiable
if the partial derivatives F j /xk (x) exist for all 1 k m and for all 1 j n, and moreover are
continuous functions on U .

Notation. Let C 1 (U, V ) denote the set of continuously differentiable maps F : U V . Let F 0 (x) denote
the n m matrix of partial derivatives of F , i.e.

F j
[F 0 (x)]jk = (x). (4)
xk

Proposition 1.3.11 (Linear approximation). Let F C 1 (U, V ). Then x U , v Rm and  > 0,

F (x + v) = F (x) + F 0 (x) v + r(, x, v),

where the remainder term r(, x, v) goes to zero faster than , i.e.
||r(, x, v)||
lim = 0.
0 

See Figure 6.
Proof. See Calculus 2 notes.

Figure 6: F maps x to F (x), and F maps a nearby point, x + v , to F (x + v). F (x + v) F (x) is given
approximately by F 0 (x) v . Note: this figure is NOT to scale. What I mean is, you should think of the
vector displacements as being small, of order , even though they look big in the picture.

The Chain Rule establishes that the composition of differentiable maps is differentiable and gives a
formula for the derivative of the composition.

7
Proposition 1.3.12 (Chain rule). Let U , V and W be open sets in Rm , Rn and Rp , respectively. Let
F C 1 (U, V ) and G C 1 (V, W ). Then G F C 1 (U, W ), and

(G F )0 (x) = G0 (F (x))F 0 (x).

Proof. See Calculus 2.


Definition 1.3.13 (Second-order partial derivatives). Let U Rm , V Rn be open sets, and let F :
U V be a map from U to V . The second-order partial derivatives of F , denoted 2 F j /xk xl (x),
where 1 j n, 1 k, l m, are given by
!
2F j F j
(x) = (x).
xk xl xk xl

Note that the second-order partial derivatives might not all exist.
Higher-order partial derivatives are defined similarly, by induction. Suppose that we have defined
partial derivatives up to order r, for r 1.
Definition 1.3.14 ((r + 1)st-order partial derivatives). Let U Rm , V Rn be open sets, and
let F : U V be a map from U to V . The (r + 1)st -order partial derivatives of F , denoted
r+1 F j /xk1 xkr+1 (x), where 1 j n, 1 k1 , . . . , kr+1 m, are given by
!
r+1 F j r F j
(x) = (x),
xkr+1 xkr xk1 xkr+1 xkr xk1

where 1 j n and 1 k1 , . . . , kr+1 m.


Notation. The set of maps F : U V for which all rth-order partial derivatives exist and are continuous
functions on U is denoted by C r (U, V ). The set of maps F : U V for which partial derivatives of all
orders exist is denoted C (U, V ).
Definition 1.3.15 (Smooth maps). A map F C (U, V ) is called a smooth map.
In this course, we will be primarily concerned with smooth maps.
Proposition 1.3.16 (Equality of mixed partials). Let F C r (U, V ), r 2. Then

2F j 2F j
(x) = (x).
xk xl xl xk
Proof. See Calculus 2 notes. (In fact, the result holds under much weaker assumptions. For simplicity,
we state the assumptions for the case where U = R2 and V = R, i.e. for a function f (x, y). Suppose
f C 1 (R2 , R) and that (f /x)/y exists and is continuous at some point (x , y ). Then (f /y)/x
also exists at (x , y ) and moreover is equal to ((f /x)/y)(x , y ). That is, for the equality of mixed
partials to hold at a point, it is enough that both first derivatives are continuous and that one of the
mixed partials exists and is continuous at that point.)
The equality of mixed partials will be very important in this course. A number of results make
essential use of this fact. Note that the equality of second-order mixed partials implies that for smooth
functions, the ordering of any number of partial derivatives does not matter. For example, if f : R2
R; (x, y) 7 f (x, y) is a smooth function of two variables, then

3f 3f 3f
(x, y) = (x, y) = (x, y).
yxx xyx xxy
Example 1.3.17 (Examples and non-examples of differentiable and smooth maps).
a) Let m = n = 1, U = V = R. Then
1
f (x) =
1 + x2
is smooth.
1
g(x) =
1 x2
is not even continuous (it has a singularity at x = 1). But if we take U = (1, ) and V = R, then
g : U V ; x 7 1/(1 x2 ) is smooth. (Weve excluded the singularities at x = 1 from the domain
of definition.)

8
b) m = n = 2. Then
F (x, y) = (sin y, x2 ) C (R2 , R2 ),
i.e. F is smooth. Let 
x4

,y , (x, y) 6= 0,
G(x, y) = x2 +y 2
0, (x, y) = 0.
Then

G C 1 (R2 , R2 ),
/ C 2 (R2 , R2 ),
G
G C (R2 (0, 0), R2 ).

That is, on R2 , G has everywhere continuous first partial derivatives but does not have everywhere
continuous second partial derivatives; the second partial derivatives of F 1 are singular at the origin.
If, however, the origin is excluded from the domain, then G is smooth. Note that R2 (0, 0) denotes
the plane with the origin removed, and this is an open set. (Why?)
(Here are a few more details. The partial derivative of G1 with respect to x is given by

2x5 4x3
+ 2 ,
(x2 + y 2 )2 x + y2

for (x, y) 6= 0. To compute G1 /x at the origin, however, we cant use the preceding expression
(because the definition of G1 at the origin is special). However, we can resort to the definition of
the partial derivative:

G1 G1 (h, 0) G1 (0, 0) h2 0
(0, 0) = lim = lim = 0.
x h0 h h0 h

We obtain
2x5 + 4x3 ,

G1 2 2 2 x2 +y 2
(x, y) 6= 0,
(x, y) = (x +y )
x 0, (x, y) = 0.

You can check that G1 /x is continuous at the origin for (x, y) small, G1 /x is small.
Next, lets compute the second partial of G1 with respect to x. We obtain

2 G1 8x6 18x4 12x2


2
(x, y) = 2 2 3
2 2 2
+ 2 , (x, y) 6= 0.
x (x + y ) (x + y ) x + y2

2 G1 (x, y) is not continuous at the origin. Indeed, we have for all  6= 0, that
x2

2 G1 2 G1
(, 0) = 2, (0, ) = 0.
x2 x2
2 1
Thus, G2 is constant and equal to 2 along the x-axis, while it is constant but equal to 0 along
x
the y -axis.)
c) Linear maps. Let A Rnm ; that is, A is a real n m matrix. Let F (x) = A x. Then F 0 (x) = A, so
that the first-partial derivatives of F are constants. All second- and higher-order partial derivatives
of F are identically 0, so F C (Rm , Rn ), i.e. F is smooth.

1.4 The Inverse Function Theorem


Let U, V Rn be open sets (so both U and V belong to Rn in this case. Let F C r (U, V ).
Here are some basic questions we might ask. GIven y V , can we find x U such that F (x) = y ?
That is, can we solve the equation F (x) = y to obtain x in terms of y ; in other words, is F onto? If so, is
x unique? In other words, is F one-to-one, or 1-1; that is, if F (x) = F (x0 ), does it follow that x = x0 ? If
F is both 1-1 and onto, then we can define the inverse of F , which we denote by F 1 : V U . We say
that F is invertible. With x and y as above, we have that F 1 (y) = x. Equivalently,

F 1 F = IdU , F F 1 = IdV , (5)

9
where IdU and IdV denote the identity maps on U and V , i.e.

IdU (x) = x, x U, IdV (y) = y, y V. (6)

A final question: if F 1 exists, does it follow that F 1 C s (V, U ) for some s?


Example 1.4.1 (Examples and non-examples of maps with smooth inverses).
a) Linear maps. Let F (x) = A x, A Rnn . That is, A is a real n n matrix. Then F is invertible if
and only if det A 6= 0, i.e. if and only if A is invertible. Note that F 0 (x) = A, so we can write that
F is invertible if and only if F 0 (x) is invertible. In this case, F 1 (y) = A1 y . Thus F 1 is also a
linear map. It follows that F 1 is smooth, since linear maps are smooth.
b) n = 1, U = V = R, f : U V ; x 7 f (x).

Figure 7: f is not invertible. For y &, f (y) = x has no solutions x near a; for y . b, f (x) = y has two
solutions near a.

See Figure 7. Suppose f 0 (a) = 0, f 00 (a) < 0. Let f (a) = b (an example would be f (x) = b (x a)2 ).
Given y near b, can we solve f (x) = y for x? For y > b, there are no solutions, at least near x = a
(so f may not be onto). For y < b, there are two solutions near x = a (so f is not 1-1).

c) f (x) = x3 . Does f 1 exist? Let x3 = y . Then x = y 1/3 . So f 1 (y) = y 1/3 , and the inverse exists.
Lets consider whether the inverse is smooth. f itself is smooth, i.e. f C (R, R). But f 1 is not
smooth, since
1 2/3
(f 1 )0 = y ,
3
which blows up at y = 0. You can show that f 1 is continuous, i.e. f 1 C 0 (R, R). But f 1 does
not have a continuous derivative, so f 1
/ C 1 (R, R).

What these examples suggest is that difficulties in finding the inverse can arise when f 0 = 0, in case
n = 1, or when det F 0 = 0, in case n > 1.
30/9

Theorem 1.4.2 (Inverse Function Theorem). Let U, V Rn be open, and let F C r (U, V ). Take x0 U
and let F (x0 ) = y0 . If det F 0 (a) 6= 0, then F has a locally defined inverse near y = y0 . That is, there exist
open sets X U and Y V with x0 X and y0 Y such that F : X Y is invertible, with inverse
F 1 : Y X . Moreover, F 1 C r (Y, X), and the derivative of F 1 is related to the derivative of F by
the formula  0
F 1 (F (x)) = (F 0 (x))1 .

See Figure 8.

10
Figure 8: The Inverse Function Theorem. If F 0 (x0 ) is nonsingular, then F : X Y is invertible.

 0
Note: the formula for F 1 follows from the Chain Rule, Proposition 1.3.12. Indeed, since F 1 F =
IdX and Id0X = I (the identity map is a linear map whose derivative is the identity matrix), it follows
that  0
(F 1 F )0 (x) = F 1 (F (x))F 0 (x) = I,

which yields the formula for the derivative of F 1 .


Proof. Proofs may be found in the course references, including Spivak and Hubbard and Hubbard.
Here is an idea of the proof. Given F (x0 ) = y0 , we want to solve

F (x0 + h) = y0 + k

to get h as a function of k. We know that for k = 0, one solution is given by h = 0. We try an


approximation, based on k and h being small, which follows from Proposition 1.3.11:

F (x0 + h) F (x0 ) + A h = y0 + k,

where A = F 0 (x0 ). Eliminating y0 from both sides, we get that A h k, or k A1 h. Note that this
approximation makes sense only if A is invertible. Of course, this is not a proof, and showing that the
inverse actually exists takes more work. In the end, one cannot hope to get an explicit general formula
for the inverse.
Example 1.4.3 (Polar coordinates). The transformation between cartesian and polar coordinates pro-
vides a familiar example where the Inverse Function Theorem applies, and where the role of the condition
on F 0 is apparent. In this case, m = n = 2, and we let U = V = R2 to start with. Well denote coordinates
on V by (x, y), and coordinates on U by (r, ). This might look funny at first, since normally we dont
think of r as being allowed to be negative. But for now, r and are just names of coordinates.
We define the map
F : U V ; (r, ) 7 F (r, ) = (x(r, ), y(r, )),
where
x(r, ) = r cos , y(r, ) = r sin .
Clearly F C (U, V ). Let us compute the derivative F 0 (r, ).
! !
0 x/r x/ cos r sin
F (r, ) = (r, ) = .
y/r y/ sin r cos

Then
det F 0 (r, ) = r.
Let (r0 , 0 ) U , where r0 > 0. Let x0 = r0 cos 0 , y0 = r0 sin 0 . According to the Inverse Function
Theorem, since r0 6= 0, there exist open sets X, Y R2 with (r0 , 0 ) X , (x0 , y0 ) Y , such that
F : X Y is invertible and F 1 C (Y, X).
In fact, we can see this directly. As in Figure 9, we can choose an interval around r0 of half-width a
and an interval around 0 of half-width b. We take

X = {(r, ) | r0 a < r < r0 + a, 0 b < < 0 + b}.

11
We take Y = F (X). Then we can write down the inverse map, as follows:

F 1 (x, y) = (r(x, y), (x, y)),

where
r(x, y) = (x2 + y 2 )1/2 , (x, y) = tan1 (y/x).

The only ambiguity is which branch to take for tan1 (y/x) (there are many angles whose tangent is y/x,
which differ by integer multiples of ). This ambiguity is resolved by taking tan1 (y0 /x0 ) = 0 , and
defining the branch elsewhere to make continuous in Y .

Figure 9: Polar to cartesian coordinates see Example 1.4.3.

Note that F : X Y does not have a smooth inverse if X contains a point with r0 = 0 (in this case,
Y would contain the point (0, 0), and x2 + y 2 )1/2 is not smooth in a neighbourhood of (0, 0), nor can
be continuously defined).
Note also that in order for an inverse to exist, X cant be too big. In particular, if b > , then
F : X Y is not 1-1.

1.5 Diffeomorphisms
Definition 1.5.1 (Diffeomorphism). Let U, V Rn be open. A diffeomorphism from U to V is a
map F : U V such that

(i) F C (U, V )
(ii) F is 1-1 and onto (so that F 1 : V U exists)
(iii) F 1 C (V, U )

The Inverse Function Theorem (Theorem 1.4.2) generates examples of diffeomorphisms.


Let Diff(U, V ) denote the set of diffeomorphisms from U to V .
Example 1.5.2 (Examples and non examples of diffeomorphisms). U = V = R throughout. See Fig-
ure 10.
(
x, x<0
a) f (x) = . Then f
/ Diff(R, R), since f is not smooth. However, f is 1-1 and onto.
2x, x0

b) f (x) = x2 . f
/ Diff(R, R), since f is not 1-1 nor onto. However, f is smooth.

/ Diff(R, R), even though f is smooth, 1-1 and onto, since f 1 (y) = y 1/3 is not smooth.
c) f = x3 . f

12
d) f = x3 + x. f Diff(R, R), since f is smooth, 1 1 and onto, and f 1 is smooth.
Let us verify these assertions for this last example.
Its clear that f (x) = x3 + x is smooth.
The fact that f is 1-1 follows from the Mean Value Theorem, which says that

f (x2 ) f (x1 ) = f 0 (c)(x2 x1 )

for some c [x1 , x2 ], and the fact that, in this case, f 0 (x) = 3x2 + 1, so that f 0 (c) is strictly positive,
and therefore nonzero. It follows that f (x2 ) = f (x1 ) implies that x2 = x1 . (Note: this argument
really shows that, in general, if f has nonvanishing derivative, then f is 1-1.)
The fact that f is onto can be established by appealing to the Intermediate Value Theorem. Note
that as x , f (x) . Since f is continuous, the Intermediate Value Theorem applies,
which in this case says that f must assume every value between and .
Since f is 1-1 and onto, it follows that its inverse exists. For convenience, lets denote f 1 by g . It
remains to show that g is smooth. First, well show that g is differentiable, and obtain an explicit
formula for the derivative. Recalling the Chain Rule above and letting f (x2 ) = y2 , f (x1 ) = y1 ,
x2 = g(y2 ), and x1 = g(y1 ), we obtain

y2 y1 = f 0 (c)(g(y2 ) g(y1 )),

or
g(y2 ) g(y1 ) 1
= 0 ,
y2 y1 f (c)
where c lies between g(y1 ) and g(y2 ). Taking the limit as y1 y2 , we see that g is differentiable
with derivative given by
1 1
g 0 (y) = = 2 .
f 0 (g(y)) 3g (y) + 1
It is clear that repeated differentiation will produce continuous functions (you can show, say by
induction, that the nth derivative of g will be of the form P (g)/(3g 2 +1)N , where P (g) is a polynomial
in g and N is a positive integer), so that g is smooth.
You might wonder why we didnt just invoke the Inverse Function Theorem to establish that f is
a diffeomorphism. After all, since f 0 6= 0, the conditions of the theorem are satisfied. The point is
that the Inverse Function Theorem is a local result; it would establish that f is a diffeomorphism
which maps some open interval U to another open interval V . However, it would not automatically
imply that U and V could both be taken to be all of R In Problem Sheet 1.11 b), there is a
two-dimensional example where det F 0 6= 0, so that the Inverse Function Theorem applies, but
nevertheless F is not invertible on all of R2 .
Let U Rn be open. We write Diff(U ) := Diff(U, U ) to denote the set of diffeomorphisms from U to
itself. Here are some observations about Diff(U ):

i) IdU Diff(u)
ii) If F, G Diff(U ), then F G Diff(U ). This follows from the Chain Rule (Proposition 1.3.12),
which implies that F G is smooth, and the fact that (F G)1 = G1 F 1 , which again by the
Chain Rule is smooth (since F 1 and G1 are smooth).
iii) If F, G, H Diff(U ), then (F G) H = F (G H). In fact, this is true for all maps from U to itself,
not just diffeomorphisms; the point is that composition of maps is associative.
iv) If F Diff(U ), then F 1 Diff(U ).
These observations can be summarised by the following:
Proposition 1.5.3 (Diffeomorphism group). Let U Rn be open. Then Diff(U ) is a group, with product
given by composition and identity given by IdU .

13
Figure 10: Example 1.5.2

1.6 ODEs and vector fields


As we shall see, solutions to first-order systems of ordinary differential equations, or ODEs, are another
source of diffeomorphisms.
Definition 1.6.1 (Vector field). Let U Rn be open. A vector field on U is a map
 
X : U Rn ; x 7 X(x) = X1 (x), . . . , Xn (x) .

5/10
Definition 1.6.2 (First-order system of ODEs). Let U Rn be open, and let X be a vector field on U .
A first-order system of autonomous ODEs on U for a curve x(t) U parameterised by t is given
by
x = X(x), x(0) = x0 , ()
where x0 U . The equation x(0) = x0 is the initial condition. In terms of components, the system is
given by
x i = Xi (x), 1 i n, x(0) = x0 . ()

First-order means that the first derivative of x(t), x , is expressed as a function of x. (A second-order
equation would be one where the second derivative x is expressed in terms of x and x ). Autonomous
means that X does not depend explicitly on t (towards the end of the course, we shall have occasion to
consider nonautonomous systems, or equivalently vector fields X that depend explicitly on t).
From the ODE2 course, you will know that an mth-order nonautonomous system is equivalent to
a first-order autonomous system in a higher dimensional space. Specifically, if the mth-order nonau-
tonomous system is defined on U Rn , then the equivalent first-order autonomous system is defined on
V Rnm+1 . See Problem Sheet 2.3 and 2.4 for examples. Thus, first-order systems are quite general.
A fundamental result in the theory of ODEs is the following:
Theorem 1.6.3 (Existence and uniqueness of solutions of ODEs). Suppose that X : U Rn is smooth.
Then x0 U , T > 0 (which may depend on x0 ) such that (*) has a unique solution x(t) defined for
T < t < T .

14
Figure 11: Geometrical description of the solution to an autonomous first-order system. The derivative
x(t)
of the solution curve x(t) is everywhere given by the vector field X.

Proof. See ODE2. A classic reference is VI Arnold, Ordinary Differential Equations.


In fact, X need not be smooth. It is sufficient that X satisfies a Lipschitz condition. (While we wont
have much occasion to refer to the Lipschitz condition, this means that there exists C > 0 such that for
all x, x0 U , ||X(x0 )X(x)|| < C||xx0 ||. C is called the Lipschitz constant.) If X is merely continuous, but
does not satisfy a Lipschitz condition, then a solution exists but in general is not unique. See Problem
Sheet 2.6 for an example.)
Note that even if X is smooth, the solution x(t) may not be defined for all t. See Problem Sheet 2.5.

Definition 1.6.4 (Complete vector field). If solutions x(t) to (*) exist for all initial conditions x0 U
and for all t, and if x(t) U for all t, then X is said to be a complete vector field on U .
Usually in this course we will restrict our attention to complete vector fields.
Definition 1.6.5 (Linear vector field). Let A be a real n n matrix. Let U = Rn . The vector field
X : U Rn given by
X(x) = A x
is a linear vector field on Rn .

As is shown in Example 1.6.10 below, linear vector fields are complete.


Remark 1.6.6 (Vector fields vs maps). You might be wondering, What is the difference between a
vector field X : U Rn and a map F : U Rn ? As we have defined them, there is in fact no difference.
However, in more advanced treatments, and in particular in the context of differentiable manifolds,
they are defined differently; indeed, vector fields are defined as maps from U to U Rn , under which
x 7 (x, X(x)).
For our purposes, it will suffice to have in mind different interpretations of maps and vector fields,
as shown in Figure 12. We shall think of a map F : U Rn in the usual way, as taking points in U
to points in Rn . Vector fields, on the other hand, we shall think of as assigning a vector to each point
x U , with the base of the vector sitting at x. In physical terms, we think of X as assigning a velocity
at each point of U (for example, the velocity of a fluid or gas moving in U ).

It is useful to indicate explicitly the dependence of the solution x(t) on the initial condition x0 . Thus
we will write x(t, x0 ) instead of x(t). Then (*) looks like
x
(t, x0 ) = X(x(t, x0 )), x(0, x0 ) = x0 . ()
t
The nature of the dependence on initial conditions is given by the following:
Theorem 1.6.7 (Smooth dependence on initial conditions). Suppose X : U Rn is smooth, and that
x(t, x0 ) U is defined for all x0 U and T < t < T for some T > 0. Then for all t (T, T ), the map
which takes initial conditions to solutions at time t,

x(t, ) : U U ; x0 7 x(t, x0 ),

is smooth.
Proof. See ODE2 references; VI Arnold, Ordinary Differential Equations.

15
(a) Map from U to Rn (b) Vector field on U

Figure 12: Distinction between maps and vector fields

Definition 1.6.8 (Flow). Let X : U Rn be a smooth, complete vector field on U , and let x(t, x0 )
denote the solution of (*). The flow of X, denoted , is the map on R U defined by

: R U U ; (t, x0 ) 7 t (x0 ) = x(t, x0 ).

Note that for all t, t is a map from U to U . Often we will refer to t (as well as ) as the flow of
X. t maps initial conditions at t = 0 to solutions at t. See Figure 13. In terms of the flow, the system
(*) can be written as
t
(x ) = X(t (x0 )), 0 (x0 ) = x0 . ()
t 0
If we omit the argument x0 , this can be written more concisely as
t
= X t , 0 = IdU . ( )
t

Figure 13: The flow map t

The flow map t is really just another notation for the solutions to a system of first-order ODEs.
However, it brings to the fore certain properties of these solutions, which will be important to us. The
following result follows straightforwardly from Theorems 1.6.3 and 1.6.7 (indeed, you might regard it as
a restatement of these theorems), but it will be basic to much of what follows:
Proposition 1.6.9 (Properties of the flow). Let be the flow of a smooth, complete vector field
X : U Rn on U . Then

i) 0 = IdU

16
ii) t s = t+s
iii) t Diff(U ) (hence, flows provide examples of diffeomorphisms).
iv) C (R U, U ),

Proof.
i) This is clear from (**) above.
ii) See Figure 14. This follows from the uniqueness of solutions to (*). Fix x0 U . Let

x1 (t) = t s (x0 ) = t (s (x0 )),


x2 (t) = t+s (x0 ).

We will show that x1 (t) and x2 (t) satisfy the same system of ODEs and the same initial condition.
First, we compute x 1 (t). We may write that x1 (t) = t (s (x0 )) = t (y), where we have introduced
y = s (x0 ) for convenience. Then using (**),

t
x 1 (t) = (y) = X(t (y)).
t
But X(t (y)) is just X(x1 (t)), so we get that x1 (t) satisfies the system

x 1 = X(x1 ).

Next, we compute x 2 (t). From (**), replacing t by t+s , we have that


t+s
x 2 (t) = (x0 ) = X(t+s (x0 )).
t
But this is equivalent to
x 2 = X(x2 ).

Thus, x1 (t) and x2 (t) satisfy the same first-order system, namely (*). At t = 0, we have that
x1 (0) = 0 (s (x0 )) = s (x0 ), while x2 (0) = s (x0 ). Thus, x1 (t) and x2 (t) satisfy the same initial
condition, and therefore must be the same, by Theorem 1.6.3.

Figure 14: t s = t+s . That is, you can evolve from initial conditions to t + s in a single step of
duration t + s, or in two successive steps of durations s and t.

iii) The fact that t : U U is smooth follows from Theorem 1.6.7. It remains to show that t has a
smooth inverse. In fact, it is easily seen that (t )1 = t , since

t t = 0 (by (ii)) = IdU (by (i)).

Also, t C (U, U ), by Theorem 1.6.7. Therefore, t has a smooth inverse, so that t Diff(U ).
iv) Theorem 1.6.7 tells us that for fixed t, t : U U is smooth, so that all partial derivatives of t
with respect to components of x exist. We need to show that all partial derivatives with respect to
both components of x as well as t exist. For this calculation, it is probably safest to use components
(although it makes for longer and more cumbersome formulas). From (**),

it
(x) = Xi (t (x))
t

17
(note that we are using x rather than x0 for the argument of t this makes no real difference,
and is only for the sake of clarity). The expression obtained on the right-hand side is evidently
smooth in x (it is the composition of functions which are smooth in x), so it follows that we can
apply a partial derivative with respect to t to t followed by any number of partial derivatives
with respect to x and obtain something smooth. By the equality of mixed partials (the stronger
version given in the note in the proof of Proposition 1.3.16), it follows that all mixed partials of
involving at most one partial derivative with respect to t are smooth. Returning to t , taking a
second derivative with respect to t and using the Chain Rule, we get that
n
2 it Xi
(t (x))Xj (t (x)).
X
(x) =
t2 xj
j=1

The resulting expression is evidently smooth in x, so by repeating the preceding argument we may
conclude that all mixed partials of involving at most two partial derivative with respect to t
are smooth. We can continue in this way, showing that any number of partial derivatives of t
with respect to t followed by any number of partial derivatives with respect to components of x
is smooth. It then follows by the strong version of the equality of mixed partials that all partial
derivatives of t exist (and are therefore smooth). Thus, C (R U, U ).

Example 1.6.10 (Linear vector fields). Let A be an n n real matrix. Let U = Rn , and let X(x) = A x,
so that X is a linear vector field. Consider the system

x = X(x) = A x, x(0) = x0 .

As was shown in ODE2, the solution is given by

x(t) = eAt x0 ,

where the matrix exponential may be defined by the power series



Aj tj
eAt :=
X
.
j!
j=0

Thus, the flow is given by


t (x0 ) = eAt x0 .
Let us verify Proposition 1.6.9 (ii). We have that
 
(t s )(x0 ) = t (s (x0 )) = t eAs x0 = eAt eAs x0 ,

t+s (x0 ) = eA(t+s) x0 .

The fact that these are equal is equivalent to the statement

eAt eAs = eA(t+s) . (7)

This is the familiar formula for the product of exponentials, here applied to the matrix exponential. Note
that it is not true that eA eB = eA+B for general matrices A and B ; this holds only if the matrices A and
B commute, i.e. AB = BA.
Let us verify (7) directly, using the power series definition of the exponential. We have that

j j k k
A t A s Aj+k tj sk
eAt eAs =
X X X
= .
j! k! j! k!
j=0 k=0 j,k=0

Rearrange the sum as follows: Let m = j + k, so that m takes values between 0 and . For given m, the
index j can take values between 0 and m. For given m and j , we have that k = m j . Thus, if F (j, k)
denotes an arbitrary summand depending on j and k, we have that

X X
X m
F (j, k) = F (j, m j).
j,k=0 m=0 j=0

18
Applying this to the preceding, we get that
X
m m
Am tj smj Am X m!
eAt eAs = tj smj ,
X X
=
j! (m j)! m! j! (m j)!
m=0 j=0 m=0 j=0

where in the last equality we have multiplied and divided by m!. We can evaluate the sum over j using
the binomial theorem,
m m
!
m! m j mj
tj smj = = (s + t)m .
X X
t s
j! (m j)! j
j=0 j=0

Substituting into the preceding, we get that



Am (s + t)m
eAt eAs = = eA(t+s) ,
X
m!
m=0
as required.
Definition 1.6.11 (One-parameter subgroup of diffeomorphisms). Let U Rn be open. A map :
R U U ; (t, x) 7 t (x) such that C (R U, U ), t Diff(U ) and
i) 0 = IdU ,
ii) t s = t+s
is called a one-parameter subgroup of diffeomorphisms.
Thus, Proposition 1.6.9 says that the flow of a smooth, complete vector field is a one-parameter
subgroup of diffeomorphisms. Example 1.6.10 says that t (x) = eAt x is a one-parameter subgroup of
diffeomorphisms.

Proposition 1.6.9 shows how vector fields give rise to flows. The following is a sort of converse; it
shows that a one-parameter subgroup of diffeomorphisms give rise to a vector field.
Proposition 1.6.12 (Flows and vector fields). Let U Rn be open, and let : R U U be a
one-parameter subgroup of diffeomorphisms on U . Let

t
X(x) = (x) .
t t=0
Then is the flow of X.
Proof. Let x(t) = t (x0 ). We need to show that
x = X(x), x(0) = x0 .
The initial condition follows from the fact that 0 = IdU . As for the differential equation, we have that
x(t + h) x(t) (x0 ) t (x0 )
x(t)
= lim = lim t+h .
h0 h h0 h
Since t+h = h t , we have that
t+h (x0 ) t (x0 ) = h (t (x0 )) t (x0 ) = h (y) y,
where for convenience we have introduced y = t (x0 ) (note that y is just x(t), but for now it is easier to
write y ). Then
h (y) y h
x(t)
= lim = (y) = X(y) = X(x(t)),
h0 h h h=0
as required.
Terminology. We will say that the one-parameter subgroup of diffeomorphisms t is generated by X.
Example 1.6.13 (Linear maps). Let A be an n n real matrix. From Example 1.6.10, we know that
t (x) = exp(tA) x is a one-parameter subgroup of diffeomorphisms on Rn . Let us compute the associated
vector field directly, using Proposition 1.6.12. Differentiating term-by-term, we have that

t tA  1 2 2
X(x) = (x)
= (e x)
= x + tA x + 2 t A x + = A x.
t t=0 t t=0 t t=0
Not surprisingly, we recover the linear vector field X(x) = A x.

19
1.7 Push-forward map

Notation: Let X (U ) denote the set of smooth vector fields on an open set U Rn . Usually we will
assume that vector fields in X (U ) are complete, but this isnt automatically the case. We note that if
X, Y X (U ), then X + Y X (U ). Also, if f C (U ), then f X X (U ).

We can motivate the push-forward map by considering a change of variables in a first-order system
of ODEs. Let X X (U ), and consider the system
x = X(x), x(0) = x0 . ()

Let U, V Rn be open, and let F Diff(U, V ). Given x(t) satisfying (*), let
y(t) = F (x(t)).

Question: What system of ODEs does y(t) satisfy?

Figure 15: Change of variables in system of ODEs

This is a computation. It will be clearer if we consider components.


n
d i F i dxj
y i (t) =
X
F (x(t)) = (x(t)) (by the Chain Rule)
dt xj dt
j=1
n
F i
(x(t))Xj (x(t))
X
= (using (*)).
xj
j=1

The rhs of the preceding is a function of x(t). We would like to express it instead as a function of y(t).
Since F is a diffeomorphism, we may write
x(t) = F 1 (y(t)).

Therefore,
n
F i 1
y i (t) = (F (y(t)) Xj (F 1 (y(t))) = Yi (y(t)),
X
xj
j=1
where
n
F i 1
Yi (y) = (F (y)) Xj (F 1 (y)).
X
xj
j=1
From now on, to simplify notation, we will use the summation convention. This saves having to
P
write the summation symbol, , in situations where its presence can be inferred. According to the
summation convention, we agree that if an index j appears twice on one side of an equation, once as an
upper index and once as a lower index, then we sum over that index. Note that an upper index in the
denominator of a derivative, e.g. the index j in F i /xj , counts as a lower index. Therefore, according
to the summation convention, we may write
F i 1
Yi (y) = (F (y))Xj (F 1 (y)),
xj
since the index j is to be summed over.
The preceding change-of-variables calculation motivates the following definition:

20
Definition 1.7.1. Let U, V Rn be open, F Diff(U, V ) and X X (U ). The push-forward of X by F ,
denoted F X, is the vector field in X (V ) defined by either of the following equivalent formulas:

F i 1
Yi (y) = (F (y)) Xj (F 1 (y)),
xj
F i
Yi (F (x)) = (x) Xj (x).
xj
More compactly, we may omit arguments and component indices and write these formulas as
 
Y = F 0 X F 1 ,

Y F = F 0 X.

Note that the push-forward by the identity map IdU is just the identity map on X (U ). That is,

IdU X = X

for all X X (U ). A pictorial description follows below.

Figure 16: The push-forward of a vector field. Take x U , and consider the nearby point x + X(x)
(which can be thought of as the point reached from x after moving with velocity X(x) for a short time
). F maps x into y , and it maps x + X(x) into a point near y . Up to O(2 ) corrections, F (x + X(x)) is
y + Y(y) (which may be thought of as the point reached from y after moving with velocity Y(y) for a
short time ). Here, of course, Y is the push-forward F X.

Note that the push-forward can be regarded as a map F : X (U ) X (V ).


Example 1.7.2 (Linear vector fields). We take U = V = Rn . Let A be an n n real matrix, and
X(x) = A x the associated linear vector field. Let S be an invertible n n real matrix, and F (x) = S x
a linear diffeomorphism (c.f. Example 1.4.1 a), with inverse F 1 (y) = S 1 y . Then F 0 (x) = S , from
Example 1.3.17 c).
We compute the push-forward F X as follows. We have that Y(F (x)) = F 0 (x) X(x), which gives
Y(S x) = SA x. Letting y = S x, we get that

Y(y) = SAS 1 y.

Thus, in the linear case, the push-forward corresponds to matrix conjugation.


Further examples of calculations of the push-forward can be found in Problem Sheet 3.1 and 3.2, as
well as previous exam papers (see Question 2 of the recent exams).

Our original motivation for introducing the push-forward changing coordinates in systems of ODEs
- is formalised by the following:
Proposition 1.7.3 (Push-forward and flow). Let X X (U ), and let be the flow of X. Let F Diff(U, V ).
For t R, let us define
t := F t F 1 .
Then is the flow of F X.

21
Proof. The plan is as follows: First, well show that t is a one-parameter subgroup of diffeomorphisms.
Then well use Proposition 1.6.12 to establish that t is generated by F X.
Clearly C (R V, V ), since is a composition of maps which are smooth (from the Chain Rule,
again). Also, t Diff(V ), since t is smooth, and the inverse of t is given by

(t )1 = (F t F 1 )1 = F (t )1 F 1 = F t F 1 = t ,

and t (the inverse of t ) is smooth. Finally, 0 = F 0 F 1 = F F 1 = IdV , and


   
t s = F t F 1 F s F 1 = F t s F 1 = F t+s F 1 = t+s .

Thus, from Definition 1.6.11, is a one-parameter subgroup of diffeomorphisms. .


By Proposition 1.6.12, the vector field Y that generates is given by

t
Y(y) = (y) .
t t=0

Taking one component at a time for clarity, and using the Chain Rule, we get
j

i

F i t 1
i 1 1
Y (y) = F (t (F (y))) = (t (F (y))) (F (y)) .

t x j t
t=0
t=0

Evaluating at t = 0 and using (**) (see page 16), we get

F i 1
Yi (y) = (F (y)) Xj (F 1 (y)).
xj
Comparing the preceding expression to Definition 1.7.1 of the push-forward, we see that Y = F X.
Example 1.7.4 (Linear case). Let A be a real n n matrix, and let

X(x) = A x.

The flow of X is given by


t (x) = etA x.
Set S be an invertible real n n matrix, and let

F (x) = S x.

From Example 1.7.2,


Y(y) = F X(y) = SAS 1 y.
As Y is a linear vector field, its flow, t , is also given by a matrix exponential,
1
t (y) = etSAS y.

Lets verify that this expression for t is consistent with Proposition 1.7.3. According to Proposi-
tion 1.7.3, we have that
t (y) = F (t (F 1 (y))) = SetA S 1 y.
Thus, we need to show that
1
etSAS = SetA S 1 .
This can be seen from the power series for the matrix exponential, as follows:

j j j j
tSAS 1 t 1 j t j 1 t A S 1 = SetA S 1 .
X X X
e = (SAS ) = SA S =S
j! j! j!
j=0 j=0 j=0

12/10
Proposition 1.7.5 (Push-forward by composition). Let U, V, W Rn be open. Let F Diff(U, V ),
G Diff(V, W ). Then
(G F ) = G F

22
Note that F is a map from X (U ) to X (V ), G is a map from X (V ) to X (W ), and (G F ) is a map
from X (U ) to X (W ). Thus, the assertion makes sense.
Proof.
We will apply both maps to a vector field X X (U ). The result will follow from the Chain Rule.
For convenience, let H = G F , y = F (x), and z = G(y) = H(x). Also, let Y = F X.
Consider the left-hand side first. We have that

H X(z) = (G F )0 (x) X(x) = G0 (y)F 0 (x) X(x),

where we have used the Chain Rule (Proposition 1.3.12) and the last formula for the push-forward given
in Definition 1.7.1.
On the right-hand side, we have that

(G F X)(z) = (G Y)(z) = G0 (y) Y(y) = G0 (y)F 0 (x) X(x).

As the expressions obtained on both sides are the same, the result follows.
Exercise. Show that in the linear case, Proposition 1.7.5 is equivalent to
 
T SetA S 1 T 1 = (T S)etA (T S)1 .

1.8 Jacobi bracket


Next, we consider the push-forward of one vector field by the flow of another.
Let U Rn be open, and let X, Y X (U ). Let s denote the flow of Y. As we know, s Diff(U ), so
we may define
Xs = s X.
Xs is a family of vector fields in X (U ) parameterised by s. The derivative of Xs with respect to s is also
a vector field in X (U ). We want to evaluate this derivative at s = 0. We proceed as follows: From the
last formula in Definition 1.7.1, we have that

Xs (s (x)) = 0s (x) X(x).

In terms of components (the second formula in Definition1.7.1),

is
Xis (s (x)) = (x)Xj (x).
xj
Next, differentiate both sides with respect to s and set s = 0. Note that, since 0 = IdU , we have that
X0 = X. On the right-hand side, we get
!
2 is is Yi

j
(x) X (x) = Xj (x) = (x)Xj (x), (RHS)

sx j xj s

xj
s=0 s=0

where we have used the relation (**) between the flow s and its generating vector field Y. On the
left-hand side, taking into account both instances of s, we get

j
!
Xis Xis

i s

Xs (s (x)) = (s (x)) + (s (x)) (x)

s s x j s
s=0
s=0

s=0

Xsi
X i
= (x) + (x)Yj (x), (LHS)

s xj
s=0

where once again we have used the relation (**) between the flow s and its generating vector field Y.
Equating (RHS) and (LHS), we get

Xis Yi Xi

j
(x) = (x)X (x) (x)Yj (x).

s xj xj
s=0

Notation. Let  

= ,..., n
x1 x

23
We write
Yi
 

(X ) Yi = Xj Yi = Xj .
xj xj
Then, dropping the component index i, we may write

Xs
= (X ) Y (Y ) X.
s s=0

Definition 1.8.1 (Jacobi bracket). Let U Rn be open, and let X, Y X (U ). The Jacobi bracket of
X and Y, denoted [X, Y], is the vector field in X (U ) given by

[X, Y] := (X ) Y (Y ) X.

Proposition 1.8.2 (Jacob bracket and flows). Let X, Y X (U ), and let s be the flow of Y. Then


s X = [X, Y].
s s=0

Proof. See the preceding calculation.


Example 1.8.3 (Simple Jacobi bracket calculation). The following is a typical short part-question from
previous exams: Let X = (y, x) and Y = (y 2 , x2 ) be vector fields on R2 . Compute [X, Y].
We have that  

(X ) Y = y +x (y 2 , x2 ) = (2xy, 2xy).
x y
Also,  

(Y ) X = y2 + x2 (y, x) = (x2 , y 2 ).
x y
Then
[X, Y] = (2xy x2 , 2xy y 2 ).

Example 1.8.4 (Jacobi bracket of linear vector fields). See Problem Sheet 3.5(a).
Proposition 1.8.5 (Simple properties of the Jacobi bracket). Let X, Y X (U ). Then

[X, Y] = [Y, X] (antisymmetry). (a)

If Z X (U ), then
[X, Y + Z] = [X, Y] + [X, Z]. (linearity). (b)

Also, if f C (U ), then

[X, f Y] = f [X, Y] + (X f )Y (product rule). (c)

Proof. The first property follows immediately from the formula for the Jacobi bracket in Definition 1.8.1.
The second follows from noting that

(X ) (f Y) = f (X ) Y + (X f ) Y.

The following shows that the push-forward preserves the Jacobi bracket. 13/10

Proposition 1.8.6 (Push-forward of Jacobi bracket). Let X, Y X (U ), and let F Diff(U, V ). Then

[F X, F Y] = F [X, Y].

Proof. It is possible to prove this result directly from the formulas for the push-forward and the Jacobi
bracket, but the calculations are surprisingly long and rather tricky. The following argument uses the
relations between brackets and flows.
From Proposition 1.8.2,

[X, Y] = s X.
s s=0

24
Then

F [X, Y] = F s X = F s X.
s s=0 s s=0

Note that we have interchanged the partial derivative with respect to s and the push-forward F . This
can be justified by expressing the partial derivative as a limit of quotients, as follows:
    !
s X X F s X F X
F s X = lim F = lim = F s X.
s s=0 s0 s s0 s s s=0

From Proposition 1.7.5,

F s X = (F s ) X = (F s F 1 F ) X = (F s F 1 ) F X.

Let s = F s F 1 . We have shown that




F [X, Y] = s F X.
s s=0

By Proposition 1.7.3, s is the flow of the vector field F Y. It follows from Proposition 1.8.2 that


s F X = [F X, F Y].
s s=0

Therefore,
F [X, Y] = [F X, F Y].

Example 1.8.7 (Push-forward of the Jacobi bracket Linear case). See Problem Sheet 3.5(b).
Next, we consider the push-forward of the Jacobi bracket of two vector fields by the flow of a third.
Differentiating with respect to the flow parameter yields an important result, called the Jacobi identity.
Let X, Y, Z X (U ), and let s be the flow of Z. We consider s [X, Y]. From Proposition 1.8.6,

s [X, Y] = [s X, s Y].

Differentiating the left-hand side and setting s = 0, we obtain from Proposition 1.8.2 that


s [X, Y] = [[X, Y], Z] = [Z, [X, Y]]. (LHS)
s s=0

Differentiating the right-hand side and setting s = 0, we get that


   

[s X, s Y] = s X, Y + X, s Y = [[Z, X], Y] + [X, [Z, Y]], (RHS)
s s=0 s s=0 s s=0

where we have used Proposition 1.8.2 again. In fact, some argument is required to justify the first
equality in the preceding. This can be done by expressing the partial derivative as the limit of a quotient
and using the linearity of the Jacobi bracket (Proposition 1.8.5(b)).

1 1
[s X, s Y] = lim ([s X, s Y] [X, Y]) = lim ([s X, s Y] [s X, Y] + [s X, Y] [X, Y])
s s=0 s0 s s0 s
        !
s Y Y s X X
= lim s X, + lim ,Y = s X, Y + X, s Y .
s0 s s0 s s s=0 s s=0

Equating (LHS) and (RHS) and cancelling the minus signs, we get

[Z, [X, Y]] = [[Z, X], Y] + [X, [Z, Y]].

Proposition 1.8.8 (Jacobi identity). Let X, Y, Z X (U ). Then

[Z, [X, Y]] = [[Z, X], Y] + [X, [Z, Y]].

25
Proof. See preceding calculation.
Using the antisymmetry of the Jacobi bracket (Proposition 1.8.5(a)), we can also write the Jacobi
identity as
[[X, Y], Z] + [[Y, Z], X] + [[Z, X], Y] = 0.
The Jacobi identity can also be proved directly using the explicit formula for the Jacobi bracket see
Problem Sheet 3.3.
Example 1.8.9 (Push-forward of the Jacobi bracket Linear case). See Problem Sheet 3.5(c).
Next, we have an extension of Proposition 1.8.2 there we calculated the derivative with respect to
s of the push-forward with respect to a flow, s , at s = 0. In the following, we evaluate the derivative
for arbitrary s.
Proposition 1.8.10. Let U Rn be open, let X, Y X (U ) and let s denote the flow of Y. Then

s X = [s X, Y].
s
Note that Proposition 1.8.2 follows from the special case where s = 0 (note that 0 X = X for any
vector field X).
Proof. We express the derivative as the limit of a quotient, and use the composition properties of flows
and push-forward maps. We have that
1 
s X = lim (s+) X s X .
s 0 

From Proposition 1.6.9,


s+ =  s ,
and from Proposition 1.7.5,
(s+) =  s .
Therefore,
1 1 X

,
s X = lim ( s X s X) = lim  X
s 0  0 
= s X. But the last expression gives
where X

1 =
X

= [X,
Y],
lim  X  X
0   =0

where in the last equality we have used Proposition 1.8.2. Substituting above and recalling how we
, we get that
defined X

s X = [s X, Y].
s

The following proposition shows that a vector field is invariant under push-forward by its own flow. 14/10
Proposition 1.8.11. Let Y X (U ) and let s be its flow. Then
s Y = Y.

Proof. From Proposition 1.7.3, for any diffeomorphism F Diff(U ),




F Y = F t F 1 .
t t=0

Taking F = s , we get that




s Y = s t s = .
t t=0 t t=0 t

since s t s = t . Since

= Y,
t t=0 t
the result follows.

26
Theorem 1.8.12. Let X, Y X (U ) and let t , s denote the flows of X, and Y. Then
t s = s t , s, t [X, Y] = 0.

Proof. Well precede by constructing a chain of equivalent equations, starting with the left-hand side
assertion. After composing with s on the left, we have that
t s = s t , s, t t = s t s , s, t.

t is the flow generated by X, of course, and s t s , for fixed s, is the flow in t generated by s X
(cf Proposition 1.7.3). Flows are the same if and only if their generating vector fields are the same (this
is uniqueness of solutions to ODEs). Therefore,
t = s t s , s, t X = s X, s.

Since X does not depend on s, the right-hand side equation above implies that the derivative of s X
with respect to s must vanish. Conversely, if we know that s X is independent of s, then it must be
equal to 0 X, which is just X. Therefore,

X = s X, s s X = 0, s.
s
From Proposition 1.8.10,

s X = 0, s [s X, Y] = 0, s.
s
Since Y = s Y (cf Proposition 1.8.11), the bracket in the right-hand side equation above may be written
as [s X, s Y], or, using Proposition 1.8.6, s [X, Y]. Therefore,
[s X, Y] = 0, s s [X, Y] = 0, s.

But it is clear that


s [X, Y] = 0, s [X, Y] = 0.
For if [X, Y] vanishes, then its push-forward vanishes (any push-forward of the zero vector field is zero).
And if s [X, Y] = 0 (for any s in fact, not necessarily all), then application of s to both sides of this
equation, and the fact that s s is the identity map on vector fields, yields [X, Y] = 0. Thus we come
to the end of our chain of equivalences, and the result is proved.
Example 1.8.13 (Simple illustration of Theorem 1.8.12). Let U = R2 with coordinates (x, v), and let
X(x, v) = (v, a) and Y = (b, c), where a, b and c are constants. We have that

[X, Y] = (c, 0),

which vanishes if and only if c = 0.


The flow of X was found in Problem 2.4 (there we set a = 1). The result is
t (x, v) = (x + vt + at2 /2, v + at),

and may be interpreted in terms of position x and velocity v under uniform acceleration a as a function
of time. The flow for Y is even simpler; its just a translation in x and v , as follows:
s (x, v) = (x + bs, v + cs).

We have that
(t s )(x, v) = t (s (x, v)) = t (x + bs, v + cs) = (x + bs + (v + cs)t + at2 /2, v + cs + at).

On the other hand,


(s t )(x, v) = s (t (x, v)) = s (x + vt + at2 /2, v + at) = (x + vt + at2 /2 + bs, v + at + cs).

Comparison of the two shows that


(t s )(x, v) (s t )(x, v) = (cst, 0).

Thus, the flows commute if and only if c = 0.


This can be interpreted as saying that motion under constant acceleration is invariant under trans-
lations in position (you can translate either at the beginning or the end of the motion) but not under
translations in velocity (if you give an object a kick, it makes a difference whether you do it at the
beginning or at the end of its motion; a kick at the beginning affects the final position, while a kick at
the end doesnt).

27
1.9 The Pull-back (functions only)
Definition 1.9.1 (Pull-back on functions). Let U, V Rn be open, and let F Diff(U, V ). The
pull-back by F , denoted by F , is a map from smooth functions on V to smooth functions on U ,
which takes a function g to the function g F . That is,

F : C (V ) C (U ) : g 7 F g := g F

As we will see in Section 3.3, the definition does not require that F be a diffeomorphism; indeed, F
could be any smooth map, and the definition would still make sense.

Figure 17: The pullback. F is a diffeomorphism from U to V , g a smooth function on V and F g = g F


a smooth function on U .

The pullback F and the diffeomorphism F are equivalent ways of viewing the same basic object. To
elaborate, it is clear from the definition that if we know F , then we know F . The converse is also true,
namely that F determines F . To see this, let us define functions ybj on V by

ybj (y) = y j .

That is, the function ybj picks out the j th component of its argument. Now consider the pull-back of ybj .
We have that
(F ybj )(x) = ybj (F (x)) = F j (x);
the pull-back of ybj is the j th component of F . Thus, if we know the pull-backs just of the functions ybj
for j = 1, . . . , n, then we have determined F .
The diffeomorphism F is defined on an infinite-dimensional function space (which makes it more
complicated), but it is a linear map (which makes it simpler); that is, if g1 , g2 C (V ) and a1 , a2 R,
then
F (a1 g1 + a2 g2 ) = a1 F g1 + a2 F g2 .
In contrast, F is defined as a map between finite-dimensional spaces, but it may be a nonlinear.
For some considerations, it is advantageous to consider one point of view or the other. For our
discussion of non-commuting flows in the following section, the pull-back point of view is useful. Hence
we are developing it here.
Proposition 1.9.2 (Pullback by a composition). Let U, V, W Rn be open, and let F Diff(U, V ),
G Diff(V, W ), so that G F Diff(U, W ). Then

(G F ) = F G .

Proof. Let h C (W ). Then

((G F ) h)(x) = h((G F )(x)) = h(G(F (x))).

On the other side, we have that

(F G h)(x) = (G h)(F (x)) = h(G(F (x))).

28
What we really want to consider here are pull-backs by flows. Let X be a smooth vector field on U
with flow t , and let f be a function on U . Regarding t as a diffeomorphism from U to U , we can
consider the pull-back of f by t , namely t f . Let us calculate the derivative of t f with respect to t.
We have that

( f )(x) = f (t (x)).
t t t
Using the Chain Rule (Proposition 1.3.12), we get that
j
f t
f (t (x)) = (t (x)) (x).
t xj t

From the relation between vectors fields and flows (see (**) and (***) on page 14), we have that
j
t
(x) = Xj (t (x)).
t
Thus,
f
( f )(x) = Xj (t (x)) j (t (x)) = (X f )(t (x)) = (t (X f ))(x).
t t x
Omitting the argument x, this can be written compactly as

f = t (X f ). (8)
t t
Let us take a second t-derivative. We get that

2
t f = (X f ).
t2 t t
Letting g = X f , we may write this as

2
t f = g = t (X g),
t2 t t

where in the last equality we have used (8). Finally, we replace g back by X f to get

2
t f = t ((X )(X f ) = t ((X )2 f ).
t2
It is clear that this can be generalised to higher t-derivatives, as follows:

j
t f = t ((X )j f ). (9)
tj
The expression

X = Xj
xj
maps smooth functions into smooth functions, and is called a linear first-order differential operator
(operator because it maps functions to functions, first-order differential because it involves first-
order partial derivatives, and linear because X (f + g) = X f + X g ). There is a actually a special
name and notation for this operator, which we introduce next.
Definition 1.9.3 (Lie derivative of functions). Let U Rn be open, and let X X (U ). The Lie derivative
with respect to X, denoted LX , is the mapping

LX : C (U ) C (U ); f 7 LX f = X f.

The Lie derivative LX and the vector field X stand in the same relation to each other as do the pullback
F and the diffeomorphism F . LX and F are linear maps (operators) defined on an infinite-dimensional
function space, while X and F are nonlinear maps defined on a finite-dimensional space.
Using the Lie derivative notation, we may write (9) as

j

j

t f = t L f . (10)
tj X

With these formulas for the t-derivatives of t f , we can develop a power series for t f .

29
Proposition 1.9.4 (Power series for pull-back by a flow). Suppose t f (x) is analytic in t; that is,
suppose t f (x) has a convergent power series in t. Then that power series is given by

t f = etLX f,

where
j
t j
etLX :=
X
L .
j! X
j=0

Proof. By assumption, t f has a convergent power series, which we may write as



!
j tj

t f =
X
j
t f .
t j!
j=0 t=0

From (10) and the fact that 0 f = f for all f (since 0 is the identity map), we have that
!
j
 
= t LX f
j j
t f = LX f.

t j t=0
t=0

Substituting this expression into the power series above, we get



tj
t f =
j
= etLX f,
X
(LX f )
j!
j=0

as required.
Example 1.9.5 (Pull-back: Linear case). Let us illustrate the concepts and results introduced in this
section in the linear case. Throughout, U = Rn .
A linear function f (x) on Rn is a function of the form

f (x) = a x,

where a Rn is a fixed vector.


We have already seen linear diffeomorphisms, starting with Example 1.5.2(a). Let F (x) = S x, where
S is an invertible n n matrix. Then

(F f )(x) = f (F (x)) = f (S x) = a S x,

Thus F f is also a linear function, which we may write as b x, where b = S T a, and S T is the transpose
of S .
Let X(x) = A x, where A is an n n matrix. Then

(LX f )(x) = X(x) f (x).

It is clear that f (x) = a. Therefore,

(LX f )(x) = a X(x) = a A x.

It follows that
j
(LX f )(x) = a Aj x.
Then
j j j j
X t j X t a Aj X t A = a eAt x = t f (x),
L f (x) = x=a
j! X j! j!
j=0 j=0 j=0

where t (x) = eAt x is the flow of X(x). Thus we verify Proposition 1.9.4 directly.
Just as the diffeomorphism F and pull-back map F are different representations of the same object,
so too the vector field X and LX are different representations of the same object. (We wont elaborate
on this relationship further, but for those of you who wish to pursue it, it can be understood in terms
of the group of diffeomorphisms on an open set U and its natural representation on C (U ).) With this
in mind, we recall that we have an operation, the Jacobi bracket, on vector fields. We would like to
determine what the Jacobi bracket corresponds to in terms of the Lie derivative.

30
Proposition 1.9.6 (Jacobi bracket and Lie derivative). Let U Rn be open, and let X, Y X (U ). Then

L[X,Y] = LX LY LY LX .

Proof. We will apply both sides of the preceding to a smooth function, f , and find we obtain the same
result. The key point is that, on the right-hand side, terms involving second partial derivatives vanish,
due to the equality of mixed partials.
On the left-hand side, we get
!
f Yj Xj f
L[X,Y] f = [X, Y]j j = Xk Yk , (LHS)
x x k xk xj

where we have used the component expression for [X, Y]j .


On the right-hand side, we get

(LX LY LY LX ) f = LX (LY f ) LY (LX f ).

Writing the first term in terms of components, we get

Yj f 2
 
f k j f
LX (LY f ) = Xk (LY f ) = Xk Yj j = Xk + X Y .
x k xk x xk xj xj xk

Note that second partial derivatives of f appear. Similarly, we have that

Xj f 2f Xj f 2f
LY (LX f ) = Yk j
+ Yk Xj = Yk j
+ Xk Yj ,
k
x x j
x x k k
x x xk xj

where in the last equality we have interchanged the summation indices j and k (which we are free to
do). Subtracting the preceding expression from the one before and using the equality of mixed partials
(Proposition 1.3.16), we get that

Yj f j
k X f
(LX LY LY LX ) f = Xk Y . (RHS)
xk xj xk xj

We see that (LHS) and (RHS) are the same, and the result follows.

1.10 Noncommuting flows


We want to extend our consideration of the relationship between the Jacobi bracket of vector fields and
the commutativity of their corresponding flows, as in Theorem 1.8.12, to the case where the flows do not
commute.
Theorem 1.10.1 (Jacobi bracket and non-commuting flows). Let U Rn be open. Let X, Y X (U ) be
vector fields on U and let t , s be their respective flows. Let

s,t = s t s t .

Regarding s and t as small, we have that

s,t = I + stL[X,Y] f + O(3),

where O(3) denotes terms of cubic or higher order in s and t (e.g., terms proportional to s3 , s2 t, or more
generally, to sa tb , where a + b > 2).
Proof. Consider s,t f . By Proposition 1.9.2,

s,t f = t s t s f.

By Proposition 1.9.4, this is given by

etLX esLY etLX esLY f.

31
Let us expand the exponentials through terms second order in s and t. We obtain
    
1 + tLX + 12 t2 L2X 1 + sLY + 12 s2 L2Y 1 tLX + 12 t2 L2X 1 sLY + 12 s2 L2Y f + O(3).

Let us compute the contributions order by order. At zeroth order in s and t, we obtain f (obtained from
taking the term 1 from each of the four factors above). At first order in s and t, we obtain

tLX f + sLY f tLX f sLY f = 0,

so there is no first-order contribution. At second order, we obtain contributions for the terms s2 and t2
in each of the factors as well as from products of first-order terms from pairs of factors. The first set of
terms yield  
1
2 t2 L2X f + s2 YY2 f + t2 L2X f + s2 L2Y f = t2 L2X f + s2 YY2 f.

The second set yield

tsLX LY f t2 L2X tsLX LY f stLY LX f s2 L2Y f + tsLX LY f.

Combining the preceding expressions and accounting for cancellations (in particular, terms in s2 and t2
cancel), we obtain the second-order contribution

tsLX LY f tsLX LY f stLY LX + tsLX LY = st (LX LY LY LX ) f.

By Proposition 1.9.6, this can be written as

stL[X,Y] f.

Combining the previous calculations, we obtain

s,t f = f + stL[X,Y] f + O(3),

as required.
In the preceding, we worked with the pull-back representation of flows. In keeping with the discussion
following Definition 1.9.1, there is an equivalent statement in terms of flows. This statement is illustrated
in Figure 18;
s,t (x) = x + st[X, Y](x) + O(3).

Figure 18: The flows t , s , t and s are applied in succession to an initial point x. For s and t
small, the initial and final point differ by the displacement st[X, Y](x), up to higher-order corrections.

32
Example 1.10.2 (Noncommuting flows (following Example 1.8.13) ). Let X(x, v), Y(x, v), t and s be
as in Example 1.8.13. There, we saw that

[X, Y](x, v) = (c, 0).

We also saw that


t (x, v) = (x + vt + at2 /2, v + at),
s (x, v) = (x + bs, v + cs),
and
s (t (x, v)) = (x + bs + vt + 12 at2 , v + cs + at).
Then

t (s (t (x, v))) = (x + bs + vt + 12 at2 (v + cs + at)t + 12 at2 , v + cs) = (x + bs cst, v + cs),

and
s (t (s (t (x, v)))) = (x cst, v) = (x, v) + st[X, Y](x, v),
in accord with Theorem 1.10.1.
21/10

1.11 The Frobenius Theorem


The Frobenius theorem states that, under certain conditions (and only under such conditions), certain
systems of first-order PDEs have a unique solution, at least in a neighbourhood of the initial data.
Moreover, when the necessary conditions are satisfied, the theorem provides an explicit formula for the
solution in terms of flows of ODEs.

1.11.1 Some motivation.


Consider the following trivial one-dimensional first-order ordinary differential equation,

x(t)
= f (t), x(0) = x0 ..

The reason this equation is trivial is that, because the right-hand side does not depend on the unknown
function x(t), the solution can be written explicitly as an integral,
Z t
x(t) = x0 + f (s) ds.
0

Let us consider an analogously simple partial differential equation for a function u of two variables,
x and y ,
u
= f (x, y),
x
u
= g(x, y,
y
u(x0 , y0 ) = u0 .

Here, too, the right-hand side does not depend on the unknown function u. However, in this case,
a solution does not automatically exist! A necessary condition for a solution to exist is the equality of
mixed partials, namely 2 u/yx = 2 u/xy . For this to hold, we must have that
f g
= .
y x
If this condition is satisfied, then we can write down an explicit formula for u(x, y) in the form of an
integral,
Z (x,y)
u(x, y) = u0 + (f, g) ds,
(x0 ,y0
where the integral may be taken along any path joining (x0 , y0 ) to (x, y).
In what follows, we generalise to systems of first-order partial differential equations in which, unlike
the case above, the right-hand side depends on the unknown function(s).

33
1.11.2 Basic example.
We first consider the case of a system of two first-order partial differential equations for a single function
u(x, y) of two variables (later, we will consider the general case of p functions in q variables). Let
f = f (x, y, z) and g = g(x, y, z) be functions on an open set U R3 . Consider the system
u
(x, y) = f (x, y, u(x, y)),
x
u
(x, y) = g(x, y, u(x, y)),
y
u(x0 , y0 ) = u0 . (11)
Note that the right-hand side is allowed to depend on the unknown function u(x, y). The condition
u(x0 , y0 ) = u0 is called the initial data.

Necessary condition. First, we derive a necessary condition in order that (11) has a solution for all
x0 , y0 , u0 . Assuming a solution exists for all initial data, we equate the mixed partial derivatives of u
with respect to x and y ,
u u
= .
y x x y
On the left-hand side, we obtain
 
u
(x, y) = (f (x, y, u(x, y)))
y x y
f f u f f
= (x, y, u(x, y)) + (x, y, u(x, y)) (x, y) = (x, y, u(x, y)) + (x, y, u(x, y))g(x, y, u(x, y)),
y z y y z
where in the last equality we have used the second PDE to replace u/y by g . Similarly, on the
right-hand side, we obtain
 
u
(x, y) = (g(x, y, u(x, y)))
x y x
g g u g g
= (x, y, u(x, y)) + (x, y, u(x, y)) (x, y) = (x, y, u(x, y)) + (x, y, u(x, y))f (x, y, u(x, y)),
x z x x z
Equating the two expressions, we get that
f f g g
(x, y, u(x, y)) + (x, y, u(x, y))g(x, y, u(x, y)) = (x, y, u(x, y)) + (x, y, u(x, y))f (x, y, u(x, y)).
y z x z
By assumption, a solution u(x, y) exists for all initial data. Thus, for arbitrary (x, y, z) U , there exists
a solution with u(x, y) = z . Since the preceding must hold for this particular solution (as it must for all
solutions), we may conclude that on all of U ,
f f g g
+ g= + f. (12)
y z x z
This is a necessary condition for a solution of (11) to exist for all initial data.

Geometrical setting. We continue to assume that a solution of (11) exists for all initial data. Consider
the graph of a solution, i.e. the surface given by z = u(x, y). We want to construct two vector fields,
X and Y, which are everywhere tangent to the graphs of all solutions (satisfying different initial data).
Equivalenly, X and Y should be orthogonal to the normal to the surfaces. The normal may be determined
as follows: Let h(x, y, z) = z u(x, y). Then the surface is given by h = 0, and the normal is given by h,
where  
u u
h = , ,1 = (f, g, 1) .
x y
Let W = (a, b, c) denote a vector field which is orthogonal to h (here, a, b and c denote functions of x,
y , z ). Then af bg + c = 0, or
c = af + bg.
There are two linearly independent solutions, parameterised by a and b. For the first, which we take to 26/10
be X, we take a = 1 and b = 0. For the second, which we take to be Y, we take a = 0 and b = 1. Thus,
X(x, y, z) = (1, 0, f (x, y, z)), Y(x, y, z) = (0, 1, g(x, y, z)).

34
Figure 19: Graphs of solutions of (11) with different initial data.

Let us compute [X, Y]. We have that

[X, Y] = (X )Y (Y )X
   

= +f (0, 1, g) +g (1, 0, f )
x z y z
   
g g f f
= 0, 0, +f 0, 0, +g .
x z y z

We observe that
f f g g
[X, Y] = 0 +g = +f .
y z x z
Thus, referring to (12), we see that [X, Y] = 0 is just the necessary condition for (11) to have a solution
for all initial data. It turns out that the condition [X, Y] = 0 is also sufficient for a solution to exist.

Explicit solution. We assume that [X, Y] = 0 and proceed to construct the solution to (11). As shown
in Figure 20, the vector fields X and Y lie tangent to the surface z = u(x, y). This suggests that we can
use the flows of X and Y to navigate from some given point on the surface, say (x0 , y0 , u0 ), to an arbitrary
point (x, y, u(x, y)). This is indeed how we will construct a solution.
Let t and s denote the flows of X = (1, 0, f ) and Y = (0, 1, g). Then t is determined by the solutions
to the system
x = 1, y = 0, z = f (x, y, z).
We can easily solve the first two equations to get x(t) = x0 + t and y(t) = y0 , where x0 and y0 are the
initial values of x and y . We cannot determine z(t) a priori; it will depend, of course, on the form of f .
Therefore, we may write that
t (x, y, z) = (x + t, y, ),
where denotes an undetermined component. Similarly, t is determined by the solutions to the system

x0 = 0, y 0 = 1, z 0 = g(x, y, z),

35
(a) The vector field X (b) The vector field Y

Figure 20

and is given by
s (x, y, z) = (x, y + s, ).
We note that
yy0 (x0 , y0 , u0 ) = (x0 , y, ),
and 
xx0 yy0 (x0 , y0 , u0 ) = xx0 (x0 , y, ) = (x, y, ).
Let
u(x, y) = 3xx0 yy0 (x0 , y0 , u0 ) ,


so that
xx0 (yy0 (x0 , y0 , u0 )) = (x, y, u(x, y)).
That is, u(x, y) is defined as the third component of the point reached by first applying the flow t to
the initial data (x0 , y0 , u0 ) for a time t = x x0 , and then applying the flow s for a time s = y y0 . See
Figure 21.

We claim that u(x, y) satisfies (11).

First, lets check the initial data. For x = x0 and y = y0 , we have that

u(x0 , y0 ) = 30 (0 (x0 , y0 , u0 )) = u0 ,

as required.

Next, we check the equation for u/x. We have that


u 3
(x, y) = ( (x , y , u )) = X3 (xx0 (yy0 (x0 , y0 , u0 ))),
x x xx0 yy0 0 0 0

where we have used the fact that t /t = X t , and we have set t equal to x x0 . But X3 = f , so we
get that
u 
(x, y) = f xx0 yy0 (x0 , y0 , u0 ) = f (x, y, u(x, y)),
x
as required.
Finally, we consider the equation for u/y . Rather than differentiate the equation

xx0 (yy0 (x0 , y0 , u0 )) = (x, y, u(x, y))

directly, we use the fact that necessary condition must be satisfied, i.e.

[X, Y] = 0.

By Theorem 1.8.12, it follows that


t s = s t .

36
Figure 21: Construction of the solution of (11).

Therefore, interchanging xx0 and yy0 , we get that


yy0 (xx0 (x0 , y0 , u0 )) = (x, y, u(x, y)).

Repeating the calculation of the preceding paragraph but with roles of x and y interchanged, we get that
u 3
(x, y) = ( (x , y , u )) = Y3 (yy0 (xx0 (x0 , y0 , u0 ))),
y y yy0 xx0 0 0 0

where we have used the fact that s /s = Y s , and we have then set s equal to y y0 . But Y3 = g ,
so we get that
u  
(x, y) = g yy0 xx0 (x0 , y0 , u0 ) = g xx0 yy0 (x0 , y0 , u0 ) = g(x, y, u(x, y)).
y
Uniqueness. (*nonexaminable) Finally, we show that the solution of (11) is unique. The argument is
based on the uniqueness of solutions of ODEs, Theorem 1.6.3.
Suppose that v(x, y) is another solution of (11). First, we show that
v(x, y0 ) = u(x, y0 ), x such that (x, y0 ) U .

Let U (t) = u(x0 + t, y0 ) and V (t) = v(x0 + t, y0 ). We have that


u
U (t) = (x + t, y0 ) = f (x0 + t, y0 , U (t)) := F (U (t), t), U (0) = u(x0 , y0 ) = u0 ,
x 0
v
V (t) = (x + t, y0 ) = f (x0 + t, y0 , V (t)) := F (V (t), t), V (0) = v(x0 , y0 ) = u0 .
x 0
Thus, U (t) and V (t) satisfy the same ODE and initial condition, and therefore must coincide.
A similar argument shows that
v(x, y) = u(x, y), (x, y) U.

Let U (s) = u(x, y0 + s) and V (s) = v(x, y0 + s). We have that


u
U (s) = (x, y0 + s) = g(x, y0 + s, U (s)) := G(U (s), s), U (0) = u(x, y0 ),
y
v
V (s) = (x, y0 + s) = g(x, y0 + s, V (s)) := G(V (s), s), V (0) = v(x, y0 ).
y

37
Thus, U (s) and V (s) satisfy the same ODE and, since u(x, y0 ) = v(x, y0 ) from above, the same initial
condition. Therefore, they must coincide.
Example 1.11.1. Verify that the system
u
(x, y) = y(u + 1),
x
u
(x, y) = x(u + 1),
y
(13)
has a solution, and find the solution satisfying the initial data
u(0, 0) = 2.

First, we construct the vector fields X and Y. Since f = y(z + 1) and g = x(z + 1), we have that
X = (1, 0, y(z + 1)), Y = (0, 1, x(z + 1)).

Then  

X Y = + y(z + 1) (0, 1, x(z + 1)) = (0, 0, z + 1 + y(z + 1)x).
x z
Similarly,  

Y X = + x(z + 1) (1, 0, y(z + 1)) = (0, 0, z + 1 + x(z + 1)y).
y z
It follows that
[X, Y] = X Y Y X = 0,
so that the necessary condition is satisfied.
Next, let us compute x (y (0, 0, 2)), where t and s are the flows of X and Y. First, s (0, 0, 2) is
determined from solutions of the system of ODEs given by
x0 = 0, y 0 = 1, z 0 = x(z + 1), x(0) = y(0) = 0, z(0) = 2.

Clearly x(s) = 0 and y(s) = s. Therefore, z satisfies z 0 (0) = 0, so that


z(s) = 2.

Setting s = y , we get that


y (0, 0, 2) = (0, y, 2).
Next, we compute x (0, y, 2). This is determined from solutions of the system of ODEs given by
x = 1, y = 0, z = y(z + 1), x(0) = 0, y(0) = y, z(0) = 2.

Clearly x(t) = t and y(t) = y . Therefore, z satisfies


z = y(z + 1), z(0) = 2.

The equation for z(t) may be solved either by separation of variables or by using an integrating factor.
Well use separation of variables. We have that
Z z(t) Z t
dz
= ydt = yt,
2 z+1 0
or
log(z(t) + 1) log 3 = yt,
or
z(t) = 3eyt 1.
Setting t = x, we get that
x (0, y, 2) = x (y (0, 0, 2) = x, y, 3eyx 1 = (x, y, u(x, y)),


or
u(x, y) = 3exy 1.
It is easy to confirm that u satisfies the required equation and initial data.

38
1.11.3 General statement and proof of the Frobenius Theorem
Statement of Frobenius Theorem. Let x denoted coordinates on Rp and z coordinates on Rq . Let
U Rp and V Rq be open sets. Let fi denote smooth functions on U V ,

fi : U V R; (x, z) 7 fi (x, z),

where 1 i p and 1 q . Consider the system of first-order partial differential equations for
u : U V given by

u
(x) = fi (x, u(x)),
xi
u(x0 ) = u0 , x0 U , u0 V. (14)

Define p vector fields X(i) , 1 i p, on U V as follows:


j j p+
X(i) (x, z) = i , 1 j p, X(i) (x, z) = fi (x, z), 1 q.

That is, among the first p components of X(i) , there is a single nonzero component, namely the ith, which
is equal to one, while the last q components of X(i) are given by fi1 , . . . , fiq . Suppose the vector fields X(i)
are complete.
Theorem 1.11.2 (Frobenius). For all (x0 , u0 ) U V , the system (66) has a solution u(x) if and only if

[X(i) , X(j) ] = 0, 1 i, j p. (15)

Moreover, if a solution exists, then it is unique.


Above, in (11) we considered the case p = 2 and q = 1. Instead of x U Rp , we wrote (x, y) R2 ,
and instead of fi : U V R, we had two functions f (x, y, z) and g(x, y, z) defined on R3 . Instead of
X(i) , we introduced two vector fields X and Y on R3 given by

X(x, y, z) = (1, 0, f (x, y, z)), Y(x, y, z) = (0, 1, g(x, y, z)).

Proof.

Necessary condition. We show that condition (15) is necessary. Assume that a solution u(x) of (66)
exists for all initial data. It follows that the mixed partials of u are everywhere equal,
u u
   

= . (16)
xi xj xj xi

We have that
u
 
 
(x) = f j (x, u(x)) =
xi xj xi
fj fj u fj fj
!
= (x, u(x)) + (x, u(x)) (x) = + fi (x, u(x)),
xi z xi xi z

where we have used the PDE (66). A similar expression is obtained for the RHS of (16). As solutions
are assumed to exist for all initial data, (x, u(x)) may be taken to be arbitrary in U V . It follows that
fj fj

fi fi

+ f i = + f j (17)
xi z xj z
on U V .
Consider next the Jacobi bracket [X(i) , X(j) ]. Since the first p components of X(i) and X(j) are constant,
it follows that the first p components of their bracket vanishes. We compute the remaining q components
as follows: For 1 q ,
   

[X(i) , X(j) ]p+ = + f i z f
j + f
j z fi =
xi xj
fj fj

f fi

= i
+ f i ij fj . (18)
x z x z

39
It is evident from (17) and (18) that (16) holds if and only if [X(i) , X(j) ] = 0.

Explicit construction. Existence of solution. Let (i)t denote the flow of X(i) . The first p compo-
nents of (i)t are given by
j j j
(i)t (x0 ) = x0 + i t, 1 j p.

For simplicity, suppose x0 = 0 in (66) (it is easy to generalise to x0 6= 0). We define u(x) by
 
(x, u(x)) = (1)x1 (p)xp (0, u0 ). (19)

It is clear that u(0) = u0 . Also,

u
(x) = 1 (p)xp (0, u0 ) =
x1 x1 (1)x
= X
(1) ((1)x1 (p)xp (0, u0 )) = X(1) (x, u(x)) = f1 (x, u(x)),

so that (66) is satisfied for i = 1. For i > 1, use the commutativity of the flows,

(i)s (j)t = (j)t (i)s ,

which follows from (15), to bring the factor (i)xi to the front in (19); then calculate /xi as we calcu-
lated /x1 above.

Uniqueness. (*Nonexaminable.)
The uniqueness of solutions to the system of partial differential equations (66) follows from the
uniqueness of solutions to ordinary differential equations. Suppose that the necessary condition (15)
holds, and let u(x) and v(x) be two solutions of (66). For simplicity, let x0 = 0 and assume that the flows
of the X(j) s are complete.
We show that u(x) = v(x) by induction. Clearly

u(0) = v(0),

since both u and v satisfy the initial data. Next, we show that if

u(x1 , x2 , . . . , xk1 , 0, . . . , 0) = v(x1 , x2 , . . . , xk1 , 0, . . . , 0), (20)

then
u(x1 , x2 , . . . , xk1 , xk , 0, . . . , 0) = v(x1 , x2 , . . . , xk1 , xk , 0, . . . , 0). (21)
Let x1 , . . . , xk1 be fixed, and let

x(t) = (x1 , x2 , . . . , xk1 , t, 0, . . . , 0) U Rp .

Then
x i (t) = ki .
Let
U (t) = u(x(t)) V Rq .
Since u satisfies (66), it follows that

u u
U (t) = (x(t))x i
(t) = (x(t)) = fk (x(t), u(x(t)) = fk (x(t), U (t)).
xi xk
Therefore, letting
F (U, t) = fk (x(t), U ),
we get that
U (t) = F (U (t), t), U (0) = u(x1 , x2 , . . . , xk1 , 0, . . . , 0). (22)
Similarly, letting
V (t) = V (x(t)) V Rq

40
we have that
V (t) = F (V (t), t)), V (0) = V (x1 , x2 , . . . , xk1 , 0, . . . , 0). (23)
From (22) and (23) and the induction hypothesis, U (t) and V (t) satisfy the same differential equation
with the same initial condition. It follows that U (t) = V (t), and in particular, letting t = xk , that

u(x1 , . . . , xk1 , xk , 0, . . . , 0) = v(x1 , . . . , xk1 , xk , 0, . . . , 0),

as required.
Example 1.11.3 (2011 Examination, Q3(b)).

Question: Show that the system


u
= u,
x
2u
= ex u2
yy

with initial data u(0, 0) = 1 has a unique solution in a neighbourhood of (0, 0) (you dont need to find the
solution explicitly). (Hint: Introduce a second function v = u/y .)

Solution: Let v = u/y . Then, using the given equation u/x = u, we get that

2u
 
v u u
= = = = v.
x xy y x y

Thus, we get the following system of first-order PDEs for u and v :


u
= u,
x
v
= v,
x
u
= v,
y
v
= ex u2 .
y

The necessary and condition for a local solution to exist is that the vector fields X(x, y, u, v), Y(x, y, u, v),
given by
X = (1, 0, u, v), Y = (0, 1, v, ex u2 )

have vanishing Jacobi bracket (equivalently, the mixed second partial derivatives of u must be equal, and
similarly for v ). Calculation gives

[X, Y] = (x + uu + vv )(0, 1, v, ex u2 ) (y + vu + ex u2 v )(1, 0, u, v)


= (0, 0, v, ex u2 ) (0, 0, v, ex u2 ) = 0,

as required.

1.12 More general versions of the Frobenius Theorem [not presented in lectures]
(*nonexaminable)
Consider the system
p
u
(x) = fi (x, u(x)),
X
Aij (x, u(x))
xi
j=1
u(x0 ) = u0 . (24)

41
We assume that the Aij s are smooth functions on U V , and moreover that
det Aij (x, z) 6= 0.

The original system (66) is recovered by taking Aij = ij (so that Aij is constant in this case). We can
formulate the following generalisation of the Frobenius theorem: Define p vector fields Y(i) , 1 i p, on
U V as follows:
j q
Y(i) (x, z) = (Ai1 , . . . , Aij , . . . , Aip , fi1 , . . . , fi , . . . , fp )(x, z).

Theorem 1.12.1 (Generalised Frobenius). For all (x0 , u0 ) U V , the system (66) has a solution u(x)
if and only if
p
ckij Y(k)
X
[Y(i) , Y(j) ] = (25)
k=1
for some smooth functions ckij on U V . Moreover, if a solution exists, then it is unique.
This result can be deduced from the original version Theorem 1.11.2. The main fact that is needed
is explained in the following section, namely that if {X(1) , . . . X(r) } and {Y(1) , . . . Y(r) } are equivalent
distributions on Rn , then {X(1) , . . . X(r) } is integrable if and only if {Y(1) , . . . Y(r) } is integrable.

1.12.1 Distributions
A k-dimensional distribution on an open set U Rn is a set of k smooth linearly independent
vector fields {Y(1) , . . . , Y(k) } on U . Linearly independent means that, at each point x U , the vectors
Y(1) (x), . . . Y(k) (x) are linearly independent. In order for this to be the case, we must have that k n.
(Note: you may have come across another mathematical definition of distribution in analysis, namely
as a linear functional on a suitable function space. The present definition is quite separate.)
A distribution {Y(1) , . . . , Y(k) } is said to be integrable if the Jacobi bracket of any two vectors Y(r)
and Y(s) can be expressed as a linear combination of the Y(j) s, ie
k
X j
[Y(r) , Y(s) ](x) = crs (x)Y(j) (x)
j=1

for some coefficients cjrs (x). Here are some examples:


1. A 1-dimensional distribution {Y} is trivially integrable, since [Y, Y] = 0 identically.
2. The 2-dimensional distribution {Y(1) , Y(2) } in R3 given by

Y(1) (x, y, z) = (f(1) , g(1) , 0)(x, y, z),


Y(2) (x, y, z) = (f(2) , g(2) , 0)(x, y, z),

is integrable. To see why, first note that the third component of [Y(1) , Y(2) ] necessarily vanishes.
Next, note that any vector Z with zero third component can be expressed as a linear combination of
Y(1) and Y(2) . This is because Y(1) , Y(2) and Z are necessarily linearly dependent (they constitute
3 vectors in the xy -plane), so we can find coefficients a, b and c such that
aY(1) + bY(2) + cZ = 0.

But c cannot vanish (otherwise, we would have aY(1) + bY(2) = 0), so that
Z = (a/c)Y(1) (b/c)Y(2) .

3. The 2-dimensional distribution given by


Y(1) (x, y, z) = (1, 0, 0),
Y(2) (x, y, z) = (0, 1, f (x)),

is not integrable. To see this, we note that


[Y(1) , Y(2) ] = (0, 0, f 0 (x)),

which cannot be expressed as a linear combination of Y(1) and Y(2) .

42
4. The 2-dimensional distribution {S, D}, where the vector fields S and D are given in Problem Sheet
4, is not integrable.
Let {X(1) , . . . , X(k) } be another k-dimensional distribution. We say that {Y(1) , . . . , Y(k) } and {X(1) , . . . , X(k) }
are equivalent if one set of vectors can be expressed as a linear combination of the other, ie
k
X
X(i) = aij (x)Y(j)
j=1

for some coefficients aij (x). If this is the case, it follows that the matrix aij (x) is invertible. To see why,
suppose it werent. If aij (x) were not invertible, we could find a k-dimensional vector (v1 , , vk ) such
that
k
X
vi aij (x) = 0.
i=1
But this would imply that

k
X k
X k
X k
X k
X
vi X(i) = vi aij (x)Y(j) = vi aij (x) Y(j) = 0,
i=1 i=1 j=1 j=1 i=1

contradicting the assumption that the X(i) s are linearly independent everywhere.
Lemma 1.12.2. Suppose {X(1) , . . . , X(k) } and {Y(1) , . . . , Y(k) } are equivalent distributions. Then {X(1) , . . . , X(k) }
is integrable if and only if {Y(1) , . . . , Y(k) } is integrable.
Proof. For definiteness, suppose that {Y(1) , . . . , Y(k) } is integrable. We will prove that {X(1) , . . . , X(k) } is
integrable. Let
k
X
X(i) = aij (x)Y(j) ,
j=1

and recall the product rule for the Jacobi bracket (Proposition 1.8.5),

[Y, f X] = Y(f )X + f [Y, X].

Now calculate:

k X
X k
[X(r) , X(s) ] = [art Y(r) , asu Y(u) ](expanding the Xs in terms of Ys)
t=1 u=1
Xk X k  
= art Y(r) (asu )Y(u) asu Y(u) (art )Y(r) + art asu [Y(r) , Y(u) ] (using product rule)
t=1 u=1

k X
k k
art asu cvru Y(v) (using
X X
= art Y (asu )Y
(r) (u) asu Y(u) (art )Y(r) + integrability of Ys).
t=1 u=1 v=1

The last expression is clearly a linear combination of the Ys, which in turn can be expressed as a linear
combination of the Xs, since
k
a1
X
Y(j) = jl (x)X(l) .
l=1
Thus, {X(1) , . . . , X(k) } is indeed integrable, as claimed.

1.12.2 Alternative formulation of Frobenius Theorem


Theorem 1.12.3. Let {Y(1) , . . . , Y(k) } be an integrable k-dimensional distibution in U . Then in a
neighbourhood of a point x0 U , there exists a k-dimensional surface S which is tangent to each of the
vector fields Y(j) .

43
Proof. We show in the Lemma below that in a neighbourhood of x0 , {Y(1) , . . . , Y(k) } is equivalent to a
distribution {X(1) , . . . , X(k) } for which
[X(r) , X(s) ] = 0

for all r and s. Let (j)s denote the flow of X(j) . Define the surface S to be the set of points
 
x(s1 , . . . , sk ) = (1)s (k)s (x0 )
1 k

obtained by applying the flows of the X(j) s in succession. The flow times s1 , . . . , sk serve as parameters
on S . Since the flows (i)si and (j)s commute, it is clear that
k

(j)t (x(s1 , . . . , sk )) = x(s1 , . . . , sj1 , sj + t, sj+1 , . . . , sk ).

That is, the effect of applying the j th flow to a point of S for a time t is to shift the value of j th parameter
sj by an amount t. Therefore, under the flow, points of S remain on S . Thus S is tangent to each of the
X(j) s.

Lemma 1.12.4. An integrable k-dimensional distibution {Y(1) , . . . , Y(k) } in U is equivalent to a distri-


bution {X(1) , . . . , X(k) } for which
[X(r) , X(s) ] = 0

for all r and s.


Proof. By applying a suitable linear transformation, we can choose coordinates so that, at x0 ,

Y(j) (x0 ) = (0, . . . , 0, 1, 0, . . . , 0),

ie Y(j) (x0 ) is the unit vector e(j) in the j th direction. Elsewhere, we write the Y(j) s in the form

k
X n
X
Y(j) (x) = ajl (x)e(j) + bjm e(m) ,
l=1 m=k+1

so that
a(x0 ) = I
(here we regard a(x) as a k k matrix). Since I is invertible and a is continuous, we may conclude that,
in a neighbourhood U of x0 , a is invertible. Let d(x) denote the inverse of a(x), ie
k
X
dij (x)ajl (x) = il
j=1

for x in U . Then define


n
X
X(i) = dij (x)Y(j) .
j=1

It follows that, for x U ,


n
X
X(i) (x) = e(i) + fim (x)e(m) ,
m=k+1
Pk
where fim = j=1 dij bjm . By assumption, {Y(1) , . . . , Y(k) } and therefore {X(1) , . . . , X(k) } is integrable.
Thus,
k
t
X
[X(r) , X(s) ] = grs X(t)
t=1

for some coefficients t .


grs
t s,
From the form of the X(i) s, it is clear that the first k components of [X(r) , X(s) ] are given by the frs
ie
[X(r) , X(s) ]t = frs
t
, for 1 t k.
It is also clear that
[X(r) , X(s) ]t = 0, for 1 t k

44
because the first k components of the X(j) s are either zero or a constant. Comparing the two preceding
equations, we may conclude that
t
frs = 0,

which implies that


[X(r) , X(s) ] = 0.

2 Algebraic k-forms
2.1 Dual space
In what follows, V denotes an n-dimensional vector space.

Let F(V ) denote the set of functions f : V R on V . F(V ) can be regarded as a vector space. The
zero element is the function, denoted 0, which is equal to zero everywhere. Addition of functions and
multiplication of functions by scalars is defined in the obvious way. For example, if f and g are functions,
we define the function f + g by (f + g)(v) = f (v) + g(v). Similarly, if R, we define the function f by
(f )(v) = f (v). It is straightforward to verify that the usual vector-space properties are satisfied (e.g.,
commutativity, associativity, distributive law, etc, for addition and scalar multiplication).
As a vector space, F(V ) is infinite dimensional. We can identify various subspaces of F(V ), for
example the space of continuous functions C 0 (V ), or the space of smooth functions C (V ). These are
also infinite dimensional.
A function f : V R is linear if for all u, v V and for all , R,

f (u + v) = f (u) + f (v).

Definition 2.1.1 (Dual space). The dual space of V , denoted V , is the subspace of F(V ) consisting of
linear functions on V .

It is easy to verify that V is a vector space. That is, the sum of two linear functions is a linear function,
and a scalar multiple of a linear function is a linear function (we wont give an explicit verification here).

Example 2.1.2 (Dual space of R3 ). Let V = R3 . Let a R3 , and define f R3 by

f (r) = a r.

Using the familiar properties of the dot product, it is easy to verify that f is indeed a linear function

(i.e., a (r + s) = a r + a r.) Indeed, every element of R3 can be represented in this way. To see
denote the standard basis in R3 . Given f R3 , let
that this is the case, let , and k

fx = f (
), fy = f (
),
fz = f (k).

Then it is easy to check that


f (r) = (fx
+ fy r.
+ fz k)

Thus, by means of the dot product, we can associate to every vector in R3 a linear function in R3 , and
vice versa.
In fact, this construction of the dual space of R3 generalises to an arbitrary vector space V . Let
e(1) , . . . , e(n) denote a basis for V . For v V , we write v = v i e(i) to denote its expansion in terms of this
basis. We define a set of n elements of V , denoted f (1) , . . . , f (n) , by the relation

f (j) (v i e(i) ) = v j .

That is, f (j) picks out the j th component of v when v is expanded in terms of the e(i) s. In what follows
we will use the following notation. Let a V , and let

ai = a(e(i) ).

45
Thus,
(j) j
fi = i .

It is easy to see that f (j) is a linear function on V , and hence an element of V . In fact, we have the
following:

Proposition 2.1.3 (Dual basis). f (1) , . . . , f (n) constitute a basis for V .

Proof. We need to show two things, namely that i) the f (j) s are linearly independent, and ii) every
element of V can be expressed as a linear combination of the f (j) s.
First, we show that the f (j) s are linearly independent. Suppose that

j f (j) = 0.

Evaluate both sides on e(i) to obtain j ij = 0, or

i = 0.

As i is arbitrary, linear independence follows.


Next, let a V . We claim that
a = aj f (j) .

To verify, let us apply both sides to v V . On the left-hand side, we get a(v). On the right-hand side,
we get
aj f (j) (v) = aj v j = a(e(j) )v j = a(v j e(j) ) (by linearity) = a(v).

Collectively, f (1) , . . . , f (n) is called the dual basis of e(1) , . . . , e(n) . It follows from Proposition 2.1.3
that V is n-dimensional, and that there is a vector space isomorphism between V and V which maps
u = ui e(i) into
Pn i (i)
i=1 u f . In this way, to every u V , we can associate a unique element f V by
Pn i i
f (v) = i=1 u v .
We could regard this last expression as defining a dot product, and write f (v) = u v .
At this point, you might wonder why we introduce the dual space in the first place. If V and V
are isomorphic, why not regard them as being the same? The rest of this section will be devoted to
answering this question.
The first answer is that V might not have a natural, or built-in, dot product. Of course, one can
always introduce a dot product, or inner product, as it is also called, on a vector space, by declaring that
a particular basis e(1) , . . . , e(n) is orthonormal (in effect, this is what we did above), so that if u = ui e(i)
and v = v j e(j) , then u v = n i i
P
i=1 u v . But this raises the question, does this inner product have any
intrinsic meaning? If we were to choose a different basis e(j) , would the dot product change? (The
answer is yes, unless the n n matrix M defined by e(i) = n
P T 1 .)
j=1 Mij e(j) happens to satisfy M = M
Certain vector spaces do have an intrinsically defined inner product. An example is n-dimensional
Euclidean space, which is endowed with, in addition to its vector space properties, an intrinsic notion of
geometrical distance, or length. The inner product can be expressed geometrically in terms of length by
the expression  
u v = 41 ||u + v||2 ||u v||2 ,

independently of any particular choice of basis.


Other vector spaces do not have an intrinsically defined inner product. An example from mechanics
is 2-dimensional phase space P = {(q, p)}, where q represents the position of a particle moving along
a line, and p represents its momentum. P is a perfectly good vector space; vector addition and scalar
multiplication make sense, and have an intrinsic meaning within mechanics. However, there is no intrinsic
inner product. Note that an expression such as q1 q2 + p1 p2 is incoherent from the point of view of
mechanics, as positions and momenta have different physical dimensions.
To summarise, if V has an intrinsically defined inner product, then one can identify V and V , and
ignore the distinction between them. Otherwise, it is better to regard V and V as being distinct.

46
The second answer to the question is that, under a linear transformation, vectors in V and V
transform differently. Let e(1) , . . . , e(n) and e(1) , . . . , e(n) be two bases for V . Then one set of basis
vectors can be expressed as linear combinations of the others, e.g.
n
X
e(i) = Mij e(j) ,
j=1

where M is an n n matrix. Let f (j) and f(j) denote the dual bases of e(i) and e(i) respectively. Here,
too, one set of basis vectors can be expressed as linear combinations of the others,
n
f(i) = Nij f (j) .
X

j=1

Given the definition of the dual basis, one can calculate the matrix N . It turns out that N is not equal
to M , but rather, we have the following:

Proposition 2.1.4 (Transformation of dual basis).


1
N = MT .

Proof. See Problem Sheet 6.1(b).


The third answer to the question is that elements of V and V have different geometrical meanings.
A vector v V corresponds to a directed displacement, or an arrow. More precisely, v can be thought
of as an instantaneous velocity r(0)
along a curve r(t) in V . A vector a V in the dual space corresponds
to a function on V , and as a function it can be represented geometrically by sets of contours (sets on
which the function takes a constant value). Since a is a linear function, it can be completely described
by just two contours, namely the 0-contour, the set on which a vanishes, and the 1-contour, where a is
equal to 1. Again, because a is linear, the 0-contour is an (n 1)-dimensional plane through the origin,
and the 1-contour is an (n 1)-dimensional plane which is parallel to the 0-contour. The real number
a(v) can be expressed purely geometrically (i.e., independently of choice of basis, units, etc) as a ratio
of lengths. See Figure 22. This geometric picture also provides some insight as to why vectors in V and
V transform differently under a linear transformation. See Figure 23.

2.2 Permutations
A permutation of n objects, or permutation for short, is a bijection

: {1, . . . , n} {1, . . . , n}; j 7 (j)

A standard way to display a permutation is as a table,


!
1 2 ... n
= .
(1) (2) ... (n)

Let Sn denotes the set of all permutations of n objects. The identity map e, given by e(j) = j , is a
permutation. The composition of two permutations and , given by (j) = ( (j)), is a
permutation. Composition is usually denoted by , ie without the symbol. To each permutation
corresponds a unique permutation 1 such that 1 = 1 = e. In this way, Sn forms a group, called
the symmetric group, or the permutation group.
The transposition rs Sn is the permutation given by

rs (r) = s, rs (s) = r, (j) = j, j 6= r, s.

Proposition 2.2.1 (Permutations and transpositions). Every permutation can be written as a product
(composition) of transpositions.
Proof. A general argument as well as a specific example is given in Problem 6.2.

47
Figure 22: v V is represented by an arrow, and represents a velocity, or a displacement per unit
parameter. a V is represented by a pair of parallel hyperplanes which correspond to its zero and unit
level sets. The value of a(v) is given purely geometrically by the ratio of the length of v to the length of
the component of v which lies between the planes of a.

To each Sn we may associate an n n permutation matrix P () given by

Pij () = i,(j) .

Proposition 2.2.2 (Permutations and permutation matrices). For all , Sn ,

P ( ) = P ()P ( ).

Proof.
n
X n
X
[P ()P ( )]ik = Pij ()Pjk ( ) = i,(j) j, (k) = i,( (k)) = Pik ( ).
j=1 j=1

It follows from Proposition 2.2.2 that P (e) = I , where I is the n n identity matrix, and that
P ( 1 ) = P 1 (). In Problem 6.5 it is shown that P () is an orthogonal matrix, i.e. P 1 () = P T ().
The sign of a permutation, denoted sgn , is defined by

sgn = sgn det P ().

In fact, since det P () is either 1 or 1 (Problem 6.5 again), we could also write sgn = det P ().

We have the following results about the sign of a permutation:

Proposition 2.2.3 ( sgn is multiplicative). For all , Sn ,

sgn ( ) = sgn () sgn ( ).

Proof.

sgn ( ) = sgn det P ( ) = sgn det(P ()P ( )) (from Proposition 2.2.2)


= sgn (det P () det P ( )) = sgn det P () sgn det P ( ) = sgn () sgn ( ).

48
Figure 23: The standard basis for V and the dual basis for V . Under a transformation A which sends
e(1) to e(1) and e(2) to e(1) + e(2) , the dual basis vectors f (1) and f (2) are sent to f (1) f (2) and f (2)
respectively.

Proposition 2.2.4 ( sgn of inverse).


sgn ( 1 ) = sgn ().
Proof. Since P ( 1 ) = P 1 () from Proposition 2.2.2, it follows that

sgn ( 1 ) = sgn det P ( 1 ) = sgn det P 1 () = sgn (det P ())1 = sgn det P () = sgn .

Proposition 2.2.5 ( sgn of transposition). If rs is a transposition, then sgn rs = 1


Proof. First consider 12 . We have that

0 1 0
12 = 1 0 0 ,

0 0 I

where I denotes the (n 2)-dimensional identity matrix. It is easy to calculate that det P (12 ) = 1,
so that sgn 12 = 1. For a general transposition, it is shown in Problem 6.3 that rs = 12 1 ,
where is any permutation for which (1) = r and (2) = s. From Propositions 2.2.2, 2.2.4 and 2.2.5,
sgn rs = ( sgn )2 sgn 12 = 1.
Proposition 2.2.6 ( sgn of general permutation). If is a product of k transpositions, then sgn =
(1)k .

Proof. By Proposition 2.2.1 and 2.2.5, sgn is the product of the signs of the k transpositions. By
Proposition 2.2.5, the sign of each transposition is 1. Therefore, sgn = (1)k .
It follows that sgn () is 1 or 1 according to whether is given by the product of an even or odd
number of transpositions.

Proposition 2.2.7 (Averaging over permutations). Let f : Sn R be a function on Sn . Then for all
, Sn , X X
f () = f ().
Sn Sn

49
Also,
f ( 1 ).
X X
f () =
Sn Sn

Proof. The mappings 7 and 7 1 are bijections on Sn . Therefore, the sums above contain
precisely the same terms, albeit in a different order.

2.3 Algebraic k-forms


2.3.1 *Tensors [nonexaminable]
Algebraic k-forms are particular examples of more general objects, called tensors. While in this course
we will not deal with tensors generally, you are likely to come across them elsewhere in your studies, at
least particular instances. Therefore, we will begin with a brief discussion of tensors in general (which
can be omitted, and is non-examinable).
As motivation for the definition of the dual space V , we first considered general functions on V . As
motivation for the definition of tensors, we first consider general functions on V k V l . That is, we
consider (real-valued) functions whose arguments are k vectors from V and l vectors from V . The set
of such functions forms a vector space, with addition and scalar multiplication defined in the usual way.
Tensors are a subset of this space of functions. A tensor of type (k, l) is a map t : V k V l R which is
linear with respect to each of its arguments. That is,

t(v(1) , . . . , u + w, . . . , v(k) , a(1) , . . . , a(l) )

= t(v(1) , . . . , u, . . . , v(k) , a(1) , . . . , a(l) ) + t(v(1) , . . . , w, . . . , v(k) , a(1) , . . . , a(l) ),

t(v(1) , . . . , v(k) , a(1) , . . . , b + c, . . . , a(l) )

= t(v(1) , . . . , v(k) , a(1) , . . . , b, . . . , a(k) ) + t(v(1) , . . . , v(k) , a(1) , . . . , c, . . . , a(l) ).

It is easy to show that the space of tensors of type (k, l) forms a vector space (that is, the property of
being linear with respect to each argument is preserved by addition and scalar multiplication). It is also
easy to show that the dimension of this vector space is nk+l .
Some examples: The dual space V is the space of tensors of type (1, 0). V itself can be regarded as
the space of tensors of type (0, 1). Linear maps from V to V can be regarded as tensors of type (1, 1), as
can linear maps from V to V . Linear maps from V to V are tensors of type (2, 0), and linear maps
from V to V are tensors of type (0, 2).
Next, we define an operation called the tensor product. Suppose t1 and t2 are functions on sets U1
and U2 , respectively. Then we can construct a function t1 t2 on U1 U2 by taking t1 t2 (u1 , u2 ) to be
the product t1 (u1 )t2 (u2 ). Now suppose that U1 and U2 are vector spaces and that t1 and t2 are linear
functions (so that t1 U1 and t2 U2 ). Then t1 t2 is linear in each of its arguments, i.e.

t1 t2 (v1 + w1 , u2 ) = t1 t2 (v1 , u2 ) + t1 t2 (w1 , u2 ),

and similarly
t1 t2 (u1 , v2 + w2 ) = t1 t2 (u1 , v2 ) + t1 t2 (u1 , w2 ).

We can generalise this operation as follows. Suppose that t1 is a tensor of type (k1 , l1 ) and t2 is a tensor
of type (k2 , l2 ). We can construct a tensor of type (k1 + k2 , l1 + l2 ), denoted t1 t2 , by taking

t1 t2 (v(1) , . . . v(k +k ) , a(1) , . . . , a(l1 +l2 ) )


1 2

= t1 (v(1) , . . . v(k ) , a(1) , . . . , a(l1 ) ) t2 (v(k +1) , . . . v(l +1) , a(1) , . . . , a(l2 ) ).
1 1 1

For example, if t1 is a tensor of type (2, 0), and t2 is a tensor of type (1, 1), then

t1 t2 (u, v, w, a) = t1 (u, v)t2 (w, a).

t1 t2 is called the tensor product of t1 and t2 . It can be shown that an arbitrary tensor of type (k, l)
can be expressed as a sum of tensor products of k tensors of type (1, 0) and l tensors of type (0, 1).

50
2.3.2 Algebraic k-forms
An algebraic k-form on V is a function on V k that is linear in each argument and which changes sign if
two arguments are interchanged. That is, letting a denote an algebraic k-form, we have that
   
a : V k R; v(1) , . . . , v(k) 7 a v(1) , . . . , v(k) .

Linearity with respect to each argument means that, for u, w V and , R,


     
a v(1) , . . . , u + w, . . . , v(k) = a v(1) , . . . , u, . . . , v(k) + a v(1) , . . . , w, . . . , v(k) .

Changing sign under the interchange of two arguments means that, for any j, l with 1 j < l k,
   
a v(1) , . . . , , v(j) , . . . , v(l) , . . . , v(k) = a v(1) , . . . , , v(l) , . . . , v(j) , . . . , v(k) .

Denote the set of algebraic k-forms by k (V ). By convention, 0 (V ) is given by R. Also, 1 (V ) is


identified with the dual space V (the antisymmetry requirement is empty for k = 1).
With regard to the discussion in Section 2.3.1, algebraic k-forms are a special type of tensor of type
(k, 0). What makes them special is the antisymmetry property under interchange of two arguments.

Proposition 2.3.1. k (V ) is a vector space.


Proof. Straightforward, and hence omitted. The point is that one can define the sum of two algebraic
k-forms and multiplication of an algebraic k-form by a scalar in the obvious way (that is, respectively, as
the sum of two functions on V k , and the multiplication of a function on V k by a scalar). You can then
show that the required properties of vector addition and scalar multiplication are satisfied.
The following shows that, under an arbitrary permutation of its arguments, the value of an algebraic
k-form either remains the same or changes sign according to the sign of the permutation.

Proposition 2.3.2. For a k (V ) and for Sk ,

a(v((1)) , . . . , v((k)) ) = sgn a(v(1) , . . . , v(k) ).

Proof. From Proposition 2.2.1, can be expressed as a product of transpositions, lets say m of them.
From the antisymmetry property of algebraic k-forms, each transposition produces a change of sign.
Therefore,
a(v((1)) , . . . , v((k)) ) = (1)m a(v(1) , . . . , v(k) ).

From Proposition 2.2.6, (1)m = sgn .


Example 2.3.3. Given u, v and w R3 , let

a(u, v, w) = (u v) w.

Show that a is an algebraic 3-form on R3 . See Problem Sheet 6.

2.4 Basis k-forms


Let e(1) , . . . , e(n) denote a basis for V . Given v V , we write

v = v i e(i) .

We introduce some notation. Let I = (i1 , . . . , ik ) denote an ordered k-tuple of indices, where 1 ir n
(I is also called a multi-index). We introduce a Kronecker delta for pairs of k-tuples of indices, defined
by (
1, i 1 = j1 , . . . , i k = jk ,
(I, J) =
0, otherwise.
Given Sk , define  
(I) = i 1 (1) , . . . , i 1 (k) .

51
That, is (I) is a permutation of the indices comprising I . For example, if
!
1 2 3 4
I = (2, 4, 7, 6) and = ,
2 3 1 4

then
(I) = (7, 2, 4, 6).

Proposition 2.4.1. Let , Sk . Then

( (I)) = ( )(I).

Proof. Let J = (j1 , . . . , jk ) = (I) and K = (k1 , . . . , kk ) = (J). Then jr = i 1 (r) , and kr = j1 (r) , so
that
kr = i 1 ( 1 (r)) = i( )1 (r) .

Therefore, K , which we defined to be ( (I), is also given by ( )(I), which is what we wanted to
show.
Let E(I) V k denote the k-tuple of basis vectors given by
 
E(I) = e(i ) , . . . , e(i ) .
1 k

Given a k (V ), we write
aI = a(E(I) ).
We will call the aI s the coefficients of a with respect to the basis e(i) . An alternative notation for the
coefficients is  
ai1 ik = a e(i ) , . . . , e(i ) .
1 k

Note that, by the antisymmetry property,

ai1 , ,ik = 0 if any two of the indices i1 ,. . . ,ik are the same.

The notation aI has the advantage of being more concise.


The linearity property implies that an algebraic k-form is completely determined by its coefficients.
To see this, note that
   
a v(1) , . . . , v(k) = a v(1) i1 e(i ) , . . . , v(k) ik e(i )
1 k

= v(1) i1 v(k) ik a(e(i ) , . . . , e(i ) ) = ai1 ik v(1) i1 v(k) ik .


1 k

For example, if a is an algebraic 2-form, then

a(u, v) = aij ui v j .

Definition 2.4.2. Let J = (j1 , . . . , jn ). The basis k-form F (J) is the algebraic k-form on V defined by

0,

if jr = js for some r 6= s,
(J)
FI := F (J) (E(I) ) = sgn , J = (I), (26)

otherwise.

0,

That is, F (J) (E(I) ) vanishes if J contains repeated indices, regardless of what I is. Otherwise,
F (J) (E(I) ) vanishes if I is not a permutation of J , while if I is a permutation of J , then F (J) (E(I) ) is
equal to the sign of that permutation. An equivalent way to write the preceding formula for F (J) (E(I) ),
which will be useful, is given by the following:

Proposition 2.4.3. Suppose J consists of distinct indices. Then

F (J) (E(I) ) =
X
sgn ((I), J).
Sk

52
Proof. This follows by comparison with (26). If I is not a permutation of J , then ((I), J) vanishes for
all , and the sum vanishes. If J = (I) for some permutation (note that there can be only one such
), then the sum vanishes for all terms but = .

We should verify that the F (J) s really are algebraic k-forms. This is done in the following:
Proposition 2.4.4.
F (J) k (V ).

Proof. F (J) is defined by its values on k-tuples of basis vectors; that is,

F (J) (v(1) , . . . , v(k) ) := v(1) i1 v(1) ik F (J) (e(i ) , . . . , e(i ) ).


1 k

Therefore, F (J) is automatically linear in each argument.


We need to check that F (J) satisfies the antisymmetry property. It is enough to consider the case
where the arguments of F (J) are basis vectors. To this end, we must show that

F (J) (E( (I)) = sgn F (J) (E(I) )

for all Sk . From Proposition 2.4.3,

F (J) (E( (I)) ) =


X
sgn (( (I)), J).
Sk

From Proposition 2.4.1,


(( (I)), J) = (( )(I), J).

Since sgn = sgn sgn ( ), we may write that

F (J) (E( (I)) = sgn


X
sgn ( ) (( )(I)), J).
Sk

By Proposition 2.2.7,

sgn () ((I), J) = F (J) (E(I) ).


X X
sgn ( ) (( )(I)), J) =
Sk Sk

Therefore,
F (J) (E( (I)) ) = sgn F (J) (E(I) ),

as required.

This leads to the following determinant formula for the basis k-forms:

Proposition 2.4.5.

v(1) j1 v(2) j1 v(k) j1




v(1) j2 v(2) j2 v(k) j2

(I)
F (v(1) , . . . v(k) ) = det . .

.. .. ..

. . .

v(1) jk v(2) jk v(k) jk

Proof. Using linearity and Proposition 2.4.3, we have that

F (J) (v(1) , . . . v(k) ) = v(1) i1 v(k) ik F (J) (E(I) )


X

I
j(1) j(k)
sgn ()v(1) i1 v(k) ik ((I), J) =
X X X
= sgn ()v(1) v(k) .
Sk I Sk

53
This last expression can be recognised as the formula for the determinant of the matrix given in the
statement of the proposition. Note that for the sake of being very explicit, we have put in the summation
over I , which really means
X n
X n
X
= ,
I i1 =1 ik =1

although it would be implied by the summation convention. The point rather obvious, but perhaps
worth spelling out is that for given and J , there is precisely one choice of indices I for which (I) = J .

Example 2.4.6 (Examples of basis k-forms).

a) F (2) (u) = f (2) (u) = u2 . In general, for k = 1, F (j) coincides with the dual basis vector f (j) .

b) !
(1,3) 1 3 3 1 u1 v1
F (u, v) = u v u v = det
u3 v3

c)

u1 v1 w1

(1,4,3) 1 4 3 3 1 4 4 3 1 4 1 3 3 4 1 1 3 4 4
F (u, v, w) = u v w +u v w +u v w u v w u v w u v w = det u v4 w4

u3 v3 w 3

The following shows that the basis k-forms are highly redundant; basis k-forms F (J) and F ((J)) differ
by at most a sign.
Proposition 2.4.7 (Basis k-forms and permutations). For Sk ,

F ((J)) = sgn F (J) .

Proof. From Proposition 2.4.3,

F ((J)) (E(I) ) =
X
sgn ((I), (J)). (27)
Sk

But
((I), (J)) = (1 ((I)), J) = ((1 (I), J),

using Proposition 2.4.1, and


sgn = sgn sgn (1 ).
Substituting the preceding into (27), we get

F ((J)) (E(I) ) = sgn sgn (1 ) (1 (I), J) = sgn sgn () ((I), J) = sgn F (J) (E(I) ),
X X

Sk Sk

where we have used Proposition 2.2.7. As this holds for all I , it follows that F ((J)) = sgn F (J) , as
required.
Example 2.4.8.

a) F (2,4) = F (4,2)

b) F (2,4,2) = 0

c) F (1,3,5,7) = -F (7,3,5,1) = F (7,5,3,1)

54
Proposition 2.4.9. [Expansion of general k-form] Let a k (V ). Then
1
a= a F (J) , (28)
k! J
where we use the summation convention for J (that is, there is a sum over each index jr in J = (j1 , . . . , jk )).
Proof. We evaluate both sides on E(I) . On the left-hand side, we have
a(E(I) ) = aI . (29)
On the right-hand side, the only terms which contribute are those for which the indices in J are distinct
(otherwise, F (J) = 0). Using Proposition 2.4.3, we get
1 1 X 1 X
a F (J) (E(I) ) = sgn ()aJ ((I), J) = sgn ()a(I) . (30)
k! J k! k!
Sk Sk

From Proposition 2.3.2,


a(I) = sgn aI .
Therefore, (30) becomes
1 X 1 X
sgn 2 ()aI = aI = aI ,
k! k!
Sk Sk
which is the same as (29).

Let J denote a k-tuple of indices which are distinct and in ascending order. That is, J = (j1 , . . . , jk )
with j1 < j2 < < jk .

Proposition 2.4.10. The F (J ) s form a basis for k (V ), and for all a k (V ),


aJ F (J ) .
X
a= (31)
J

[Note that in contrast to (28), there no factor of 1/k!, whose absence is balanced by the fact that the
sum over J in (31) is restricted to ascending k-tuples].
Proof. First, we show that the F J s are linearly independent. Suppose
cJ F (J ) = 0.
X

We want to show that cJ = 0. To do this, we apply both sides to E(I ) . We have that

F (J ) (E(I ) ) =
X
sgn ((I ), J ) = (I , J ),
Sk

since (I ) and J can coincide only if = e (both I and J are in ascending order). Therefore,

cJ F (J ) (E(I ) ) = cI = 0,
X

as required.
Next, we show that (31) holds, which in particular implies that the F (J ) s span k (V ). We start
with Proposition 2.4.10,
1
a= a F (J) .
k! J
As every J is related to a unique J by a unique permutation , we can replace the (implicit) sum over
J by sums over J and to get
1 X X
a= a(J ) F ((J )) .
k!
J Sk

But a(J ) = sgn aJ from Proposition 2.3.2, and F ((J )) = sgn F (J ) from Proposition 2.4.7. Thus,
1 X X
( sgn )2 aJ F (J ) = aJ F (J ) ,
X
a=
k!
J Sk J

as required.

55
The number of distinct k-tuples J where the indices jr are in strictly ascending order is the number
of ways of choosing k distinct things from n things, which is given by the binomial coefficient nk =


n!/(k!(n k)!). It follows from Proposition 2.3.1 that


!
k n n!
dim (V ) = = .
k k!(n k)!

Note that, for k = 0, this is consistent with taking 0 = R), since n0 = 1. Note too that n (V ) is also


one-dimensional, while k (V ) for k > n is zero-dimensional, and consists of a single element, namely the
trivial function which maps everything to zero.

2.5 Wedge product


Definition 2.5.1. Let a k (V ) be an algebraic k-form and b l (V ) be an algebraic l-form. Their
wedge product, denoted a b, is the algebraic (k + l)-form defined by
1 X
a b (v(1) , . . . , v(k+l) ) = sgn a(v((1)) , . . . , v((k)) )b(v((k+1)) , . . . , v((k+l)) ). (32)
k! l!
Sk+l

In words, the value of a b for a set of (k + l) vectors is obtained by permuting the vectors and
evaluating a on the first k and b on the last l of them, multiplying the values of a and b so obtained,
and then summing the result over all permutations counted with sign, and then dividing by k! l!. Note
that permutations which differ only by permutations among the first k vectors and/or among the last l
vectors produce the same contribution to the sum. The factor of k! l! compensates for this.

The wedge product appears naturally in many problems in geometry, topology and physics (field
theories). For us, the wedge product will be important when we come to discuss differential forms. As
we will show below, general k-forms can be expressed as sums of wedge products of one-forms. Many
general results and calculations can be simplified by carrying them out for one-forms, and then extending
the results to general k-forms by means of the wedge product.

Example 2.5.2 (Examples of wedge product).

a) k = 0. If a is an algebraic zero-form, i.e. a R, then the wedge product reduces to scalar


multiplication, i.e.
a b = ab.

b) If a and b are algebraic one-forms, then a b is an algebraic two-form, and

a b (u, v) = a(u)b(v) a(v)b(u).

c) If a is an algebraic one-form and b is an algebraic two-form, then a b is an algebraic three-form


and
a b(u, v, w) = a(u)b(v, w) + a(v)b(w, u) + a(w)b(u, v).

Note that in the formula (32), the sum over S3 produces six terms, and there is also a factor
of 1/2 in front. However, each term in the sum appears, in effect, twice; for example a(u)b(v, w) is
accompanied by a(u)b(w, v), which is equal to a(u)b(v, w), so that the six terms can be reduced to
three, and the factor of 1/2 cancels.

The wedge product satisfies the following properties:

Proposition 2.5.3 (Properties of the wedge product).

i) a b is an algebraic (k + l)-form. That is, a b, as defined by (32), is linear in each argument and
changes sign under interchange of any pair of arguments.

56
ii) Linearity. If a is an algebraic k-form and b and c are algebraic l-forms, then

a (b + c) = a b + a c.

iii) (Anti)commutativity. If a is an algebraic k-form and b is an algebraic l-form, then

a b = (1)kl b a.

In other words, if either k or l is even, then a b = b a. If both k and l are odd, then a b = b a.

iv) Associativity. If a is an algebraic k-form, b an algebraic l-form, and c an algebraic m-form, then

a (b c) = (a b) c.

v) Basis k-forms. Let J = (j1 , . . . , jk ). Then

F (J) = f (j1 ) f (jk ) .

The proofs of these results are a bit involved. They are not hard to understand, and the results
are easy to verify in examples. The difficulty is the book-keeping required for a general argument.
For this reason, the proofs are given separately in the following section, Section 2.6. The proofs are
non-examinable.
As an illustration, let us verify Proposition 2.5.3 v) for a simple case by showing that

F (i,j) = f (i) f (j) .

Let us apply both sides to vectors u and v . From Proposition 2.4.5,

F (i,j) (u, v) = ui v j uj v i ,

while from (32),


f (i) f (j) (u, v) = f (i) (u)f (j) (v) f (i) (v)f (i) (u) = ui v j uj v i ,

as claimed. A consequence of Proposition 2.5.3 v) is that a general algebraic k-form can be written as a
sum of wedge products of algebraic one-forms.
We may derive the following component formula for the wedge product: Let a be an algebraic k-form
and b an algebraic l-form given by
1 1
a= a F (K) , b= b F (M ) .
k! J l! M
Then
1
ab= a b F (J,M ) ,
k! l! J M
where J = (j1 , . . . , jk ), M = (m1 , . . . , ml ), (J, M ) = (j1 , . . . , jk , m1 , . . . , ml ), and J and K are summed over.
Sometimes well write this with the indices explicitly written out.
1
ab= a bm ,...,ml f (j1 ) f (jk ) f (m1 ) f (ml ) .
k! l! j1 ,...,jk 1
Here is an example of this formula:
   
f (1) f (3) 2f (4) f (1) 3f (2) 4f (3) + 5f (4)

= 3f (1) f (2) f (3) 6f (1) f (2) f (4) + 13f (1) f (3) f (4) .

57
2.6 *Proof of properties of the wedge product [nonexaminable]
These notes contain proofs for the properties of the wedge product stated without proof in the previous
section. This material is not examinable. Throughout, let V denote an n-dimensional vector space, and
e(1) , . . . , e(n) a basis on V . Let f (1) , . . . , f (n) denote the corresponding basis on the dual space V .

Definition 2.6.1. Let a k (V ) and b l (V ). Their wedge product, denoted a b, is given by


1 X
a b(v(1)) , . . . , v(k+l) ) = sgn a(v((1)) , . . . , v((k)) )b(v((k+1)) , . . . , v((k+l)) ).
k! l!
Sk+l

Proposition 2.6.2. a b is a (k + l)-form.


Proof.
We need to show that a b is linear and antisymmetric in its arguments.

Antisymmetry: Given k + l vectors v(1) , . . . , v(k+l) , reorder them by a permutation to define k + l


vectors w(1) , . . . , w(k+l) given by
w(r) = v( (r)) .
Then

a b(v( (1)) , . . . , v( (k+l)) ) = a b(w(1) , . . . , w(k+l) )


1 X
= sgn a(w((1)) , . . . , w((k)) )b(w((k+1)) , . . . , w((k+l)) )
k! l!
Sk+l
1 X
= sgn a(v( (1)) , . . . , v( (k)) )b(v( (k+1)) , . . . , v( (k+l)) )
k! l!
Sk+l
1
sgn ( 1 ) a(v((1)) , . . . , v((k)) )b(v((k+1)) , . . . , v((k+l)) ),
X
=
k! l!
Sk+l

where in the last line we have used Proposition 2.2.7 to replace by 1 in the summand. Since
sgn ( 1 ) = sgn sgn , we get that

a b(v( (1)) , . . . , v( (k+l)) ) = a b(w(1) , . . . , w(k+l) )


1 X
= sgn sgn a(v((1)) , . . . , v((k)) )b(v((k+1)) , . . . , v((k+l)) )
k! l!
Sk+l

= sgn a b(v(1) , . . . , v(k+l) ),

as required.
Linearity: From the antisymmetry property just established, it suffices to prove linearity with respect
to the (k + l)th argument. The idea is that linearity for a b follows from the fact that a and b are linear.
However, a little book-keeping is needed to keep track of whether, under a permutation , the (k + l)th
argument belongs to a or b. Let v(1) , . . . v(k+l1) be k + l 1 vectors in V , w and x two additional vectors
in V , and , R.
Define three sets of k + l vectors, A(i) , B(i) and C(i) as follows: For i < k + l,

A(i) = B(i) = C(i) = v(i) ,

while for i = k + l,
A(k+l) = w, B(k+l) = x, C(k+l) = w + x.
We will show that

a b(C(1) , . . . , C(k+l) ) = a b(A(1) , . . . , A(k+l) ) + a b(B(1) , . . . , B(k+l) ),

which amounts to linearity in the (k + l)th argument.


We have that
1 X
a b(C(1)) , . . . , C(k+l) ) = sgn a(C((1)) , . . . , C((k)) )b(C((k+1)) , . . . , C((k+l)) ).
k! l!
Sk+l

58
For given Sk+l , define r by
(r) = k + l,
or, equivalently, r = 1 (k + l). If r k, we have that
a(C((1)) , . . . , C((k)) ) = a(C((1)) , . . . , w + x, . . . , C((k)) ),

where, on the right-hand side, w + x is the rth argument of a. By the linearity of a,

a(C((1)) , . . . , w + x, . . . , C((k)) )
= a(C((1)) , . . . , w, . . . , C((k)) ) + a(C((1)) , . . . , x, . . . , C((k)) )
= a(A((1)) , . . . , A((k)) ) + a(B((1)) , . . . , B((k)) ).

For this same ,


b(C((k+1)) , . . . , C((k+l)) ) = b(A((k+1)) , . . . , A((k+l)) ) = b(B((k+1)) , . . . , B((k+l)) ).

It follows that, for this same ,

a(C((1)) , . . . , C((k)) )b(C((k+1)) , . . . , C((k+l)) )


= a(A((1)) , . . . , A((k)) )b(A((k+1)) , . . . , A((k+l)) )
+ a(B((1)) , . . . , B((k)) )b(B((k+1)) , . . . , B((k+l)) ).

In case r > k, a similar argument (which we wont repeat) establishes that the preceding formula still
holds. Therefore
1 X
a b(C(1)) , . . . , C(k+l) ) = sgn
k! l!
Sk+l

a(A((1)) , . . . , A((k)) )b(A((k+1)) , . . . , A((k+l)) )


!
+ a(B((1)) , . . . , B((k)) )b(B((k+1)) , . . . , B((k+l)) )

= a b(A(1) , . . . , A(k+l) ) + a b(B(1) , . . . , B(k+l) ),

as required.

The following is a formula for the wedge product of more than two forms. It wasnt given in lectures,
but is used below to establish that the wedge product is associative as well as to express the basis k-forms
F (J) as wedge products of the f (j) s.
Proposition 2.6.3. Let a(1) , a(2) , . . . be a sequence of algebraic forms on V of degrees k1 , k2 , . . . Let
b(1) , b(2) , . . . denote the successive wedge products of the a(i) s, defined by

b(1) = a(1) ,
b(j) = b(j1) a(j) , j > 1.

That is, b(2) = a(1) a(2) , b(3) = (a(1) a(2) ) a(3) , etc. b(q) is an lq -form, where
lq = k 1 + + k q .

Then b(q) is given by

1
b(q) (v(1) , . . . , v(lq ) ) =
k1 ! kq !
X
sgn a(1) (v((1)) , . . . , v((k1 )) )a(2) (v((l1 +1)) , . . . , v((l1 +k2 )) )
Slq
!
a(q) (v((l , . . . , v((lq )) ) .
q1 +1))

59
Proof. We proceed by induction on q , the number of forms in the product. If q = 2, the formula above
for b(2) coincides with the definition of the wedge product. For q > 2, assume the formula holds for q 1,
and prove that it holds for q , as follows. From the definition of the wedge product,

b(q) (v(1) , . . . , v(lq ) ) = b(q1) a(q) (v(1) , . . . , v(lq ) )


!
1 X
= sgn b(q1) (v((1)) , . . . , v((lq1 )) )a(q) (v((lq1 +1)) , . . . , v((lq )) ) .
lq1 !kq !
Slq

From the induction hypothesis,


1 X
b(q1) (v((1)) , . . . , v((l
q1 ))
)= sgn
k1 ! kq1 !
Sl
q1

a(1) (v(( (1))) , . . . , v(( (k ))) )a(2) (v(( (l +1))) , . . . , v(( (l +k ))) )
1 1 1 2
!
a(q1) (v(( (l , . . . , v(( (l ) .
q2 +1))) q1 )))

Substitute above to get


1 1 X X
b(q) (v(1) , . . . , v(lq ) ) = sgn sgn
k1 ! kq1 ! lq1 !kq !
Sl Slq
q1

a(1) (v(( (1))) , . . . , v(( (k ))) )a(2) (v(( (l +1))) , . . . , v(( (l +k ))) )
1 1 1 2

a(q1) (v(( (l , . . . , v(( (l )


q2 +1))) q1 )))
!
a(q) (v((l , . . . , v((lq )) ) .
q1 +1))

Given Slq1 , define Slq by

(r) = (r), 1 r lq1 ,


(r) = r, lq1 < r lq .

That is, permutes the numbers 1 through lq1 in the same way as does while leaving the numbers lq1
through lq alone (note that lq s are an increasing sequence of numbers). Then the preceding expression
can be re-written as
1 1 X X
b(q) (v(1) , . . . , v(lq ) ) = sgn sgn
k1 ! kq1 ! lq1 !kq !
Sl Slq
q1

a(1) (v(
(1)) , . . . , v(
(k1 )) )a(2) (v( (l1 +k2 )) )
(l1 +1)) , . . . , v(
!
a(q1) (v(
(l , . . . , v( ) a(q) (v( , . . . , v(
(lq )) ) .
q2 +1)) (l q1 )) (l q1 +1))

Use Proposition 2.2.7 to replace by 1 in the sum over . Then sgn is replaced by sgn sgn 1 =
sgn sgn . It is clear that sgn = sgn (both permutations can be realised by the same set of trans-
positions), so that the factors of sgn cancel from the sum above. As the summand no longer depends
on , the sum over Slq1 yields a factor of lq1 !, which cancels the same factor appearing in the
denominator. We get that
1 X
b(q) (v(1) , . . . , v(lq ) ) = sgn
k1 ! kq !
Slq
!
a(1) (v((1)) , . . . , v((k )) )a(2) (v((l +1)) , . . . , v((l +k )) ) a(q) (v((l , . . . , v((lq )) ) ,
1 1 1 2 q1 +1))

as required.

60
Proposition 2.6.4. The wedge product satisfies the following properties:

a (b + c) = a b + a c, , R, a k (V ), b, c l (V ) (Linearity),
a b = (1)kl b a, a k (V ), b l (V ) ((Anti)commutativity),
a (b c) = (a b) c k (V ), b l (V ), c p (V ) (Associativity).

Proof. Linearity follows immediately from a direct calculation:

(a (b + c))(v(1) , . . . , v(k+l) )

1 X
= sgn a(v((1)) , . . . , v((k)) ) b(v((k+1)) , . . . , v((k+l)) )
k! l!
Sk+l
!
+ c(v((k+1)) , . . . , v((k+l)) )

1 X
= sgn a(v((1)) , . . . , v((k)) )b(v((k+1)) , . . . , v((k+l)) )
k! l!
Sk+l
1 X
+ sgn a(v((1)) , . . . , v((k)) )c(v((k+1)) , . . . , v((k+l)) )
k! l!
Sk+l

= a b(v(1) , . . . , v(k+l) ) + a c(v(1) , . . . , v(k+l) ).

For (anti)commutativity, define the cyclic permutation Sk+l by

(1) = k + 1, (2) = k + 2, . . . , (l) = k + l,


(l + 1) = 1, (l + 2) = 2, . . . , (l + k) = k.

We have that
1 X
a b(v(1) , . . . , v(k+l) ) = sgn a(v((1)) , . . . , v((k)) )b(v((k+1)) , . . . , v((k+l)) ).
k! l!
Sk+l

Replace by in the summand above (Proposition 2.2.7 again) to get that

a b(v(1) , . . . , v(k+l) )
1 X
= sgn sgn a(v((k+1)) , . . . , v((k+l)) )b(v((1)) , . . . , v((l)) )
k! l!
Sk+l

= sgn b a(v(1) , . . . , v(k+l) ),

ie
a b = sgn b a.

It remains to calculate sgn . We note that = k , where is the cyclic permutation given by

(i) = i + 1, i < k + l, (k + l) = 1.

From Problem 6.4,


sgn = (1)k+l1 ,
so that
sgn = (1)k(k+l1) .
But k2 k = k(k 1), so k2 k is always even. Therefore,

sgn = (1)kl ,

as required.

61
For associativity, we note that Proposition 2.6.3 gives an explicit formula for (a b) c, namely
1 1 1
(a b) c (v(1) , . . . , v(k+l+p) ) =
k! l! p!
X
sgn a(v((1)) , . . . , v((k)) )b(v((k+1)) , . . . , v((k+l)) )c(v((k+l+1)) , . . . , v((k+l+p)) ). (33)
Sk+l+p

For a (b c), use (anti)commutativity to bring it into a form where we can apply Proposition 2.6.3
That is,
a (b c) = (1)k(l+p) (b c) a,
so that

a (b c)(v(1) , . . . , v(k+l+p) ) = (1)k(l+p) )(b c) a (v(1) , . . . , v(k+l+p) )


1
= (1)k(l+p)
X
sgn b(v((1)) , . . . , v((l)) )c(v((l+1)) , . . . , v((l+p)) )a(v((l+p+1)) , . . . , v((l+p+k)) ).
k! l! p!
Sk+l+p

Arguing as we did for (anti)commutativity, let denote the permutation which shifts every number up
by k, ie (
r + k, 1 r l + p,
(r) =
r (l + p), l + p + 1 r l + p + k.
Replace by in the preceding (using Proposition 2.2.7) to obtain

a (b c)(v(1) , . . . , v(k+l+p) ) = (1)k(l+p) )(b c) a(v(1) , . . . , v(k+l+p) ) =


1 1 1
= (1)k(l+p) )
k! l! p!
X
sgn ( )b(v( (1)) , . . . , v( (l)) )c(v( (l+1)) , . . . , v( (l+p)) )a(v( (l+p+1)) , . . . , v( (l+p+k)) )
Sk+l+p
1 1 1
= (1)k(l+p) )
k! l! p!
X
sgn b(v((k+1)) , . . . , v((k+l)) )c(v((k+l+1)) , . . . , v((k+l+p)) )a(v((1)) , . . . , v((k) )). (34)
Sk+l+p

Equations (53) and (43) agree apart from the sign factor (1)k(l+p) sgn . But = k , where Sk+l+p
is the cyclic permutation that shifts every number up by 1, with k + l + p mapped back to 1. From
Problem 6.4, sgn = (1)k+l+p1 . Therefore,

sgn = ( sgn )k = (1)k(k+l+p1) = (1)k(l+p) ,


since, as we noted above, k2 k = k(k 1) is always even. Therefore,

(1)k(l+p) sgn = 1,

so that equations (53) and (43) agree, and associativity is confirmed.

Proposition 2.6.5. Let J = (j1 , . . . , jk ). Then

F (J) = f (j1 ) f (jk ) .

Proof. It is enough to show that both sides agree when their arguments are basis vectors. Let I =
(i1 , . . . , ik ) and E(I) = (e(i ) , . . . , e(i ) ). Then from Proposition 2.6.3,
1 k

f (j1 ) f (jk ) (E(I) ) = sgn f (j1 ) (e(i ) f (jk ) (e(i


X
)
(1) ) (k) )
Sk
j
sgn ij1
X X
= i k = sgn ((I), J).
(1) (k)
Sk Sk

But this last expression coincides with the definition of F (J) (E(I) ).

62
2.7 Contraction
Definition 2.7.1. Let v V . We define iv : k (V ) k1 (V ), the contraction with v , by

iv c = 0, for c 0 (V ),
iv a(w(1) , . . . , w(k1) ) = a(v, w(1) , . . . , w(k1) ), for a k (V ), k > 0,

where w(1) , . . . , w(k1) V . Thus, iv maps k-forms to (k 1)-forms by fixing the first argument of the
k-form to be v . The contraction on any zero-form is defined to be zero.

Proposition 2.7.2 (Properties of contraction).

iv (a + b) = iv a + iv b, a, b k (V ),
iv (a b) = (iv a) b + (1)k a (iv b), a k (V ), b l (V ).

The proof (which is non-examinable) is given in the following section.

Example 2.7.3.

a)
iv f (j) = v j = f (j) (v).

b)
iv f (1) f (2) = v 1 f (2) v 2 f (1) .

c)
iv f (2) f (4) f (3) = v 2 f (4) f (3) v 4 f (2) f (3) + v 3 f (2) f (4) .

Proposition 2.7.4 (Coordinate formula for contraction).

Let
1
a= a (x) f (i1 ) f (ik ) k (V ).
k! i1 ik
Then
1
iv a = v j aji2 ik (x) f (i2 ) f (ik ) .
(k 1)!

2.8 *Proof of properties of contraction [nonexaminable]


These notes contain a proof of Proposition 2.7.2, which was omitted from the lectures.
Let V be an n-dimensional vector space, and let v V . We define iv : k (V ) k (V ), the contraction
with v , by

iv c = 0, for c 0 (V ),
iv a(w(1) , . . . , w(k1) ) = a(v, w(1) , . . . , w(k1) ), for a k (V ), k > 0,

where w(1) , . . . , w(k1) V ). Thus, iv maps k-forms to (k 1)-forms by fixing the first argument of the
k-form to be v .

We want to prove the following (Proposition 2.7.2):

iv (a + b) = iv a + iv b, a, b k (V ),
iv (a b) = (iv a) b + (1)k a (iv b), a k (V ), b l (V ).

63
Proof. The first property, which states the contraction is a linear map, is easy to verify. Let a and b be
algebraic k -forms. Then iv (a + b) is a (k 1)-form, which is determined by its values on an arbitrary set
of k 1 vector fields, which we denote by w(1) , . . . , w(k1) . We have that

(iv (a + b))(w(1) , . . . , w(k1) ) = (a + b)(v, w(1) , . . . , w(k1) )


= a(v, w(1) , . . . , w(k1) ) + b(v, w(1) , . . . , w(k1) )
= iv a(w(1) , . . . , w(k1) ) + iv b(w(1) , . . . , w(k1) ).

Therefore, if we omit the arguments w(1) , . . . , w(k1) , we get the (quite obvious) relation between (k 1)-
forms,
iv (a + b) = iv a + iv b.

Next we consider the contraction acting on a wedge product. The argument here is more involved. Let
a be an algebraic k-form and b an algebraic l-form. Then iv (a b) is a (k + l 1)-form, which is determined
by its values on an arbitrary set of k + l 1 vector fields, which we denote by w(1) , . . . , w(k+l1) . We have
that
(iv (a b))(w(1) , . . . , w(k+l1) ) = (a b)(v, w(1) , . . . , w(k+l1) ).

It will be convenient to introduce an alternative notation for the vector fields v, w(1) , . . . , w(k+l1) , so
that they can all be referred to by a single index. Thus we write

x(1) = v, x(2) = w(1) , . . . , x(k+l) = w(k+l1) .

Then from the definition of the wedge product,


1 1 X
(a b)(x(1) , . . . , x(k+l) ) = sgn a(x((1)) , . . . , x((k)) )b(x((k+1)) , . . . , x((k+l)) ).
k! l!
Sk+l

r
Given 1 r k + l, let Sk+l denote the subset of Sk+l consisting of permutations for which (r) = 1.
Then every permutation Sk+l belongs to a unique Sk+l r , namely the one with r = 1 (1). We have
that
X k+l
X X
= ,
Sk+l r=1 S r
k+l

so that
k+l
X
(a b)(x(1) , . . . , x(k+l) ) = Tr , (35)
r=1
where
1 1 X
Tr = sgn a(x((1)) , . . . , x((k)) )b(x((k+1)) , . . . , x((k+l)) ).
k! l! r
Sk+l
r , it follows that (1) = (r) = 1, so that
Consider first the case where r k. Given Sk+l r1 r1
is a permutation which leaves 1 invariant, and therefore permutes the numbers 2 through k + l amongst
themselves. By shifting these numbers down by one, we can construct a permutation Sk+l1 given
by
(s) = (r1 )(s 1) + 1.

It is then straightforward to check that

x((1) ) = w(
(r1)) ,
x((r)) = v,

(s1)) for s 6= 1, r.
x((s)) = w(

Note that since r k, x((r)) appears in Tr as an argument of a. Also,

sgn = sgn r1 sgn = (1)r,1 1 sgn .

64
Therefore,

a(x((1)) , . . . , x((k)) )b(x((k+1)) , . . . , x((k+l)) )

= (1)r,1 1 a(w(
(r1)) , w(
(1)) , . . . , w(
(r2)) , v, w(
(r)) , . . . , w(
(k1)) )b(w(
(k)) , . . . , w(
(k+l1)) )
= a(v, w(
(1)) , . . . , w(
(k1)) )b(w(
(k)) , . . . , w(
(k+l1)) )
= (iv a)(w(
(1)) , . . . , w(
(k1)) )b(w(
(k)) , . . . , w(
(k+l1)) ),

where in the second-to-last equality an extra sign factor of (1)r,1 1 comes from interchanging the
vector fields v and w((r1) in a (if r = 1, no exchange is necessary). Substituting into the expression for
Tr , we get

1 1 X
Tr = sgn (iv a)(w((1)( , . . . , w((k1)) )b(w((k)) , . . . , w((k+l1)) )
k! l!
Sk+l1

1 1 1 X
= sgn (iv a)(w((1) , . . . , w((k1)) )b(w((k)) , . . . , w((k+l1)) )
k (k 1)! l!
Sk+l1

1
= ((iv a) b)(w(1) , . . . , w(k+l1) ).
k
Since Tr does not depend on r, summing over r between 1 and k just eliminates the factor of 1/k above.
Therefore,
k
X
Tr = ((iv a) b)(w(1) , . . . , w(k+l1) ). (36)
r=1
The case where r > k is treated similarly, but takes a bit more work. Let C1,r,k+1 Sk+l denote
the cyclic permutation which maps 1 to r, r to k + 1, and k + 1 to 1, leaving all other numbers between
r , it follows that C
1 and k + l unchanged. Given Sk+l 1,r,k+1 (1) = (r) = 1, so that C1,r,k+1 is
a permutation which leaves 1 invariant, and therefore permutes the numbers 2 through k + l amongst
themselves. By shifting these numbers down by one, we can construct a permutation Sk+l1 by

(s) = (C1,r,k+1 )(s 1) + 1.


It is then straightforward to check that

x((1)) = w(
(k)) ,
x((k+1)) = w(
(r1)) ,
x((r)) = v,

(s1)) , for s 6= 1, r, k + 1.
x((s)) = w(

Note that since r > k, x((r)) = v appears in Tr as an argument of b. Also,

sgn = sgn C1,r,k+1 sgn = (1)r,k+1 sgn

(note that if r = k + 1, then C1,r,k+1 reduces to the transposition r1 ). Therefore,

a(x((1)) , . . . , x((k)) )b(x((k+1)) , . . . , x((k+l)) )

= (1)r,k+1 a(w(
(k)) , w(
(1)) , . . . , w(
(k1)) )b(w(
(r1)) , w(
(k)) , . . . , w(
(r2)) , v, w(
(r)) , . . . , w(
(k+l1)) ).

We have that
k1
a(w(
(k)) , w(
(1)) , . . . , w(
(k1)) ) = (1) a(w(
(1)) , w(
(2)) , . . . , w(
(k)) ).

since a cyclic permutation of the arguments of a k-form produces a sign factor (1)k1 . Also,

b(w(
(r1)) , w(
(k)) , . . . , w(
(r2)) , v, w(
(r)) , . . . , w(
(k+l1)) )
= (1)r,k+1 1 b(v, w(
(k)) , . . . , w(
(r2)) , w(
(r1)) , w(
(r)) , . . . , w(
(k+l1)) )
= (1)r,k+1 1 (iv b)(w(
(k)) , . . . , w(
(k+l1)) ),

65
where in the second-to-last equality an extra sign factor of (1)r,k+1 comes from interchanging the
vector fields v and w((r1)) in b (if r = k + 1, no exchange is necessary). Substituting the preceding into
the expression for Tr , we get

Tr =
1 1
(1)k1 (1)r,k+1 1 (1)r,k+1
X
sgn a(w((1)) , . . . , w((k)) )(iv b)(w((k+1)) , . . . , w((k+l1))
k! l!
Sk+l1

1 1 1
= (1)k
X
sgn a(w((1)) , . . . , w((k)) )(iv b)(w((k+1)) , . . . , w((k+l1))
l (k)! (l 1)!
Sk+l1

1
= (1)k (a iv b)(w(1) , . . . , w(k+l1) ).
l
Since Tr does not depend on r, summing over r between k + 1 and k + l just eliminates the factor of 1/l
above. Therefore,
k+l
X
Tr = (a iv b)(w(1) , . . . , w(k+l1) ). (37)
r=k+1

Finally, we substitute (36) and (37) into (35) to obtain


 
(iv (a b))(w(1) , . . . , w(k+l1) ) = (iv a) b + (1)k a iv b (w(1) , . . . , wk+l1 ),

as required.

2.9 Algebraic forms on R3


We have that
dim 0 (R3 ) = 1, dim 1 (R3 ) = dim 2 (R3 ) = 3, dim 3 (R3 ) = 1.

Thus, algebraic zero-forms and three-forms on R3 can be identified with scalars, while algebraic one-forms
and two-forms can be identified with vectors. By making these identifications, the wedge product can
be seen to correspond with the familiar dot product and cross product of vector algebra.

A zero-form is by definition a scalar.

A one-form a 1 (R3 ) can be written as a = a1 f (1) + a2 f (2) + a3 f (3) . We may associate a vector
A = (A1 , A2 , A3 ) to a one-form, and vice versa, by making the (rather obvious) identifications

a1 = A1 , a2 = A2 , a3 = A3 .

A two-form b 2 (R3 ) can be written as

b = b12 f (1) f (2) + b23 f (2) f (3) + b31 f (3) f (1) .

We may associate a vector B = (B1 , B2 , B3 ) to a two-form, and vice versa, by making the (somewhat less
obvious) identifications
b23 = B1 , b31 = B2 , b12 = B3 .

A three-form c 3 (R3 ) can be written as c = c123 f (1) f (2) f (3) . We may associate a scalar C to a
three-form, and vice versa, by making the (fairly obvious) identification

c123 = C.

Let us translate the wedge product of algebraic forms on R3 into relations between vectors and scalars.
A wedge product involving a 0-form is just scalar multiplication, so we wont consider it further. The
wedge product of two one-forms yields a two-form, and the wedge product of a one-form and a two-form
is a three-form. These are the only two cases to consider, as all other wedge products (involving, for
example, two two-forms or a one-form and a three-form) produce forms of degree greater than three, and
therefore vanish automatically.

66
Let a and d be two one-forms, and let b = a d. Then b is a two-form. Let A, D and B denote the
vectors associated with a, d and b, respectively. As we show below,

B = A D.

Lets show this explicitly. We have that


   
a d = a1 f (1) + a2 f (2) + a3 f (3) d1 f (1) + d2 f (2) + d3 f (3)

= a1 d2 f (1) f (2) + a1 d3 f (1) f (3) + a2 d1 f (2) f (1) + a2 d3 f (2) f (3) + a3 d1 f (3) f (1) + a3 d2 f (3) f (2)
= (a1 d2 a2 d1 )f (1) f (2) + (a2 d3 a3 d2 )f (2) f (3) + (a3 d1 a1 d3 )f (3) f (1) .

This two-form may be associated with the vector B as follows:

B = (a2 d3 a3 d2 , a1 d2 a2 d1 , a1 d2 a2 d1 ) = (A2 D3 A3 D2 , A3 D1 A1 D3 , A1 D2 A2 D1 ) = A D.

Thus, the wedge product of two one-forms corresponds to the cross-product of the associated vectors.
Note that the anticommutativity of the wedge product of one-forms, i.e. the fact that a d = d a,
corresponds to the antisymmetry of the cross-product,

A D = D A.

Next, let a be a one-form, b be a two-form, and c = a b. Then c is a three-form. Let A and B be


the vectors associated with a and b, and C the scalar associated with c. As we show below,

C = A B.

Lets show this explicitly. We have that


   
a b = a1 f (1) + a2 f (2) + a3 f (3) b12 f (1) f (2) + b23 f (2) f (3) + b31 f (3) f (1) .

= a1 b23 f (1) f (2) f (3) + a2 b31 f (2) f (3) f (1) + a3 b12 f (3) f (1) f (2)
= (a1 b23 + a2 b31 + a3 b12 )f (1) f (2) f (3) .

The coefficient of the basis three-form is given

a1 b23 + a2 b31 + a3 b12 = A1 B1 + A2 B2 + A3 B3 = A B.

Thus, the wedge product of a one-form and a two-form corresponds to the dot product of the associated
vectors. Note that the commutativity of the wedge product of a one-form and a two-form, i.e. the fact
that a b = b a, corresponds to the symmetry of the dot product,

A B = B A.

Let a, d and e be one-forms, and let A, D and E be the associated vectors. The associativity of the
wedge product, i.e. the fact that a (d e) = (a d) e, corresponds to the triple product identity,

A (D E) = (A D) E.

Let us also consider the contraction for algebraic forms on R3 . This, too, turns out to correspond
to familiar vector-algebra operations. Let v = v 1 e(1) + v 2 e(2) + e(3) R3 be a vector in R3 . Let
a = a1 f (1) + a2 f (2) + a3 f (3) 1 (R3 ), and let A be the associated vector in R3 . Then iv a is a scalar, and
is given by
iv a = A v.
Thus, contraction with a one-form corresponds to the dot product.

Let b = b12 f (1) f (2) + b23 f (2) f (3) + b31 f (3) f (1) 2 (R3 ) be an algebraic two-form on R3 , and let
B be the associated vector in R3 . Then iv b is a one-form, and is given by
     
iv b = v 3 b31 v 2 b12 f (1) + v 1 b12 v 3 b23 f (2) + v 2 b23 v 1 b31 f (3) ,

67
which corresponds to the vector B v. Thus, contraction with a two-form corresponds to the cross
product.

Let c = c123 f (1) f (2) f (3) 3 (R3 ) be an algebraic three-form on R3 with associated scalar c123 .
Then iv c is a two-form, and is given by

iv c = c123 v 3 f (1) f (2) + c123 v 1 f (2) f (3) + c123 v 2 f (3) f (1) ,

which corresponds to the vector Cv. Thus, contraction with a three-form corresponds to scalar multipli-
cation.

3 Differential forms
3.1 Definition of differential forms
Definition 3.1.1 (Differential forms). Let U be an open subset of Rn . A differential k-form on U , or
k-form for short, is a smooth map

1
: U k (Rn ); x 7 (x) = (x)F (J) .
k! J
Here, smooth means that the coefficient functions are smooth, i.e. J (x) C ().
Let k (U ) denote the space of differential k-forms on U . For k = 0, 0-forms are smooth functions on
U , i.e. 0 (U ) = C (U ).
Given v(1) , . . . , v(k) Rn , we write (x; v(1) , . . . , v(k) ) to denote (x) (which is an algebraic k-form)
evaluated on the k vectors v(1) , . . . , v(k) .

Addition, wedge product and contraction of differential forms with vectors are defined point-wise, in
analogy with the corresponding operators on algebraic k-forms.

Definition 3.1.2 (Addition of differential forms). Given , k (U ), then (x) and (x) are both
algebraic k-forms, and it makes sense to add them. Thus, we define + k (U ) by

( + )(x) = (x) + (x).

Definition 3.1.3 (Wedge product of differential forms). If is a k-form on U and is an l-form, then
we define the (k + l)-form by
( )(x) = (x) (x).

It follows from Proposition 2.5.3 that

= (1)kl ,
( + ) = + , where is a differential l-form, like ,
( ) = ( ) , where is a differential m-form.

For example, if we regard differential 1-forms on R3 as vector fields, then we are just defining the
sum and cross product of vector fields by the rather obvious formula (A + B)(r) = A(r) + B(r), and
(A B)(r) = A(r) B(r).

Definition 3.1.4 (Contraction of differential forms). Let U Rn be open. Given a differential k-form
k and a vector field X X (U ), the contraction of with X, denoted iX , is the differential
(k 1)-form defined by
(iX ) (x) = iX(x) (x).

Note that X(x) Rn is a vector in Rn , and (x) k (Rn ) is an algebraic k-form. Hence, iX(x) (x)
k1 (Rn ), where iv denotes the contraction map of Section 2.7, is an algebraic (k 1)-form that depends
smoothly on x (since X(x) and (x) depend smoothly on x), which therefore determines a differential
(k 1)-form.
Note that the contraction of a 0-form is always zero.

68
The properties of the contraction map for algebraic forms carry over to differential forms; specifically,
from Proposition 2.7.2, we have the following:

iX ( + ) = iX + iX , (38)
iX ( ) = (iX ) + (1)k iX . (39)

Example 3.1.5. Let be the differential 1-form and the differential 2-form on R3 given by

= (x + y)f (2) + (x2 y 2 )f (3) ,


= z f (1) f (2) + xz f (1) f (3)

and let X be the vector field on R3 given by

X = xe(2) + e(3) = (0, x, 1).

Then

= (x+y)xz f (2) f (1) f (3) +(x2 y 2 )zf (3) f (1) f (2) = (x+y)yz f (1) f (2) f (3) = (x+y)yz F (1,2,3) ,

and    
(iX ) (x, y, z) = i(0,x,1) z f (1) f (2) + i(0,x,1) xz f (1) f (3) = xzf (1) xzf (1) = 0.

3.2 The exterior derivative


Definition 3.2.1 (Exterior derivative). Let U Rn be open. The exterior derivative, denoted d, is a
map d : k (U ) k+1 is a map from differential k-forms to differential (k + 1)-forms defined as follows:
k = 0. For g 0 (U ) = C (U ),
g (i)
dg = f .
xi

k > 0. For k (U ), we may write that


1
= f (i1 ) f (ik ) ,
k! i1 ...ik

where i1 ...ik (x) C (U ) (cf Proposition 2.4.10). Then

1
d(x) = d f (i1 ) f (ik ) .
k! i1 ...ik
Equivalently,
1 i1 ...ik (j)
d(x) = f f (i1 ) f (ik ) .
k! xj

Thus, the exterior derivative of a function corresponds to the gradient of the function; indeed, dg is
often called a gradient one-form. In particular, let us consider the ith coordinate function, which we
denote by xi . We regard xi as a function on U given by

xi (x ) = xi .

That is, xi , regarded as a function, picks out the ith component of its argument. Then

xi (j)
dxi = f = ji f (j) = f (i) .
xj

Thus, dxi is equal to the constant one-form f (i) ; we can think of f (i) as a differential one-form j f (j)
with a single nonzero coefficient function, i , which is equal to 1.

Notation. It is conventional to write dxi instead of f (i) . Similarly, we write

dxi1 dxik instead of f (i1 ) f (ik ) ,

69
and sometimes
dxI instead of F (i) .
Thus, from now on, we will write differential one-forms as
= i dxi .

In particular, a gradient one-form will be written as


g
dg = dxi ,
xi
which conforms to usage in standard calculus. A general k-form will be written as
1
= dxi1 dxik ,
k! i1 ...ik
or sometimes as
1
= dxI .
k! I
Proposition 3.2.2 (d and addition). Let , k (U ). Then
d( + ) = d + d.

Proof. We have that


    
1  
i1 ik 1 
i1 ik
d( + ) = d i1 ik + i1 ik dx dx = di1 ik + di1 ik dx dx
k! k!
1 1
= d dxi1 dxik + d dxi1 dxik = d + d.
k! i1 ik k! i1 ik

Proposition 3.2.3 (d and wedge product). Let k (U ) and l (U ). Then


d( ) = (d) + (1)k d.

Proof. Since d is linear (c.f. Proposition 3.2.2), it suffices to consider the case where and consist of
a single term, rather than a sum of terms, i.e.
= f dxi1 dxik ,
= g dxj1 dxjl .

Then
= f g dxi1 dxik dxj1 dxjl ,
so that
 
d( ) = d f g dxi1 dxik dxj1 dxjl = d(f g) dxi1 dxik dxj1 dxjl .

But  
g f
d(f g) = (f g) dxj = f +g j dxj = f dg + g df.
xj xj x
Therefore,
d( ) = (g df + f dg) dxi1 dxik dxj1 dxjl .
We have that
   
g df dxi1 dxik dxj1 dxjl = df dxi1 dxik g dxj1 dxjl = d .

On the other hand, since dg is a 1-form, we have that


dg dxi1 dxik = (1)k dxi1 dxik dg.

Therefore,
 
f dgdxi1 dxik dxj1 dxjl = (1)k f dxi1 dxik dxj1 (dgdxj1 dxjl = (1)k d.

Combining the preceding results, we obtain


d( ) = d + (1)k d,

as required.

70
Proposition 3.2.4 (d2 = 0).

For all k (U ),
d2 = 0.

Proof. We proceed by induction on k.

First, we show d2 = 0 for 0-forms. Let f 0 (U ). From Definition 3.2.1 for d applied to a function,
we have that
f
df = dxj .
xj
Then from Definition 3.2.1 for d of a one-form, we have that
 
f
d(df ) = d dxj .
xj

But
2f
 
f
d = dxi .
xj xj xi
Then
2f
d(df ) = dxi dxj .
xj xi
This expression vanishes, since the first factor is symmetric in i and j (by the equality of mixed partials)
while the second factor is antisymmetric (since dxi dxj = dxj dxi ).

Let us show this last fact explicitly, writing in the summations over i and j . By the equality of mixed
partials (Proposition 1.3.16), we may write that

n
!
2f 2f
1 dxi dxj
X
d(df ) = 2 +
xj xi xi xj
i,j=1
n 2 n 2
1 f dxi dxj + 1 f dxi dxj
X X
=
2 xi xj 2 xj xi
i,j=1 i,j=1
n 2 n 2
1 f dxi dxj + 1 f dxj dxi (swapping i and j in the second sum)
X X
= 2 xi xj 2 xi xj
i,j=1 i,j=1
n 2
1 f
 
i j j i
X
= 2 xi xj dx dx + dx dx = 0.
i,j=1

Next, we show that d2 = 0 for k-forms. Since d is linear, it suffices to consider k-forms of the form

= f dxi1 dxik ,

where f is a smooth function. Then

d = df dxi1 dxik .

From Proposition 3.2.3,


 
d(d) = (d(df )) dxi1 dxik df d dxi1 dxik .

Since we have already shown that d(df ) = 0, it follows that the first term vanishes. The second term also
vanishes, since    
d dxi1 dxik = d 1 dxi1 dxik = 0,

by Definition 3.1.1 (the coefficient of the basis k-form is the constant function f = 1, whose exterior
derivative d(1) vanishes).
Example 3.2.5 (Exterior derivative on R3 ).

71
0-forms. Under the correspondence between scalar and vector fields on R3 , on the one hand, and
differential forms on R3 on the other (as described in Section 2.9), a 0-form f corresponds to a
function, and the 1-form df corresponds to a vector field. We have that
f f f
df = dx + dy + dz.
x y z
Thus, df corresponds to the gradient of f .
1-forms. Under the correspondence between scalar and vector fields and differential forms on R3 ,
a 1-form on R3 , given by
= 1 dx + 2 dy + 3 dz,
corresponds to a vector field A = (A1 , A2 , A3 ) with components given by Aj = j . The 2-form d
also corresponds to a vector field. We have that

d = d1 dx + d2 dy + d3 dz
     
1 1 1 2 2 2 3 3 3
= dx + dy + dz dx+ dx + dy + dz dy+ dx + dy + dz dz
x y z x y z x y z
     
2 1 1 3 3 2
= dx dy + dz dx + dy dz.
x y z x y z
Thus, d corresponds to a vector field B with components
3 2
B1 = ,
y z
1 3
B2 = ,
z x
2 1
B3 = .
x y
B may be recognised as the curl of A. Thus, d corresponds to the curl of A.

2-forms. Under the correspondence between scalar and vector fields and differential forms on R3 ,
a 2-form on R3 , given by
= 12 dx dy + 23 dy dz + 31 dz dx,

corresponds to a vector field B = (B1 , B2 , B3 ) with components given by


B1 = 23 ,
B2 = 31 ,
B3 = 12 .

The 3-form d corresponds to a scalar field. We have that

d = d12 dx dy + d23 dy dz + d31 dz dx


     
12 12 12 23 23 23 31 31 31
= dx + dy + dz dxdy+ dx + dy + dz dydz+ dx + dy + dz
x y z x y z x y z
12 23 31
= dz dx dy + dx dy dz + dy dz dx
z x y
 
12 23 31
= + + dx dy dz.
z x y
The 3-form d corresponds to the scalar field
23 31 12
+ + ,
x y z
which corresponds to
B1 B2 B3
+ + = B.
x y z
Thus, d corresponds to the divergence of B.
As discussed in Problem Sheet 8, the fact that d2 = 0 corresponds to the identities (f ) = 0 and
( A) = 0.

72
3.3 The Pullback
Definition 3.3.1 (Pullback). Let U Rm and V Rn be open, and let F : U V be a smooth map.
The pullback, denoted F , is a map F : k (V ) k (U ) that is, the pullback maps differential forms
on V back to differential forms on U . Given k (V ), F is defined as follows. We note that as F
is a differential k-form on U , F (x) is an algebraic k-form on Rm , which may defined by specifying its
value when applied to k arbitrary vectors in Rm . Denoting these vectors by u(1) , . . . , u(k) , the definition
is given by
(F )(x; u(1) , . . . , u(n) ) = (F (x); F 0 (x)u(1) , . . . , F 0 (x)u(n) ).

For 0-forms, i.e. functions, F f = f F , in accord with Definition 1.9.1.

Figure 24: (x) is an algebraic 3-form, which is to be evaluated on three vectors, u, v and w, to produce
a number. This number is given by evaluating (F (x)), an algebraic 3-form, on the three vectors F 0 (x)u,
F 0 (x)v and F 0 (x)w.

Exercise 3.3.2.

i) Show that F is well defined, i.e. F k (U ).


ii) For , k (U ), show that F ( + ) = F + F .

Example 3.3.3 (Component formula for pullback of one-forms). Let (V ). Then

(y) = j (y) dy j .

Let = F . We may write that


(x) = i (x) dxi .
We want to express the i s in terms of the j s.

To proceed, we recall the definition of i (x), namely

i (x) = (x; e(i) ).

That is, (x) is an algebraic one-form on Rn , and its ith component is obtained by applying it to the
basis vector e(i) . From Definition 3.3.1 for the pullback,
 
(x; e(i) ) = F (x; e(i) ) = (F (x); F 0 (x) e(i) ) = j (F (x)) F 0 (x)e(i) )j .

In general,
 j F j
F 0 (x) v = (x)v k .
xk

73
Therefore,
 j F j F j F j
F 0 (x)e(i) ) = (x)e(i) k = (x)ik = (x).
xk xk xi
Substituting above, we get that
F j
i (x) = (x)j (F (x)),
xi
or
F j
= F = j F dxi .
xi
In particular,
F j
F dy j = dxi . (40)
xi
Proposition 3.3.4 (Pullback of wedge product).
F ( ) = F F .

Proof. This follows directly from the definition (32) and Definition 3.3.1 of the pullback.
Example 3.3.5 (Component formula for pullback of k-forms). Let k (V ). We may write that
1
= dy j1 dy jk
k! j1 ...jk
Then = F is given by
1 F j1 F jk  
= j ...j F dxi1 dxik .
k! xi1 xik 1 k

Remark 3.3.6. A k-form can be expressed as a sum of terms of the form f dxi1 dxik , where f is
a function. Moreover, the basis k-form dxi1 dxik can be written as d, where is the (k 1)-form
given by
= xi1 dxi2 dxik .
Therefore, a k-form can be written as a sum of terms of the form f d, where is a (k 1)-form. This
observation will be the basis for a number of proofs to follow involving k-forms, where we will use
induction on k.
Proposition 3.3.7 (Pullback and d). Let k (V ). Then
F d = dF .

Proof. We proceed by induction on k.


First, we show this is true for 0-forms, i.e. functions. Let g 0 (V ) = C (V ). We compute F dg as
follows. We have that dg is given by
g
dg = dy j .
y j
Then from Example 3.3.3,
F j g
F dg(x) = (x) j (F (x)) dxi .
xi y
Next, we compute d(F g), as follows:
g F j
d(F g)(x) = i
g(F (x)) dxi = j
(F (x)) i (x) dxi .
x y x
Thus,
F dg = dF g. (41)
Next, we assume the result is true for (k1)-forms. Let be a k-form on V . By Remark 3.3.6, it suffices
to take = g d , where is a (k 1)-form on V (note that the pullback is linear, by Proposition 3.3.2).
We have that

F d(g d) = F (dg d) (since d2 = 0) = (F dg) (F d) (by Proposition 3.3.4)


= d(F g) dF (by induction hypothesis) = d(F g dF ) (since d2 F = 0)
= d(F g F d) (by induction hypothesis) = dF (g d) (by Proposition 3.3.4) .

74
Example 3.3.8 (Polar coordinates on R2 .). Let U = {(r, )} = R2 , V = {(x, y)} = R2 . Let F : U V be
given by
F (r, ) = (r cos , r sin ).

Then F : k (V ) k (U ) maps Cartesian-coordinate forms to polar-coordinate forms as follows:


0-forms.
Let g(x, y) be a function on V . Let f = F g . Then

f (r, ) = g(F (r, )) = g(r cos , r sin ).

1-forms.
Let
(x, y) = 1 (x, y) dx + 2 (x, y) dy.

First, well compute F dx and F dy . We could use the formulas in Example 3.3.3, but here is an
easier way:
F dx = dF x = d(r cos ) = cos dr r sin d,

where we have used Proposition 3.3.7. Similarly,

F dy = dF y = d(r sin ) = sin dr + r cos d.

Then, letting = F , we have that

= 1 (r cos , r sin )(cos dr r sin d) + 2 (r sin , r cos )(sin dr + r cos d)


= (cos 1 (r cos , r sin ) + sin 2 (r sin , r cos )) dr+(r cos 2 (r cos , r sin ) r sin 1 (r cos , r sin )) d

2-forms.
A general two-form on V is given by

(x, y) = c(x, y) dx dy.

First we pull pack dx dy . Using Proposition 3.3.4, F (dx dy) = (F dx) F dy , and we have
already computed F dx and F d y above. Therefore,

F (dx dy) = (F dx) F dy = (cos dr r sin d) (sin dr + r cos d)


= r cos2 dr d r sin2 d dr = (r cos2 + r sin2 ) dr d = r dr d.

You might recognise this as reminiscent of the transformation of the area element from Cartesian
to polar coordinates indeed, this is no coincidence, as we shall see later. Finally, we have that

F (r, ) = c(r cos , r sin )r dr d.

Proposition 3.3.9 (Pullback and composition). Let U Rm , V Rn and W Rp be open. Let


F : U V and G : V W be smooth maps. Then

(G F ) = F G .

Proof. Proceed by induction on k. For 0-forms, i.e. functions, the result is clear, and was shown in
Proposition 1.9.2. Suppose it holds for (k 1)-forms. By Remark 3.3.6 it suffices to consider k-forms of
the form f d, where f is a function and is a (k 1)-form. Then

(G F ) f d = (G F ) f (G F ) d (by Proposition 3.3.4)


= ((G F ) f ) d(G F ) (by Proposition 3.3.7 ) = (F G f ) dF G (by induction hypothesis)
= F G f F G d (by Proposition 3.3.7 ) = F G (f d) (by Proposition 3.3.4) .

75
3.4 The Lie derivative
Definition 3.4.1 (Lie derivative of forms). Let U Rn be open, and let X X (U ) be a smooth vector
field on U with flow t . Let k (U ) be a differential k-form. The Lie derivative of with respect
to X, denoted LX , is given by

LX = .
t t=0 t

Proposition 3.4.2 (Simple properties of the Lie derivative forms).

i) LX ( + ) = LX + LX
ii) LX ( ) = LX + LX
iii) LX d = dLX .

Proof. These properties follow directly from corresponding properties of the pullback.
i) We have that

LX ( + ) = ( + ) .
t t=0 t
Since the pullback is linear (Exercise 3.3.2 it follows that

t ( + ) = t + t .

Then

+ t = LX + LX .

LX ( + ) =
t t=0 t

ii) From Proposition 3.3.4, we have that

t ( ) = t t .
 

From the product rule, it follows that




t t
 
t ( ) =
t t=0
t t=0

   

t t=0 + t t=0
 
= t t
t t=0 t t=0
= (LX ) + (LX ) .

iii) From Proposition 3.3.7, we have that


dt = t d.
It follows that

LX d = t d = dt d = dLX .
t t=0 t t=0 t t=0 t

Proposition 3.4.3 (Homotopy formula). For k (U ),

LX = iX d + diX .

Proof. We proceed by induction. First, we show that the formula holds for 0-forms. For a 0-form f , the
left-hand side of the homotopy formula gives

f
LHS := LX f = f = f t = X f = Xi i ,
t t=0 t t t=0 x

in accord with Definition 1.9.3. On the right-hand side, we have that


f
RHS := iX df + diX f = iX df = Xi ,
xi

76
since iX f = 0. Thus, LHS and RHS coincide, and the homotopy formula holds for 0-forms.

Next, we assume that the homotopy formula holds for (k 1)-forms, and we show that it holds for
k-forms. Since all the operations we are considering namely LX , iX and d are linear, by Remark 3.3.6
it suffices to take
= f d,
where is a (k 1)-form.
Applied to = f d, the left-hand side of the homotopy formula gives

LHS := LX (f d) = (LX f ) d + f LX d,

by Proposition 3.4.2 ii). From the preceding,

LX f = iX df,

while from Proposition 3.4.2 iii),


LX d = dLX .
By the induction hypothesis, the homotopy formula applies to , since is a (k 1)-form. Therefore,

LX = iX d + diX ,

so that
d LX = diX d,
since d2 = 0 by Proposition 3.2.4. Combining the previous results, we get that

LHS = (LX f ) d + f LX d = (iX df ) d + f diX d.

The right-hand side of the homotopy formula applied to = f d gives

RHS := iX d(f d) + diX (f d).

Let us consider the first term on the right-hand side of the preceding. Since

d(f d) = df d + f d(d) = df d,

it follows from (39) that

iX d(f d) = iX (df d) = (iX df )d df (iX d).

Next, we consider the second term on the right-hand side of the preceding. We have that

diX (f d) = d (f iX d) = df (iX d) + f diX d.

Combining the preceding expressions, we get

RHS := (iX df )d df (iX d) + df (iX d) + f diX d = (iX df )d + f diX d.

Thus, LHS and RHS coincide, and the homotopy formula holds for k-forms.
Example 3.4.4. Let be the 2 -form on R3 given by

= x dx dy + y dy dz,

and let X be the vector field on R3 given by

X = (y, 0, z).

We compute LX using the homotopy formula. We have that d = 0, so that iX d = 0. On the other
hand,
iX = x iX (dx dy) + y iX (dy dz) = xy dy yz dy = y(x + z) dy.
Then
LX = d iX vec = d(y(x + z) dy) = y(dx dy + dz dy) = y dx dy + y dy dz.

77
3.5 The Poincar
e Lemma
From Proposition 3.2.4, we know that d(d) = 0 for any differential form . This fact motivates the
following question: Suppose is a k-form such that d = 0. Can we find a differential (k 1)-form
such that = d? Two specific examples of this question occur in vector calculus: i) if F = 0, can
we write F = for some scalar function , and ii) if B = 0, can we write B = A for some vector
field A?
Definition 3.5.1 (Closed and exact forms.). A differential k-form is closed if d = 0. is exact if
= d for some (k 1)-form .

In this language, Proposition 3.2.4 says that every exact form is closed. The question we are asking
is whether every closed form is exact.
The Poincare Lemma is a partial answer to this question. It provides a sufficient condition for every
closed form to be exact. It turns out that this condition is related to the topology of the space on which
the forms are defined. Examples of closed forms which are not exact occur in spaces which are, in a sense
that can be made to be precise, topologically nontrivial. In physics applications, these often correspond
to singularities, e.g. point charges or lines of current. The general area for these questions is called
differential topology. It is a means of studying the topology of manifolds through analytic methods,
specifically through certain special differential forms, which are characterised by their satisfying some
simple (natural) partial differential equation.

3.5.1 Time-dependent vector fields


We will need a generalisation of vector fields and one-parameter subgroups. This generalisation corre-
sponds to going from autonomous to non-autonomous systems of ODEs.

Definition 3.5.2 (One-parameter family of diffeomorphisms). Let U, V Rn be open, and let I be an


open interval in R. a one-parameter family of diffeomorphisms is a smooth family of maps
: I U V ; (t, x) 7
t (x)

t is a diffeomorphism onto its image. That is, letting Ut =


such that t (U ) V , then
t : U Ut is a
diffeomorphism.
Note that a one-parameter subgroup of diffeomorphisms, given by Definition 1.6.11, is a special case
of a one-parameter family of diffeomorphisms. In particular, for a family, as opposed to a subgroup, we
do not assume that s
t =
s+t nor that
0 = Id .
U

t on Ut by
t , we define a one-parameter family of vector fields X
Given the one-parameter family
t (x)), or
t (
=X t.
(x) = X (42)
t t t t
In the language of mechanics, X t (
t (x)) is the velocity at time t along the trajectory x(t) =
t (x). Since

t is a diffeomorphism and therefore invertible, we can rearrange (53) to obtain the following expression
t evaluated at x (rather than at
for X t (x)),
 
t (x) = 1 (x)).
X t ( t (43)
t

Note that as 1 =
t is not necessarily a one-parameter subgroup, we cannot assume that t .
t

Associated to Xt is a system of first-order differential equations,

x(t)
t (x(t)).
=X (44)

This system is nonautonomous, as X t depends explicitly on t. (As in exercises from Problem Sheet 2,
we could recast (54) as an autonomous system on U R, but we wont do this here.) Then t (x) is the

solution of (54) with initial condition 0 (x).

. A
Given a k-form on X , we want to evaluate the t-derivative of its pullback with respect to t
formula is given by the following:

78
Proposition 3.5.3 (Lie derivative with respect to time-dependent vector field).

t L
= t .
t t t=t 0 X
0
0

Some comments: t0 is just the value of t where we evaluate the derivative. Sometimes, well omit t0 ,
and write the formula more concisely as
t L .
=
t t t
X

Note that, in the case where t0 = 0 and , Proposition 3.5.3 is just


t is the flow of a fixed vector field X
the definition of the Lie derivative, ie Definition 3.4.1.
Proof. From a first-order Taylor expansion about t = t0 , we have that

t + 
t + = t + O(2 )

0 0 t t=t
0
t + X
= t + O(2 )
t
0 0 0
= (Id + X t + O(2 ),
t ) (45)
0 0

t .
where we have used the definition (53) of X 0
Let s denote the flow (one-parameter subgroup of differmorphisms ) of the fixed (i.e., t-independent)
t , so that
vector field X 0
t = s

X 0
. (46)
s s=0
Then a first-order Taylor expansion of s about s = 0 gives
t + O(2 ) = (Id + X
 = 0 + X t ) + O(2 ), (47)
0 0

since 0 = Id. Substituting (47) into (45), we get that

t + O(2 ).
t + =  (48)
0 0

From (48) and the composition property of the pullback (Proposition 3.3.9), we have that
t + = (
t ) + O(2 ) =
t  + O(2 ). (49)
0 0 0

Then we compute

1  t

t = lim t0 + 0
(definition of derivative)
t t=t0 0 
1  t + O(2 )

= lim t0  0 0
(using (49) and 0 = Id)
0 
1

t

= lim t0  0 0
(O(2 ) term vanishes in the limit)
0 
1 
= lim t  0 (using linearity of pullback)
0  0
t lim 1  t is independent of )

= 0 0  0 (as 0

t t

= 0 t definition of derivative
t=0
t L
= (using Definition 3.4.1), (50)
0 Xt0

Example 3.5.4 (Time-dependent flow). Let U = V = Rn and I = (0, ). Define a one-parameter family
t for t > 0 by
of diffeomorphisms
t (x) = tx,

79
t using (43). By inspection,
t just dilates or contracts by a scalar factor t. Lets compute X
ie

1 (x) = 1 x,
t t
and

(x) = x.
t t
Therefore,     
t (x) = 1 (x)) = t 1 1
X t ( t x = x.
t t t t

3.5.2 Poincar
e Lemma
Theorem 3.5.5 (Poincare Lemma). Let t : U U be a one-parameter family of diffeomorphisms
k
defined for 0 < t 1. Let (U ) be a closed k-form. Suppose that
= ,
t = 0.
lim
1
t0

Then
= d,
where Z 1  
= t i dt,
t
X
0
t is defined as above by
and X
t (
t (x)).
(x) = X
t t
Remark. In many applications, t is not invertible for t = 0, so that
0 is not a diffeomorphism

(cf. Example 3.5.4). In this case, X0 is not defined (cf. Eq.(54).
Proof. From the assumptions above, we have that
1 Z 1 Z 1
t = lim
= lim t dt = t dt.
0  0  t 0 t
From Proposition 3.5.3,
t L .
=
t t t
X
From Proposition 3.4.3,
L = di + i d.
Xt Xt Xt
But since is closed by assumption, the second term vanishes, and we have that
L = di .
Xt Xt

It follows that
t L =
t di .

Xt Xt
Since d commutes with the pullback (Proposition 3.3.7), we may write that
t L = d
t i .

Xt Xt

Substituting into the equation for above, we get that


Z 1  
= d t i dt.
Xt
0
Next, we note that d can be taken outside the t-integral. This is essentially due to the fact that d is linear;
that is, d(1 + 2 ) = d1 + d2 and the integral
 over
 t can be understood as a limit of sums. Or else, one
can write down the explicit formula for d i , and verify that it may be taken outside the integral.
t Xt
(Below, in Lemma 4.1.6, well write out both of these arguments in more detail for completeness). It
follows that Z 1  
=d t i dt ,

Xt
0
as required.

80
Definition 3.5.6 (Contractible set). An open set U Rn is said to be contractible if there exists a
t : U U , with 0 < t 1 such that
one-parameter family of diffeomorphisms
1 (x) = x,
t (x) = x ,
lim
t0

t interpolates between the identity map for t = 1 and, as t approaches


for some fixed x U . That is,
0, a map which maps every x in U to a single point x .

Clearly, for any differential form ,


= .
1
0 (x) = 0. Therefore, from Definition 3.3.1 for the pullback,
0 (x) = x , it follows that
Also, 0

 = 0.
lim
0

Therefore, the hypothesis of the Poincare Lemma are satisfied. It follows that, on a simply connected
space, any closed k-form can be expressed as d for some (k 1)-form . From Example 3.5.4, we see
that Rn is simply connected. Therefore, on Rn , every closed form is exact.
On noncontractible spaces, it is no longer necessarily the case that every closed form is exact.
Example 3.5.7. Let 3 (R3 ). Show that = d , where is a two-form.

We note that every three-form on R3 is closed, and the Poincare Lemma may be applied, with
t (x, y, z) = (tx, ty, tz), as in Example 3.5.4. Thus,

= d,

where Z 1  
= t i dt,
t
X
0
t is given as in Example 3.5.4 by
and X

t (x, y, z) = 1 (x, y, z).


X
t
We may write that
(x, y, z) = c(x, y, z) dx dy dz.
We have that
i = ci (dx dy dz) .
Xt Xt

From the formula for the contraction with a wedge product (??), we get that
     
i (dx dy dz) = i dx dy dz i dy dx dz + i dz dx dy.
Xt Xt Xt Xt

In general, iX
t dx = dx(Xt ) is the first component of Xt , iX
t dy is the second component of Xt , etc.
Therefore,
1 1
i (dx dy dz) = (x dy dz y dx dz + z dx dy) = (x dy dz + y dz dx + z dx dy) ,
Xt t t
and
c(x, y, z)
i = (x dy dz + y dz dx + z dx dy) .
Xt t
Next, we compute the pullback of i by . We have that
Xt t

t x = tx,
t y = ty,
t z = tz.

dx = d
Therefore, since , and similarly for dy and dz , we have that
t t

t dx = t dx,
t dy = t dy,
t dz = t dz.

81
Therefore (since the pullback of a wedge product is the wedge product of the pullbacks), we have that
t (dx dy) = t2 dx dy,
t (dy dz) = t2 dy dz,
t (dz dx) = t2 dz dx.

Therefore, the pullback of one of the terms in iX


t (the first) is given by
 
t c(x, y, z) c(tx, ty, tz)
x dy dz = tx tdy tdz = t2 c(tx, ty, tz)x dy dz.
t t

The two other terms are treated similarly, and we get that
 
t i = t2 c(tx, ty, tz) (x dy dz + y dz dx + z dz dx) .

Xt

Then Z 1 
(xy, z) = t2 c(tx, ty, tz) dt (x dy dz + y dz dx + z dz dx) .
0
Under the correspondence between forms on R3 and scalar and vectors (cf Section ?? and Exam-
ple 3.2.5), a differential three-form corresponds to a scalar field c, and a differential two-form corre-
sponds to a vector field E. d , which is a differential three-form, corresponds to E, which is a scalar
field. The previous result, translated into the language of vector calculus, says that if E is the vector
field given by Z  1
E(x, y, z) = t2 c(tx, ty, tz) dt r,
0
where r = (x, y, z), then
E = c.

4 Integration of Differential Forms


We begin with some preliminary remarks.

Let U Rn be an open set in Rn . Let n (U ) be an n-form on U . can be written as

(x) = f (x) dx1 dxn ,

where f (x) is the x-dependent coefficient (a function) of the standard basis n-form. As the notation
suggests, differential forms are meant to be integrated. The integral of over U will be defined as
follows: Z Z
:= f (x) dx1 dxn .
U U
This proposed definition raises two questions:
1. Is the value of the integral dependent on the choice of coordinates? That is, suppose F : U V is
a diffeomorphism between open sets U, V Rn , which we might think of as a change of coordinates
(cf Example 3.3.8). Let n (V ). Is it the case that
Z Z
= F ?
V U

The following simple example shows that the answer is No, in general. Let U and V be the unit
squares in R2 given by

U R2 = {(x1 , x2 ) | 0 x1 , x2 1}, V R2 = {(y 1 , y 2 ) | 0 y 1 , y 2 1}.

Note that we are thinking of U and V as being different sets; V has coordinates y , and U has
coordinates x. Let be the two-form on V given by

= dy 1 dy 2

(that is, the coefficient function is constant, and is equal to 1). Then from the provisional definition
above, Z Z
= dy 1 dy 2 = 1.
V V

82
Let F : U V be given by
(y 1 , y 2 ) = F (x1 , x2 ) = (x2 , x1 ).

That is, F effectively interchanges the coordinates.

F = (F dy 1 ) (F dy 2 ) = dx2 dx1 = dx1 dx2 .

Then from the provisional definition, Z


F = 1.
U
R
Thus, it appears that the sign of U can change under a change of coordinates. It will turn out
that this is the only ambiguity.

2. How do we integrate k-forms on U if k < n?

4.1 Singular k-cubes and integration of differential forms


We will address the Question 2. above first, that is how to integrate k-forms on U . In this discussion, we
let t = (t1 , . . . , tk ) denote coordinates on Rk .

Let n o
I k = t Rk | 0 ti 1, 1 i n

denote the closed unit k-cube in Rk .

Definition 4.1.1 (Singular k-cube). Let U Rn be open. A singular k-cube on U is a smooth map

c : I k U.

(Note: since I k is not open, we have not really defined what it means for a map on I k to be smooth.
We will say that c : I k U is smooth if there exists an open set B Rk such that I k B and a map
c : I k U such that c(t) = c(t) for all t I k . One says that c is a smooth extension of c. We shall not
give too much attention to this point, although in rigorous treatments it require care.)

You can think of a singular k-cube as being a parameterisation of a region of U . Typically you can
think of this region as being k-dimensional, but this neednt be the case (and we havent really defined
what a k-dimensional region is). You should note that c need not be 1-1 nor onto. For example, the
constant map c(t) = x0 , where x0 U is fixed, whose image is a single point, is a singular k-cube.

Let c : I k U be a singular k-cube on U . Let k (U ) be a k-form on U . Then c is a k-form on


Ik, and we may write that
c (t) = f (t) dt1 dt1 dtk ,


for some smooth function f (t) defined on I k .


R
Definition 4.1.2. The integral of a k-form over a singular k-cube c, denoted c , is defined by
Z Z Z
w := c := f (t) dt1 dtk .
c Ik Ik

Example 4.1.3.
We consider the integration of a two-form over a singular two-cube on U = R3 (so k = 2 and n = 3).
For convenience, denote coordinates on I 2 by (u, v) rather than (t1 , t2 ), and coordinates on U = R3 by
(x, y, z). Let c : I 2 R3 be given by

c(u, v) = (u, v, 2 u2 v 2 ).

Let
(x, y, z) = x dy dz.

83
Figure 25

We compute c as follows: First, we need to compute c . We have that


R

c x(u, , v) = x(c(u, v)) = u, c y(u, , v) = y(c(u, v)) = v, c z(u, , v) = z(c(u, v)) = 2 u2 v 2 ,

and
c dy = dc y = dv, c dz = dc z = 2u du 2v dv.

Then
c = u dv (2u du 2v dv) = 2u2 du dv.

Now we integrate: Z Z Z 1Z 1
2
= c = 2u2 du dv = .
c I2 0 0 3

Next, we address the first question in the preamble above: to what extent does the integral of a
k-form depend on the choice of coordinates? We need the following definition:

Definition 4.1.4 (Orientation-preserving maps). Let U, V Rn be open. Let G : U V be a diffeo-


morphism. Then det G0 (x) 6= 0 for all x U . We say that G is orientation-preserving if det G0 > 0 on
U.

*The basic result we need is the following change-of-variables formula from multidimensional integra-
tion, which we will state without proof (a heuristic explanation is given below).

Theorem 4.1.5 (Change of variables formula). Let G : U V be an orientation-preserving diffeomor-


phism, and let f C (V ) be integrable. Integrable in this context means that the integral V |f (y)| dn y
R

is finite. Since f is smooth, this is automatically the case if V is a bounded set. If V is not bounded,
then f (y) must vanish for large |y| sufficiently quickly in order for the integral to converge.) Then
Z Z
f (y) dy 1 dy n = f (G(x)) det G0 (x) dx1 dxn .
V U

Proof. See Spivak and Hubbard. A rough sketch is given in the rough sketch below.
The idea is that the small area element B based in y in V , as shown in the figure, is the image under
G of the square area element A based at x in U . The area of B (denoted vol for volume in the figure)
is given approximately by det G0 (x)vol (A). The integral of f over V is obtained by partitioning V into
small area elements, multiplying the area of each element by the value of the function at a point of the
area element, summing, and then taking the limit as the partition is made finer. The change of variables
formula is given by this limit.

Proposition 4.1.6 (Independence of parameterisation).

84
Figure 26

Let B Rk be an open set which contains the k-cube I k . Let G : B B be an orientation-preserving


diffeomorphism, and suppose that G(I k ) = I k . That is, G maps the k-cube into itself, and may be
thought of as a smooth change of variables on I k . Let c : I k U be a singular k-cube, and k (U ).
Then Z Z
= .
c cG

Note that c G : I k U is a singular k-cube.

Proof. *Let
c (t) = f (t) dt1 dtk .

Then the left-hand side of the assertion reads


Z Z
= f (t) dt1 dtk .
c Ik

On the right-hand side, we have that


Z Z
= (c G) ,
cG Ik

from the definition of integration of a k-form. By Proposition 3.3.9,


 
(c G) = G c = G f dt1 dtk .

We have that
G f = f G,

and
Gi ji
G dti = d(G ti ) = dGi = dt .
tji

85
Then
G1 Gk
G (f dt1 dtk ) = (f G) j
j dtj1 dtjk
t 1 t k
X G1 Gk
= (f G) dt(1) dt(k) ,
t (1) t (k)
S k

since the only nonvanishing contributions to the sum come from terms where the indices j1 , . . . , jk are
all distinct, and therefore are given by a permutation; that is, ji = (i) for some Sk . From the
antcommutativity of the wedge product, it follows that

dt(1) dt((k) = sgn dt1 dtk .

Therefore,
1 k
X G k G1
dt1 dtk .
G (f dt dt ) = (f G) sgn
t (1) t (k)
Sk

But we recognise the combinatorial formula for the determinant (cf Proposition 2.4.5),

G1 Gk
= det G0 .
X
sgn
Sk t(1) t(k)

Therefore,
G (f dt1 dtk ) = (f G) det G0 dt1 dtk .
It follows that Z Z
= f (G(t)) det G0 (t) dt1 dtk .
cG Ik
By the Change of Variables formula (Theorem 4.1.5) and the fact that G(I k ) = I k by assumption, we
have that Z Z
f (G(t)) det G0 (t) dt1 dtk = f (t)dt1 dtk .
Ik Ik
Therefore, Z Z Z
= f (t)dt1 dtk = ,
cG Ik c
as required.

Example 4.1.7. Here is a simple example of change of parameterisation in a one-dimensional integral.


Really, its just how you would integrate 1/(1 + t2 ) by making the trigonometric substitution t = tan s.
Below we carry this through, with some small changes of notation, to fit with the statement of Proposi-
tion 4.1.6.

Let be the one-form on U = R given by


1
(x) = dx.
1 + x2

Let c : I U be the singular one-cube on U given by

c(t) = t.

Then
1
c = dt,
1 + t2
and Z Z 1
dt
= .
c 0 1 + t2
Let G : I1 I1 be given by

G(t) = tan t.
4

86
We want to compute Z Z 1
= (c G) .
cG 0
We have that  
1
(c G) = G dt .
1 + t2
Also,

G t = tan t,
4
so that

G dt = sec2 t dt.
4 4
Then
(/4) sec2 ((/4)t)
 
1
G dt = dt = dt.
1 + t2 2
1 + tan ((/4)t) 4
Then Z Z /4

= dt = .
cG 0 4

Finally, we want to consider singular k-cubes for k = 0, and the analogue of integration for 0-forms.
We define
I 0 = {0}.

That is, I 0 consists of a single point, namely 0. A singular 0-form on U is a map c : I 0 U ; 0 7 c(0) U .
That is, a 0-form maps 0 to a single point c(0) U . Given a zero-form f 0 (U ), i.e. a function, we
make the following definition: Z
f := f (c(0)).
c
That is, integration of a 0-form over a singular 0-cube is really just evaluation, i.e. evaluating a function
at point.

4.2 Boundaries
We want to consider boundaries of singular k-cubes. Roughly, this means looking at the map c restricted
to the different faces of I k . This leads us to the more general notion of singular k-chains.
Definition 4.2.1 (Singular k-chain). A singular k-chain on U , denoted C , is a formal sum of a finite
number of singular k-cubes cr : I k U with integer coefficients, i.e.

C = a1 c1 + as cs , ar Z.

k-chains may be added. Here is an example: Let

C = c1 + 2c3 , C 0 = 2c1 + c3 + c4 .

Then
C + C 0 = c1 + 3c3 + c4 .

k-chains may also be multiplied by integers. Here is an example: Given C as above,

2C = 2c1 + 4c3 .

In general, let
s s
C0 = a0r cr .
X X
C= ar cr ,
r=1 r=1

Note that some of the ar s and a0r s can be zero. Then


s s
C + C0 = (ar + a0r )cr ,
X X
mC = mar cr , m Z.
r=1 r=1

87
Given a singular k-chain C on U and a k-form k (U ), we define
Z s
X Z
:= ar . (51)
C r=1 cr

That is, the integral over a k-chain is just the sum of the integrals over the k-chains it comprises weighted
by the integer coefficients.

Definition 4.2.2 (Faces of k-cubes).

Let c : I k U be a singular k-cube on U . Take j such that 1 j k and = 0 or 1. The (j, )th face
of c, denoted c(j,) , is the singular (k 1)-cube given by

c(j,) : I k1 U,

where
c(j,) (t1 , . . . , tk1 ) = c(t1 , . . . , tj1 , , tj , . . . , tk1 ).

That is, c(j,) is obtained by fixing the j th argument of c to be . This restricts c to one of the faces
of I k .
Definition 4.2.3 (Boundary of singular k-cube ). Let c : I k U be a singular k-cube. The boundary
of a singular k-cube c, denoted c, is the singular (k 1)-chain given by
k X
(1)j+ c(j,) .
X
c =
j=1 =0,1

For k = 0, the boundary of a singular 0-cube is taken be 0;


c = 0.

Figure 27: For j = 3 and = 1, the (j, )th face of the singular 3-cube c is obtained by restricting c to
the top face of the cube, as shown.

Example 4.2.4. We consider again the singular 2-cube on U = R3 from Example 4.1.3,
c : I 2 R3 ; (u, v) 7 c(u, v) = (u, v, 2 u2 v 2 ).

The faces are given by


c(1,0) (t) = c(0, t) = (0, t, 2 t2 ),

c(1,1) (t) = c(1, t) = (1, t, 1 t2 ),

c(2,0) (t) = c(t, 0) = (t, 0, 2 t2 ),

c(2,1) (t) = c(t, 1) = (t, 1, 1 t2 ).

88
Figure 28

The boundary is given by


c = c(1,0) + c(1,1) + c(2,0) c(2,1) .

Arrows are drawn according to the sign (1)j+ . If (1)j+ = 1, then arrows point in the direction
of increasing t. If (1)j+ = 1, then arrows point in the direction of decreasing t.
Let k1 (U ) be a (k 1)-form, and c : I k U a singular k-cube. Then
Z k X Z
(1)j+
X
= .
c j=1 =0,1 c(j,)

Let us discuss the sign factor, (1)j+ = (1) (1)j , which appears in the definition of the boundary.
The factor of (1) means that opposite faces (corresponding to faces with the same j but different s)
come with opposite signs. The factor (1)j is not as easily motivated, but is justified by, and can be
derived from, the following two results.

Given a k-chain
s
X
C= ar cr ,
r=1
we define its boundary, denoted C , by
s
X
C = ar cr .
r=1

C is a (k 1)-chain.

Proposition 4.2.5 ( 2 = 0).

Let C be a singular k-chain on U . Then

2 C = (C) = 0.

Proof. See Section 4.4 for the (purely combinatorial) proof.

Example 4.2.6. Let c be a singular two-cube. Then

2 c = c(1,0) + c(1,1) + c(2,0) c(2,1) .

We have that    
c(1,0) = c(1,0) + c(1,0) = c(0, 0) + c(0, 1).
(1,0) (1,1)

89
The other faces of faces are computed similarly. We get

c(1,0) = c(0, 0) c(0, 1),


c(1,1) = c(1, 0) + c(1, 1),
c(2,0) = c(0, 0) + c(1, 0),
c(2,1) = c(0, 1) + (1, 1).

Adding up the singular 0-cells on the right-hand side, we see that the terms cancel pairwise.
*The following result gives some further insight into the signs which appear in the definition of the
boundary map. Let us consider a generalised boundary-like map, denoted , defined on k-chains as
follows. For a singular 0-cube c on U Rn ,
= 0.
c
For a singular k-cube with k 1,
k X
X
=
c nk (j, )c(j,) , (52)
j=1 =0,1

where the cr s are singular k-cubes on U Rn ,


P
where nk (j, ) are integers. For C = r br c r ,
X
=
C r.
br c
r

= 0. This is equivalent to
In addition, for any 1-cube c with c(1) = c(0), we require that c

n1 (1, 1) = n1 (1, 0). (53)

The standard boundary map, , corresponds to taking

nk (j, ) = (1)j+ .

Proposition 4.2.7.

Let U Rn be open, and let C be a singular k-chain on U . Then, with as above, 2 = 0 if and only
if
nk (j, ) = k (1)j+ , (54)
where k is an integer.

Proof. See Section 4.4

4.3 Stokes theorem


We begin with a calculation. Let U Rn be open, c : I k U a singular k-cube, and k1 (U ) a
(k 1)-form. Then c is a (k 1)-form on I k , and may be written in the form

k
c = gi (t) dt1 dt
ci dtk ,
X
(55)
i=1

ci means that dti is omitted from the wedge product.


where the notation dt

The following gives an expression for c(j,) in terms of c . For the sake of clarity, well use
coordinates s = (s1 , . . . , sk1 ) on I k1 and coordinates t = (t1 , . . . , tk ) on I k .
Proposition 4.3.1. Let k1 (U ) be a (k 1)-form on U , and let c be given by (55). Then
 
c(j,) (s) = gj (s1 , . . . , sj1 , , sj , . . . , sk1 ) ds1 dsk1 .

90
Proof. This is a straightforward calculation. The only difficulty is being careful about which coordinate
goes where. It is helpful to introduce the following map: Let

e(j,) : I k1 I k

be given by
e(j,) (s) = (s1 , . . . , sj1 , , sj , . . . , sk ).

Thus, e(j,) maps the (k 1)-cube I k1 onto one of the faces of I k , namely the face obtained by setting
the j th coordinate equal to . In other words, if we let tr (s) := er(j,) (s), then

r
s , r < j,


r
t (s) = , r = j,

sr1 ,

r > j.

Then
c(j,) = c e(j,) .

Now we may compute c(j,) as follows:


k
c(j,) = (c e(j,) ) = e(j,) c = e(j,) gi (t)dt1 dt
ci dtk .
X
(by Proposition 3.3.9)
i=1

We pullback each factor in turn. We have that


 
e(j,) gi (s) = gi (e(j,) (s)) = gi (s1 , . . . , sj1 , , sj , . . . , sk1 ).

Also,
r r
ds , r < j, ds , r < j,



e(j,) dtr = de(j,) tr = d, r = j, = 0, r = j,

r1
ds , r > j. dsr1 ,

r > j.

Then
  0, j 6= i (since e(j,) dtj = 0),
e(j,) dt1 dt
ci dtk =
ds1 dsk1 , j = i.

It follows that  
c(j,) (s) = gj (s1 , . . . , sj1 , , sj , . . . , sk1 ) ds1 dsk1 ,

as required.
Theorem 4.3.2 (Stokes theorem).

Let U Rn be open, and let c : I k U be a singular k-cube on U . Let k1 (U ) be a (k 1)-form


on U . Then Z Z
d = .
c c

Proof. As in (55), we write


k
c = gi (t)dt1 dt
ci dtk .
X

i=1
On the right-hand side of the assertion, we have
Z k X Z k X Z 1 Z 1
(1)j+ c(j,) = (1)j+ gj (s1 , . . . , sj1 , , sj , . . . , sk ) ds1 . . . dsk1 ,
X X
=
c j=1 =0,1 I k1 j=1 =0,1 0 0
(56)
from Proposition 4.3.1.

91
On the left-hand side, we have
Z Z Z
d = c d = dc .
c Ik Ik

Then

k
dc = d gi dt1 dt
ci dtk
X

i=1

k k k g
g i dtj dt1 dt
ci dtk = j j
dt dt1 dt
cj dtk ,
X X X
=
tj tj
i=1 j=1 j=1

since the only terms in the sum over i which contribute have i = j (otherwise, the fact that dti dti = 0
means the term vanishes). It follows that

k gj
dc = (1)j1 dt1 dtk .
X
tj
j=1

Then
Z k Z 1 Z 1
gj 1
(1)j1 dt dtk .
X
d =
c j=1 0 0 tj

Consider the j th term in the sum, and integrate with respect to tj :


Z 1
gj
(t) dtj = gj (t1 , . . . , tj1 , 1, tj+1 , . . . , tk ) gj (t1 , . . . , tj1 , 0, tj+1 , . . . , tk )
0 tj
(1)+1 gj (t1 , . . . , tj1 , , tj+1 , . . . , tk ).
X
=
=0,1

Let us changes variables, replacing tr by sr for r < j and by sr1 for s > j . Then
Z k Z 1 Z 1
(1)j1 (1)+1 gj (s1 , . . . , sj1 , , sj , . . . , sk ) ds1 dsk1 .
X
d =
c j=1 0 0

This last expression coincides with (56).

Example 4.3.3.

a) For k = 1 and take U = R, Stokes theorem is just the Fundamental Theorem of Calculus. Consider
a 0-form, i.e. a function f (x) on R. Let c : [0, 1] R be the singular 1-cube given by c(t) = t. Then
Theorem 4.3.2 says that Z Z Z 1 df
df = dt = f (1) f (0) = f.
c 0 dt c

b) We continue with the example discussed in Examples 4.1.3 and 4.2.4. In this case, k = 2. Consider
the singular 2-cube on R3 given by

c(u, v) = (u, v, 2 u2 v 2 ).

Let be the 1-form given by


= y 2 dz.
R R
We verify Stokes theorem by calculating c and c d explicitly.
First, we compute c . We computed c in Example 4.2.4. We compute c(j,) for each face in
R

turn.

92
j = 1, = 0. We have that c(1,0) (t) = (0, t, 2 t2 ), so that

c(1,0) x = 0, c(1,0) y = t, c(1,0) z = 2 t2 ,

and
c(1,0) dx = 0, c(1,0) dy = dt, c(1,0) z = 2t dt,

Then
c(1,0) = t2 (2t) dt = 2t3 dt.

j = 1, = 1. We have that c(1,1) (t) = (1, t, 1 t2 ), so that

c(1,1) x = 1, c(1,1) y = t, c(1,1) z = 1 t2 ,

and
c(1,1) dx = 0, c(1,1) dy = dt, c(1,1) z = 2t dt,

Then
c(1,1) = t2 (2t) dt = 2t3 dt.

j = 2, = 0. We have that c(2,0) (t) = (t, 0, 1 t2 ), so that

c(2,0) x = t, c(2,0) y = 0, c(2,0) z = 2 t2 ,

and
[c(2,0) dx = dt, c(2,0) dy = 0, c(2,0) z = 2t dt,

Then
c(2,1) = 0.

j = 2, = 1. We have that c(2,1) (t) = (t, 1, 1 t2 ), so that

c(2,1) x = t, c(2,1) y = 1, c(2,1) z = 1 t2 ,

and
[c(2,1) dx = dt, c(2,1) dy = 0, c(2,1) z = 2t dt,

Then
c(2,1) = dt.

It follows that
2 X
(1)j+ c(j,) = dt,
X

j=1 =0,1

as the contributions from c(1,0) and c(1,1) cancel. We obtain


Z Z 1
= dt = 1.
c 0

R
Next we compute c d . We have that

c d = dc = d(2v 2 (u du + v dv)) = 4vu dv du = 4uv du dv.

Then Z Z 1Z 1
= 4uv du dv = 1.
c 0 0

93
4.4 *Proofs of results for the boundary map
Proof of Proposition 4.2.5. It suffices to show that 2 c = 0 for a singular k-cube c (the argument for a
general k-chain then follows by linearity). As above,
k X
(1)j+ c(j,) ,
X
c =
j=1 =0,1

so that
k X k k1
2 c = (c) = (1)j+ c(j,) = (1)j+l++ (c(j,) )(l,) .
X X X X

j=1 =0,1 j=1 l=1 ,=0,1

Here, (c(j,) )(l,) , the (l, )th face of c(j,) , is a singular (k 2)-cube. Let us divide the preceding
expression into two contributions according to whether j > l or j l. We write

2 c = S1 + S2 , (57)

where
k j1
(1)j+l++ (c(j,) )(l,) ,
X X X
S1 =
j=1 l=1 ,=0,1

and
k k1
(1)j+l++ (c(j,) )(l,) .
X X X
S2 =
j=1 l=j ,=0,1

We will change summation variables in S1 so that the domain of summation coincides with that of
S2 . First, note that we can start the j -sum in S1 at j = 2, since there are no contributions from j = 1
(the l-sum is empty in this case). Thus,

k j1
(1)j+l++ (c(j,) )(l,) .
X X X
S1 =
j=2 l=1 ,=0,1

Let n = j 1, m = l and then replace j, l by m, n and interchange and to get


k1 n k1 n
(1)m+n+1++ (c(n+1,) )(m,) = (1)m+n++ (c(n+1,) )(m,) .
X X X X X X
S1 =
n=1 m=1 ,=0,1 n=1 m=1 ,=0,1

Interchange the m and n sums to get


k1
X k1
(1)m+n++ (c(n+1,) )(m,) .
X X
S1 =
m=1 n=m ,=0,1

Finally, rename (m, n) by (j, l) to get


k1
X k1
(1)j+l++ (c(l+1,) )(j,) .
X X
S1 =
j=1 l=j ,=0,1

Substitute into (57) to get


k1
X k1  
2C = (1)j+l++ (C(l+1,) )(j,) + (c(j,) )(l,) .
X X
(58)
j=1 l=j ,=0,1

For j l, we claim that


(c(l+1,) )(j,) = (c(j,) )(l,) . (59)

From (58), this establishes that 2 C = 0. Verifying (59) involves carefully applying the definition of
faces,
c(j,) (t1 , . . . , tk1 ) = c(t1 , . . . , tj1 , , tj , . . . , tk1 ), (60)

94
in the appropriate order. Starting on the right-hand side, we have that

(c(j,) )(l,) (t1 , . . . , tk2 ) = c(j,) (t1 , . . . , tl1 , , tl , tl+1 , . . . , tk2 ).

Apply (60) a second time to express c(j,) in terms of c. Since j l, we get that

(c(j,) )(l,) (t1 , . . . , tk2 ) = c(t1 , . . . , tj1 , , tj , . . . , tl1 , , tl , tl+1 , . . . , tk2 ). (61)

On the left-hand side of (59), we have that

(c(l+1,) )(j,) (t1 , . . . , tk2 ) = c(l+1,) (t1 , . . . , tj1 , , tj , tj+1 , . . . , tk2 ).

Apply (60) a second time to express c(l+1,) in terms of c, and note that, since j l, the lth argument
in (t1 , . . . , tj1 , , tj , tj+1 , . . . , tk2 ) is tl1 . Then

(c(l+1,) )(j,) (t1 , . . . , tk2 ) = c(t1 , . . . , tj1 , , tj , . . . , tl1 , , tl , tl+1 , . . . , tk2 ). (62)

The claim (59) follows from (61) and (62).

Proof of Proposition 4.2.7.

If holds, then, for C a singular k-chain, we ave that 2 C = k k1 2 C , which vanishes by Proposi-
tion 4.2.5.

Next, assume that 2 C = 0 for all k-chains C . From (53), (54) holds for k = 1 if we define

1 = n1 (1, 1).

We proceed by induction. We assume (54) holds for k and show it holds for k + 1.
Let C be a singular k + 1-cube on U . Then

k+1
2
X X
C = nk+1 (i, )C(i,) =
i=1 =0,1
k
X X k+1
X X  
nk (j, ) nk+1 (i, ) C(i,) =
(j,)
j=1 =0,1 i=1 =0,1
j6=i

X X X  
+ nk+1 (i, )nk (j, ) C(i,) , (63)
(j,)
i<j i>j ,=0,1

where in the last expression i and j are summed over pairs of distinct integers between 1 and k + 1, which
we separate according to whether i < j or i > j .
From the definition of the faces of k-cubes, it follows that, for i < j ,
   
C(i,) = C(j+1,) ,
(j,) (i,)

as both sides of the preceding give the k 1-cube

C 0 (t1 , . . . , tk1 ) = C(t1 , . . . , ti1 , , ti , . . . , tj1 , , tj , . . . , tk1 ).

Therefore, (63) can be written as


     
2 C =
X X
nk+1 (i, )nk (j, ) C(j+1,) + nk+1 (i, )nk (j, ) C(i,)
(i,) (j,)
i>j ,=0,1
X X  
= (nk+1 (j + 1, )nk (i, ) + nk+1 (i, )nk (j, )) C(i,) ,
(j,)
i>j ,=0,1

95
 
where the last line follows by combining terms involving the same edges C(i,) (an edge is a face
  (j,)
of a face). C(i,) . In order that 2 C = 0 for all C , we must have
(j,)

nk+1 (j + 1, )nk (i, ) + nk+1 (i, )nk (j, ) = 0.

From the induction hypothesis, it follows that

nk+1 (j + 1, )(1)i+ = nk+1 (i, )(1)j+ , (64)

where we have canceled a factor of k . Setting i = = 1, we get that

nk+1 (j + 1, ) = k+1 (1)j+ , (65)

where we define k+1 as


k+1 = nk+1 (1, 1).
Setting l = j + 1 in (65), we get
nk+1 (l, ) = k+1 (1)l+ ,
which establishes (54) for 2 l k + 1. For l = 1, set i = 1 in (64) and use (54) for the lhs to conclude
that
k+1 (1)j+1+ (1)1+ = nk+1 (1, )(1)j+ ,
or
nk+1 (1, ) = k+1 (1)1+ .
This establishes (54) for l = 1.

5 Bibliography
The course does not follow a particular textbook. There are many books and online resources that cover
all or parts of the syllabus, but there is no, single recommended text. Below is a list of some standard
texts, but you are encouraged to look for yourselves.

1. JH and BB Hubbard, Vector calculus, linear algebra and differential forms: A unified approach,
2 ed, Prentice Hall
This is a very good rigorous treatment of multivariable calculus. The main overlap with the syllabus
is in Chapter 6, which deals with algebraic and differential forms. The text takes a different point of
view to ours on several concepts, in particular the definition of the exterior derivative. Otherwise,
it contains proofs of many results referred to in the lectures, including the inverse function theorem,
multidimensional integrals and the change of variables formula. Recommended for supplementary
material, but should not be regarded as a textbook in lieu of the course notes. There is little on
part 1 of the syllabus (vector fields, flows, Jacobi bracket, Frobenius theorem, etc).
2. B Schutz, Geometrical methods in mathematical physics, Cambridge University Press
This text provides a more informal introduction to much of the material in the course, and much
more that isnt in the course, in the style of applied mathematics and theoretical physics. The
text introduces differentiable manifolds early on, whereas in the course we work mainly with open
subsets of Rn . Therefore, the presentation requires some translation from the general setting of
manifolds to Rn . The text covers vector fields, Jacobi bracket, the Frobenius theorem, algebraic
and differential forms, exterior derivative, Lie derivative and Stokes theorem. It also discusses Lie
groups and Riemannian geometry (not covered in the course) and a number of physical applications.
The presentation and notation is occasionally nonstandard, differing from what we use in the course.
Recommended for supplementary reading but not as a main text.
3. W Darling, Differential forms and Connections, Cambridge University Press
This is a mathematical but elementary introduction to differentiable manifolds and differential
forms. Algebraic forms on Rn are covered in Chapter 1, differential forms on Rn in Chapter 2,
and integration of differential forms in Chapter 8; hence there is substantial intersection with parts
2, 3 and 4 of the syllabus. The text introduces differentiable manifolds in a simplified, slightly

96
cheating way (avoiding the notion of topological space) which nevertheless enables a shortcut into
the general theory. It also covers Riemannian geometry and vector bundles, along with applications
all these topics are outside the course syllabus. Recommended for supplementary reading and a
different approach, but not as a substitute for the course notes. There is some overlap with part 1
of the syllabus (vector fields, flows, Jacobi bracket, Frobenius theorem, etc).
4. M Spivak, Calculus on Manifolds, Westview Press
Similar to Hubbard and Hubbard, Spivak is a rigorous treatment of calculus on Rn , and covers the
material in parts 2, 3 and 4 of the syllabus, providing proofs of all the main results. The style is
different; this is a short text written in the form of extended notes. This is a classic text. There is
very little on part 1 of the syllabus (vector fields, flows, Jacobi bracket, Frobenius theorem, etc).
5. M Spivak, A comprehensive introduction to differential geometry, vol 1, Publish or Perish, Berkeley
This is the first of a five-volume set, and follows on from Spivaks Calculus on Manifolds. This
is definitely at a more advanced level than our course, but interested students may want to try
it. There is an extensive treatment of part 1 of the syllabus (vector fields, flows, Jacobi bracket,
Frobenius theorem, etc), all in the context of manifolds, as well as the rest of the syllabus. Sub-
sequent volumes treat Riemmanian geometry, vector bundles and characteristic classes in some
detail. Postgraduate level, for supplementary reading and future study.
6. V Arnold, Mathematical methods of classical mechanics, Springer-Verlag.
As the name suggests, this is a text on classical mechanics, and a classic at that. Chapters 4, 7 and
8 provide a condensed introduction to much of the material we cover in the course. f differentiable
manifolds and the main topics in the course. Postgraduate level. Good for supplementary reading.

6 Summary notes
These summary notes were written before I produced the full set of notes, and were intended to present
the main results in outline form. At this point, I am including them provisionally, and only for the first
part of the unit to see whether you think they would be helpful. If people do find them helpful, I will
update and extend them.

1. Diffeomorphisms (Section 1.5. Let U and V be open subsets of Rn . A smooth map F


C (U, V ) is a diffeomorphism if F is invertible with F 1 C (V, U ). Let Diff(U, V ) denote the
set of diffeomorphisms between U and V . If U and V are the same, we write Diff(U ) for the set
of diffeomorphisms of U to itself. The set of diffeomorphisms Diff(U ) on U forms a group under
composition (Proposition 1.5.3).
2. ODEs, vector fields and flows (Section 1.6. Let X : U Rn be a smooth vector field on an
open set U Rn Then the first-order system

x(t)
= X(x(t)), x(0) = x0 (66)

has a unique solution x(t, x0 ) for T < t < T for some T > 0 (which may depend on x0 ) (Theo-
rem 1.6.3). As a function of initial conditions, x(t, x0 ) is smooth (Theorem 1.6.7). A vector field X
is complete if x(t, x0 ) is defined for all t. Suppose X is smooth and complete. We define a map

: R U Rn ; (t, x0 ) 7 t (x0 ) = x(t, x0 ). (67)

is called the flow of the vector field X. has the following properties (Proposition 1.6.9):

(a) 0 = IdU .
(b) t s = t+s .
(c) t : U t (U ) is a diffeomorphism.
(d) C (R U, Rn ).

97
A map with properties (a)(d) is called a smooth one-parameter subgroup of diffeomorphisms.
Conversely, if is a smooth one-parameter subgroup of diffeomorphisms on U Rn , we define a
vector field X on U by

X(x) = (x). (68)
t t=0 t
Then is the flow of X (Proposition 1.6.12).
Matrix exponential (Examples 1.6.10, 1.6.13. Let A Rnn be an n n matrix. Define
j j
t A
etA = (I + tA + 12 t2 A2 + ) =
X
. (69)
j!
j=0

Proposition1.6.9 says that


etA esA = e(s+t)A . (70)
Proposition 1.6.12 says that
d tA
e = A. (71)
dt

3. Pushforward of vector fields (Section 1.7 Let U Rn be open. Let X (U ) denote the
space of smooth vector fields on U . Let V Rn be open. Let F Diff(U, V ). We define a map
F : X (U ) X (V ) : X 7 F X, by either of the following equivalent formulas:

F X(y) = F 0 (F 1 (y)) X(F 1 (y)),


F X(F (x)) = F 0 (x) X(x). (72)

F X, a smooth vector field on V , is called the pushforward of X by F . The definition (72) is


motivated by changing variables in the system of ODEs described by X. That is, if

x = X(x)

and we define y(t) = F (x(t)), then y(t) satisfies the system

y = Y(y).

At the level of flows, we have the following (Proposition 1.7.3): Let t be the flow of X. Then
t , the flow of F X, is given by
t = F t F 1 (73)

For a linear vector field X(x) = Ax, where A Rnn , and a linear diffeomorphism F (x) = Sx, where
S Rnn and S is invertible,
(F X)(y) = SAS 1 y.
(See Example 1.7.4).
The pushforward by a composition of two maps is given by Proposition 1.7.5 : Let F, G Diff(U )
be diffeomorphisms on U and X X (U ) be a smooth vector field. Then

(F G) X = F G X. (74)

4. Jacobi bracket. Commuting flows (Section 1.8). Let U Rn be open. Let X, Y X (U ). The
Jacobi bracket of X and Y, denoted [X, Y], is the vector field in X (U ) given by

[X, Y] = (X )Y (Y )X.

Here, = (/x1 , . . . , /xn ). Let s be the flow of Y. Then (Proposition 1.8.2)




[X, Y] = s X. (75)
s s=0

The Jacobi bracket has the following properties (Proposition 1.8.5):


(a) Linearity. [aX + bY, Z] = a[X, Z] + b[Y, Z], where a, b R.

98
(b) Antisymmetry. [X, Y] = [Y, X].
(c) Leibniz rule. [X, f Y] = f [X, Y] + ((X )f )Y , where f : X R is a smooth function on X .
Proposition 1.8.6: The pushforward of the Jacobi bracket is given by

F [X, Y] = [F X, F Y].

Proposition 1.8.8 : Letting F be the flow, r , of a third vector field Z and differentiating with
respect to r at r = 0, we obtain the Jacobi identity,

[X, [Y, Z]] = [[X, Y], Z] + [Y, [X, Z]].

See also Problem Sheet 3.4.


Proposition 1.8.10. The derivative of the pushforward by a flow at arbitrary time. Let X and Y
be vector fields and s the flow of Y. Then

s X = [s X, Y].
s

Proposition 1.8.11. A vector field is invariant under push-forward by its flow. Let X be a vector
field with flow t . Then
t X = X.

Theorem 1.8.12. Commutativity of flows and brackets. Let X and Y be vector fields with flows
t and s respectively. Then

t s = s t [X, Y] = 0.

5. Pullback and Lie derivative on smooth functions. Noncommuting flows (Sections 1.9,
lsubsec: noncommuting flows).
Let U be an open set in Rn , and let F Diff(U ). We define a map F : C (U ) C (U ), called
the pullback of F , which maps smooth functions into smooth functions. (Later, we will extend the
definition of the pullback to differential forms.) Given f C (U ), we define the pullback of f by
F by
F f = f F, i.e. F f (x) = f (F (x)).

It is easily verified that the pullback is a linear map; that is, if f, g C (U ) and a, b R, then
F (af + bf ) = aF f + bF g .
(Remark: the pullback may be defined more generally for smooth maps G : U V , where U and
V are open sets in Rm and Rn respectively, and G is not necessarily a diffeomorphism. We define
G : C (V ) C (U ) by G g = g G for g a smooth function on V . Note that G goes in the
opposite direction to G; that is, G is a map from U to V , while G is a map from C (V ) to C (U ).
This is natural.)
Proposition 1.9.2. The pullback by a composition of diffeomorphisms F, G Diff(U ) is given by

(F G) = G F .

Let X X (U ). We define a map LX : C (U ) C (U ), called the Lie derivative with respect to X,


which maps smooth functions into smooth functions. (Later, we will extend the definition of the
Lie derivative to differential forms.) Given f C (U ), we define the Lie derivative of f by X to be
f
LX f = (X f ) = X i .
xi
That is, LX f is the directional derivative of f along X.
Proposition 1.9.4 Let X X (U ) and let be the flow of X. Then
j
t j
t f = L f = etLX f.
X
j! X
j=0

99
In general, the series might not converge, even if f and t are smooth. In this case, the right-hand
side may be regarded as a formal power series; if the series is truncated at the nth term, then the
error is O(tn+1 ). If f and t are analytic (i.e., their Taylor series converge, at least for x and t in
some neighbourhood), then the series for t f will also converge in some neighbourhood.
Proposition 1.9.6. Differential-operator form of Jacobi bracket . Let X and Y be vector fields
and f a function. Then
LX LY f LY LX f = L[X,Y] f.

Theorem 1.10.1. Noncommuting flows. Let X and Y be vector fields with flows t and s
respectively, and let
(s,t) = s t s t .

Let f be a smooth function. Then

(s,t) f = f stL[X,Y] f + O(3),

where O(3) denotes terms of third- and higher-order in s and t. Equivalently, if r is the flow of
[X, Y], then
(s,t) = st + O(3).

100

Vous aimerez peut-être aussi